|
Testing and
Debugging CGI
chapter7
CONTENTS
·
Debugging Techniques
·
Testing Forms
·
Parrot: Echoing the Browser Request
·
Common Errors
·
Summary
Debugging CGI programs is
sometimes a difficult task because they rely on different information from
several different sources. There are several different ways you can test
your CGI programs, both interactively over the Web and stand-alone using a
debugger. Both of these approaches have different advantages and
disadvantages.
In this chapter, you learn some
common debugging techniques using CGI scripts and common debuggers as tools.
You then learn some very common CGI errors and solutions.
Debugging
Techniques
There are two different approaches
to testing and debugging CGI programs: testing the program over the Web
server as a CGI program and testing it as a stand-alone program. Although
you can open HTML and other files directly from a Web browser, you need to
have a Web server running in order to test the results of a CGI program from
a Web browser. If you already have a server from which you can test your CGI
programs or if you set up a personal or experimental server for testing
purposes, how can you debug your CGI programs?
There are several steps you can
take. First, see if your program works. If it doesn't and if you receive a
server error message, your program did not execute correctly. If you do not
receive a server error message but your output is incorrect, then there is
most likely a problem either with one of your algorithms or with the
expected data.
There are several potential server
error messages, the simplest being ones such as "file not found" (404). One
of the most common server error messages when your CGI program is not
working properly is "server error" (500), which means that your CGI program
did not send an appropriate response to the server. The server always
expects CGI headers (such as Content-Type) and usually some data; if the
appropriate headers are not sent, then the server will return a 500 error.
|
Tip |
|
Many servers redirect stderr
to a file. The NCSA and Apache servers, for example, log error
messages and stderr to the file logs/error_log by default. This is an
invaluable resource for debugging CGI programs, because you can often
determine the exact nature of the problem by looking at this log file.
You can also log certain information to this file from within your CGI
program by printing messages to stderr. |
For example, the following program
returns the error 500 because the header is invalid:
#include <stdio.h>
int main()
{
printf("Cotnent-Tpye: txet/plain\r\n\r\n");
printf("Hello, World!\n");
}
If you check your server error
logs, you are likely to find a message that says the headers are invalid.
If you know your program should
return the appropriate headers (that is, you have the proper print
statements in the proper places), then your program has failed somewhere
before the headers are sent. For example, the following C code seems to be a
valid CGI program:
#include <stdio.h>
#include <string.h>
int main()
{
char *name;
strcpy(name,NULL);
printf("Content-Type: text/plain\r\n\r\n");
printf("Hello, world!\n");
}
This program will compile fine and
the headers it prints are valid, but when you try to run it from the Web
server, the server returns an error 500. The reason is clear in this
contrived example: strcpy() produces a segmentation fault when you try to
copy a NULL value to a string. Because the program crashes before the header
is sent, the server never receives valid information and so must return an
error 500. Removing the strcpy() line from the program fixes the problem.
Another common browser message is
Document contains no data. This message appears when a successful status
code (200) and Content-Type are sent but no data is. If you know your
program should print data following the header, you can infer that the
problem lies between the header and body output. Consider the modified code:
#include <stdio.h>
#include <string.h>
int main()
{
char *name;
printf("Content-Type: text/plain\r\n\r\n");
strcpy(name,NULL);
printf("Hello, world!\n");
}
If you compile and run this
program as a CGI, you will receive a Document contains no data message but
no error. However, there is supposed to be data: "Hello, world!". Again, the
error is clear: You cannot copy a NULL string to a variable. Because the
program crashes after the header is printed, the body is never sent, and
consequently, the browser thinks the document has no data. The error message
helps you narrow down the location of the error and quickly identify the
problem.
With a compiled language such as
C, server error 500 generally means that the program has crashed before the
header has been sent. Any syntax errors in the code are caught at
compile-time. However, because scripting languages such as Perl are compiled
languages, you don't know whether there are syntax errors until you actually
run the program. If there are syntax errors, then the program will crash
immediately and once again, you will see the familiar error 500. For
example:
#!/usr/local/bin/perl
pirnt "Content-Type: text/plain\n\n";
print "Hello, World!\n";
There is a typo in the first print
statement, so the program will not run, and consequently, the server
receives no headers and sends an error 500. If your server logs stderr to an
error file, you can find exactly where the syntax errors are by checking the
log.
How can you debug your program if
it runs correctly, does not crash, but returns the incorrect output?
Normally, you could run your program through a debugger and watch the
important variables to see exactly where your program is flawed. However,
you cannot run the CGI program through a debugger if it is being run by the
server. If you are testing your CGI program in this manner, you want to take
advantage of the server and the browser to locate the error.
The poor man's method of debugging
is to include a lot of print statements throughout the code. Because
everything printed to the stdout is sent to the browser, you can look at the
values of various variables from your Web browser. For example, the
following code is supposed to output the numbers 1 factorial (1), 2
factorial (2), and 3 factorial (6):
#include <stdio.h>
int main()
{
int product = 1;
int i;
printf("Content-Type: text/html\r\n\r\n");
printf("<html><head>\n");
printf("<title>1, 2, and 6</title>\n");
printf("</head>\n\n");
printf("<body>\n");
for (i=1; i<=3; i++)
printf("<p>%d</p>\n",product*i);
printf("</body></html>\n");
}
When you compile and run this
program as a CGI, you get 1, 2, and 3 . Suppose for the moment that this is
a vastly complex program and that you cannot for the life of you figure out
why this code is not working properly. To give you more information and help
you trace the problem, you could print the values of product and i at each
stage of the loop.
#include <stdio.h>
int main()
{
int product = 1;
int i;
printf("Content-Type: text/html\r\n\r\n");
printf("<html><head>\n");
printf("<title>1, 2, and 6</title>\n");
printf("</head>\n\n");
printf("<body>\n");
for (i=1; i<=3; i++) {
/* print product and i */
printf("<p>product = %d i = %d<br>\n",product,i);
printf("%d</p>\n",product*i);
}
printf("</body></html>\n");
}
With this additional information,
you can see that the value of product is not updating each time; it remains
1 at each iteration
#include <stdio.h>
int main()
{
int product = 1;
int i;
printf("Content-Type: text/html\r\n\r\n");
printf("<html><head>\n");
printf("<title>1, 2, and 6</title>\n");
printf("</head>\n\n");
printf("<body>\n");
for (i=1; i<=3; i++) {
product = product * i;
printf("<p>%d</p>\n",product);
}
printf("</body></html>\n");
}
Although using print statements is
a simple and workable solution, it can be an inconvenient one, especially if
you use a compiled language such as C. Each time you are debugging the
program or making a slight change, you need to add or remove print
statements and recompile. It would be easier if you could just run the
program directly from within a debugger.
|
Tip |
|
Cgiwrapd-a feature of Nathan
Neulinger's cgiwrap-displays useful debugging information such as
environment variables and the standard input. It enables you to
redirect the stderr to stdout so that you see the error output from
the Web browser rather than from the error log file. For more
information about cgiwrap (and cgiwrapd), see URL:
http://www.umr.edu/~cgiwrap/. |
You could run the program from
within a debugger if you could correctly simulate a CGI program from the
command line. This is possible but difficult because of the many variables
you need to set. There are several environment variables that the CGI
program might or might not rely on. For example, if you are testing a CGI
program from the command line that accepts form input, you need to at least
set the environment variable REQUEST_METHOD so that your program knows where
to get the information. You must also properly URL encode the input, a
non-trivial matter if you use a lot of non-alphanumeric characters.
There are two ways to address this
problem. The first is a somewhat minimalist approach. Determine and set as
many environment variables and other information as you need and then run
the program. For example, if you are testing program.cgi and you know that
you are using the GET method and that the input string is
name=Eugene&age=21
you could do the following (from
the UNIX csh shell with the gdb debugger):
% setenv REQUEST_METHOD GET
% setenv QUERY_STRING 'name=Eugene&age=21'
% gdb program.cgi
Because all of the necessary
information is set, the debugger runs the program without any problems
almost as if the program were running from a Web server. You could create
more advanced implementations of this solution. For example, instead of
setting each variable manually, you could write a wrapper script that sets
all of the appropriate environment variables and the input and runs the
program through the debugger.
The second way to address the
problem of simulating a CGI program from the command line is to actually run
the program from the Web server and save the state information to a file.
Then, when you are ready to debug, load the state file and use that
information as the state information. Several CGI programming libraries have
implemented features that save and load state information. Although this is
a good solution for obtaining and testing CGI programs using the exact same
information you would have under real Web conditions, it also requires
modification of the code every time you save or load state information. This
might not be a desirable task.
Testing Forms
The main difficulty in testing
forms is testing CGI programs that accept and parse input. A CGI program
that just sends some output to the Web server, possibly based on the value
of one environment variable such as HTTP_ACCEPT, is very simple to test from
the command line because you usually do not need to worry about presetting
the appropriate variables. I have already listed a few different ways of
setting the input so that your CGI program runs properly from the command
line. These are fairly good general solutions for debugging your programs.
One possible source of bugs is not
knowing what type of input you are actually receiving. For example, suppose
you wrote some code that parsed data from the following HTML form and
returned the data in a different format:
<html><head>
<title>Form</title>
</head>
<body>
<h1>Form</h1>
<form action="/cgi-bin/poll.cgi" method=POST>
<p>Name: <input name="name"></p>
<p>Do you like (check all that apply):<br>
<input type=checkbox name="vegetable" value="carrot">Carrots?<br>
<input type=checkbox name="vegetable" value="celery">Celery?<br>
<input type=checkbox name="vegetable" value="lettuce">Lettuce?</p>
<input type=submit>
</form>
</body></html>
Remember, if the user does not
check any checkboxes, then none of that information is submitted to the CGI
program. If you-the CGI programmer-forgot this and assumed that you would
have a blank value for "vegetable" rather than no entry labeled "vegetable"
at all, your CGI program might produce some surprising output. Because you
did not properly predict what kind of input you would receive, you
inadvertently introduced a bug in your program.
Avoiding this situation means
making sure the input looks as you expect it to look. You can use the
program test.cgi in Listing 7.1 as a temporary CGI program for processing
forms in order to see the exact format of the input. test.cgi simply lists
the environment variables and values and information from the stdin if it
exists.
------------------------------------
Listing 7.1. test.cgi.
#!/usr/local/bin/perl
print "Content-type: text/plain\n\n";
print "CGI Environment:\n\n";
foreach $env_var (keys %ENV) {
print "$env_var = $ENV{$env_var}\n";
}
if ($ENV{'CONTENT_LENGTH'}) {
print "\nStandard Input:\n\n";
read(STDIN,$buffer,$ENV{'CONTENT_LENGTH'});
print $buffer;
}
--------------------------------
|
Tip |
|
If you want to quickly test
a CGI program that is supposed to process a form, you know the exact
format of the form input, and you don't want to waste time putting
together the proper HTML form, you can telnet directly to the port of
the Web server from a UNIX machine and enter the data directly. For
example, if you wanted to post the following data:
name=Eugene&age=21
to URL:
http://hcs.harvard.edu/cgi-bin/test.cgi, you would use the
following:
% telnet hcs.harvard.edu 80
Trying 140.247.73.252...
Connected to hcs.harvard.edu.
Escape character is '^]'.
POST /cgi-bin/test.cgi HTTP/1.0
Content-Length: 19
name=Eugene&age=21
|
Parrot: Echoing the
Browser Request
Although test.cgi displays the
input parsed by the server, it does not return the exact request that the
browser has sent. Sometimes, being able to see this low-level request can be
useful. First, seeing how the browser communicates with the server is useful
for learning purposes. Second, you can see the exact format of the request,
look for variations in the input, and correct the appropriate bugs in your
program.
I wrote a program called parrot,
listed in Listing 7.2, written in Perl for UNIX platforms. It is a Web
server that simply takes the browser's request and echoes it back to the
browser.. Parrot is essentially a very small, very stupid Web server that
can handle one connection at a time and just repeats what the browser says
to it. In order to use the program, type parrot at the command line. You can
optionally specify the port number for parrot by typing parrot
n where
n is the port number. If the
machine already has an HTTP server running or if you're not the site
administrator, it might be a good idea to pick a high port such as 8000 or
8080. To use it, you'd point your browser at http://localhost:8000/ (of
course, you'd substitute a different number for 8000 if you picked a
different port number).
---------------------------------------------
Listing 7.2. The parrot
program.
#!/usr/local/bin/perl
$debug = 0;
### trap signals
$SIG{'INT'} = 'buhbye';
$SIG{'TERM'} = 'buhbye';
$SIG{'KILL'} = 'buhbye';
### define server variables
($port) = @ARGV;
$port = 80 unless $port;
$AF_INET = 2;
$SOCK_STREAM = 1;
if (-e "/ufsboot") { # Solaris; other OS's may also have this value
$SOCK_STREAM = 2;
}
$SO_REUSEADDR = 0x04;
$SOL_SOCKET = 0xffff;
$sockaddr = 'S n a4 x8';
($name, $aliases, $proto) = getprotobyname('tcp');
select(fake_handle); $| = 1; select(stdout);
select(real_handle); $| = 1; select(stdout);
### listen for connection
$this = pack($sockaddr, $AF_INET, $port, "\0\0\0\0");
socket(fake_handle, $AF_INET, $SOCK_STREAM, $proto) || die "socket: $!";
setsockopt(fake_handle, $SOL_SOCKET, $SO_REUSEADDR, pack("l",1));
bind(fake_handle,$this) || die "bind: $!";
listen(fake_handle,5) || die "listen: $!";
while (1) {
@request = ();
($addr = accept (real_handle,fake_handle)) || die $!;
($af, $client_port, $inetaddr_e) = unpack($sockaddr, $addr);
@inetaddr = unpack('C4',$inetaddr_e);
$client_iname = gethostbyaddr($inetaddr_e,$AF_INET);
$client_iname = join(".", @inetaddr) unless $client_iname;
print "connection from $client_iname\n" unless (!$debug);
# read first line
$input = <real_handle>;
$input =~ s/[\r\n]//g;
push(@request,$input);
$POST = 0;
if ($input =~ /^POST/) {
$POST = 1;
}
# read header
$done = 0;
$CONTENT_LENGTH = 0;
while (($done == 0) && ($input = <real_handle>)) {
$input =~ s/[\r\n]//g;
if ($input =~ /^$/) {
$done = 1;
}
elsif ($input =~ /^[Cc]ontent-[Ll]ength:/) {
($CONTENT_LENGTH = $input) =~ s/^[Cc]ontent-[Ll]ength: //;
$CONTENT_LENGTH =~ s/[\r\n]//g;
}
push(@request,$input);
}
# read body if POST
if ($POST) {
read(real_handle,$buffer,$CONTENT_LENGTH);
push(@request,split("\n",$buffer));
}
&respond(@request);
close(real_handle);
}
sub respond {
local(@request) = @_;
# HTTP headers
print real_handle "HTTP/1.0 200 Transaction ok\r\n";
print real_handle "Server: Parrot\r\n";
print real_handle "Content-Type: text/plain\r\n\r\n";
# body
foreach (@request) {
print real_handle "$_\n";
}
}
sub buhbye {
close(fake_handle);
exit;
}
--------------------------------------------------------
As an example of parrot's
usefulness for CGI programming, I wanted to learn how to use Netscape's
support for the HTML File Upload feature supported in its 2.0 browser .
However, the RFC on File Upload was flexible, and I was interested
specifically in how Netscape implemented it. Because Netscape did not
document this feature well, I created a sample file upload form and had it
connect to the parrot server. After submitting the file, parrot returned
exactly what Netscape had submitted. After obtaining the format of the
upload, I was able to write the scripts in
Chapter 14 that correctly handled file upload.
Common Errors
There are several common errors
people tend to make when programming CGI. A large percentage of the problems
people generally have with CGI programming (other than a lack of conceptual
understanding that this book hopefully addresses) falls under one of the
categories described next. You should be familiar with all of these errors,
their symptoms, and their solutions; they will save you a lot of time
chasing after tiny mistakes.
The most common mistake is not to
send a proper CGI header. You need to have either a Content-Type or a
Location CGI header, and you can send only one or the other but not both.
Each line should technically end with a carriage return and a line feed (CRLF),
although a line feed alone usually works. The headers and the body of the
CGI response must be separated by a blank line.
Assuming you use the proper header
format, you also want to make sure you use the proper MIME type. If you are
sending an image, make sure you send the proper MIME type for that image
rather than text/html or some other wrong type. Finally, if you are using an
nph script, the program must send an HTTP status header as well.
HTTP/1.0 200 Ok
Content-Type: text/plain
Hello, World!
One common problem especially
pertinent to UNIX systems is making sure the server can run the scripts. You
want to make sure first that the server recognizes the program as a CGI
program, which means that it is either in a designated scripts directory
(such as cgi-bin) or its extension is recognized as a CGI extension (that
is, *.cgi). Second, the server must be able to run the script. Normally,
this means that the program must be world-executable; if it is a script, it
must be world-readable as well. Additionally, it means you must be familiar
with how your server is configured.
Always use complete pathnames when
writing a CGI program. CGI programs can take advantage of the PATH
environment variable if it is trying to run a program, but it is more secure
and reliable to use the full pathname rather than rely on the environment
variable. Additionally, you want to make sure data files that you open and
close are referred to as a complete pathname rather than a relative
pathname.
There are situations in which you
use paths relative to the document root rather than the complete path. For
example, within HTML files, the path is always listed as relative to the
document root. If your GIF file is located in
/usr/local/etc/httpd/htdocs/images/pic.gif
and your document root is
usr/local/etc/httpd/htdocs/
you reference this picture as
<img src="/images/pic.gif">
and not as
<img src="/usr/local/etc/httpd/htdocs/pic.gif">
This latter tag will give you a
broken image message. In general, use relative paths from within HTML files
and use full paths for data files and other such input and output.
Know what type of input to expect.
Remember that certain form elements such as checkboxes have the unique
quality that they only get passed to the server when they have been checked,
and you need to make note of these quirks. Finally, if you're using an
NCSA-style authentication for your Web server, you want to make sure you set
the limitations on both GET and POST.
There are many language-specific
problems that are often useful to know, especially if you are using several
different languages. C users should remember to compile the proper libraries
when linking and to make sure your include files are in the proper place.
Watch out for pointer code that could cause segmentation faults within the
program. Finally, use the full pathname.
Summary
You can approach testing and
debugging CGI programs from two perspectives: actually testing the programs
over the Web and testing them from the command line. Both have different
advantages and disadvantages. Testing your programs over the Web enables you
to see whether your CGI program works properly under expected conditions
given real input. On the other hand, it can be a difficult and sometimes
inefficient process. Testing from the command line gives you greater
flexibility to debug your programs thoroughly at the cost of testing your
scripts using real input from a true Web environment. You can also learn a
lot by determining the exact format and content of the input from the Web.
Most CGI errors can be attributed
to a few common errors. Before you spend a lot of time doing exhaustive
testing and debugging, check to make sure you did not make one of the
following mistakes:
·
Sent an improper CGI header.
·
Did not use complete pathnames, or did not
properly differentiate between real pathnames and relative pathnames (to
document root).
·
Did not compile your code properly (there are
syntax or other errors).
·
Did not correctly predict the type of
information you received. For example, a checkbox on a form does not
guarantee that the CGI program receives any input related to that checkbox. |