Passing data to your CGI script
You don't need to do
much to ensure that your CGI script receives the necessary data from your
Web server. The CGI has already defined how this is done and the task is
performed automatically every time your Web server executes a CGI script.
All of the relevant data sent to the server from the Web browser, such as
form input, plus the HTTP request headers are sent from the server to the
CGI script in either environment variables or by standard input (stdin),
which is the default location at which your program receives input. Because
this task is done for you, all you have to know is where to look for the
information you need.
Environment Variables
When a Web browser
requests a CGI script from a Web server, the server starts the CGI program
in what is termed a stateless environment. What this means is that the CGI
script is running in its own state or environment. It does not inherit
values from the environment that the Web server is running under. This is
important because many Web browsers can be requesting the same CGI script at
the same time, and the Web server can start many copies of the same script.
Each version of the script that is running concurrently must run
independently from all the other scripts, otherwise conflicts may arise.
Because the Web server sets up a new environment for your CGI script, it
places almost all of the information available to the script in environment
variables. Table 2.1 lists the CGI environment variables.
Table 2.1: CGI Environment Variables
|
Variable |
Meaning |
|
AUTH_TYPE |
Contains the authentication method used to validate the Web browser, if
any is used. An example of an authentication method is a
username/password scheme. |
|
CONTENT_LENGTH |
The length of the user-provided content from the Web page requesting the
CGI script, which is sent via the user's Web browser. Because the
user-provided content is passed to the CGI script as a string, this
value is in bytes, with each byte representing one character. |
|
CONTENT_TYPE |
Contains the type of the data that accompanies the browser's request for
the CGI script. Examples are text/html or image/jpeg. |
|
GATEWAY_INTERFACE |
Holds the version of the Common Gateway Interface being used. For
version 1.1 of the CGI specification, this variable would be CGI/1.1. |
|
PATH_INFO |
Holds additional path information for the CGI script. This is usually
the virtual path to another document in the document root that the CGI
script will use. This value is set from the information appended to the
URL requesting the CGI script. See PATH_TRANSLATED for an example. |
|
PATH_TRANSLATED |
Holds additional path information for the CGI script. This is usually
the virtual path to another document in the document root that the CGI
script will use. This value is set from the information appended to the
URL requesting the CGI script. See PATH_TRANSLATED for an example. |
|
QUERY_STRING |
Contains the user-provided data when the request method is GET. This
data is appended along with a question mark to the referenced URL. For
example, in the URL http://www.robertm.com/cgi-bin/answer.pl?state=CA,
the QUERY_STRING would be "state=CA." |
|
REMOTE_ADDR |
Stores the IP address of the machine running the Web browser requesting
the CGI script. |
|
REMOTE_HOST |
Stores the domain name of the machine running the Web browser requesting
the CGI script. If this information is unavailable to the Web server,
REMOTE_ADDR will be set and REMOTE_HOST will not be set. |
|
REMOTE_IDENT |
Stores the user's login name only if the Web server supports
identification. |
|
REMOTE_USER |
Stores the username the Web browser specified for authentication. This
is only set if the server supports authentication and the CGI script is
protected. |
|
REQUEST_METHOD |
Contains the request method used to request the CGI script. This can
contain any of the valid HTTP request methods such as GET, HEAD, POST,
PUT, and so on. |
|
SCRIPT_NAME |
Stores the virtual path and name of the CGI script being executed. This
is used for self-referencing URLs. |
|
SERVER_NAME |
Contains the name, either domain name or IP address, of the machine
running the Web server. |
|
SERVER_PORT |
Contains the port number on which the Web browser sent the request to
the Web server. |
|
SERVER_PROTOCOL |
Contains the name and version of the protocol being used to make the
request for the CGI script. In most cases, this will be the HTTP
protocol and will look something like HTTP/1.0. |
|
SERVER_SOFTWARE |
Stores the name and version of the Web server software that executed the
CGI script. For example, for the Netscape Communications Server version
1.1, the variable would be set to Netscape-Communications/1.1. |
In addition to the
CGI environment variables, the Web server makes available all the HTTP
request headers received from the Web browser. These are also placed in
environment variables, all of which have the prefix HTTP_. Table 2.2 lists
the HTTP request header environment variables.
Table 2.2: HTTP Request Header Environment Variables
|
HTTP Request Header |
Meaning |
|
HTTP_ACCEPT |
Contains a comma-separated list of media types the browser can accept in
response from the Web server. Examples are audio/basic, image/gif,
text/*, */*. The last two examples contain the wildcard *, which is a
stand-in for any string of characters. text/* means that all forms of
text can be accepted; */* means that the browser will accept any content
type. |
|
HTTP_ACCEPT_ENCODING |
Contains the valid encoding methods the browser can receive in response
from the Web server. Examples are x-zip, x-stuffit, and x-tar. |
|
HTTP_ACCEPT_LANGUAGE |
Contains the browser's preferred language for a response from the Web
server. However, responses in any language not specified in this
variable are allowed. An example is en_UK, which is the English of the
United Kingdom. |
|
HTTP_AUTHORIZATION |
Contains authorization information from the Web browser. Its value is
used for the browser to authenticate itself with the Web server. There
is not a single specific format for possible values of this field, and
new formats may be added. One example is the user/password scheme, where
the value, in my case, would be user robertm:mypassword. |
|
HTTP_CHARGE_TO |
Formats for this field are still undetermined. However, it is available
to contain information for the account that is to be charged for the
costs of receiving the requested data. |
|
HTTP_FROM |
Contains the name of the requesting user as supplied by the Web browser
in an e-mail address format. Some examples are robertm@deltanet.com and
rmcdanie@primenet.com. |
|
HTTP_IF_MODIFIED_SINCE |
Can contain a value specified in a valid ARPANET date standard, such as
Weekday, DD-Mon-YY HH:MM:SS TIMEZONE. This field can be used in
conjunction with the GET method to return the requested document only if
it has changed since the date specified. |
|
HTTP_PRAGMA |
Holds the value of any special directives for the Web server. For
instance, a proxy Web server has one valid value for a pragma request
header, no-cache, which means that the proxy server should always
request the document from the real Web server instead of returning a
nonexpired cached copy. |
|
HTTP_REFERER |
Contains the URI (uniform resource identifier, which is a superset of
URLs) of the document that contained the link to the currently requested
document. An example would be http://www.thepalace. com/web-pages.html. |
|
HTTP_USER_AGENT |
Contains the name of the Web browser software that requested the
document. An example is Mozilla/2.0 (Win95; I), which would be the user
agent for the Netscape 2.0 browser for Windows 95. |
Clearly there are
many environment variables available to your CGI script. For the most part,
you will only use a few of these. Of course, your objective will determine
which variables you need for your project. Listing 2.1 shows a CGI script
that displays the values of the CGI and HTTP request header environment
variables.
Listing 2.1: The display.pl CGI Script
#!/usr/local/bin/perl
print "Content-type: text/html\n\n";
print "AUTH_TYPE = $ENV{'AUTH_TYPE'}<BR>\n";
print "CONTENT_LENGTH = $ENV{'CONTENT_LENGTH'}<BR>\n";
print "CONTENT_TYPE = $ENV{'CONTENT_TYPE'}<BR>\n";
print "GATEWAY_INTERFACE =
$ENV{'GATEWAY_INTERFACE'}<BR>\n";
print "PATH_INFO = $ENV{'PATH_INFO'}<BR>\n";
print "PATH_TRANSLATED = $ENV{'PATH_TRANSLATED'}<BR>\n";
print "QUERY_STRING = $ENV{'QUERY_STRING'}<BR>\n";
print "REMOTE_ADDR = $ENV{'REMOTE_ADDR'}<BR>\n";
print "REMOTE_HOST = $ENV{'REMOTE_HOST'}<BR>\n";
print "REMOTE_IDENT = $ENV{'REMOTE_IDENT'}<BR>\n";
print "REMOTE_USER = $ENV{'REMOTE_USER'}<BR>\n";
print "REQUEST_METHOD = $ENV{'REQUEST_METHOD'}<BR>\n";
print "SCRIPT_NAME = $ENV{'SCRIPT_NAME'}<BR>\n";
print "SERVER_NAME = $ENV{'SERVER_NAME'}<BR>\n";
print "SERVER_PORT = $ENV{'SERVER_PORT'}<BR>\n";
print "SERVER_PROTOCOL = $ENV{'SERVER_PROTOCOL'}<BR>\n";
print "SERVER_SOFTWARE = $ENV{'SERVER_SOFTWARE'}<BR>\n";
print "HTTP_ACCEPT = $ENV{'HTTP_ACCEPT'}<BR>\n";
print "HTTP_ACCEPT_ENCODING =
$ENV{'HTTP_ACCEPT_ENCODING'}<BR>\n";
print "HTTP_ACCEPT_LANGUAGE =
$ENV{'HTTP_ACCEPT_LANGUAGE'}<BR>\n";
print "HTTP_AUTHORIZATION =
$ENV{'HTTP_AUTHORIZATION'}<BR>\n";
print "HTTP_CHARGE_TO = $ENV{'HTTP_CHARGE_TO'}<BR>\n";
print "HTTP_FROM = $ENV{'HTTP_FROM'}<BR>\n";
print "HTTP_IF_MODIFIED_SINCE =
$ENV{'HTTP_IF_MODIFIED_SINCE'}<BR>\n";
print "HTTP_PRAGMA = $ENV{'HTTP_PRAGMA'}<BR>\n";
print "HTTP_REFERER = $ENV{'HTTP_REFERER'}<BR>\n";
print "HTTP_USER_AGENT = $ENV{'HTTP_USER_AGENT'}<BR>\n";
|
Note: Once
again, to run this program on a Windows machine, remove the line #!/usr/local/bin/perl.
Place this code in a file called display.pl in your cgi-bin directory. You
can then run it from your Web browser by a URL in the form http://www.robertm.
com/cgi-bin/display.pl. Remember, www.robertm.com is specific to my machine.
In its place you need to specify the domain name or IP address of the
machine running your Web server. Also, try running this script with
different Web browsers and on different machines, or maybe even create an
HTML page that has a link to the script, and notice how the values of the
environment variables change.
Standard Input
Under most
circumstances, all the information your script needs will be contained in
the environment variables. However, in some cases the Web server passes data
to your CGI script by using standard input. When a Web browser requests a
CGI script from a Web server with the request method of POST, which is most
often used with forms in HTML, the user-provided data, if any, is sent via
standard input. The Web server still assigns values to most of the
environment variables discussed earlier. In fact, when the user-provided
data is sent via standard input, you should always check the value of
CONTENT_LENGTH before working with the data sent since the Web server does
not send an EOF (End Of File) at the end of the data.
URL
Encoding
Whether the Web
server sends the user-provided data via standard input or by assigning it to
the QUERY_STRING environment variable, the data is always sent as one long
string of name/value pairs that is URL encoded. This encoding consists of
changing all spaces to plus signs (+) and converting certain special
characters into hexadecimal. Before working with the data, you need to
decode the string and separate the name/value pairs
Each name/value pair
consists of a field name and value separated by an equal sign (=). The field
name is usually taken from the NAME attribute in one of the <INPUT>, <TEXTAREA>,
or <SELECT> tags of an HTML form, and the value is usually data entered by
the user submitting the form. The name/value pairs are separated by an
ampersand sign (&). In Perl, a useful function called split separates a
string into substrings at intervals that you specify. Below is an example of
how you can split the name/value pairs. Each name/value pair is first placed
into an array. Then the name and value are separated and placed into an
associative array, with the name acting as the key and the value being
assigned to the array element. By the way, an associative array is an array
that is indexed by strings rather than integers. For associative arrays, the
index is referred to as the key. So, for name{`first'}=Robert, the array is
name, the key is first, and the value is Robert.
Listing 2.2 splits up
the name/value pairs, but remember that the query string is URL encoded as
well. You must decode the contents of the string in addition to splitting
the name/value pairs. Listing 2.3 adds the code within the foreach loop that
changes all equal signs (=) to spaces and replaces hexadecimal codes with
their character equivalents.
Listing 2.2: Perl Code to Split Name/Value Pairs
# This line places each name/value pair as a separate
# element in the name_value_pairs array.
@name_value_pairs = split(/&/, $user_string);
# This loops over each element in the name_value_pairs
# array, splits it on the = sign, and places the value
# into the user_data associative array with the name as the
# key.
foreach $name_value_pair (@name_value_pairs) {
($name, $value) = split(/=/, $name_value_pair);
# If the name value pair has already been given a value,
# as in the case of multiple items being selected, then
# separate the items with a " : ".
if (defined($user_data{$name})) {
$user_data{$name} .= " : " . $value;
} else {
$user_data{$name} = $value;
}
}
|
Listing 2.3: Perl Code to URL Decode User-Provided Data
# This line changes the + signs to spaces.
$user_string =~ s/\+/ /g;
# This line places each name/value pair as a separate
# element in the name_value_pairs array.
@name_value_pairs = split(/&/, $user_string);
# This loops over each element in the name_value_pairs
# array, splits it on the = sign, and places the value
# into the user_data associative array with the name as the
# key.
foreach $name_value_pair (@name_value_pairs) {
($name, $value) = split(/=/, $name_value_pair);
# These two lines decode the values from any URL
# hexadecimal encoding. The first section searches for a
# hexadecimal number and the second part converts the
# hex number to decimal and returns the character
# equivalent.
$name =~
s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/ge;
$value =~
s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/ge;
# If the name value pair has already been given a value,
# as in the case of multiple items being selected, then
# separate the items with a " : ".
if (defined($user_data{$name})) {
$user_data{$name} .= " : " . $value;
} else {
$user_data{$name} = $value;
}
}
|
You might wonder why the entire string is not hexadecimal URL decoded before
it is split, even though plus signs are replaced with spaces at this stage.
Some of the special characters that are converted to hexadecimal when URL
encoding takes place are the +, &, and = signs. If these were changed before
the string was split or plus signs were converted to spaces, any of these
special characters could alter where a value is split or what value is
actually displayed, causing incorrect results. This is why hexadecimal
encoding is done. It enables your CGI script to distinguish between when a
symbol is typed by the user or is being used for a special purpose. For
example, the name/value separator & would not be changed to hexadecimal
whereas a & symbol typed by the user would be changed.
The code samples in
both Listings 2.2 and 2.3 are not complete, ready-to-execute CGI scripts.
Rather, they are just examples of the lines of code that perform the URL
decoding. Chapter 4 will incorporate these lines of code into a subroutine
for use within the examples in this book.
Returning the results
Whenever a CGI script
is called, it needs to return a result to the Web server, which then sends
it to the Web browser that requested it. A CGI script also has the option of
bypassing the Web server and returning the result directly to the Web
browser. Whether the results are being sent to the Web server or directly to
the Web browser, the CGI script must specify a valid header.
When a CGI script
completes execution, it typically sends its results back to the Web server
via standard output. The Web server receives the results, formats the proper
HTTP response header, and returns all of the data to the Web browser. The
first thing the CGI script must return to the browser is a parsed header.
Parsed Headers
Every CGI script must
precede any data returned to the Web server with a parsed header. A parsed
header is the lines of code output by your CGI script that get parsed by the
Web server. This parsed header is in the same format as an HTTP header and
can contain any of the CGI variable names listed in Table 2.1. Parsed
headers must always be immediately followed by a blank line. Any lines in
the parsed header that are not directives to the Web server are sent back to
the Web browser as part of the HTTP response header. The current version of
CGI, version 1.1, specifies three server directives, which are shown in
Table 2.3.
Table 2.3: Server Directives for Parsed Headers
|
Directive |
Meaning |
|
Content-type |
Specifies to the Web server the MIME type of the data being returned by
the CGI script. |
|
Location |
Contains either the virtual path or the URL of a document that your CGI
script wants returned to the Web browser requesting your script. |
|
Status |
Returns to the Web server an HTTP status line, which will then be
returned to the Web browser. Status lines consist of a three-digit
status code and the reason string. Examples are 404 Not Found and 403
Forbidden. |
Here's an example of a parsed header being returned in a CGI script:
#!/usr/local/bin/perl
print "Content-type: text/html\n\n";
Bypassing the Server
Most Web servers
allow you to send the output from your CGI script directly back to the Web
browser rather than through the Web server. For the Netscape Communications
server, you can activate this feature by preceding the name of your CGI
script with nph-.
When your CGI script
sends its output directly back to the Web browser, it has to specify a
nonparsed header that must contain the proper HTTP response headers. Table
2.4 lists the HTTP response headers
Table 2.4: HTTP Response Headers
|
HTTP Response Header |
Meaning |
|
ALLOWED |
Specifies to the requesting browser which request methods are allowed.
Examples are GET, HEAD and PUT. |
|
CONTENT-ENCODING |
Specifies which encoding method is used. Examples are x-zip, x-stuffit,
and x-tar. |
|
CONTENT-LANGUAGE |
Specifies the language the returning document is in. An example is en,
which is English in one of its forms. |
|
CONTENT-LENGTH |
Specifies the size in bytes of the returning data. |
|
CONTENT-TRANSFER-ENCODING |
Specifies the encoding of the data between the Web server and the Web
browser. The default is binary. |
|
CONTENT-TYPE |
Contains the type of the data being transferred. Examples are text/html
and image/gif. |
|
COST |
Will contain the cost of the retrieval of the object being requested.
The format of this header has not yet been specified. |
|
DATE |
Contains a creation date of the requested object in a valid ARPANET
format. |
|
DERIVED-FROM |
Can contain a version number for the requested object, allowing for
version control of editable documents. |
|
EXPIRES |
Contains an expiration date for the requested information, after which
the document should be retrieved again. This header is used primarily
for caching mechanisms and is in an ARPANET date format. |
|
LAST-MODIFIED |
Contains the date when the requested object was last modified. This
header is in an ARPANET date format. |
|
LINK |
Holds information about the document being returned. You can use it to
specify information such as the inclusion of another URL within the
returned document or the creator of the returned object. |
|
MESSAGE-ID |
Contains a unique identifier for the HTTP message. |
|
PUBLIC |
Fairly similar to the ALLOW response header. However, it specifies the
request methods that anyone can use, not just the requesting browser.
Examples for this header are GET, HEAD, and TEXTSEARCH. |
|
TITLE |
Contains the title of the document being returned. For an HTML file,
this is equivalent to the value contained within the <TITLE></TITLE>
tags. |
|
URI |
Gives the URI (uniform resource identifier) where the requested object
can be found. This will not always be the URL the user entered in the
Web browser requesting the returned object. However, it will point to an
object that should be the same as the one being returned, with some
degree of variance. An example is http://www.robertm.com/Group-one/section1.htmlvary=
language, version which gives a URI with the same document, which might
vary in language or version. |
|
VERSION |
Defines the version of an object that can be changed. Its format is
currently undefined. |
You do not need to provide every HTTP response header to have a valid
nonparsed header. For example, a CGI script with a valid nonparsed header
would look like this:
#!/usr/local/bin/perl
print "HTTP/1.0 200 OK\n";
print "Server: Netscape-Communications/1.1\n";
print "Content-type: text/html\n\n";
|