Why use CGI?
The Common Gateway Interface,
or CGI, is a standard for communication between Web documents and CGI
scripts you write. CGI scripting, or programming, is the act of creating a
program that adheres to this standard of communication. A CGI script is
simply a program that in some way communicates with your Web documents. Web
documents are any kind of file used on the Web. They can be HTML documents,
text files, image files, or any number of other file formats. The existence
of this gateway between programs you write and your Web documents allows you
to create much more dynamic and interactive Web pages than you could with
HTML alone.
This chapter will help you
understand the role of CGI scripting within the World Wide Web and will show
why you would want to use it. First, you will be introduced to some of the
key elements and terminology of the Web, such as HTTP, URLs, HTML, and CGI.
Then you will learn some of the advantages of CGI scripts.
The World Wide Web
Many people have heard of the
World Wide Web, but not everyone knows what it is. Even people who use it
may have trouble defining it precisely. The World Wide Web is a global
collection of interconnected documents on the Internet. Because the World
Wide Web has grown explosively and has been advertised so extensively, many
people think it is the same thing as the Internet. However, the World Wide
Web is only a part of the Internet.
The Internet has been around
for over three decades. It began as a Department of Defense program for
enabling computers to communicate over great distances without requiring a
central server to route the communications traffic. Since those early days,
the Internet has grown substantially. Early on it was adopted by the
academic community, and more recently it has been commercialized. The
federal government no longer funds the Internet directly, leaving private
and public telecommunications companies in charge of the major backbones-the
major network connections of the Internet. The telecommunications companies
charge Internet service providers for connections to the backbone, and
Internet service providers in turn charge companies and individuals for
their access to the Internet. The Internet itself is nothing more than an
enormous number of networked computers all over the world. Like any computer
network, the Internet has various software programs running on it, such as
e-mail, newsgroups, FTP, gopher, and the World Wide Web.
The World Wide Web, or Web, was
born in 1989 at CERN (the European Laboratory for Particle Physics). Since
then, it has grown at a phenomenal rate. Today, Web traffic accounts for
somewhere between one third and one half of the total traffic on the
Internet. Because the Internet consists of many other sources of traffic,
many of which have been around for decades, this is an impressive feat.
So, what is the Web? In simple
terms, the Web is a part of the Internet that uses the Hypertext Transfer
Protocol (HTTP) to display hypertext and images in a graphical environment.
Hypertext refers to the ability to present text documents that are
interlinked. You might click on a portion of the text in a document and be
taken to another section of text in a different document. The Web is based
on the concept of hypermedia, which is a superset of hypertext. Think of
hypermedia as various forms of media (text, graphics, sound files, and so
on) that are interlinked. For example, you could click on a text link in one
document and display a graphic image. Figure 1.1 illustrates both a text
link and an image link. Clicking on the word "resume" would take you to a
page with the actor's rÈsumÈ, and clicking on the picture itself would take
you to a larger version of the same image. In the early days of the Web,
text links always had a different color of underlined text, and graphic
links were always enclosed within a colored box. Now, however, the current
shape of the mouse pointer gives you a better indication of what is and
isn't a link. If the mouse pointer changes into a hand with the index finger
extended, as shown below the "resume" link in Figure 1.1, the object being
pointed to is a link to another document. Documents on the Web are
interlinked so you can navigate between them by selecting links. The name
World Wide Web alludes to the Web's spiderweb-like nature.
Clients and Servers
To understand the World Wide
Web and CGI programming, you must understand the division between Web
clients and Web servers and how HTTP facilitates the interaction between the
two. Simply put, a server handles requests from various clients. For
example, suppose you are using a word processing program to edit files on
another computer. Your computer would be the client because it is requesting
the file from another computer. The other computer would be the server
because it is handling your computer's request. With networked computers,
clients and servers are very common. A server typically runs on a different
machine than the client, although this is not always the case. The
interaction between the two usually begins on the client side. The client
software requests an object or transaction from the server software, which
either handles the request or denies it. If the request is handled, the
object is sent back to the client software. On the World Wide Web, servers
are known as Web servers, and clients are known as Web browsers. Web
browsers request documents from Web servers, allowing you to view documents
on the World Wide Web. There's a good chance that you have already used a
Web browser. Some of the most common browsers are Netscape's Navigator,
Microsoft's Internet Explorer, and NCSA's Mosaic. Like most software
companies that distribute Web browsers, these companies also distribute Web
server software.
The process of viewing a
document on the Web starts when a Web browser sends a request to a Web
server. The Web browser sends details about itself and the file it is
requesting to the Web server in HTTP request headers. The Web server
receives and reviews the HTTP request headers for any relevant information,
such as the name of the file being requested, and sends back the file with
HTTP response headers. The Web browser then uses the HTTP response headers
to determine how to display the file or data being returned by the Web
server. (There's more information on these headers in Chapter 2.)
Note: This discussion
barely scratches the surface of what is actually happening, but it is
enough for our study of CGI scripting. If you want more details on HTTP
headers, check Chapter 2 as well as the "Useful Web Pages" section of the
Appendix.
When a Web browser requests a CGI script from a Web server, the server
starts the CGI script and passes the HTTP request headers to it. The
information stored in the request headers is available for your script to
use. Normally, when a CGI script is finished executing, the output is passed
back to the Web server, which formats an HTTP response header and sends the
information to the Web browser. It is possible, however, for your CGI script
to format the HTTP response header and send the data directly to the Web
browser. You can use this approach to reduce the work load of your Web
server.
Whether the Web browser is
requesting a file or a CGI script, the browser has to know the location of
the Web server and the name of the file in order to make the request. With
the millions of documents on the Web, you might wonder how the Web browser
knows exactly where to look for the file you want to see. You probably also
realize that many files on the Web have the exact same name. So how do the
Web browsers get the correct document? Each file on the Web has a unique
identifier that not only sets it apart from other documents but also
describes where it is located. These unique identifiers are called uniform
resource locators, or URLs.
Uniform Resource
Locators
The uniform resource locator
(URL) is like an address for Web documents. Every document on the Web has a
unique URL, and each part of the URL pro. |