CMPS 335 Advanced Web Publishing
Perl and CGI Programming
Processing Forms
CGI and CGI Environment Variables
Common Gateway Interface (CGI) is the standard for communication
between a Web browser, a Web server, and the programs on the server.
It is the interface between a Web page being displayed by a browser and a
program that runs on the server. A Web page can call a CGI
program for a service and the CGI program usually communicates back to
the Web page by creating an HTML document. The HTML document is sent
back to the browser through the Web server. All Web browsers and servers
communicate using HTTP
request and response headers. The most common HTTP request header is
the method header. Browsers commonly use the GET and POST headers
when requesting HTML documents. The POST method is frequently used
when passing data from the browser to the server.
One of the keystones of CGI is a collection of data called
environment variables. These variables are the primary mechanism
by which a Web server communicates with CGI scripts. A
CGI script is a program that is run on a Web server triggered
by a request from a Web browser. Environment variables are
created by the server and their values are automatically set by
the server each time a CGI script is executed. These variables and
their values are stored as key-value pairs in a special hash called %ENV.
Some of environment variables give information about the server and
hardware and will never change. Other variables contain information
about the data available to a script and will be different every time
a cript is executed. Your CGI script can use any of these environment
variables as needed.
The following three environment variables contain information about
the data that is being passed from the browser to the server and
they are essential to your CGI script.
Variable Name (key) Value
-----------------------------------------------------------
REQUEST_METHOD GET or POST
QUERY_STRING Form data sent from the browser (GET method)
CONTENT_LENGTH Length of the posted data (POST method)
REQUEST_METHOD can have value GET or POST, depending on how the
form data is submitted. QUERY_STRING contains the data appended to a
link or data appended to a URL in the Action attribute of the Form tag.
CONTENT_LENGTH contains the size of the data submitted with the POST
method stored in the standard input known as STDIN. The size
gives the number of characters in the input buffer.
Click Here to view a
sample list of environment variables
URL Encoding Conventions
URL encoding is a scheme used by a Web browser to encode
data to be passed to a Web server for processing by a CGI script.
The browser collects the contents of all NAME and VALUE
attributes from the form, encodes them as name=value pairs, and sends
them to the server. URL encoding follows the following rules:
- NAME=VALUE pairs are separated by ampersands (&).
- Spaces in the input are represented by plus signs (+).
- Any special characters are encoded in hexadecimal preceded by a
percent sign (%xx).
Some of the commonly encoded special characters are:
Character URL Encoded String
------------------------------
& %26
/ %2F
: %3A
; %3B
@ %40
~ %7E
Example: Form Data (two VALUES are entered by the visitor)
NAME Attribute VALUE Attribute
----------------------------------
studentname David Smith
age 21
Encoded data: studentname=David+Smith&age=21
Click Here for URL Encoding Example 1
Click Here for URL Encoding Example 2
Processing of Form Data
The easiest and most common way to get input from your visitors
is with a form on your Web page. The following figure gives a basic
description of how CGI works.
************ (1) *********** (2) ***********
* Web * ------> * Web * ------> * CGI *
* browser * <------ * server * <------ * script *
************ (4) *********** (3) ***********
Steps:
- Browser's request
A visitor fills out the form in the browser. When the Submit
button is clicked, the form data is encoded and sent from the
browser to the server in one of two browser request methods: GET or
POST. The POST method allows for unlimited quantities of data and
is the one generally used.
- Data Available to A CGI script
Form data sent via the GET method is available to the CGI script
in the %ENV environment hash, using the hash key QUERY_STRING. Data
transferred using the POST method is aviable to the CGI script in the
input buffer represented by the handler STDIN, and the exact amount
of data in the
standard input, in bytes, is stored in CONTENT_LENGTH. Most CGI
scripts get their input through the POST method, but the QUERY_STRING
variable is handy for special circumstances, such as passing the
encoded data appended to the URL in the Action attribute of the
<FORM> tag or the encoded data appended to the URL of the <A>
tag. The appended data begins with a ? mark.
- Output from the script
The data is available to the script is encoded as a long stream of
name=value pairs separated by ampersands. The CGI script parses
the stream of input data, processes the data, and outputs the requested
information formatted in HTML. The cgi script is responsible for
generating the response header
Content-type:text/html
and a blank line that terminates the header
(Perl statement: print "Content-type:text/html\n\n";)
transmission.
The response header tells the browser that the data returned by the
server is HTML text.
- Server's response
The server sends the HTML document to the browser. The browser
interprets the HTML document and displays the Web page in the
browser window.
When the Submit button is clicked, the browser collects all the
form data and sends the data to the server in a long stream of encoded
name=value pairs separated by ampersands.
Examples:
number1=48&operator=mul&number2=7
studentname=Mary+Smith&class=Junior
For the stream of encoded form data to be useful, a CGI script
must be used to parse the encoded data. Form parsing scripts
serve as the backbone of CGI scripts that process form data.
A form parsing script called Parse_Form_POST is shown below.
The Parse_Form_POST script parses the encoded form data sent
from the browser and places name-value pairs in a hash called %formdata.
Each NAME attribute from the HTML form corresponds to a key in
the %formdata hash and each VALUE attribute (or data typed by the visitor
in text boxes) corresponds to a value in the %formdata hash. You
need to include this Parse_Form script as a subroutine in your Perl
CGI scripts.
The Parse_Form_POST script
# The subroutine is in the /cgi-bin/subroutines.lib file
# For parsing form data sent by the POST method
sub Parse_Form_POST
{
read (STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
@pairs = split(/&/, $buffer);
foreach $pair (@pairs)
{
($key, $value) = split (/=/, $pair);
$key =~ tr/+/ /;
$key =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
$value =~ tr/+/ /;
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
if ($formdata{$key})
{
$formdata{$key} .= ", $value";
}
else
{
$formdata{$key} = $value;
}
}
}
1;
The Parse_Form script
# The subroutine is in the /cgi-bin/subroutines.lib file
# For parsing form data sent by either the POST or GET method
sub Parse_Form
{
if ($ENV{'REQUEST_METHOD'} eq 'GET')
{
@pairs = split(/&/, $ENV{'QUERY_STRING'});
}
elsif ($ENV{'REQUEST_METHOD'} eq 'POST')
{
read (STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
@pairs = split(/&/, $buffer);
if ($ENV{'QUERY_STRING'})
{
@getpairs = split(/&/, $ENV{'QUERY_STRING'});
push(@pairs,@getpairs);
}
}
else
{
print "Content-type: text/html\n\n";
print "Use POST or GET";
}
foreach $pair (@pairs)
{
($key, $value) = split (/=/, $pair);
$key =~ tr/+/ /;
$key =~ tr/+/ /;
$key =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
$value =~ tr/+/ /;
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
$value =~ s///g;
if ($formdata{$key})
{
$formdata{$key} .= ", $value";
}
else
{
$formdata{$key} = $value;
}
}
}
1;
Return to CMPS 335 Home Page
Return to Web Site Home Page