Section 1: Basic Questions

This section aims to deal with basic questions, addressing the role and
nature of CGI, and its place in Web programming. Questions/answers which
just don't appear to 'fit' under any other section may also be included
here.

1.1: What is CGI?

[ from the CGI reference http://hoohoo.ncsa.uiuc.edu/cgi/overview.html ]

The Common Gateway Interface, or CGI, is a standard for external
gateway programs to interface with information servers such as HTTP servers.
A plain HTML document that the Web daemon retrieves is static,
which means it exists in a constant state: a text file that doesn't change.
A CGI program, on the other hand, is executed in real-time, so that it
can output dynamic information.

[Table of Contents] [Index]

1.2: Is it a script or a program?

The distinction is semantic.   Traditionally, compiled executables
(binaries) are called programs, and interpreted programs are usually
called scripts.   In the context of CGI, the distinction has become
even more blurred than before.   The words are often used interchangably
(including in this document).   Current usage favours the word "scripts"
for CGI programs.

[Table of Contents] [Index]

1.3: When do I need to use CGI?

There are innumerable caveats to this answer, but basically any
Webpage containing a form will require a CGI script or program
to process the form inputs.

[Table of Contents] [Index]

1.4: Should I use CGI or JAVA?

[answer to this non-question hopes to try and reduce the noise level of
the recurrent "CGI vs JAVA" threads].

CGI and JAVA are fundamentally different, and for most applications
are NOT interchangable.

CGI is a protocol for running programs on a WWW server.
Typical applications include accessing a database, submitting
an order, or posting messages to a bulletin board.
JAVA is a programming language, and is an alternative to C, Perl, etc
rather than to CGI.

Java was designed for network applications, and its close association
with the Web stems from its ability to run programs safely on the
Client machine, and its adoption in browsers (especially Netscape).

In certain instances the two may be combined in a single application:
for example a JAVA applet to define a region of interest from a
geographical map, together with a CGI script to process a query
for the area defined.

[Table of Contents] [Index]

1.5: Should I use CGI or SSI or ... { PHP/ASP/... }

CGI and SSI (Server-Side Includes) are often interchangable, and it may
be no more than a matter of personal preference.   Here are a few
guidelines:
  1) CGI is a common standard agreed and supported by all major HTTPDs.
     SSI is NOT a common standard, but an innovation of NCSA's HTTPD
     which has been widely adopted in later servers.   CGI has the
     greatest portability, if this is an issue.
  2) If your requirement is sufficiently simple that it can be done
     by SSI without invoking an exec, then SSI will probably be
     more efficient.   A typical application would be to include
     sitewide 'house styles', such as toolbars, netscapeised <body>
     tags or embedded CSS stylesheets.
  3) For more complex applications - like processing a form -
     where you need to exec (run) a program in any case, CGI
     is usually the best choice.

Many more recent variants on the theme of SSI are now available.
Probably the best-known are PHP which embeds server-side scripting
in a pre-html page, and ASP which is a Microsoft proprietary API
with the backing of MS marketing.

[Table of Contents] [Index]

1.6: Should I use CGI or an API?

APIs are proprietary programming interfaces supported by particular
platforms.   By using an API, you lose all portability.   If you know
your application will only ever run on one platform (OS and HTTPD),
and it has a suitable API, go ahead and use it.   Otherwise stick to CGI.

[Table of Contents] [Index]

1.7: So what are in a nutshell the options for webserver programming?

Too many to enumerate - but I'll try and summarise.  Briefly, there
are several decisions you have to make, including:
  * Power.  Is it up to a complex task?
  * Complexity.  How much programming manpower is it worth?
  * Portability.  Might you want to run your program on another system?

So here's an overview of the main options.  It's inevitably subjective,
but may be helpful to someone:

			Power		Complexity	Portability
Basic SSI:		Low		Low		Medium
Enhanced SSI[2]:	Medium		Medium		Low
CGI:			High		Medium		High[4]
Enhanced CGI-like[5]:	High		Medium		Medium[6]
Server API:		v.High		High		Low
Servlets[7]:		High		High		Medium

[2] For example, PHP, ASP.
[4] Subject to choice of programming language.
[5] For example: mod_perl or fastcgi, both of which can improve
    efficiency with respect to standard CGI.
[6] You port them by converting to standard CGI
[7] Servlets are really a server API, and make sense if you're already
    running a JAVA VM, but will probably hit your server hard if not.

[Table of Contents] [Index]

1.8: What do I absolutely need to know?

If you're already a programmer, CGI is extremely straightforward, and just
three resources should get you up to speed in the time it takes to read them:
  1) Installation notes for your HTTPD.   Is it configured to run CGI
     scripts, and if so how does it identify that a URL should be executed?
     (Check your manuals, READMEs, ISP webpages/FAQS, and if you still can't
     find it ask your server administrator).
  2) The CGI specification at NCSA tells you all you need to know
     to get your programs running as CGI applications.
     http://hoohoo.ncsa.uiuc.edu/cgi/interface.html
  3) WWW Security FAQ.   This is not required to 'get it working', but
     is essential reading if you want to KEEP it working!
     http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html

If you're NOT already a programmer, you'll have to learn.   If you would
find it hard to write, say, a 'grep' or 'cat' utility to run from the
commandline, then you will probably have a hard time with CGI.   Make
sure your programs work from the commandline BEFORE trying them with CGI,
so that at least one possible source of errors has been dealt with.

[Table of Contents] [Index]

1.9: Does CGI create new security risks?

Yes.   Period.
There is a lot you can do to minimise these.   The most important thing
to do is read and understand Lincoln Stein's excellent WWW security
FAQ, at http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html .

[Table of Contents] [Index]

1.10: Do I need to be on Unix?

No, but it helps.   The Web, along with the Internet itself, C, Perl,
and almost every other Good Thing in the last 20 years of computing,
originated in Unix.   At the time of writing, this is still the
most mature and best-supported platform for Web applications.

[Table of Contents] [Index]

1.11: Do I have to use Perl?

No - you can use any programming language you please.   Perl is simply
today's most popular choice for CGI applications.   Some other widely-
used languages are C, C++, TCL, BASIC and - for simple tasks -
even shell scripts.

Reasons for choosing Perl include its powerful text manipulation
capabilities (in particular the 'regular' expression) and the fantastic
WWW support modules available.

[Table of Contents] [Index]

1.12: What languages should I know/use?

It isn't really that important.  Use what you're comfortable with,
or what you're constrained (eg by your manager) to use.

If you're just dabbling with programming, Perl is a good choice, simply
because of the wealth of ready-to-run Perl/CGI resources available.

If you're serious about programming, you should be at home in a
range of languages.  C, the industry standard, is a must (at least to
the level of comfortably reading other people's code).  You'll
certainly want at least one scripting language such as Perl, Python
or Tcl.  C++ is also a very good idea.

In response to a Usenet newbie question:
>  I am seriously wanting to learn some CGI programming languages

J.M. Ivler wrote some eloquent words of wisdom:
> If you want to learn a programming language, learn a programming language.
> If you want to learn how to do CGI programming, learn a programming
> language first.
> 
> My book is one of the few that tackles two languages at the same time.
> Why? because it's not about languages (which are just syntax for logic).
> CGI programming is about programming, and how to leverage the experience
> for the person coming to the site, or maintaining the site, or in some way
> meeting some requirements. Language is just a tool to do so.

[Table of Contents] [Index]

1.13: Do I have to put it in cgi-bin?

see next question

[Table of Contents] [Index]

1.14: Do I have to call it *.cgi? *.pl?

Maybe.   It depends on your server installation.

These types of filenames are commonly used conventions - no more.
It is up to the server administrator whether or not CGI scripts are
enabled, and (if so) what conventions tell the server to run or
to print them.

If you are running your own server, read the manual.
If you're on ISP or other rented webspace, check their webpages for
information or FAQs.   As a last resort, ask the server administrator.

[Table of Contents] [Index]

1.15: What is the "CGI Overhead", and should I be worried about it?

The CGI Overhead is a consequence of HTTP being a stateless protocol.
This means that a CGI process must be initialised for every "hit"
from a browser.

In the first instance, this usually means the server forking a
new process.  This in itself is a very small overhead, but it
can become important on a heavily-used server if the number of
processes grows to problem levels.  If the CGI programs are
themselves long-running, this is heavily exacerbated.

In the second place, the CGI program must initialise.  In the
case of a compiled language such as C or C++ this is negligible,
but there is a penalty to pay for scripting languages such as Perl.

Thirdly, CGI is often used as 'glue' to a backend program, such as
a database, which may take some considerable time to initialise.
This represents a major overhead, which must be avoided in any
serious application.  The most usual solution is for the backend
program to run as a separate server doing most of the work, while
the actual CGI simply carries messages.

Fourthly, some CGI scripts are just plain inefficient, and may
take hundreds of times the resources they need.  Programs using
system() or `backtick` notation often fall into this category.

Note that there are ways to reduce or eliminate all these overheads,
but these tend to be system- or server-specific.  The best-supported
server is probably Apache, as commercial server-vendors like to push
their proprietary solutions in preference to CGI.

[Table of Contents] [Index]

1.16: What is CGIWrap, and how does it affect my program?

[ quoted from http://www.umr.edu/~cgiwrap/intro.html ]

> CGIWrap is a gateway program that allows general users to use CGI scripts
> and HTML forms without compromising the security of the http server.
> Scripts are run with the permissions of the user who owns the script. In
> addition, several security checks are performed on the script, which will not
> be executed if any checks fail. 
> 
> CGIWrap is used via a URL in an HTML document. As distributed, cgiwrap
> is configured to run user scripts which are located in the
> ~/public_html/cgi-bin/ directory. 

See http://www.umr.edu/~cgiwrap/

[Table of Contents] [Index]

1.17: How do I decode the data in my Form?

The normal format for data in HTTP requests is URLencoded.   All Form data
is encoded in a string, of the form
	param1=value1&param2=value2&...paramn=valuen
Many non-alphanumeric characters are "escaped" in the encoding:
the character whose hexadecimal number is "XY" will be represented by
the character string "%XY".

Decoding this string is a fundamental function of every CGI library.

Another format is "multipart/form-data", also known as "file upload".
You will get this from the HTML markup
<form method="POST" enctype="multipart/form-data">

(but note you must accept URLencoded input in any case, since not all
browsers support multipart forms).

Most(?) CGI libraries will handle this transparently.

[Table of Contents] [Index]