General Computing
Web Programming
This page deals with HTTP (the Web server protocol), CGI (programs that can be run from
a browser) and Perl (the string manipulation language of choice for most Web programmers).
See Web Authoring if you want to know what
to actually put in a Web document.
CGI and HTTP
- CGI Resource Index. Archive and catalog of
CGI scripts, documentation and resources.
- Matt's Script Archive. Many useful
Perl and CGI scripts. Includes a simple search perl script which is used here by members
of the mathematics department. Matt Wright.
- CGI City. A large collection of CGI/Perl
scripts and links to documentation.
- Free CGI Collection. A collection of
CGI scripts including Intermediate Search, which is a more developed version of Matt
Wright's Simple Search, and Xavatoria
Search, which is an indexing search engine for larger sites. Both behave much like
AltaVista. Fluid Dynamics.
- Extropia.com. A well documented and rich
collection of CGI scripts available for download. Has grown out of earlier work by Selena
Sol and Gunther Birznieks.
- Gossamer Threads. Several shareware Perl
scripts including Links which is designed to maintain a Yahoo style directory of links.
- HMonArc. An email to html
converter written in Perl suitable for archiving a mailing list. Main competitor is
Hypermail, a C program written by Keith Hughes, but that seems to be no longer actively
supported. Earl Hood, University of California, Irvine.
- Majordomo. One of the Internet's most popular
free mailing list management packages. Written in Perl. Great Circle Associates.
- WWWiz Magazine. Online version of the magazine devoted
to the Web.
- Web Developer's Virtual Library. Over 500 pages of
web development resources.
- WebSoft. A project of the Department
of Information and Computer Science at the University of California, Irvine, investigating
the use of the Web as the infrastructure for a software engineering environment. The
project has produced the following tools:
- MOMspider A web-roaming
robot for maintenance of wide area webs.
- libwww-perl A library of
perl packages which provides a simple and consistent programming interface to the Web.
- wwwstat A package for
processing a sequence of httpd access_log files and printing a log summary in HTML format
suitable for publishing or further analysis.
- Internet Count. A free service which can provide
count statistics for your site.
- Xenu's Link Sleuth. Software to
find broken links on a web site. Tilman Hausherr.
Site Search Engines
The following search engines consist of just one or two Perl scripts, and are suitable
for small to medium sites.
- Simple Search. A very
small, elegant search engine. Easy to install, modify and customize. Does not use an index
file, so the search is always up-to-date. Suitable for sites containing a few hundred
files (on this site, Simple Search takes about 14 seconds to search 5MB of text in about
1000 documents). Matt Wright.
- ICE. An indexing search engine. Easy to
install, very fast, and produces a compact index file (on this site, the index file is 12%
of the size of the original files, and searches are nearly instantaneous). Does not search
the titles of html documents, although this is easily fixed if you know Perl. Excellent
installation documentation is provided by webreference.internet.com,
although this has not been updated for Version 2. Christian Neuss.
- Xavatoria Search. An
indexing search engine. Easy to install. Its unusually powerful search syntax is modelled
on AltaVista's simple search, and includes phrases, wildcards, requires/excludes and so
on. As with AltaVista, output of results is paged and includes a 2-line summary of each
document. On the downside, is slower than ICE and produces a relatively large index file
(on this site, the index file is 49% of the size of the original files and searches take
less than 2 seconds). The Statistical Science Web uses a much modified version of
Xavatoria which reduces to the index file to about 33% of the original files. Fluid
Dynamics.
The search engines below are written in C with Web interfaces in Perl. Installation is
more involved than for the engines above, but indexing and searching is generally faster.
These engines are typically server orientated. That is, they are designed for installation
by Web server administrators, with individual users able to configure their own index
files.
- Excite for Web Servers. Perhaps
the slickest of the freely available search engines. Features a fuzzy search style, and
the ability to find "more like this". For literal key word searches, Glimpse or
SWISH for example may be more suitable. Index file is about 15% of the size of the
original files. Comes ready compiled, and with very complete installation assistance.
However you do need to have root privileges on your Web server to install it. The Byte Magazine site is a good example of Excite in use.
- Glimpse. A powerful indexing and query
system. Returns not only document names but the lines of each document in which keywords
were found. With WebGlimpse, provides search capabilities for a Web site. WebGlimpse
automatically adds a search box to every html file in your site, and allows searches of
the "neighbourhood" of each file. Index file can be very small or moderately
large depending on the options you choose. The HIV
InfoWeb is a nice example of Glimpse (not WebGlimpse) in action. Udi Manber, Sun Wu
and Burra Gopal, University of Arizona.
- ht://Dig. A popular search engine developed at San
Diego State University and used by a number of North American universities. Includes its
own spider and can index files through the http server, allowing you to index pages
produced dynamically by CGI programs. Like Glimpse, can display the context of successful
keywords. Andrew Scherpbier.
- SWISH-E. Simple Web Indexing System
for Humans - Enhanced. One of the most popular web site search engines. Originally by
Kevin Hughes, EIT, now enhanced by a team at the Berkeley Digital Library SunSITE,
University of California. Of the search engines in this section, this is the easiest to
install if you want to index only your own files rather than an entire server. Well
documented, even with it's own mailing list. Can limit searches to specified html tags,
such as meta tags, titles or headings. Claims the index file is 2-5% of the size of the
original files, but on this site the index is 25% of the size of the original files.
- SWISH++. A new version of SWISH
written in C++. Testing by the author suggests that
it is substantially faster than any of the other search engines listed here.
Documentation is terse but adequate. Installation requires an up-to-date C++ compiler.
Paul Lucas.
- Swish-Web. An alternative
web interface for the SWISH search engine. Rod Clark, Small Hours.
- Webinator. A new indexing search
engine, free for the first 10,000 documents. Already has a good number of high profile
users. You need root privileges to install it. Thunderstone.
For high end commercial search engines capable of handling very large sites, see the
reviews by the US Department of Education
and Network Computing Magazine.
- Ultraseek Server. Seems
to be the current winner amongst the commercial search engines. Easier to install and with
more features than any of the search engines above. On this site Ultraseek writes an index
file of about 11% of the size of the original files, and uses another 9MB of disk space
for program and documentation files. Ultraseek can be seen in action on the NewsWorks, Sun
Microsystems and CNN Interactive sites.
For information on other search engines, see the survey by Search Engine Watch.
Perl
- Perl Language Home Page. The
definitive site, but orientated towards experts rather the beginners. Tom Christiansen.
- Perl Tutorial. A
nicely written first tutorial, written around a single example, mailing the output from a
form. National Center for Supercomputing Applications, University of Illinois at
Urbana-Champaign.
- Perl Notes. CGI Programming techniques in Perl, by Selena Sol
and Gunther Birznieks. Designed as a reference, but good as a second tutorial.
- Perl 5 Reference Guide. The Perl 5 Desktop Reference, by
Johan Vromans, HTMLified by Rex Swain.
- cgi-bin.pl. "The de facto standard
library for creating CGI scripts in the Perl language." I must admit that I haven't
actually used this (or any other) Perl library, and the Perl Language Home Page prefers
newer libraries. Steven Brenner, Cambridge University.
- Robin's Perl for Win32 Page.
Includes socket programming examples. Robin Chatterjee.
- Simple Perl Databases. An online
tutorial with example perl scripts for maintaining and searching a simple database. Brent
Michalski.
Web Site Hosting
- Aunic. Registration of .au domain names.
- InterNIC. Registration of North American domain
names.
- AumCom. Offers a complete web site hosting service.
Uses the world's highest bandwidth facility, the "Giga-Center" in Silicon
Valley, California.