A mini-tutorial:

<URL:http://www.webthing.com/tutorials/login.html>

Login on the Web

Abstract

For largely historical reasons, the concept of login is not built in to the Web, and is poorly supported. Implementing a system supporting login is harder than it might at first appear. In CGI, it is a uniquely hard task, because (for security reasons) the authentication information is explicitly excluded.

This tutorial gives an overview of login methods for secure [1] Web-based applications, followed by a more detailed description of how HTTP Authentication works, and describe (with code extracts) a system using HTTP Authentication with CGI in a portable manner, to implement a complex system of dynamic protections for a Web-based FileServer. We conclude with a few exercises, designed to consolidate the reader's understanding of the subject.

Constructing a Login System

HTTP is a stateless protocol, and login implies maintenance of state information, which must therefore be added on top of it. There are two main mechanisms [2,3] for this:

HTTP Authentication
Cookies

In terms of HTTP, these are very similar: both work by passing an additional header that contains state information. However, they are less similar to work with, and each has its own advantages and drawbacks.

Which to Use?

It is my considered opinion that for any serious Web application, this choice should be dictated by the impression you wish to present to the users of the system:

HTTP Authentication generates the familiar login/password browser authentication sequence. This has a relatively formal feel, and may be reassuring to users who need confidence in the security of their data. The main drawback is that it is essentially inflexible, and cannot easily be adapted to more complex tasks, such as presenting restricted (e.g. read-only) access to anonymous users.

Cookies may be presented as you see fit, but are typically set using a login/password HTML Form and CGI script. If security is a concern, you will have to encrypt passwords yourself, as neither the browser nor the server will do it for you. You may also have to work harder to inspire confidence amongst users of the security of your system. At the time of writing, cookies are also regarded with suspicion by many users, due to privacy concerns [4].

The chief advantage of cookies is flexibility: you are handling the whole process yourself and have more control. The ability to set a persistent cookie and save the user having to re-authenticate each time she logs in may be a major advantage in some cases, but is inherently insecure if, for example, more than one user might be sharing a browser.

Programming to work with both methods

A program may be written to work with either option. A construct I have used in a number of systems is represented by the Perl code:


sub getuser {
  my $defaultuser = shift ;
  $ENV{'REMOTE_USER'} || &cookie_user || $defaultuser || &authenticate ;
}

This little example summarises the difference in the approaches:

First, if the user is authenticated by HTTP, we use this. No programming effort required.
If not, we parse the Cookie (if any) for username/password. This is more work, because we must write routines to set and decode the authentication details. However, a small set of functions to manage cookie-users is a one-off reusable exercise.
If neither option is set, we fall back to behaviour specified in the program: either allow the user in as a default user, or (if we called getuser with no arguments) insist that she authenticates.

We'll return to a variant on this construct later.

In summary:

In programming terms, working with cookies involves more initial effort, but is much easier to extend to complex applications. Cookies are usually my preferred choice for ordinary Intranet systems.

In human terms, I consider HTTP authentication usually preferable on the Web.

Server Mechanics

As with any programming task, there's more than one way to do it. You can customise a webserver - either directly or using an API if one is provided - or you can use CGI for a portable solution. This is of course purely an implementor's decision, and does not affect your users.

Login using HTTP Authentication with CGI

HTTP Authentication is built into HTTP Servers (and of course browsers). The underlying mechanism is:

The browser requests a document from the server.
The server issues an authentication challenge.
The browser prompts the user for credentials (typically via a username/password popup).
The browser sends a new request to the server, including the credentials (username and encrypted password) entered.
The server validates the credentials supplied, and (if acceptable) returns the document requested.

Scope and Duration of Authentication

If the credentials were accepted once, the browser will return the same credentials in future requests to the server for the duration of the session [5]. As far as the server is concerned, this may be interpreted as "until told otherwise".
The first three steps will thus normally be omitted after the browser has authenticated successfully once.
Credentials are regarded by the browser as valid for other URLs in the same hierarchy (typically a directory) as one that has validated successfully, and are supplied automatically when requesting another URL.
The Server can permit a particular set of credentials (i.e. user) to access some areas and deny access to others, by rejecting the credentials when another access is attempted. This restarts the authentication process at step (2) above.

Setting up HTTP Authentication

From the above, we see that HTTP Authentication not merely supports, but is a simple login scheme. Setting it up is a server configuration issue, and is completely transparent to CGI, just as it is to a static document - be it HTML or any other media type. You will need to consult your server manuals for details of how to configure protection, but the key point to remember is that your CGI program will only ever run when the User has already been authenticated by the Server.

If you are serving static documents, or indeed dynamic documents whose permissions can be determined in advance (e.g certain specified user(s) or group(s) are always permitted), you generally can and should use the Server's configuration in preference to CGI.

The User's identity (the username entered in the browser dialogue box) is available to CGI in the REMOTE_USER environment variable. No other information is available to CGI[6], due to security risks (although 'extended' CGI-like tools may sometimes give you this information anyway).

Hence what HTTP Authentication gives you automatically is:

Users are required to supply a valid username and password to access a designated area of your Server.
Different URLs may permit different users. Most servers also support Groups.
A User ID is supplied to CGI programs.

For many purposes, this alone is perfectly adequate. However, what it does not provide for includes:

Logout (cancelling of credentials)
Mixed (authenticated and unauthenticated) access to a common URL
Dynamic determination of document permissions

The first two of these are in fact impossible [7], and must be worked around using either server configuration or CGI (as before, server configuration should be preferred if it can do what you need). The third can be accomplished with CGI, but has no non-programming alternatives.

How-to . . .

Logout: HTTP has no provision to cancel a user's credentials, and there is no general[8] way to do so. The workaround is to overwrite the user's credentials with those of another valid user at your site. Create a valid but unprivileged user ID, and a Logout URL which is permitted only to this user. This URL is now a logout button. This of course still leaves you the human task of persuading your users to use it.
Mixed Access: It is not possible[7] to provide open and authenticated access to a single URL. However, it is entirely possible to offer mixed access to a document or program, by mapping a protected and unprotected URLs to the same document or program. In the case of a program, it may of course behave differently according to the value of REMOTE_USER (which is set only when access is authenticated).
Dynamic Permissions: I'm not even going to try and talk generalities about this. Instead I'll outline a working system.

An Example using Dynamic Authentication

The File Manager component of the Virtual Desktop at <URL:http://www.webthing.com/> has a fairly complex dynamic authentication requirement, requiring permissions to be computed from a database. The authentication function is required to determine:

The logged in user. This is dealt with by the HTTPD, as described below.
The owner of the file area being accessed. This is an argument to every call to the file manager, and so easy to determine.
The protection of the file area being accessed. This is an attribute of every Directory and File in the Virtual Desktop (Attachments inherit the attributes of their parent File), and may be Public, Private or Workgroup. They may be changed - by the owner or another authorised workgroup member - at any time.
The authorization level at which the logged in user accesses the owner's desktop. This is determined from the owner's Workgroup file, and may permit the user readonly or read/write access to Workgroup-protected areas, and may also by updated by the owner at any time.

Having cross-referenced these, it must either allow the attempted operation, or permit the user to re-authenticate (if access is denied, it may be available to the user under a different userid, so the user is immediately invited to re-login).

Implementation Details

The first decision was to use HTTP Authentication, for the reasons already described. To do this, I protected the /desk/ URL under which the file manager resides, using a .htaccess[9] file:


AuthType Basic
AuthName WebThing
AuthDBMUserFile		/my/path/to/passwdfile
AuthDBMGroupFile	/my/path/to/passwdfile
require valid-user

When the server receives a request for a URL under the directory protected by this configuration file, it will:

Require a valid user. That is to say the browser must supply an Authorization HTTP header, which the server will check against the entries in the AuthDBMUserFile (the DBM is used for efficiency [10]).
If a valid Authorization header is supplied, REMOTE_USER is set.
If a valid Authorization header is NOT supplied, the server issues an authentication challenge to the browser. We will see the anatomy of this in a minute, when I explain how to do the same thing with CGI.

Note that the server is permitting any valid user to access any desktop file: the more complex task of dynamic protections is handled by CGI. However, the Server has done the first crucial part of the work for us, by determining the identity of the user, and the rest is mere bookkeeping.

Authentication with CGI

The core of the CGI authentication is the authenticate method of the CGI++ Library (<URL:http://www.webthing.com/cgiplusplus/>). Here it is in full:


void CGI::authenticate(
	const char* authtype,
	const char* realm,
	void callback (const int) = 0
  ) const {
  cout	<< "Status: 401 Authentication Required\n"
	   "WWW-Authenticate: " << authtype
		<<"; realm=" << realm  << "\n" ;

  if ( callback )
    callback(401) ;
  else
    cout
	<< "Content-type: text/plain\n"
	"\nPlease enter your username and password to access this document."
 ;
  exit(0) ;
}

The first two arguments to this are the same as the AuthType and AuthName directives from the Apache configuration file we saw earlier, and the first two lines output by CGI::authenticate() are equivalent (though not identical - this is not NPH-CGI) to the authentication challenge issued by the Server when no credentials were supplied. Specifically:

Status is a CGI header, that will be interpreted by the server and converted to a response (which may be HTTP/1.0, 1.1 or other according to the server version in use). Status 401 is the code to require authentication from the browser.
The authtype specifies the format the browser should use to encode the credentials entered by the user for transmission. Basic Authentication is the only method currently supported by todays browsers, although some servers also support the more secure Digest Authentication.
The realm is simply a name.

The rest of CGI::authenticate deals with printing a customised error document. Since CGI++ is a library, it has no knowledge of the application, and what kind of error document would be appropriate, so it permits the caller to supply a callback function for this. If no callback is supplied, CGI++ itself prints a minimalist 1-line errordoc.

Note that CGI error documents can only ever be seen by a logged in user attempting an unauthorised operation, since the CGI won't run in the first place until the user has authenticated with the Server.

Bookkeeping

With these basic building blocks in place, the complex authentication task has been reduced to mere bookkeeping, of the kind familiar to every programmer. In pseudo-code outline:


  if ( remote_user == owner )
	PASS ;	// I'm accessing my own data
  else

 // look up the level of access required to perform the required
 // operation on the specified data
    switch (level = protection_of(required_op, specified_data), level) {
      case Public: PASS ;	// anyone can do this
      case Private: FAIL ;	// we've already dealt with the owner.
      default:
 // look up user's authorised level of access to owner's desktop
	if ( workgroups(owner).auth_level(remote_user) >= level )
	  PASS ;
	else
	  FAIL ;
    }

  if ( PASS )
    do_what_i_asked ;	// authorized - do what was wanted
  else
// present the user with an authentication dialogue - permit re-login
    cgi.authenticate("Basic", "WebThing", errorfunc) ;

Exercises

Exercises 1-8 should use only your Server's configuration file(s) - CGI is not required. For the remainder, use your choice of CGI programming language.

Create a password-protected document doc1.html on your server. Create a User ID user1 permitted to read the document.
Create a second password-protected document doc2.html that is protected and not readable by user1. Create a user user2 permitted to see doc2 but not doc1.
Now read the two documents in sequence - note what happens in your browser.
Create a third user user3 with permission to read both documents. Re-read the two documents in sequence as user3. Note that now user3 cannot "log out".
Create a document bye.html which none of your users can read, and a user nobody authorized to read bye.html but nothing else. Reading bye.html now "logs you out", in the sense that you can no longer read doc1 or doc2 without re-authenticating.
Create an unprotected link link.html to doc1.html. Note that you can now read the document anonymously as link.html, but the URL doc1.html remains protected.
If your server supports SSI, add a line to echo var=REMOTE_USER to doc1.html. Note that it is set when you read doc1, but not when you come via the link link1. If your server doesn't support SSI, use a simple CGI script instead.
Hit your browser's "back" button several times. Anything you see now is in cache, and may still be there when you exit the browser. But cache control is a whole tutorial in itself.

For the following, set up a protected directory permitting any authenticated user. You will be using CGI to control access.

Convert getuser and CGI::authenticate() to your chosen programming language (you may omit the callback function). Write a function cookie_user, and a corresponding set_cookie_user function. You may, if you wish, omit password protection from the cookie scheme (this gives you non-secure user identification which is nevertheless interchangable with REMOTE_USER in your programs).
Write a "login_as" program, to read a userid from an HTML form, and then authenticate the user as that userid. Make sure it works with both REMOTE_USER and Cookies.
Consider an application[11] that permits you to make entries in - or linked into - several protected workspaces. Design an authentication program to run the bookkeeping for this. Consider whether your function is scaleable: what are the average and maximum number of permissions lookups your program makes for N workspaces, as N varies?

Guru exercise: find TWO reasons why HTTP authentication is more secure than an equivalently-encrypted cookie, and TWO reasons why the reverse is the case. Assume competent implementation in both cases (if you can improve on two, I'd like to know).