404 Processing part 2

In 404 processing part 1 I looked at the user side of handling Page not found or 404 Errors. In this post I look at the technical implementation of 404 errors and how you can even make use of 404 processing to your advantage.

In the early days of the web, web servers were pretty simple, you put a set of documents on the file system of a server. The URL mapped to the files in the system. If a file was not found, you returned a 404 error. The web has evolved and so has the handling of web server requests on the server side.

These days you have a layered structure for handling requests. Front end caches serve the most static content, web servers serve static and semi static content and application servers serve all dynamic content. If a device or server cannot find an appropriate resource for a URL it simply passes the request on to the next layer. Simple caches tend to be dumb but fast. Application servers are very intelligent but slow. High volume web sites have a lot of requests for unknown resources. Having all 404 errors be passed on to the application server is for performance reasons not ideal, you would like to respond to the majority of this traffic on the front end servers. Some strategies for handling also unknown resources on the front end machines are:

  • Configuring the web front ends to only forward requests for dynamic pages to the application server. This is typically done by a mapping based on URL patterns (for example *.html)
  • You can make use of a limited amount of subfolders for dynamic URL’s. All other folders should be handled by the servers. For example with IBM Websphere you get this one for free.
  • Configuring most common 404’s to be handled by the 404 processing in the web servers

Many web sites have a requirement for short URL’s (like http://www.mirabeau.nl/arnoud) to support printed or TV advertising. This is trivial to build into a application environment. For example in Java would use a J2EE Filter to implement this feature. But this requires you to pass all unknown traffic to an application server which can hurt performance. An alternative is to change the 404 processing in the web server. When the 404 processing starts you run some custom code to check whether there is a short URL defined. If there is you either pass the request on to the application server or give the client a 3xx return with the proper URL. It is not easy but it can be done.

Advertisements

%d bloggers like this: