Wednesday 8 October 2008


Search has become -thanks to Google- the default way of finding information. At the same time search is difficult to implement and technology is immature. Web content search interfaces differentiate between searching structured and unstructured content. In this post more information on searching structured content.

Structured content is content that is classified based on metadata, such as tags, taxonomies and properties of the content. The most intuitive way to search structured content is to filter the result set based on a selection of meta information. This is called guided or facetted search.

The classic example of structured content is information about a product you can buy. For example each digital camera has more or less the same properties. Websites build to find and compare digital cameras allow you to view them side by side. Structured content also opens the way to navigation based on the properties, often called guided navigation, facetted search or browsing. Find me all cameras that 7 megapixels could be such a query.

Some data is structured by default, other can be classified automatically based on textual analysis. This is mostly relevant for text documents. Advanced techniques exist to find out the language of a document and extract some of the topics it covers. This allows you to relate content, often used in news sites as a set of related content links. Yet another category is structure that is created by user behavior. Amazon has many examples of such links, users that bought X also bought Y.

Guided search is mostly relevant for:



These days it is crucial to combine guided search with textual search. Users that know a particular model number type this in a search box, users that only have a vague idea what they want (price, vendor) will use guided navigation.

Many websites employ an in-house build system for guided search. I have build and worked on several homegrown systems myself. There are also some vendors offering solutions for guided search of structured content. The most well known are:



Some great resources on search are: Searchtools, CMS Watch and Enterprisesearchblog.