Information Filtering

Information filtering deals with the delivery of information that the user is likely to find interesting or useful. An information filtering system assists users by filtering the data source and deliver relevant information to the users. When the delivered information comes in the form of suggestions an information filtering system is called a recommender system. Because users have different interests the information filtering system must be personalized to accommodate the individual user’s interests. This requires the gathering of feedback from the user in order to make a user profile of the his preferences.

Two major approaches exist for information filtering: Content-based filtering and collaborative filtering. A content-based filtering system selects items based on the correlation between the content of the items and the user’s preferences, while a collaborative filtering system chooses items based on the correlation between people with similar preferences.

Information FilteringThe content-based approach has its roots in the research field of information retrieval which has been studied since the late fifties. In an information retrieval system a user enters a request for information and the system responds by identifying information sources that are relevant to the query. Many techniques that are incorporated in an information retrieval system can also be employed by a content-based filtering system such as the vector space model, latent semantic indexing and relevance feedback. Content-based filtering differs from information retrieval in the manner in which the interests of a user are represented. Instead of using a query an information filtering system tries to model the user’s long term interests.

Tapestry, an experimental mail system developed at the Xerox Palo Alto Research Center, was one of the first information filtering systems to include collaborative filtering. In Tapestry a user manually constructs a filter query based on the document content and on annotations from other users. For example, a user can request articles containing the word “internet” that the user Joe has evaluated as “excellent”.

Although information filtering is often divided into content-based and collaborative filtering the two approaches can also be used together. Hybrid systems that follow this approach are based on the idea that incorporating both content and social information could lead to a better filtering technique.

Alternative approaches for filtering information have been proposed as well. Demographic filtering systems for example use demographic information such as age, gender and education to identify the types of users that like a certain item. Economic filtering systems select items based on the costs and benefits of producing and viewing them. An example of economic filtering are systems that adaptively schedule banner advertisements on the internet. Ad systems exists that learn to display ads that will yield the highest possible click-through rate based on the past behavior of the user. By directing ads to a more targeted population it could help internet providers and advertising agents increase their ad revenues.

Leave a Reply