Machine Learning

Information filtering can be seen as a classification task. Based on training data a user model is induced that enables the filtering system to classify unseen items into a positive class c (relevant to the user) or a negative class  (irrelevant to the user). The training set consists of the items that were acquired from feedback. These items form training instances that all have an attribute. This attribute specifies the class of the item based on either the rating of the user or on implicit evidence.

Formally, an item is described as a vector  of n components. The components can have binary, nominal or numerical attributes and are derived from either the content of the items or from information about the users’ preferences. The task of the learning method is to select a function  based on a training set of m input vectors that can classify any item in the collection.

The function will either be able to classify an unseen item as positive or negative at once by returning a binary value or return a numerical value. In that case a threshold can be used to determine if the item is relevant or irrelevant to the user.

Leave a Reply