Collaborative Filtering

Collaborative filtering, also referred to as social filtering, filters information by using the recommendations of other people. It is based on the idea that people who agreed in their evaluation of certain items in the past are likely to agree again in the future. A person who wants to see a movie for example, might ask for recommendations from friends. The recommendations of some friends who have similar interests are trusted more than recommendations from others. This information is used in the decision on which movie to see.

Neighborhood-based approach

Most collaborative filtering systems apply the so called neighborhood-based technique. In the neighborhood-based approach a number of users is selected based on their similarity to the active user. A prediction for the active user is made by calculating a weighted average of the ratings of the selected users.

To illustrate how a collaborative filtering system makes recommendations consider the example in movie ratings table below. This shows the ratings of five movies by five people. A “+” indicates that the person liked the movie and a “-“ indicates that the person did not like the movie. To predict if Ken would like the movie “Fargo”, Ken’s ratings are compared to the ratings of the others. In this case the ratings of Ken and Mike are identical and because Mike liked Fargo, one might predict that Ken likes the movie as well.

Movie ratings

 

Amy

Jef

Mike

Chris

Ken

The Piano

+

+

Pulp Fiction

+

+

+

Clueless

+

+

Cliffhanger

+

+

Fargo

+

+

?

Instead of just relying on the most similar person, a prediction is normally based on the weighted average of the recommendations of several people. The weight given to a person’s ratings is determined by the correlation between that person and the person for whom to make a prediction. As a measure of correlation the Pearson correlation coefficient can be used. In this example a positive rating has the value 1 while a negative rating has the value –1, but in other cases a rating could also be a continuous number. The ratings of person X and Y of the item k are written as  and , while  and  are the mean values of their ratings. The correlation between X and Y is then given by:

In this formula k is an element of all the items that both X and Y have rated. A prediction for the rating of person X of the item i based on the ratings of people who have rated item i is computed as follows:

Where Y consists of all the n people who have rated item i. Note that a negative correlation can also be used as a weight. For example, because Amy and Jef have a negative correlation and Amy did not like “Farg” could be used as an indication that Jef will enjoy “Fargo”.

The Pearson correlation coefficient is used by several collaborative filtering systems including GroupLens [Resnick et al. 1994] and Ringo [Shardanand 1994, Shardanand & Maes 1995]. GroupLens, a system that filters articles on Usenet, was the first to incorporate a neighborhood-based algorithm. In GroupLens a prediction is made by computing the weighted average of deviations from the neighbor’s mean:

Note that if no one has rated item i the prediction is equal to the average of all the ratings person X has made. Ringo recommends music albums and artists a person might be interested in. Shardanand considers several collaborative algorithms and reports that a constrained Pearson r algorithm performs best for Ringo’s information domain. The constrained Pearson measure is similar to the normal Pearson measure but uses the mean value of possible rating values (in this case the average is 4) instead of the mean values of the ratings of person X and Y.

Selecting Neighborhoods

Many collaborative filtering systems have to be able to handle a large number of users. Making a prediction based on the ratings of thousands of people has serious implications for run-time performance. Therefore, when the number of users reaches a certain amount a selection of the best neighbors has to be made. Two techniques, correlation-thresholding and best-n-neighbor, can be used to determine which neighbors to select. The first technique selects only those neighbors who’s correlation is greater than a given threshold. The second technique selects the best n neighbors with the highest correlation.

The Sparsity Problem

Although having to deal with too many ratings can be an issue, for most collaborative filtering systems, having to deal with too few ratings is a far more serious problem. This problem occurs when the amount of items become very large reducing the number of items users have rated to a tiny percentage. In such a situation it is likely that two people have few rated items in common making the correlation coefficient less reliable. This is called the sparsity problem. Several solutions have been proposed to overcome this problem:

  • Implicit ratings. Some systems, such as a later extension of GroupLens [Miller et al. 1997], try to increase the number of ratings by inferring them from the user’s behavior. However, a user still has to explore an item before the system can infer a rating.
  • Dimensionality reduction. By reducing the dimensionality of the information space, the ratings of two users can be used for predictions even if they did not rate the same items. Pazzani and Billsus [Pazzani & Billsus 1998] use an non correlation-based approach for making predictions based on a neural network. User’s ratings are represented by a matrix consisting of Boolean features. The matrix is projected into a lower dimensional space by using latent semantic indexing. Although Pazzani and Billsus report an improvement in prediction accuracy the computational complexity of the algorithm is a serious issue. The approach they use for collaborative filtering is described in more detail in section 2.4.5.
  • Content description. Using the content of an item instead of the item itself could increase the amount of information people have in common. This is a hybrid approach to combining content-based filtering and collaborative filtering and is described in the next section in more detail.

Item-To-Item approach

Another approach Shardanand and Maes consider for Ringo uses the correlation of artists and albums to generate predictions. This approach is simply an inversion of the neighborhood-based approach. Instead of measuring the similarities between people the ratings are used to measure the correlation between items. The Pearson correlation coefficient can again be used as a measure. For example, the ratings of the movies “Fargo” and “Pulp Fiction” have a perfect correlation. Based on this correlation one might predict that Ken likes “Fargo” given the fact that he liked “Pulp Fiction”.

Classification Approach

Collaborative filtering can also be formulated as a classification problem. To illustrate this,  consider again the example in Movie rating table. In order to predict if Ken likes the movie ‘Fargo’ a learning method has to determine the class of this movie. The learning method can be trained by using the four movies that Ken has rated as training instances. The movies ‘The Piano’, ‘Pulp Fiction’ and ‘Cliffhanger’ are labeled positive while the movie ‘Clueless’ is labeled as a negative training instance. A movie can be directly represented as a vector, where each component of the vector corresponds to a rating of a different user. This would mean however that many values are missing because normally an item will not have been rated by all the users. For learning methods that cannot handle missing values the items have to be represented in a different way.

Pazzani and Billsus [Pazzani & Billsus 1998] use n Boolean features for every user, where n is the number of all the possible rating values. A feature i is assigned a value 1 (true) if the user has given the item a rating i or 0 (false) otherwise. In this example this would lead to the Boolean feature vectors that are shown in the following table.

Movie ratings as Boolean feature vectors

 

The Piano

Pulp Fiction

Clueless

Cliffhanger

Fargo

Amy +

0

0

1

0

0

Amy –

1

1

0

1

1

Jef +

0

1

0

0

1

Jef –

1

0

0

1

0

Mike +

1

1

0

1

1

Mike –

0

0

1

0

0

Chris +

0

0

1

0

0

Chris –

0

1

0

1

1

Class

+

+

+

?

Once the user ratings are converted into this vector format almost any supervised learning method can be used to classify unseen items. Pazzani and Billsus first reduce the dimensionality of the vectors by applying singular value decomposition and then use a neural network as a learning method.

7 Responses to "Collaborative Filtering"

  1. Nasro says:

    Hello , Could You pleaze Give us an example for calculating a recommadation
    thank you

  2. Ronnie Graham says:

    Similarity[Number of users][Number of users] = 0;//assume zero similarity
    for User1 in set of all users
    for User2 in set of all users
    Similarity[User1][User2] = User1*User2;//where ‘*’ is the dot product ~ cosine similarity

    now we have the similarities between all users
    now calulate recommendations for all users

    UserRecomendations[Number of users][Number of Items] = 0;//zero recommendations for all
    for User1 in set of all users
    for User2 in set of all users
    for item in set of all items
    if (Similarity[User1][User2] * User2[item] > UserRecomendations[User1][item]) {
    UserRecomendations[User1][item] = Similarity[User1][User2] * User2[item];
    }

    now UserRecomendations contains the best recomendations for each user, just remove the entries that have already been rated by the user and recomend the top 5 or so and it should work if you have a large enough data set.

    and yes this has asymptotic run time of O(N^2*I) which isn’t good, it can probably be improved, but hey this is just a forum comment.

  3. Priya says:

    CF is memory based or model based??

  4. Chirag says:

    How to I calculate the Prediction Rating Matrix ?

Leave a Reply