Both content-based filtering and collaborative filtering have there strengths and weaknesses. Three specific problems can be distinguished for content-based filtering:
- Content description. In some domains generating a useful description of the content can be very difficult. In domains where the items consist of music or video for example a representation of the content is not always possible with today’s technology.
- Over-specialization. A content-based filtering system will not select items if the previous user behavior does not provide evidence for this. Additional techniques have to be added to give the system the capability to make suggestion outside the scope of what the user has already shown interest in.
- Subjective domain problem. Content-based filtering techniques have difficulty in distinguishing between subjective information such as points of views and humor.
A collaborative filtering system doesn’t have these shortcomings. Because there is no need for a description of the items being recommended, the system can deal with any kind of information. Furthermore, the system is able to recommend items to the user which may have a very different content from what the user has indicated to be interested in before. Finally, because recommendations are based on the opinions of others it is well suited for subjective domains like art. However, collaborative filtering does introduce certain problems of its own:
- Early rater problem. Collaborative filtering systems cannot provide recommendations for new items since there are no user ratings on which to base a prediction. Even if users start rating the item it will take some time before the item has received enough ratings in order to make accurate recommendations. Similarly, recommendations will also be inaccurate for new users who have rated few items.
- Sparsity problem. In many information domains the existing number of items exceeds the amount a person is able (and willing) to explore by far. This makes it hard to find items that are rated by enough people on which to base predictions.
- Gray sheep. Groups of users are needed with overlapping characteristics. Even if such groups exist, individuals who do not consistently agree or disagree with any group of people will receive inaccurate recommendations.
A system that combines content-based filtering and collaborative filtering could take advantage from both the representation of the content as well as the similarities among users. Although there are several ways in which to combine the two techniques a distinction can be made between two basis approaches. A hybrid approach combines the two types of information while it is also possible to use the recommendations of the two filtering techniques independently.
Collaboration via Content
Collaborative filtering looks for the correlation between user ratings to make predictions. Such correlation is most meaningful when users have many rated items in common. As stated earlier, in large domains with many items this is not always the case. Furthermore, the lack of access to the content of the items prevent similar users from being matched unless they have rated the exact same item. For example, if one user liked the movie “Rocky” and another liked the movie “Rocky II” they would not necessarily be matched together. A hybrid approach called collaboration via content deals with these issues by incorporating both the information used by content-based filtering and by collaborative filtering.
In collaboration via content both the rated items and the content of the items are used to construct a user profile. The selection of terms which describe the content of the items is done using content-based techniques. The weight of terms indicate how important they are to the user. In the table below an example is shown of what kind of information is available to make a prediction about the movie “Fargo” for Ken with collaboration via content. Five terms are shown which describe the sort of movie a user is interested in.
Five terms and movie ratings
comedy |
historical |
high-school |
violence |
black-humor |
Fargo |
|
Amy |
1 |
0 |
1.2 |
0.2 |
0.2 |
– |
Jef |
2.1 |
0 |
0.5 |
3 |
2.2 |
+ |
Mike |
1.3 |
1.5 |
0.2 |
3.2 |
1.9 |
+ |
Chris |
1.1 |
2 |
2.8 |
0.8 |
0 |
– |
Ken |
0.8 |
1.1 |
0 |
2 |
1.2 |
? |
Just as with collaborative filtering, the Pearson correlation coefficient can be used to compute the correlation between users. Instead of determining the correlation with user ratings however, term weights are used. Because this method has a greater number of items from which to determine similarity than collaborative filtering the problem of users not having enough rated items in common is not an issue anymore. Furthermore, unlike content-based filtering, predictions are based on the impressions of other users which could lead to recommendations outside the normal environment of a user. However to make recommendations about items it is still necessary that there are enough users who have rated the item. Just as with collaborative filtering new items can not be recommended as long as there aren’t any user who have rated the new item.
Combining Recommendations
Another approach to combining collaborative and content-based filtering is to make predictions based on a weighted average of the content-based recommendation and the collaborative recommendation. The rank of each item being recommended could be a measure for the weight. In this way the highest recommendation receives the highest weights.