The Bayesian classifier is a probabilistic method for the classification task. A document D is represented as a Boolean feature vector which contains m Boolean features that indicate whether or not a certain term appears in the document. A document D is classified as relevant if the probability that D belongs to class c given that is contains or does not contain specific terms is larger then the probability that D does not belong to class c given the features of D:
where, according to Bayes’ rule
The denominator in the equation above can be left out. Under the assumption that the words in the document are conditionally independent, the probability is proportional to:
The probabilities ,
,
and
may be estimated from training samples. This approach is also called a naive Bayesian classifier because of the independence assumption.
An example of a system that has implemented a Bayesian classifier is Syskill & Webert [Pazzani et al. 1996]. Syskill & Webert is a software agent that learns to classify pages on the World Wide Web. The user has to rate explored pages as either hot or cold and these pages are treated by a naive Bayesian classifier as positive and negative examples. Experiments with the Syskill & Webert agent show that the Bayesian classifier performs well at the classification task even though the independence assumption is false. Terms do not appear independent of each other in documents.