The Bayesian classifier is a probabilistic method for the classification task. A document *D* is represented as a Boolean feature vector _{} which contains *m* Boolean features that indicate whether or not a certain term appears in the document. A document *D* is classified as relevant if the probability that *D* belongs to class *c* given that is contains or does not contain specific terms is larger then the probability that *D* does not belong to class *c* given the features of *D*:

_{}

where, according to Bayes’ rule

_{}

The denominator in the equation above can be left out. Under the assumption that the words in the document are conditionally independent, the probability is proportional to:

_{}

The probabilities _{}, _{}, _{} and _{} may be estimated from training samples. This approach is also called a naive Bayesian classifier because of the independence assumption.

An example of a system that has implemented a Bayesian classifier is Syskill & Webert [Pazzani et al. 1996]. Syskill & Webert is a software agent that learns to classify pages on the World Wide Web. The user has to rate explored pages as either hot or cold and these pages are treated by a naive Bayesian classifier as positive and negative examples. Experiments with the Syskill & Webert agent show that the Bayesian classifier performs well at the classification task even though the independence assumption is false. Terms do not appear independent of each other in documents.