Bayesian filters or Bayesian method
What are Bayesian filters?
Definition: Bayesian filters, also known as Bayesian classifiers, are techniques used in e-mail security to filter and classify e-mails according to their probability of being spam or legitimate mail. These filters are based on Bayes’ theorem, a statistical concept developed by mathematician Thomas Bayes.
A probabilistic method for filtering e-mail, based on the statistical distribution of keywords in e-mails. This type of algorithm uses a base as heterogeneous as possible of spam and hams (legitimate messages) in order to be able to recognize the type of message received.
How do Bayesian filters work?
Bayesian filters work by analyzing email content and assigning probabilities to certain characteristics or keywords that may be associated with spam or legitimate mail. Their operation can be summarized in a few steps:
- Filter training: Initially, the filter is trained using a set of known email samples, both spam and legitimate mail. These samples serve as the basis for establishing initial probabilities.
- Assigning weights: The filter assigns weights to various email characteristics, such as keywords, headers, sender addresses and so on. These weights are based on the probabilities calculated in the training step.
- Probability calculation: When a new email arrives, the filter calculates the probability of it being spam or legitimate mail by combining the probabilities associated with its characteristics. A calculation formula based on Bayes’ theorem is used to obtain an overall probability.
- Classification: Depending on the calculated probability, the email is classified as spam or legitimate mail. A predefined threshold can be used to determine the cut-off point between the two categories.
- Continuous improvement: Bayesian filters improve over time by adjusting the weights assigned to features based on feedback and manual corrections made by users.
Bayesian filters are effective for filtering spam, as they adapt to new spam trends by updating their probabilities and weights. However, they can also produce false-positives or false-negatives, so other email security techniques are needed to complement their operation.
Examples
Two databases are created, one for spam and one for hams (legitimate messages). Through a learning phase, a dictionary of keywords is created in which each term is associated with a probability. For example: viagra 100%, security 20%, messaging 10% and free 60%.then, when analyzing an e-mail, if the words in the lexicon exist, the sum of the probabilities of each keyword found is assigned to the e-mail. Following our example, if an e-mail contains the words “free mail server security”, the e-mail will obtain a score of: (20% + 10% + 60%) / 3, i.e.: 30%. This is a legitimate message, since the score is less than 50%. With a large number of spam and hams, this technique provides very interesting analysis results.
Applications
Most e-mail clients with built-in anti-spam features (Thunderbird, Outlook, etc.) use Bayesian filters almost exclusively. In ALTOSPAM, Bayesian filters are among the 15 technologies used. Depending on the score obtained (between 0 and 100%), the e-mail is more likely to be classified as spam or ham.
Would you like to strengthen your e-mail security?
Security starts in your mailbox. Phishing, Spear Phishing, malware, ransomware, spam, viruses – we offer a free 15-day analysis of your inbox.