Skip to main content

SpamTitan

Bayes Database

Thomas Bayes was a statistician whose methods are widely used for classification. SpamTitan Gateway contains a Bayesian classifier that identifies spam by looking at what are called tokens. Tokens are short words or phrases that are commonly found in spam (or non-spam) messages.

Incoming mail is scored based on a spam pattern library, bad attachments, bad links and end-user feedback from their email client. The Bayesian engine then 'learns' to recognize new spam patterns, and also to forget old spam patterns that might block legitimate messages.

For example, if the Bayesian database learned one hundred messages with the phrase 'penis enlargement', the Bayesian code is confident that a new message containing the same phrase is spam, and raises the spam score of the message.

Go to Anti-Spam Engine > Settings > Bayesian Database to enable and configure Bayes database settings (default: enabled).

Setting

Description

Bayesian Analysis:

Click Enable to enable Bayesian analysis or Disable to disable Bayesian analysis (default: enabled).

Spam Messages:

Shows the number of messages that have been learned as spam by the Bayesian classifier.

Note

Bayesian scoring will not start until there are at least 200 spam and ham (non-spam) messages.

Ham Messages:

Shows the number of messages that have been learned as not spam by the Bayesian classifier, i.e. clean mail.

Note

Bayesian scoring will not start until there are at least 200 spam and ham (non-spam) messages.

Tokens:

The total number of individual tokens learned by the Bayesian classifier.

Oldest Token:

The timestamp of when the oldest token was learned.

Newest Token:

The timestamp of when the newest token was learned.

Last Expired:

Indicates when tokens were last expired from the database.

Auto Learning:

If enabled [ON] the anti-spam engine automatically feeds high-scoring mail (or low-scoring mail for non-spam) into the Bayesian classifier.

If disabled [OFF] messages are only learned when users confirm spam in their quarantine reports or release messages (false positives) from quarantine. (default: enabled).

Nonspam Threshold:

Indicates the score threshold below which a message must score for the anti-spam engine to feed it into the Bayesian classifier as a clean message (default: 0.1).

Spam Threshold:

Indicates the score threshold above which a message must score for the anti-spam engine to feed it into the Bayesian classifier as spam (default: 10.0).

Important

The anti-spam engine requires a message to score at least 3 points from the message headers and 3 points from the message body to auto-learn the message as spam. Therefore, the minimum working value for this setting is 6.

Force Bayes Expire:

As messages can be auto-learned, the Bayesian database could potentially grow until your disk is full. To control this, the database is expired periodically when certain criteria are met:

  • the last expiration was attempted at least twelve hours ago.

  • the number of tokens in the database is greater than 100,000.

  • the number of tokens in the database is greater than the specified bayes_expiry_max_db_size.

  • there is at least a twelve-hour difference between the oldest and newest token times.

Reset Bayes Database:

Resets the Bayesian database and clears the database of all tokens.