Pete Freitag Pete Freitag

Analyzing Words in Spam Emails

Updated on November 02, 2021
By Pete Freitag
misc

We recently did some analysis on our bayesian spam filter corpus (spam assassin token database), and came up with a list of words with a high spam/ham ratio.

By using the spam/ham ratio, and not the spam count, we came up with a better list of words to avoid. Most lists would have you avoid words like click and here, but they are used so much in legitimate email, that they have a lot spam/ham ratio.



spam email bayesian deliverability semantics

Analyzing Words in Spam Emails was first published on August 03, 2005.

If you like reading about spam, email, bayesian, deliverability, or semantics then you might also like:

Discuss / Follow me on Twitter ↯