Analyzing Words in Spam Emails
Updated on November 02, 2021
By Pete Freitag
By Pete Freitag
![misc](/themes/pete/images/misc.gif)
We recently did some analysis on our bayesian spam filter corpus (spam assassin token database), and came up with a list of words with a high spam/ham ratio.
By using the spam/ham ratio, and not the spam count, we came up with a better list of words to avoid. Most lists would have you avoid words like click
and here
, but they are used so much in legitimate email, that they have a lot spam/ham ratio.
Analyzing Words in Spam Emails was first published on August 03, 2005.
If you like reading about spam, email, bayesian, deliverability, or semantics then you might also like:
- Battling Comment Spam
- Trick or Treat - Web 2.0 Goodies for ColdFusion
- Spammers now using ASCII Art
- ReturnPath aquires BondedSender