pf » Analyzing Words in Spam Emails
August 03, 2005
Analyzing Words in Spam Emails
We recently did some analysis on our bayesian spam filter corpus (spam assassin token database), and came up with a list of words with a high spam/ham ratio.
By using the spam/ham ratio, and not the spam count, we came up with a better list of words to avoid. Most lists would have you avoid words like click and here, but they are used so much in legitimate email, that they have a lot spam/ham ratio.
Permalink | Add Comment |
add to del.icio.us
| Tags: spam, email, bayesian, deliverability, semantics
add to del.icio.us
| Tags: spam, email, bayesian, deliverability, semantics
Related Entries
- Battling Comment Spam - January 31, 2007
- Trick or Treat - Web 2.0 Goodies for ColdFusion - October 31, 2006
- Spammers now using ASCII Art - April 21, 2005
- ReturnPath aquires BondedSender - April 12, 2005
- Another Trick for Avoiding Email Harvesters - March 20, 2005
Trackback Address: 432/90676D79DFB3950EC6197F6323C5A0EC
Spell Checker by Foundeo
- CFSCRIPT Cheatsheet
- 3 New Image Effects for ColdFusion 8
- Googlebot to Submit Web Forms
- ColdFusion 8 Update 1 Fixes some Image Processing Quirks
- 10 Most Useful Image Functions in ColdFusion 8
- Speaking at NYC CFUG This Week
- Adobe AIR Tutorial for HTML / JavaScript Developers
- INFORMATION_SCHEMA Support in MySQL, PostgreSQL
Subscribe to my RSS Feed:
RSS
RSS
Pete Freitag is a software engineer, and web developer located in










