Analyzing Words in Spam Emails
August 03, 2005
We recently did some analysis on our bayesian spam filter corpus (spam assassin token database), and came up with a list of words with a high spam/ham ratio.
By using the spam/ham ratio, and not the spam count, we came up with a better list of words to avoid. Most lists would have you avoid words like click and here, but they are used so much in legitimate email, that they have a lot spam/ham ratio.
Tweet
Permalink | Add Comment |
add to del.icio.us
| Tags: spam, email, bayesian, deliverability, semantics
add to del.icio.us
| Tags: spam, email, bayesian, deliverability, semantics
Related Entries
- Battling Comment Spam - January 31, 2007
- Trick or Treat - Web 2.0 Goodies for ColdFusion - October 31, 2006
- Spammers now using ASCII Art - April 21, 2005
- ReturnPath aquires BondedSender - April 12, 2005
- Another Trick for Avoiding Email Harvesters - March 20, 2005
Trackbacks
Trackback Address: 432/90676D79DFB3950EC6197F6323C5A0EC
Post a Comment
Spell Checker by Foundeo
Recent Entries
- Nginx redirect www to non www domain
- HashDOS and ColdFusion
- HackMyCF Updated for APSB11-29 Security Hotfix
- Adobe eSeminar on FuseGuard
- Determining Which Cumulative Hotfixes are Installed on ColdFusion
- Adding Two Factor Authentication to ColdFusion Administrator
- ColdFusion Developer Week at Adobe.com
- Bug Loading Scripts for CFFileUpload and CFMediaPlayer





