Better spam defense for Django comments

I've finally found the time to package up an improved version of my Django comment validation.

The original's simple link count worked surprisingly well for a while, but microbes are always evolving, so this had to as well. (For those of you wondering when I'm going to stop being stubborn and use TypePad's AntiSpam service ... not yet. :^)

The first improvement is a list of banned IPs — if the commenter's posting from one the comment is killed. Even if the IP hasn't been banned, it's checked for previous comments that weren't clean enough to be marked public; each one found counts against the current comment.

The second is more complex. You can now create blacklists of phrases that count against comments. The comment text, after removing stop words, is compared to each blacklist to derive their Tanimoto coefficient, and that is multiplied by the weight assigned to the blacklist. The weighted score lets you be more aggressive about certain phrases.

Finally, this version includes these batch tools for comment administration, and adds the ability to ban the IPs of multiple comments at once.

You still just need to add this version to your INSTALLED_APPS setting, but you'll also need to run manage.py syncdb to install the tables and the initial blacklist data (don't look too closely if you're easily offended).

This has been working pretty well here, and it's pretty tweakable. Some things to consider:

Update: 18 May 2009

After several requests and way too long, I've set this up as a Bitbucket project. There you'll find better instructions, the full source in both downloadable form and a Mercurial repository, and an issue tracker. The code itself has had numerous improvements since this was posted, too, including support for Akismet or TypePad AntiSpam.

comments (3)

ger

2 February 2009

10:05

spain Spain

Please can you host the code in someplace like github, googlecode...?
You known, wiki, bugtracking, more visibility...

Miles

2 February 2009

14:57

united states United States

Our strategy is to use Akismet, and if Akismet flags the comment as spam, to show a basic (rather ugly) Captcha. And we haven't had a spam get through in months (and, as far as I know, no failed false-positives).

Joe

30 April 2009

2:55

united states United States

Could you post a simple readme or updated sample for the new version? I noticed "fairview" and "fairview.comments" are now just "fccv" ..

Comments have been turned off for this article, but you can always contact us about it.