Those who have tried commenting recently probably would have noticed their comments weren’t showing up for a while — this was as a result of my filters having a common term in them which essentially forced me to moderate everything manually. The list of banned terms is quite extensive, so it took me a while to find exactly what I’d done wrong and correct it; long story short, you can now comment and mostly expect your comment to pop up straight away (unless of couse you’re talking about your wonderful job at a casino where you get to play poker whilst selling viagra and cialis and watching free porn, in which case probably not).
Sorry for any inconvenience or confusion this may have caused recently — I’ve added a note above the comment posting area notifying people that moderation is in force and as such their comments may not appear straight away. If you have any further troubles, feel free to use the contact form (which, incidentally, also had problems with the “Confirm” page markup until around midnight — thanks Steve for pointing that out — I was still getting all messages sent via it, fortunately).

How did you know I was gonna comment about that Casino I was at?
See, I had to manually approve that! Geeeeez :P
vis.it dA.l3group f0r seriou5ly co0l warz and sig.n up t o our p0ker chann3l. Pl4y off w3ekly for gr43t priz3s, m!n bet is $10O0.
http://www.dalegroup.net/
Ah, but that misses the purpose of comment spam entirely — it’s all about inbound links with appropriate keywords! The actual “message” doesn’t matter so much as the content (e.g. individual words), unlike email spam which is another story altogether…
Hmmm, I have something i wrote in VB (yes snicker and get over it), that could be adapted to filter every conceivable variation on a bad word. Not that I could be stuffed adapting it so this comment is quite pointless but that’s me.
I think what would be more interesting is REGEX based rules that works out things such as numbers to letters ratio (and other such things). Because most bodies of real text contain few number and are most just letters, and not symbols etc.
So on top of your word filter you could compare a post with other rules.
Also maybe you could have “grey words”. If a certain word is used more than once per post or in a set amount of time (lots of posts) it flags it as spam. This may help to stop spam flooding. Sure you may get one spam message but it is better than 100 of the same thing.
Although josh, your word lists seems to work well. Very few spam messages make it live to this site.
Yeah, there are some REGEX rules (I think… kind of like .htaccess style stuff? let me know if I’m wrong) in the list, BUT I don’t know if they’re actually doing anything (coz I ripped the list from a Moveable Type website which can use REGEX for filtering with MT-blacklist or whatever). I’m sure one of the big WordPress spam management packages does grey words or something like it, but I just don’t see the reason to implement it at the minute, because, like you said, filtering here works pretty okay.
Nick — g0t pseudo-code version of it? Would be an interesting exercise even if it weren’t used.
p.s. you need to get a blog site/website up and running!
Ahhhh, pseudo-code, evil, starting to sound like thill, I maintain that the only documentation I am ever likely to need is remarking within the code. Unfortunately the BOS does not share this view… My own website, what do you take me for some sort of cheap exhibitionist :).
Heh, a comment spammer just hit this post :P Must happen all the time, I only just took note though :P