Die spammers die!

Sat, 15 Jan 2005

Using tricks from Parker and Dorothea, I've grown my own referrer-spam-fighting fu. In addition, I've translated my old bot-fighting rules (which were using ModRewrite) into less-heavy SetEnvIf rules. That should make Kevin a little happier.

I have a few more strings to add to the collective killfile:

jmsimonr|middlecay|neweighweb|targetindustries|zalaszentgrot|zone-b51|

One problem I've noticed is that referrer spam seems to be passing through on my comments, like this:

203.115.21.204 - - [16/Jan/2005:07:04:24 +0000] "GET /cgi-imps/mt/mt-comments.cgi?entry_id=193;parent_id=96 HTTP/1.0" 200 5265 "http://www.hdic.net/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.1.4322)"

So I've put a .htaccess file in my Movable Type CGI directory, and it looks like this:

SetEnvIf Referer "^http://niceperson.org" local_referal
SetEnvIf Referer "^http://www.niceperson.org" local_referal
SetEnvIf Referer "^$" local_referal
order deny,allow
deny from all
allow from env=local_referal

I don't know yet whether this actually works, but it's the same code I use in my image directory (to prevent other sites from stealing my bandwidth). It doesn't work. Damn. Anyone know if this sort of rule just doesn't work on CGI directories? My global referrer-blocking rules aren't working either.

Update 18.1.2005 0:15 — It turns out .htaccess files weren't enabled for my CGI directory (different document root from my web hierarchy). My wonderful admin has now enabled enough stuff to make it work, so I'm a happy bunny.

Yesterday I also implemented Jacques Distler's method of renaming mt-comments.cgi so that spammers won't find it so easy to find my comments pages. This has the strong possibility of eliminating comment spam (at least temporarily, until the spammers write smarter bots). I suspect that the referrer spammers are not significantly daunted by 410 status codes (over a span of 5 hours today yesterday, our friends at fidelity-funding requested non-existent mt-comments.cgi a total of 52 times), but at least I'm serving them less data.

Update 1.2.2005 13:05 — I was wrong, this code still doesn't work. See Denying non-local referrers for the correct method.

Comments

Dorothea Salo says:

Try taking out some quotation marks, especially around ^$. I don't know why, but I have a rule relating to ^$ also, and it just wouldn't fly if I quote-marked it.

pjm says:

I think I'm with Dorothea on that. The ^ and $ are regular expression special characters to anchor the match at the start and end of the string (^ is start, $ is end) so ^$ is, specifically, a null string. I don't think the quotes are necessary, because there's nothing to quote.

Laurabelle says:

I've taken out the quotation marks, because it shouldn't hurt. Still, I think that if that were failing, I wouldn't be able to access any file in my MT directory, and I can. The rule is deny-everyone-except-local where local is defined as niceperson.org or a blank referer, so if the blank referer weren't matching, then I wouldn't have been able to load mt.cgi like I did a few minutes ago.

I did some Googling after I posted, and I think perhaps Apache just isn't configured to recognize or use .htaccess. I'll drop a line to my Friendly Neighborhood Sysadmin.

Laurabelle says:

Okay, I know for sure that the rule works on my /images directory, because it blocks images from showing in the preview of my Atom feed on Bloglines.

If anyone cares, the rules for images are the same as what I was trying in my MT directory, except that it applies only to image files:

<FilesMatch "\.(gif|jpe?g|png)$">
Order deny,allow
Deny from all
Allow from env=local_referal
<FilesMatch>

This has the same effect as the ModRewrite rules I wrote almost two years ago, but it's much easier on Apache. As I understand it, ModRewrite is powerful but heavy, so it's generally more efficient to use normal rules when possible.

pjm says:

Renaming mt-comment and mt-tb are both very good and useful steps. I've found that adding some caps to the new names is also helpful - if their bots find the new names, apparently they're not always bright enough to do so in a case-sensitive way. That plus MT-Blacklist has had me spam-free for a few months now.

W.r.t. Ed's mod_rewrite rule, he left a comment on my entry suggesting that what he was seeing wasn't specifically referrer spam - it seemed to be attempts to exploit his site as an open proxy, with referrer spam just piggy-backed on. So it's no wonder his rules don't always help us. I just left 'em out and went with the regex plus user-agent block, which has helped hugely.

Laurabelle says:

Ohhhh, that makes sense (about Ed's rule). It also correlates with other things I've been reading about referrer-spammers exploiting open HTTP proxies in order to mask their true origins.

Good idea about case-sensitivity. I'll institute that when I'm done posting this comment.

I don't black-list, but I use MT-Bayesian. It filters spam very effectively; unfortunately it's apparently going through a phase of marking everything as spam. 90% of my own comments get marked as spam!

ptt_ says:

some comments pages use the "type the letters you see in this image" hurdle, to allow only humans to comment

Laurabelle says:

True, but that's a different problem from the one I was trying to solve with blocking referers. :-)

What you describe is called a "captcha," and I don't find that I need that right now. I've actually more or less solved the comment-spam problem. (Maybe I'm just not popular enough for anyone to try to circumvent the measures I currently have in place.)

Trackback spam isn't susceptible to captchas, of course, but I haven't had too much of a problem with that either.

Jeff says:

Captchas can also be defeated in a couple ways. Advanced OCR techniques can defeat most simple captchas and even some of the more distorted ones, but there is always the ability to get a human to overcome a captcha for the spammer.

One scenario (reported to be in use): Porn site operator and spammer needs yahoo.com accounts, so he writes a script to autofill in the form, captures the captcha, feeds it to his "new user" page, gets some schmuck to sign up, decode the captcha, and when the yahoo.com email account has been verified, the schmuck gets his porn, and he just thinks that the porno page has implemented a captcha itself.

Anyway, between "social engineering" techniques and advanced pattern recognition algorithms, captchas are just another arms race.

Jeff

Laurabelle's Blog says:

Google and comment spam

This weekend I noticed that Googlebot had started indexing the URLs of my newly renamed comment script, even though it...

Heal Your Church Web Site says:

Using .htaccess to deal with a recent flood of trackback ping spam

"Holy smokes, I've been hit! My comment spam 'secret code' filter is working like a charm - no spam in weeks, but now they've decided to spam through trackback. The other day I had two new trackback pings on older...

Docs.Rage.Net says:

Docs.Rage.NET: /apache/env.html.en

This was suggested as being relevant by a visitor.

Docs.Rage.Net says:

Docs.Rage.NET: /apache/urlmapping.html.en

This was suggested as being relevant by a visitor.

Post a comment











XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

OpenID: If you use OpenID, your comment will be approved automatically and will not be held for moderation.