I am teh 133test

Wed, 31 Aug 2005

As I said in my last post on Movable Type:

I am moving my blog to WordPress and to http://blog.niceperson.org/. I haven’t quite figured out how to do the redirects automagically yet, but I’ll figure something out. In the meantime, please update your links and bookmarks.

I’m not completely sure what I’m going to do with the main site, but I have some ideas.

Problem A (automagic redirection) has now been solved with the careful application of RewriteRules. Unfortunately it’s not spot-on for the single-post pages, because Movable Type and WordPress use different algorithms for creating slugs, but that only means you’ll get the daily archive (which probably only contains one post anyway). It’s a quick, painless solution that requires no rebuilding of archives whatsoever. I’m pretty happy.

I have not deleted my Movable Type page hierarchy yet, merely moved it from archive/ to .archive/ (just in case there’s anything I haven’t anticipated). We’ll see how many 404’s I get from locations I’ve forgotten to redirect.

Inevitable

Tue, 12 Apr 2005

Well, I knew it was going to happen eventually. The Trackback spammers have finally found my weblog. They left fifteen little spams over the last three days, and I found it rather ironic that my last entry about fighting spam got more spam-pings than any other.

This minor deluge of spam has at last prodded me into action; as I promised in that very entry, I have renamed my comment and trackback scripts. That should slow down the spammers, if only temporarily.

Google and comment spam

Tue, 1 Feb 2005

This weekend I noticed that Googlebot had started indexing the URLs of my newly renamed comment script, even though it wasn't actually crawling the pages. Since comment and trackback spammers don't need the content of the page, just the URL, this was not enough for me. I wrote to Google to find out what could be done.

From: Laura Melton
To: Googlebot
Subject: Googlebot is not following my robots.txt
Date: Sat, 29 Jan 2005 21:03:39 -0800

Hi,

My robots.txt file (http://www.niceperson.org/robots.txt) disallows
bots from indexing my /cgi-imps/ directory (what I've renamed
/cgi-bin/).  Yet Google has indexed scripts in my /cgi-imps/ directory.
Why is this?

Thanks,

Laura Melton

My effort was not wholly in vain. Yesterday I received this reply:

From: Googlebot
To: Laura Melton
Subject: Re: Googlebot is not following my robots.txt
Date: Mon, 31 Jan 2005 14:11:42 -0800

Hi Laura,

Thank you for your note. Although your robots.txt file prevents our robots
from crawling your pages, it will not prevent our robots from adding a
link to your page without crawling it. We reviewed the link you mention:
www.niceperson.org/cgi-imps/mt/mt-tb.cgi/45. Our robots found this page
because another page linked to it. Our robots added the link to our index
without actually visiting or crawling the page. This is why the page does
not have a detailed title or description.

Although a robots.txt file usually prevents pages from appearing in our
search results, the only fool-proof ways to keep them out of our index are
to make sure that no sites link to them, password protect them, or remove
the robots.txt file and use a NOINDEX meta tag instead.

In this case, we suggest that you remove the links to this page or use
NOINDEX meta tags. To obtain a list of the links that point to a page,
perform a Google search on the URL. From the search results page, select
the 'Find web pages that contain the term' link, and Google will provide
you with webpages that mention that address. For more information on meta
tags, please visit http://www.google.com/remove.html#exclude_pages

You can also remove this page by submitting your robots.txt file for
immediate review at http://services.google.com:8882/urlconsole/controller.
However, please note that this will only temporarily remove this page from
our search results. If sites continue to link to this page, it may be
included again in our search results.

Regards,
The Google Team

This is not what I wanted to hear, but it is informative. Contrary to my prior belief, the robots.txt file does not completely block Googlebot from directories and files. Moreover, if I have a robots.txt, then Googlebot won't pay attention to my meta tags. I'm not getting rid of my robots.txt because I need it for file types that don't have meta tags. For example, my robots.txt has been successful in blocking off my images directory. In any case, I can't give my Trackback script a meta tag either. Screw the meta tag approach.

Removing all links to those scripts is sheer absurdity, of course, as is password protection. As for my rel="nofollow" attempt, Google says only that marked links will get no PageRank credit, not that Google won't index them. It was mere wishful thinking on my part.

This only proves what Phil and I said about Google's proposed solution: Google isn't interested in a solution for anyone but them. It's mind-bogglingly easy for spammers to find Movable Type comment or trackback scripts via Google (allinurl: "mt comments cgi"), so if Google really wanted to make the spammers' work harder, they'd make their bots a little less greedy (or give bloggers ways to slap them down).

So what can Movable Type bloggers do? I've submitted my robots.txt to Google's automated removal system, so in a week or so all links to my scripts should be gone, at least temporarily. As a more permanent work-around, I'm also going to rename my comment and trackback scripts (again) so that they don't contain the words comment or trackback. Current ideas are annotate and see-also, but I haven't made up my mind. Whatever I name the scripts, the comment spammers will have to work much harder to find them.

Stylish

Fri, 21 Jan 2005

If you tried to submit a comment a couple of hours ago, you probably noticed that the script formerly known as mt-comments.cgi was producing nice fat 500 server errors. Yup, that was me, tweaking (as ever) without having any clue what I was doing. Great learning experience, bad customer service. Bad Laurabelle.

(Note to self: Make better testing environment ASAP.)

What was I so clumsily tweaking? I was trying another tactic on the same problem I had trouble with earlier. The task was to translate a certain script from PHP to Perl, and it was harder than I had thought it would be; or rather, the parts I thought would be hard were easy, and parts I thought would be easy were very difficult.

The code in question is what I use for determining what style my site will use on any given occasion. The algorithm is fairly simple. If today is a holiday for which I have defined a stylesheet, use that style. Otherwise, if the browser has a cookie for the visitor's favorite style, use that one. If neither of those pans out, then use a seasonal style. (By the way, the form for setting a cookie is on the front page, at the bottom of the left sidebar.)

It turns out that taking my Perl code out of the MT template and sticking it in a plugin doesn't make it any more correct, but it did sort of make it easier to debug. That is, until I stopped getting reasonably comprehensible parsing errors and started getting server errors.

The log entry for these errors looked like this:

[Fri Jan 21 20:23:39 2005] [error] [client xxx.xxx.xxx.xxx] malformed header from script. Bad header=<!-- Script and navigation hea: LBM-comments.cgi, referer: http://www.niceperson.org

I had a beast of a time figuring out what the problem was, because to put it mildly, the MT plugin interface is not terribly well documented, at least for people who don't want to pay for it. So I bumbled along on my own and eventually worked it out. For those of you who find yourselves in my situation and have found my site by googling malformed header from script, I had to fix two things that were not obvious to me. First, make sure there is a line:

1;

at the end of your file. I don't really have a clue what this does, but it's important. Secondly, don't try to print your output; return it.

Once I got that bit worked out, the page loaded, but without styles. If you came along at that point, a plain white page was all you saw. First it was completely unstyled, and then it was merely albino, and then I finally cracked it. I lost count of how many bugs there were in my initial code, but there were a lot of them.

In fact, the cookies still don't work, but I'm not too surprised about that because my cookie fu is not very strong. It works in PHP (except my cookie-deletion code never worked, I don't know why) but not in Perl, but I'm not going to worry terribly much about it, because I suspect that nobody uses that feature anyway, and does it matter so much if the comments pages are a different style from the rest? (However, if someone with more plugin-fu and cookie-fu and, I dare say, Perl-fu would care to look at some code that almost-works, I won't refuse.)

I call my plugin MTStylish. If anyone thinks s/he would find it useful or interesting, I would be glad to share.

RIP Movable Type

Thu, 13 May 2004

It looks like I won’t be upgrading to MT 3.0. Six Apart is trying to put a happy face on it, but their pricing scheme is frankly absurd, and Shelley is right that there’s no guarantee of a free version past 3.0. It’s really too bad, and I think (hope) that Ben and Mena will be unhappily surprised by the disappearance of their user base.

Some people are planning to migrate as soon as possible, others (like Shelley) are glad they beat the rush, and still others never used it in the first place. Me, I’m planning to stick with 2.65 for a while. I don’t currently need an upgrade, because I’ve already got all the features I need. In the meantime, I’ll investigate my options more thoroughly before staking my course. Maybe I’ll go with WordPress or Textpattern, or maybe I’ll just grow my own.