In case you found the notice below a little confusing, here's the news: Trackbacks are back in business!
I've just finished hacking together a triple-bypass for the Movable Type / MT Blacklist system we've been suffering from.
The new system isn't amazingly sophisticated, but it does have some useful features:
Downside? Well, pings can take up to a minute to appear. But then, they used to take a minute to post. Also, if your ping fails for some strange reason, you don't get an error. It just says "Okay!" and drops it down the toilet. So if you have any problems, let me know.
Update: Doesn't seem to work for trackbacks from Wordpress. I'm not sure why; it seems that Wordpress does something funny/different when it sends the ping. Investigating. I don't know how to use Wordpress.
Update: 30,322 trackbacks received by the new system. 238 were passed on to stage three, where they still need to get past MT Blacklist. There's still quite a bit of crap getting past stage two, but it is weeding out 99.5% of the spam while using 0.4% of the resources of the old system. I count that as a win.
I'll look at tweaking stage two further. That part is all my own code and written in Python, and since it's a generic trackback/comment filtering system, I can easily re-use it in Minx, so the effort won't be wasted. But I think it's good enough for now.
I've tried pinging from MT 2.6, MT 3.1 and Wordpress, and they all went through fine. If you are running something else, feel free to give this post a ping and we'll see how it goes. Remember that it can take up to a minute for your trackback to appear.
Update: New feature that will be going in tonight: Tunable throttling. Right now there's a hard limit: 5 pings from any one (non-munu) source in a batch (which represents one minute) and they all get deleted. But if some spammer sends us 10 pings in one minute and 4 the next minute, the 4 pings go through.
With the new system, it holds a grudge. So if you send 10 pings in one minute, the next minute it still counts you as having sent those pings - less a tunable percentage (currently 10%). So in minute one, you send 10 pings, they get dumped. Minute two, it considers you to have already sent 9 pings, so anything else you send gets dumped and bumps the counter up. At 90% grudge level, it would take 9 minutes before it would let them send even one ping through.
Any number less than 1 gets dropped completely, so people sending a ping here, a ping there, shouldn't have any problems. And as before, it doesn't affect intra-munu pings at all.
Update: Oh yeah, I'm going to expand I've expanded the intra-munu thingy to a more general whitelist, so we can unblock friendly but prolific pingers if need be.
Update: Version 1.1 of the Snark!™ Trackback Filtering System is up and running. When you need to protect a blog that's precious to you, go with the very best: Go Snark!™
Posted by Pixy Misa at November 22, 2005 12:23 PM | TrackBackIt's interesting to watch. With the time delay/throttle system, the more spam we get, the less likely any of it is to get through. Which means the more munuvians we add, the less spam we get.
And having done away with the cursed trackbacks, the server is now frolicking gaily around the paddock, meaning that it's time to dump another 200 users on it.
Posted by Pixy Misa at November 22, 2005 01:13 PMGah. Now I just need to do something about comment spam.
Posted by Pixy Misa at November 22, 2005 01:13 PMOkay. So it turns out that the truly insane side of the trackback script is the display routine. You know when you click on "3 Trackbacks" and it lists them? That program is completely nuts. So now I have to replace that too.
Posted by Pixy Misa at November 22, 2005 01:21 PMAnother note: In the ten days since we moved to the new server, we've received over 500,000 trackbacks.
Ugh.
Posted by Pixy Misa at November 22, 2005 01:23 PMWe got 300 trackbacks in the past four minutes, all spam.
One got past the stage one filter.
Not perfect, but pretty good.
Posted by Pixy Misa at November 22, 2005 01:30 PMThe Googlebot is merrily rampaging through our CGI scripts, running every single comment and trackback script, all 250,000-odd of them.
I shot it through the head.
Someone remind to to remind people to use static pages for comments.
Also someone remind me to move the log files before they fill up the directory and crash the web server.
Thanks!
Posted by Pixy Misa at November 22, 2005 01:34 PMMore specifically, I prevented Google from indexing our CGI scripts, which it bloody well shouldn't be doing in the first place. If you want your comments indexed by Google, use a static page. They almost certainly already get indexed via your individual archive pages, so blocking Google from splorp.cgi isn't likely to matter. And you should be using static pages anyway. Static pages good! Dynamic pages bad! (At least if you're running MT 2.6...)
Posted by Pixy Misa at November 22, 2005 01:39 PMPixy - move the log files before they crash the server.
You're welcome.
What with all this hacking at MT and various codes in order to make Munuviana an ideal blog hosting service you might have kindofsorta stumbled into, well, an ideal blog hosting service. More ideal. The idealest.
Posted by Simon at November 22, 2005 01:41 PM
Right now, we have Google and Yahoo and MSN trying to index our trackbacks. I'm trying to find a simple way to tell them to piss off. Since they're following links from other domains, I don't think the usual robots.txt will do the job. And Yahoo comes from dozens of different IP addresses at once. Grrr.
Maybe some user-agent blocking. Hmm. Yes, that will do it.
Posted by Pixy Misa at November 22, 2005 01:55 PMThere is no god but Pixy, blessed be his name!
Those stupid search engines are the reason we have spam in the first place!
Oh, yeah--Yay!
Posted by Susie at November 22, 2005 02:12 PMI shut it down for a few minutes to rewrite the blacklist system, but it's back now.
Your pings went through.
Posted by Pixy Misa at November 22, 2005 02:22 PMI want to help I really do. But at the risk of betraying myself a fool - how do I even know what kind of page I'm using for comments and if I have the bad one, how do I change it?
Posted by Stephen Macklin at November 22, 2005 03:42 PMMT 3 defaults to static, but MT 2.6 defaults to dynamic. I'll have to post a how-to.
Posted by Pixy Misa at November 22, 2005 10:25 PM