I wrote code for a botnet today

There’s a piece of software out there trying to cut down on blog spam, and it behaves annoyingly badly. It’s bad in a particular way that drives me up the wall. It prevents reasonable behavior, and barely blocks bad behavior of spammers.

In particular, it stops all requests that lack an HTTP Referer: header. All requests. Not just POST to the comment CGI, which might appear to make sense. Not just POST. All requests.

There’s two problems with this. First, it assumes a static attacker, which is a poor descriptor of spammers. Second, it has high auxiliary costs.

So I wrote 28 characters of code for a spamming botnet. This assumes that there’s a variable “site” which is getting spammed, and gets inserted in the header printing block:

printf("Referer: %sn", site);

That’s it. I just broke the “Bad Behavior” plugin, because that’s what the comment link referer will look like. (If I were to put in site, path, that would be about 4 lines of code. Mostly because it’s been long enough since I’ve dealt with C string handling I’d have to look up how to split the string and drop the last component.) I’d link to it, but you know, I can’t see the site.

Incidentally, I didn’t contribute that code anywhere. It’s a thought experiment, which Bad Behavior’s author should have done years ago.

Good security design takes into account obvious next steps by attackers. It considers impacts on privacy and liberty. Missing those, security designs are at best acceptable, and at worst oppressive.

[Update: I realized I’m violating my own advice here, by saying “that’s wrong.” So let me be prescriptive: Don’t use the referer header for security. Just don’t. Don’t even try. You might try to redesign blog posting to take into account a particular blog post, but that would require breaking commenting directly from the front page of a blog.] [Update 2, added link to WMV video around ‘my own advice.]

10 thoughts on “I wrote code for a botnet today

  1. With all due respect, you’re very wrong about not using the Referer header for security. You obviously cannot rely on it when it’s correct since it can be forged trivially, but when it’s incorrect (or inconsistent with legitimate activity), it’s very close to 100% reliable as an indicator of a problem.
    Consider this: If you get a comment from someone claiming to be “the author of thedave.ca”, you’ll have no particular reason to trust or distrust me since you don’t know if I’m lying or not, but if you get a comment from someone claiming to be “the author of Emergentchaos.com” you immediately know I’m lying.
    In the same way, if I get a contact form submission from a request with a Referer which doesn’t have a contact form on it, I know it’s spam.
    Even with homepage based commenting, if I only have five articles on the homepage, how can a comment for a post #17 posts ago (expired from the homepage 12 posts ago) claiming a referer of the main page possibly be legitimate?
    Refusing requests without a Referer that aren’t even form submissions is pretty dumb, but one bad implementation doesn’t negate the possibility that someone will create a good implementation.
    The fact that spammers could work around it with a small number of lines of code is also not a reason to take advantage of the fact that it does work *today* and may work tomorrow — Humanity might blow ourselves up next month, but I’d still recommend paying your mortgage today just in case we don’t.
    Some spammers are particularly adaptive, others are surprisingly boneheaded and never adapt, why not pick off the low hanging fruit?

  2. The observation about low-hanging fruit is a good one. For years I stopped at least half the spam coming into my site just by closing the SMTP connection if the remote system sent a command before receiving the response from the previous one. Legitimate mail servers would always wait for the response, but spambots often blindly spewed SMTP commands without waiting. (I put a couple of short, deliberate delays into the transaction to make this behavior more obvious.) I figured they’d adapt eventually to this technique but it took quite a long time before I saw a significant decline in effectiveness.

  3. It all depends on how much wheat you’re willing to throw away in order not to have to put up with chaff.
    Of course, when you can’t see the lost wheat, it is difficult to calibrate your thresher.
    (We hope you enjoy this special edition of Infosec Analogies. The regulars: “Security is like a {car, house, barn door, ‘Star Wars scene’}” will be back next week.) :^)

  4. There’s a paper out there somewhere that recommends using the “Origin: ” header. “Origin” would only be sent on POST requests, where privacy is less important (you’re telling the webserver to change its state), and it has, in theory, a well-defined definition.
    And it’s spelled correctly, too!

  5. I don’t track my comment submission that closely, but looking back at my comment feedback (which does include the Referer), I can’t find a single case of a legitimate piece of feedback coming in that didn’t also have a valid Referer.
    Looking at email as an example for how well spammers do/do not adapt, SMTP “early talker” (inappropriate pipelining) still catches a two digit percentage of spam, with a 0% false positive rate in my own testing. I see thousands of delivery attempts daily that EHLO/HELO as “localhost” or put my server’s MX or the destination domain in the EHLO/HELO, something I’ve blocked outright for 4-5 years.
    Intelligent/adaptive spammers are one part of the market, but if the email experience is at all similar to the web based spammers, the “low hanging fruit” includes the majority of delivery attempts, although probably not the majority of spammers.

  6. The Dave,
    If a measure is cheap, then it may be worth using. But cheap involves both low cost to implement, and low side effects.
    In this case, blocking GET requests which are purely reading cost the blogger in question real page views. There’s enough of those that the software comes with advice that I give up privacy to read those blogs. It’s not an unknown fail.
    If you advertise, the plugin is costing you page views without any gain in security when people are reading.
    There are lots of measures which are cheap and easy which have low side effects. This isn’t one of them.

  7. To put it another way, should referer be treated as the only data to make a spam decision (which is what this software does) or an element of a scoring system?

  8. The annoying thing about blogspam is that it is not even aimed at the readers of the blogs. It is typically aimed at the search engines.
    Years ago I proposed that we might use RDFa to mark sections of a blog that were from external sources. That would then allow a search engine to avoid giving credence to blogspam.
    I gave a presentation on this to the W3C.
    At the time it was not core for VeriSign so I did not follow up. Now that I am independent I guess I could write it up as a concrete proposal if there is interest.

  9. I’m not suggesting blocking GETs or POSTs without a Referer, just adding code to check for a valid Referer before processing comments/feedback/etc regardless of whether the content comes in via GET or POST.
    I like the idea of marking content as “from external”, but isn’t this similar to rel=nofollow as far as avoiding weighting links?

Comments are closed.