Buffer Overflows and History: a request

overflow.jpg

One of my long-term interests in security is the ongoing cost of secrecy. My current favorite example is the stack smashing buffer overflow. These were known and understood no later than 1972, and clearly documented in the Computer Security Technology Planning Study:

The code performing this function does not check the source and destination addresses properly, permitting portions of the monitor to be overlaid by the user. This can be used to inject code into the monitor that will permit the user to seize control of the machine. (Page 61)

I believe that more open discussion of the technique by Aleph One led to a variety of defensive techniques getting baked into compilers and operating systems. Those defenses are now widespread, and it’s getting hard to find a stack smashing attack 10 or so years later. Had we not let the problem fester in secret, we’d be better off.

I’ve been told that the Bendix G-20 and the Burroughs B5500 had hardware level protection against buffer overflows as an intentional security mechanism. That is, there was an understanding that user supplied data could alter the flow of control.

I’m wondering if this is documented as clearly as the statement in the Security Technology Planning Study. It is very clear what the attack is and what the impact is. I’ve spent some time looking for a similarly clear published statement about one or the other of those machines. (Or heck, even a clear statement of the stack smashing attacks, rather than fuzzy statements about problems.)

Can you help me find such a thing?

Photo: Overflowing Glass 3, by nosheep on Stock.xchng.
[Update: We’ve got very interesting debate flowing in the comments.]

12 thoughts on “Buffer Overflows and History: a request

  1. I don’t have specific pointers yet but you can find most/all of the documentation for the B5000 series here:
    http://www.bitsavers.org/pdf/burroughs/B5000_5500_5700/
    My dad worked on repairing them way back when and has this to say:
    “Here is a link to a bunch of manuals. A B5700 is a re-skinned B5500 with racing stripes but the same internal guts. Most of the systems delivered were B5500s
    http://www.bitsavers.org/pdf/burroughs/B5000_5500_5700/
    The reference manual 1021326, the third item in the list, will give you a good overview of how things worked.
    If you want more detail, look at the processor training manual B5281.55
    School for this was 12 weeks in Detroit and then you understood enough to be dangerous. Troubleshooting was down to the register bit and input/output gate level AND you fixed the board. No card swapping either.”

  2. Thanks Andy!
    I’d actually come across that site, but wasn’t able to find anything about overflows. Maybe I missed it? The terminology has changed a lot.

  3. It was known as “array bounds checking” – if you search on that you may have more luck. If all compilers had it these days, we’d all be a lot safer.
    But I think it became very unpopular as languages started to deal less with arrays of defined size, and more with strings of undefined size.

  4. Bounds checking was the ticket. I recall seeing this problem with standard Fortran and variations called WATFOR and WATFIV from the University of Waterloo back around 1976-1978. In normal Fortran the problem could happen anytime unless you compiled with bounds checking. WATFOR/WATFIV were designed for students and did this automatically, except you could bypass it when using call by reference to pass arrays of arbitary sizes. Some students had programs that would dump arbitrary core or write and execute arbitrary instructions. I don’t recall anyone smashing a stack but then you didn’t need to.

  5. I seem to recall Peter Neumann talking on comp.risks about finding buffer overflows in Multix back in the mid-60s, but I can’t find a reference to it at the moment.

  6. As I read the blog post, Adam isn’t asking for a reference to defenses like array bounds checking; he’s asking for a reference to a clear description of the attack — namely, something that makes clear just how easily, and how thoroughly, systems can be exploited if they contain a buffer overrun / array out-of-bounds error. [Bold added by Adam: Yes!]
    Of course it was well known that array bounds checking can help prevent some kinds of bugs — e.g., bugs that might cause your program to crash or cause random memory corruption. But this misses the point. The eye-opener about Aleph One’s paper is that it shows how easy it is to inject malicious code and cause the system to transfer control to the malicious code. That was truly eye-opening.
    I don’t recall seeing this clearly documented decades earlier. Yes, there are the Karger papers, but based on my memory, I don’t believe they clearly explain the stack-smashing attack.
    A clear description of the stack-smashing exploit was important: it changed people’s thinking about the severity of buffer overrun bugs. Aleph One’s paper made very clear, in a way that earlier papers did not, how serious these bugs are.
    P.S. Adam, while it’s probably fair to say that Aleph One’s paper led to significant improvements to buffer overrun defense, it’s also my impression that it led to a tremendous increase in exploitation of buffer overrun vulnerabilities for malice. It’s not clear to me if it was the publication of Aleph One’s paper that led to better defenses — or if it was the significant uptick in malicious buffer overrun attacks following publication of Aleph One’s paper that stimulated work on better defenses. I’m not sure how we would tell. So while I agree with your general thesis, I wonder whether Aleph One’s paper is the best illustration of it.
    For other illustrations of this thesis, see the foreword and endnote to Bellovin’s paper on DNS security (http://www.cs.columbia.edu/~smb/papers/dnshack.pdf), which covers the flip side: Bellovin decided not to disclose the vulnerability for fear of making things worse, but attackers learned of it anyway and this probably delayed deployment of defenses. See also the story about DDoS with reflectors: if I recall correctly, Vern Paxson discovered the threat (I think?), but didn’t publish because he couldn’t come up with any way to defend against the threat, so publishing seemed like it would only make things worse. Years later, Barros rediscovered the problem and found a clever defense (http://web.archive.org/web/20060331022605/http://www.research.att.com/lists/ietf-itrace/2000/09/msg00044.html is the message, I think) that stimulated new work on defenses against this kind of attack. It turns out that reflector-based DDoS probably isn’t very important today, because of the rise of botnets, but that’s a story “that could have been”. And ask David Jefferson or David Wagner sometime about the story with the Diebold virus vulnerability (the “Hursti II” hack) and how much delayed disclosure helped, for another example of the flip side of the story.

  7. Got another update on this:
    “If they would look at the B5500 Reference Manual 1021326 and go to pages 4-4 to 4-10 there is information on the Interrupt Controller. In particular they should refer to the Stack Overflow, Invalid Address and Invalid Index interrupts.
    Section 3 of the manual talks about how the stack concept worked in the system. In there are explanations of the various registers and how their error detection would cause these interrupts to be set.
    Normally you would not expect these interrupts to be set because they caused a job to abort. Since this was a multi-tasking system you only lost the one job and the system continued to run and start the next job in the queue. If you had a two processor system, they worked in a master/slave configuration and you actually ran two independent jobs simultaneously while sharing a common memory and I/O system”

  8. I read pages 4-4 to 4-10 of B5500 Reference Manual 1021326. I don’t see how this is relevant to Adam’s request. I don’t see any warning that overflow of user-supplied data can alter the program’s control flow and thereby breach security. I don’t see any warning that overruns of user data buffers can be used by an attacker to take control of the running code. This is nothing like the Aleph One paper. I’m unimpressed.

  9. Anonymous geek,
    Thanks for checking through that for us!
    As to the question of “Is Aleph’s paper the best illustration?” I chose it very intentionally. Yes, the paper led to a massive and unfortunate rise in stack smashing for fun and profit. However, we’re now in a decline of such things, and most modern OSs now have explicit defenses. Had the paper come out in ’85, most of those defenses would have been in place when the Internet came along. We’d have a lot less code to comb through.

  10. This is a tricky topic, but I tend to agree with Anonymous Geek. Also, I think it is worth considering that this might not have been a “festering secret” as much as nobody bothered to develop/control the weakness for “fun and profit” as it were. It’s sort of like saying electricity was a festering secret for hundreds of years before we figured out how to use it.
    It may also be worth really fleshing out the cost model – I don’t see how Aleph One’s paper reduced costs nor how secrecy increased them.

  11. Pete,
    There were clearly people who understood the technique and could exploit it. For example, Robert Morris Jr.
    Feel free to attempt to flesh out the cost model. From where I sit, it’s pretty overwhelmingly obvious that a tutorial on writing X makes it easier for more people to do X, and that not discussing X meant that the problem was harder to prevent.

  12. @Adam –
    Sure there were people who understood it, and in the aftermath of the Morris worm, a number of folks wrote about it, including Gene Spafford in 1991. I am not sure why these don’t qualify as public information – what people knew about buffer overflows was published. (There is a more interesting question here about distribution of information in pre-Internet days, I think).
    Regarding the cost model, we are caught in a chicken/egg causation question. My model would include the costs associated with incidents prior to Aleph One’s paper (e.g. Morris worm) to the costs after the paper (e.g. almost everything else). In addition, I would factor in the costs of patch development and application. I think maybe your patch timing paper could be used at a macro level as well to help flesh this out.
    In any case, it seems to me that the costs after the paper were many times more than the costs prior to publication. What am I missing?

Comments are closed.