Data on Data Breaches

At the FIRST conference in Seville, Spain, I delivered a presentation about “Data on Data Breaches” that Adam and I put together. The slides, with the notes I made to act as “cue cards” for me, are available as a large PDF file on a slow web server.
The main points I tried to make are:
That with the availability of breach reports direct from states with central reporting, such as New York, it is possible to measure part of our ignorance when we rely solely on published breach reports — even the best available sources (such as Attrition’s DLDOS) undercount breaches dramatically, and are biased toward larger incidents.
That we are still at the leading edge of an explosion of information, and that we should not draw hasty conclusions until more facts are in.
That, as Emil Faber might put it, “Knowledge is Good” and is not that painful to provide.
And finally, primary materials such as breach reports are useful artifacts not only because they tell us dry facts in a standardized format (but that IS nice), but also because the notices themselves are interesting evidence of how firms talk to their customers about a difficult topic.
I’ll be writing more on this subject now that I have received the fourth batch of breach reports from my pals in New York, and my other pals in New Hampshire have made such materials available on-line.

Why Johnny Can’t Bank Safely

Stuart E. Schechter, Rachna Dhamija, Andy Ozment, and Ian Fischer have written a paper which examines the behavior of persons doing on-line banking under various experimentally-manipulated conditions.
The paper is getting some attention, for example in the New York Times and at Slashdot.
What Schechter, et. al. find is that despite increasingly alarming indicators that something may be amiss, subjects frequently provided their passwords to an on-line banking site with which they were at least somewhat familiar. Absence of indicators that SSL is used, and absence of an image-based site authenticity indicator (such as SiteKey — although the authors do not mention which bank was involved in the study — are almost entirely ignored by subjects. Only a relatively dire IE7-style warning page seems to dissuade the subjects, and even then over a third logged in even when their real credentials, at their real bank, were involved.
The press is focusing on the Sitekey angle. The hook seems to be this: even when this highly-touted anti-phishing feature is absent (and a suspicious text box left in its place), people merrily supply their passwords. Therefore, Sitekey doesn’t help.
Another aspect of this study is worthy of note. One of the experimental treatments was whether subjects used their own account credentials, or whether — as instructed by the researchers — they played the role of a fictitious person using credentials supplied by the researchers (with and without a lecture about security).
Unshockingly enough, people behaved “more securely” (my words, not the study’s) when their real bank accounts were on the line.
So, even if we know that people act more securely when they have some skin in the game, how do we explain it when they nonetheless do seemingly dumb things?
This is where I want to see some follow-up work. If the Sitekey-style images aren’t there, and if people have been warned to look for them, what were they thinking when they just clicked on by? Why were they thinking that? Why weren’t they thinking precisely what they had been told to think — namely that this could be an attempt at fraud? When a blatant message was presented, the equivalent of a blinking neon sign, it helped, but why did a third of people disregard it? Did they read it? Was it “pop-up fatigue” at work? Do people not care about SSL indicators because they’ve seen one too many “secure login” pages that collect creds via HTTP-based forms and simply POST them via SSL? Is it that all this web security stuff is indistinguishable from magic (hard to believe of the young Harvard-area types that were the subjects of this study, but hey, maybe they were visiting from Somerville or Boston)?
These are important questions, and more and more is riding on them.
I haven’t seen any figures on losses due to phishing that I can remember offhand, but I strongly suspect that they are on the rise. Moreover, as operating systems and web browsers become more secure, it’s increasingly important for businesses like banks to understand the human side of these technologies because that’s where fraudsters will take aim. What people think when they interact with computers, the mental models they use, how they react to cues presented to them by applications and web sites, and how all of these mix with things they already know (or believe) about sites (“It must be reliable — it’s FooBarCoLand National Bank”) are things that will increase in importance.
I’m eager to learn more.
(Credit where credit’s due: 0, 1)

More on Godin and Tufte

There’s another good article on Juice Analytics, “Godin, Tufte, and Types of Infographics:” (hey, guys, where are the author names? Author names only show in RSS, not the web page?)

Tufte frustrates on a number of levels. He is enormously influential in business. Businesses send people to his seminars and they come back energized with the essential truthfulness of his message. Yet weeks later those principles are abandoned by the lack of practicality of his message. No one in business is going to design a graph in Adobe Illustrator as he can. They use Excel. Seldom can we spend days or weeks refining and testing a graph. The work must be done and then we move on.

So I totally agree with this, and ask, why aren’t we asking more of Excel? Why can’t we get graphics that are of Tuftian quality from them? As I’ve said, I’m really fond of the ribbon design, and if enough customers were asking for great, and defined improvements in graphical excellence, I suspect Excel would ship it. (A personal example: I’d like to be able to lock a set of graphs to the same scales for the axes, so I can create small multiples more easily. I have some graphs today that slice one data set differently, and I have to work hard to make the scales the same.)

It would be really interesting to see if the community of excellence around Excel could come up with ideas.

(In another post, Zach points to Re-Visions of Minard.)

Tufte, Godin, Juice Analytics

napoleons-march.jpgJuice Analytics comments on “Godin’s take on Tufte:

(Godin) I think this is one of the worst graphs ever made.

He’s very happy because it shows five different pieces of information on three axes and if you study it for 15 minutes it really is worth 1000 words.

I don’t think that is what graphs are for. I think you are trying to make a point in two seconds for people who are two lazy to read the forty words underneath

I think Seth has it just right. Personally, I can hardly resist the a well-constructed infographic, but I have an unnatural interest in data. For the many business users, better to construct information displays that are simple and to the point.

So, Seth’s points are good. They’re made in this video presentation at GEL 2006 (Google video, worth watching).

I’m really irritated by Juice’s words. It is never better to construct information displays that are simple and to the point, absent an understanding of why you’re constructing a display. If your point is “Napoleon lost a lot of lives attacking Russia” maybe a bar graph would do. Sometimes complex reasoning requires complex data. The question is not “Should your graphics be simple and to the point,” but rather “do my graphics help present the data and help people reason about it?”

To put it another way, start from the user story, use case, or scenario, and construct your information presentations to help that story along. Then, and only then, should you make it as simple and to the point as possible, but no simpler.

One Graph, Zero Credibility


Let’s see..we’ve got shadows, random colors, and the colors are graduated, and so is the background. Displaying 13 digits takes 109,341 bytes (in the original), for a remarkable data density of .0001 digit per byte.

Anti-phishing working group? You can, I hope, do better.

Via the F-Secure blog, who don’t have per-post links.

A Picture (or Three) Is Worth A Thousand Words

Iang over at Financial Cryptography talks about the importance of not just which cryptographic algorithm to use, but which mode it is implemented with. He uses three pictures from Mark Pustilnik’s paper “Documenting And Evaluating The Security Guarantees Of Your Apps” that are such a great illustration of the problem, that I have to include them here.
Adam and I have both been to Tufte’s courses on Presenting Data and Information and these strike me as the kind of illustrations he would appreciate. The beauty of them is that as a non-cryptographer, you don’t need to understand the technical differences between ECB and CBC modes, because the illustrations demonstrate them far better than any text could.
[Edit: In the comments, nicko points out this extremely cleaver idea was originally done with the Tux logo from Linux and that they can be found on wikipedia in the section on block cipher modes of operation.]
Figure 2a Plaintext
Figure 2b ECB Encryption
Figure 2c CBC Encryption

Presentations and the Web

bad-presentation.jpgIt’s easy to put presentations on the web, just like it’s easy to create them. Neither is easy to do well. I’d like to talk not only about good slide creation, but how to distribute a presentation in a useful way. It’s not easy to create good presentations, even when you have good content. Simson Garfinkel pointed me to a great source on “The Design of Presentation Slides.” It’s based on actual research about presentation style and retention. It turns out that a full sentence headline, graphical representation of data, and conclusions to draw from the data presented is far more memorable than bulleted sentence fragments (right).

This style also works well when the presentation is actually a presentation of some other organized thinking, such as a scientific paper, or progress report. When the presentation is accompaniment to something, I believe the research that says the headline sentence, data and conclusion style lead to better retention. What about when there is no other handout?

There’s an expectation that speakers at a conference or workshop will provide slides. From the perspective of the conference organizers, requesting slide offers some small assurance that the speaker has prepared, and allows the conference attendees to have the slides as a reminder of the talk. From the reminder perspective, outline slides are actually very useful. There’s rarely an expectation of handouts that aren’t the slides. Perhaps the most useful (generically) is an actual outline, created with a tool designed for that purpose. A real outline is useful because it is less constrained by the genre: ideas can be more than active fragments, and the printed page imposes fewer constraints on both sentence and block than the slide. An outline’s not so useful as data, but who has data these days?

So I think I may move away from my habit of providing multiple formats of the slides themselves, and move to putting up a three-part web page with outline, references, and any details of the argument that seem to require elucidation. Perhaps even a short essay.

I would do this because the two scenarios are so different: One involves having me at the front of a room, using slides to illustrate and orient around my words. The other, without me there, means that the message needs to be self-contained.

Tony Chor on Presenting at MIX

Tony Chor has a good post on “Backstage at MIX06.” The effort that goes into a good presentation, including the practice, the extra machines, the people to keep them in sync, etc, is really impressive:

Normally, when I do a presentation and demo, both the demos and the presentation are on the same machine. I advance the slides and do the demo myself. Sometimes, for a big talk like my keynote at Hack-in-the-Box, we separate out the slides and demo onto separate machines (especially when the demos have pre-release bits like Windows Vista or IE7) and maybe I’ll have someone help me with the demos/slides to keep things running more smoothly.

Well, MIX took that to a whole new level. First, the demo machine was backstage, connected to a monitor, keyboard, and mouse via a switch. We also had a backup demo machine hooked up.