New Species Discovered on Flickr

Semachrysa Jade

There’s a very cool story on NPR about “A New Species Discovered … On Flickr“. A entomologist was looking at some photos, and saw a bug he’d never seen. Check out the photographer’s site or Flickr pages. The paper is “A charismatic new species of green lacewing discovered in Malaysia (Neuroptera, Chrysopidae):
the confluence of citizen scientist, online image database and cybertaxonomy

The online images were then randomly examined by the senior author (SLW) who determined that this distinctive species was not immediately recognizable as any previously described species. Links to the images were forwarded to additional experts in chrysopid taxonomy to elicit comment on its possible taxonomic identity. After extensive discussion it was concluded that the species was likely new to science but its generic placement inconclusive based solely upon the images at hand.

I find it fascinating that the distinction of a new species is keyed on a morphological difference like this. While I know nothing about the chryopidae, and this is just a lay comment, but substantially larger variations occur in dogs without driving the claim of a new species. Does anyone know what makes for a new chryopid?

Photo by Kurt, aka Hock Ping Guek.

Emergent Map: Streets of the US

This is really cool. All Streets is a map of the United States made of nothing but roads. A surprisingly accurate map of the country emerges from the chaos of our roads:

Allstreets poster

All Streets consists of 240 million individual road segments. No other features — no outlines, cities, or types of terrain — are marked, yet canyons and mountains emerge as the roads course around them, and sparser webs of road mark less populated areas. More details can be found here, with additional discussion of the previous version here.

In the discussion page, “Fry” writes:

The result is a map made of 240 million segments of road. It’s very difficult to say exactly how many individual streets are involved — since a winding road might consist of dozens or even hundreds of segments — but I’m sure there’s someone deep inside the Census Bureau who knows the exact number.

Which raises a fascinating question: is there a Platonic definition of “a road”? Is the question answerable in the sort of concrete way that I can say “there are 2 pens in my hand”? We tend to believe that things are countable, but as you try to count them in larger scales, the question of what is a discrete thing grows in importance. We see this when map software tells us to “continue on Foo Street.” Most drivers don’t care about such instructions; the road is the same road, insofar as you can drive in a straight line and be on what seems the same “stretch of pavement.” All that differs is the signs (if there are signs). There’s a story that when Bostonians named Washington Street after our first President, they changed the names of all the streets as they cross Washington Street, to draw attention to the great man. Are those different streets? They are likely different segments, but I think that for someone to know the number of streets in the US requires not an ontological analysis of the nature of street, but rather a purpose-driven one. Who needs to know how many individual streets are in the US? What would they do with that knowledge? Will they count gravel roads? What about new roads, under construction, or roads in the process of being torn up? This weekend of “carmageddeon” closing of 405 in LA, does 405 count as a road?

Only with these questions answered could someone answer the question of “how many streets are there?” People often steam-roller over such issues to get to answers when they need them, and that may be ok, depending on what details are flattened. Me, I’ll stick with “a great many,” since it is accurate enough for all my purposes.

So the takeaway for you? Well, there’s two. First, even with the seemingly most concrete of questions, definitions matter a lot. When someone gives you big numbers and the influence behavior, be sure to understand what they measured and how, and what decisions they made along the way. In information security, a great many people announce seemingly precise and often scary-sounding numbers that, on investigation, mean far different things than they seem to. (Or, more often, far less.)

And second, despite what I wrote above, it’s not the whole country that emerges. It’s the contiguous 48. Again, watch those definitions, especially for what’s not there.

Previously on Emergent Chaos: Steve Coast’s “Map of London” and “Map of Where Tourists Take Pictures.”

Another personal data invariant that varies

Just about anything a database might store about a person can change. People’s birthdays change (often because they’re incorrectly reported or recorded). People’s gender can change. One thing I thought didn’t change was blood type, but David Molnar pointed out to me that I’m wrong:

Donors for allogeneic stem-cell transplantation are selected based on their HLA type (tissue type), and not on their blood type. Therefore, it is quite common that the donor and patient have different blood types. The blood type is determined by the red cells. After transplant and bone-marrow recovery the red cells will come from the donor and have the donor’s blood type. As an example, if the patient is blood type A, and the donor is blood type O, the patient after transplant will become blood type O. The long-term outcome of an allogeneic stem-cell transplant is affected only to a small degree by the blood types of the donor and recipient. If an ABO difference exists, the transplant itself may create some technical difficulties, but these can be easily overcome. Red-cell recovery may be delayed after such transplants, and the patient may need support with red-cell transfusions for a prolonged period of time. More importantly, the patient should be aware that the blood type has changed or will change, and that old blood type cards are no longer valid. IBMT will provide you with a laminated card that indicates that your blood type may have changed. After your bone-marrow function has fully recovered, you may receive red cells of your new blood type. During the transplant process, usually red cells of blood type O are used, since these can be used for any patient (universal donor).
(“Indiana Blood and Marrow Transplantation“)

David continues:

The Seattle Cancer Care Alliance is the #1 by volume in the U.S and does several thousand per year. So that means several people per day are having their blood type changed right here in Seattle.

Does your database and e-health record support updating your blood type record?

Congratulations to the CVE team!

The CVE Web site now contains 30,000 unique information security issues with publicly known names. CVE, which began in 1999 with just 321 common names on the CVE List, is considered the international standard for public software vulnerability names. Information security professionals and product vendors from around the world use CVE Identifiers (CVE-IDs) as a standard method for identifying vulnerabilities, and for cross-linking among products, services, and other repositories that use the identifiers.

See the CVE News page. I remember proposing that we have a CVE-1. I’m tremendously proud to have helped get such a useful thing off the ground, and really happy for the CVE team.

From the Heresy Desk

Theatre Security

Before Bruce Schneier started using the term, “Security Theatre” was a term I heard from what I call Real Security People. I was designing a security-oriented NOC, and I interviewed people who built secure sites for a couple of governments, banks, and others. They said that what The Adversary thinks you can do is more important than what you can do. I was told that perception is the majority of security: “Maybe not two-thirds, but definitely more than half.” As the team built the system, we took this to heart, which made it more fun, at the very least. But I also heard from someone I know who nmapped our system and received an nmap in return that he decided it wasn’t a good idea to go further. In that case, at least, the security theatre worked.

We also used a bit of security-through-obscurity. We tweaked some of our network protocols so that they were merely incompatible with the off-the-shelf stuff. Our protocol banners lied. We particularly enjoyed having them declare that they were known vulnerable in odd ways. It was at least informative that the random attacks that came by were not tailored. No one ever tried Sparc vulnerabilities on that server claiming to be SunOS 4 with Bind 3. They hit it with the Windows buffer overflows anyway. That was disappointing, but we also learned an important lesson — the only people who care what your banners say are the good guys. The bad guys find it more economical to just spray you with whatever exploits they have in their bag of tricks. Or at least most of the bad guys.

Security through obscurity has gotten a bad rep in part because there are people who think that merely by being obscure is being secure. There are also people who think that a mediocre security system can be made secure by being obscure. If, however, you start with good security and then put a bit of obscurity on the top, it’s a bonus. Think of security as armor and obscurity as camouflage. Camouflage is not armor; obscurity is not security. People who tell you it is are trying to sell you something. However, if an attacker is faced with armored things that are also camouflaged, their job is harder. If you back up the camouflage with good log analysis, then you can take the element of surprise away from the attacker. The total effect is good security theatre, a theatre that might result in deterrance. Just be honest about it, especially to yourself. If the attacker discovers you have no armor behind the camouflage, then you have a well-prepared opponent.

There are other reasons to eschew obscurity. It isn’t scalable, and it doesn’t lead to market solutions. You can’t shop around for the best obscurity. The notion of a global secret is somewhere between ironic and silly. This is why DRM systems don’t work against determined attackers. However, not everything needs to be open, scalable, and market-driven. If you are building a system that is closed, proprietary, and local (such as the secure NOC I was working on), obscurity can be a valuable spice in the dish that makes a tasty meal tastier.

We are also seeing changes in the threat model that justifies a revision in our defense model. A few years ago, the attackers were using broadcast attacks. They didn’t look at the lies we told them because they were unskilled attackers throwing all the handy exploits they had. They wouldn’t see embarrassments that didn’t fit their model. I have a story about that I’ll post soon.

The trend in attacks is that they are becoming slow, targeted, and with a clear goal — money. They also want not only to succeed, but to succeed undetected. A measure that increases the attacker’s uncertainty increases the attacker’s risk of being caught.

Here’s an informal example. Suppose I divide my system into an external “red” network and an internal “black” network. All connections use TLS with AES-256, but on the black network, we are not using standard AES, we’re using a modified AES that real cryptographers agree is as secure, just incompatible with AES; call it AEN for Advanced Encryption Non-standard. Cryptographers have a formal notion of this that they call “family keys.” AEN is my spice. On the black network, you’re expected to use AEN. We just compiled it into OpenSSL where AES was supposed to be. The resulting system is just as secure as one that uses AES everywhere, but has this extra little twist. It makes the attacker’s job harder, and makes our job of detecting an attack easier. It has costs, of course, which you can think of as well as I can. But in my system, which is not only closed, but I want to be closed, they’re not bad costs to pay. Even better, if I publicize that I’ve done this, I might convince an attacker to target someone else.

If you remember that obscurity is not security, that it is camouflage rather than armor, that it is not scalable, that it is only as good as the obscurity itself is, there might be places you can use it effectively. Also, not all security theatre is bad. What is bad is only having theatre and not backing up obscurity with real security.
Photo of theatre security courtesy of Luigi Rosa.

Periodic Spiral

periodic-spiral.jpgThe periodic table is under-appreciated as a design masterpiece, and as an iconic representation of science. The table works as a taxonomy, showing someone who knows how to read it a great deal of information about the elements based on their arrangement in space.

So it’s pretty audacious to come out with a re-design:
The Periodic Spiral envisions a remedy to the flaws in conventional periodic tables by illustrating hydrogen’s ambiguous relationship to the noble gases and halogens while recognizing its relationship to the alkali metals; it also fully integrates the lanthanons and actinons into the design.

Via Information Esthetics.

Do Kings Play Chess on Folding Glass Stools?

butterflies.jpgOver at the OSVDB blog, blogauthor writes:

On September 29, Stefan Esser posted an advisory in which he said “While searching for applications that are vulnerable to a new class of vulnerabilities inside PHP applications we took a quick look…“. This lead me to remember an article last year titled Microsoft unveils details of software security process in which Window Snyder (former Microsoft security strategist) said “These are entire classes of vulnerabilities that I haven’t seen externally. When they found these, (the developers) went on a mission, found them in all parts of the system, and got rid of them.” referring to vulnerabilities that were proactively removed. The article goes on to say “Moreover, the company found and fixed two classes of vulnerabilities that have not been discovered elsewhere, she said.”

Anyone else curious about these? Less than a year, and three new classes of vulnerabilities? Come on Window, you left Microsoft, you can speak up now! Steffan, spill the beans, give us details!

So, here are the details. No, just kidding. I can’t talk about the details, but what I can talk about are taxonomies. I can talk about taxonomies for hours. I think, by analogy, that stack smashing may be an order. Perhaps a family. Closely related are the integer overflow and format string. Each places code in the expected path of execution, overwriting it. More distant are command stuffing (my term for the classic “; echo $stuff > /etc/passwd”) or sql injection. Cross site scripting belongs to the phyla of code/data separation, or perhaps the family of output validation.

I’m not sure if there’s a taxonomy here at all. By taxonomy I mean a repeatable, exclusive, reproducible system of questions that a variety of experts can ask of a sample and classify it in the same way. To be a taxonomy, you need exclusivity. You can’t be both a person and a penguin. Not all data fits neatly into taxonomies because of that exclusivity requirement. You can, for example, be both a Mac and Windows user. Thus, being a Mac or PC user isn’t a good taxonomic classification.

What’s the natural ordering of relations of emergent phenomenon?

Oh, the title? It’s a memonic for the Linean taxonomy of life: kingdom, phyla, class, order, family, genus, species. And the photo is Drawers of Curiosities, by smalleyta.

What’s in a Name?

rose.jpgA rose by any other name might smell as sweet, but it would certainly be confusing to order online. Consistent naming is useful, but requires much effort to get right. In identity management, which I hadn’t thought of as closely related to taxonomies, Zooko has argued that names can be “secure, decentralized or human memorable (pick any two).” I think this applies to taxonomists as well. All of this is inspired because the February 11th Economist has two articles on taxonomy! The first was an article on naming consistency in biology “Today we have naming of parts,” and the second covered that there are “Names for Sale:”

Last year, for example, America’s president, vice president and defence secretary each got a beetle (Agathidium bushi, A. cheneyi, A. rumsfeldi) courtesy of two Republican coleopterists. Admittedly, the beetles in question eat slime mould, which caused a few titters among taxonomists of a Democrat persuasion, but it is clearly an act of gross speciesism to criticise the dining habits of other organisms, so the titters were sotto voce. And it is not only politicians who are benefiting. Sting, a musician, has his own tree frog (Hyla stingi), and several spiders also bear the names of entertainers (Calponia harrisonfordi, Pachygnatha zappa) who clearly have taxonomists as fans.

Ironically, the last post I offered up on this subject was “A Profusion of Taxonomies,” after which, on that topic, the rest was silence.

Portland 151” rose photo by Brian Lopez.

A Profusion of Taxonomies

In “In the Classification Kingdom, Only the Fittest Survive,” Carol Kaesuk Yoon writes about the profusion of naming schemes for animals:

Then there’s uBio, which has sidestepped the question of codes and regulations altogether and instead aims to record every single name ever used for any organism, scientific or common, correct or incorrect, down to the last variation and misspelling, as a way of linking all information ever recorded about an organism together.

The All Species Foundation aims not only to record all names but also to find every species and describe it, all in 25 years. And then there’s Wikispecies, Species 2000, the Electronic Catalogue of Names of Known Organisms and many more. Some have already come and gone, or nearly so, and others are expiring for lack of sustained funds.

So ZooBank finds itself born in the midst of a Cambrian explosion of initiatives, a proliferation not merely of Web sites and databases but of ideas about how to accomplish the task of naming and organizing all of life. And though disorder may be the most abhorrent thing to a tidy taxonomist, sometimes a little chaos can be healthy. [mmm, chaos!]

And I used to think this was simple. But as Clay Shirky has pointed out, vocabularies are most useful for a particular task, and different tasks, even in the same domain, may require slightly different “meta-data.” (That is, the information about the data in the taxonomy.)

I’ll note that uBio sounds a lot like the CVE, which is a computer vulnerability concordance, (concordance at Wikipedia) even though not everyone agrees with that definition.

A few Typographies of Bloggers

First, a very brief bit of terminology: A typography is a way to organize things, much like a taxonomy. Each item within a typography has clearly distinguishing characteristics, but there’s no hierarchy such as animal, vertebres, mammals, hominids, humans. To be honest, I’m not sure if this is a typography or just some categories. But “A few categories…” would be far less fun as a headline.

At BlogNashville, Rebecca McKinnon discussed the concept of “bridge bloggers,” those bloggers who make an effort to blog about their country in a way that an outsider or foreigner can understand. Its a great concept, but I’m having trouble finding a good link. Anyone? So much of what so many bloggers say is “inside baseball,” things that are hard for folks outside the club to understand (or even understand why you might bother to say them). This doesn’t just happen across national boundaries, it also takes place across organizational or professional lines. Milbloggers and peace bloggers often seem to be on different planets. No one takes the time to explain their orientation.

There are a few information security bridge bloggers: Steven Hofmeyer nthWorld, the mysterious John at “Internet Security: Be Careful,” Deb Radcliffe at “Security Chief.” Some people might stick Bruce Schneier may fit into the category; his last book was intended as a bridge, but his blog doesn’t always seem to fit.

In a closely related post, “An update from the Weblog Workshop” Ethan Zuckerman posts:

Shinsuke Nakajima from NAIST introduces three ways to think about key bloggers: topic-finders, agitators and summarizers. He talks most about the second two types and methods for detecting them. Summarizers, unsurprisingly, link to lots of people. Agitators can be found by looking for a drastic change in entries posted within a thread, or a drastic change in topic.

Its not original, but still important to note that there’s a split between personal life bloggers (the “Livejournal crowd”) and issue bloggers. Many people maintain both.

And look, once again, it’s Technorati’s tag. Isn’t there a way to hide that?

My Categories Suck

The categories I’ve set for this blog are non-functional. I have 16 categories, of which maybe 4 are ever exclusive.

Do you look at my categorization of posts? Do you look at the category archives?
Should I create a new set of categories? If so, what? (mmm, Choicepoint! Not.) Should I abandon categories and go to tagging? If so, what Movable Type/MarsEdit add-on should I use?

A Few Ideas Connected by the Tag “Folksonomy”

Nude Cybot, in an email in which he promises to emerge soon, presumably to be exceptionally cold, mentions that folksonomies have hit Wired News. The Wired article points out that there are more “cat” (16,297) tagged images than “dog” (14,041) in Flickr. But the conclusion they draw from this, “If the photo-sharing site Flickr is any indication, the world of digital photographers is dominated by cat people” is very dependent on the search. Puppy (2145) beats kitten (1912). As I discuss in Economics of Taxonomies, the cost of easy classification can be difficulty in searching. Deciding which tags are close enough to kitten to be included in the count is subjective. (Flickr suggests “Related: cat, cats, cute” and that you “See also:
kitty, animal, kittens, pet, animals, pets, black, sleeping, sleep, bw, white”

This relates closely to the idea of Keynes’ Beauty Contests, where your goal was not really to decide which was the most beautiful woman out of a set of photos published by Flickr the newspaper, but to select the one picked by the most other people. This might indicate that those skilled at groupthink will do well in a folksonomy-centric world.

A different way to state that, which would get far fewer nods, because the ideas are more rare, would be to say that those with different orientations may well be disadvantaged by their need to spend energy observing the mainstream, unless they use those analysis to guide their decisions and actions to take advantage of the orientation differences. In this way, those Microsofties with Ipods could be doing their company a great service.

[Prior posts include “Folksonomies, Tested“, and “Economics of Taxonomies“.]

Folksonomies, Tested

I’ve just stumbled across this abstract comparing full-test searching to controlled vocabulary searching. The relevance to Clay’s posts on controlled vocabularies is that our intuitive belief that controlled vocabulary helps searching may be wrong. Unfortunately, the full paper is $30–perhaps someone with an academic library can comment.

…In this paper, we focus on an experiment in which different component indexing and retrieval methods were tested. The results are surprising. Earlier work had often shown that controlled vocabulary indexing and retrieval performed better than full-text indexing and retrieval…, but the differences in performance were often so small that some questioned whether those differences were worth the much greater cost of controlled vocabulary indexing and retrieval … In our experiment, we found that full-text indexing and retrieval of software components provided comparable precision but much better recall than controlled vocabulary indexing and retrieval of components. There are a number of explanations for this somewhat counter-intuitive result, including the nature of software artifacts, and the notion of relevance that was used in our experiment. We bring to the fore some fundamental questions related to reuse repositories.