I knew that when I left the world of academia and became an industrial scientist, my relationship with the rest of my field would change. I knew I wouldn't be writing so many papers, but I thought I'd still have access to what others had written. This week I had an experience that showed me how hard it really is for the general public to get access to science as it happens, and what a privilege it is to be a professor.
I needed some data about what results to expect from an experiment a coworker had done. When I was in school or worked at the national laboratories, I could look that information up in a database at my library. (I'm referring to the database that I discussed in a previous post.) The database costs thousands of dollars a year, but that's nothing for a national laboratory employing 900 PhD's. The University of Michigan had a subscription, but I was shocked to discover that Case Western Reserve University does not. They have a pretty good materials science graduate program there, so I expected them to carry it. My company pays for a third-party service to search intellectual property databases and the technical literature, but they didn't have it either. It looks like in order to get the data that I need to make a comparison between our experiments and a model, I will have to drive to Michigan and visit the library in Ann Arbor. Or ask a favor of a colleague.
A similar thing happens with what I've called the technical literature. That's all the peer-reviewed articles that have been used for hundreds of years to present new scientific results in journals. As an industrial scientist, I depend on keeping abreast of new developments in my field, and looking up solutions to old problems. In academia, that was trivial: universities maintain digital subscriptions to the major journals, so I could just download articles from my desk. At my current job, our third-party service can alert me to new articles I should read, but they won't retrieve them for me. I have to go to a library that has a subscription to that journal and photocopy it. If no local library has it, I have to pay the publisher $30-45 for a copy.
Why? Databases and journals cost money. The five or so major scientific publishers have been raising subscription prices at a rate that far outstrips inflation - something like the rapidly increasing price of an undergraduate education. They can get away with it because it is very close to a single-payer system. Very few private libraries subscribe to these journals; the places that do are major corporate research labs like IBM Thomas J. Watson Research Center. The entire national laboratory system's subscriptions are paid for by the US Government, and ultimately most public and private universities' subscriptions are too, through research grants. They ask for more money and the government pays, and the government doesn't try to do too much about it because of the publishers' lobbying power. Who loses? The general public, and the thousands of less-than-immense corporate research labs, like mine, scattered throughout the US.
I came away from this experience with a strong sense of why technology companies are concentrated in cities with major research universities. It's the library. There are some scientific problems that just can't be solved without one.
Hardwired for abstractions
Recently I was part of a discussion about whether a particular musician's work was art or schlock. Someone commented that those who call it schlock are members of the same class of self-appointed arbiters of taste that pollute every artistic field.
And I thought, yes, the leaders in every field of human endeavor, not just art, maintain their lead by constantly creating new abstractions, forcing others to learn them. Trading coconuts for bananas became gold currency which became paper currency which became credit which became junk bonds which became packaged mortgages.
And that's what I suspect differentiates genetically modern humans from other species: we are hardwired for abstractions. A successful quest for food or sex leads us to continue pursuing the winning strategy. In apes, it ends there. But in humans, we're saddled with self-awareness, which forces us to create a worldview. Every success or failure has to be integrated into that worldview with an abstraction, or else it makes us feel anxious that we don't know how the world works. So: I killed the deer because I left a heap of apples for it to eat. I got laid because I used that cologne. These explanations give us a handle on what to do next. They might be utterly wrong, but they dispel the angst of believing we're ignorant.
This post was an abstraction. But was it art or schlock?
And I thought, yes, the leaders in every field of human endeavor, not just art, maintain their lead by constantly creating new abstractions, forcing others to learn them. Trading coconuts for bananas became gold currency which became paper currency which became credit which became junk bonds which became packaged mortgages.
And that's what I suspect differentiates genetically modern humans from other species: we are hardwired for abstractions. A successful quest for food or sex leads us to continue pursuing the winning strategy. In apes, it ends there. But in humans, we're saddled with self-awareness, which forces us to create a worldview. Every success or failure has to be integrated into that worldview with an abstraction, or else it makes us feel anxious that we don't know how the world works. So: I killed the deer because I left a heap of apples for it to eat. I got laid because I used that cologne. These explanations give us a handle on what to do next. They might be utterly wrong, but they dispel the angst of believing we're ignorant.
This post was an abstraction. But was it art or schlock?
Confidence and trust when presentations fail
We're never surprised when an inexperienced coworker puts up a chart
that doesn't make any sense or fails to prove their point. But we
expect better from scientists. And when a scientist puts up a bad graph
in a professionally produced brochure, while trying to sell data to
other scientists ... well, it's a cautionary tale. When this happens, should you merely lose
confidence in the presenter or actively distrust them? Let me go
through a few of the rules of data visualization with the help of a
counterexample from the International Centre for Diffraction Data.
The ICDD sells a database called the Powder Diffraction File which is used by crystallographers. Here, they've plotted the size of the database (which they call its "value") on the left Y axis, and the price on the right Y axis. Time is on the X axis, so you're looking at history from left to right. The first problem here (common to all double-Y graphs) is that it takes effort to sort out which line corresponds to the numbers on which side. The blue line corresponds to the numbers on the left Y axis, but you can only learn that by finding the word "Entries" in the legend and then finding the same word along the left Y axis. They should have attached an arrow to the blue line, pointing leftwards.
The second, and most glaring, problem is that the Y axes don't start at zero. Look folks, whenever you make a bar chart or a scatter plot or anything where vertical position implies "more", the bottom should be "zero" unless you have a damn good reason for it not to be. I don't care what Excel's defaults are, it doesn't communicate effectively and it can be actively misleading. In this case, zero entries is no database, and zero price is free, which is exactly what the ICDD is competing against. As it is, I can't really tell how much the database costs today, except that it's somewhere between $3000 and $8000.
Probably. The 2011 list price could actually be more than $8000, because "$8000" is the second of eight labels on the right Y axis whereas the red line treads below the second of only seven horizontal lines across the graph. AAAARRRGGH.
Fourth, I have to ask why the right Y axis goes up to a price of $38000 when they don't sell anything worth more than $8000. (Probably.) Should they have started the right Y axis at zero and topped it out at $10000? Probably.
The truth is, what they did is they plotted the number of "entries" and got a nice looking upwards slope. Then they they plotted the "price" and got two noisy, up-and-down but vaguely increasing swaths across the middle of the plot. So they jacked up the maximum value of the "price" axis until the list price of the first data point, 1987, sat on top of the number of entries for that year. Hence the seemingly random top price of $38000. For the minimum on the price axis, they either left Excel's default nonzero value, or they actually changed it from zero to $3000 to make their prices look smaller. I can understand that they're trying to imply that as time has gone on, users have actually gotten proportionally more value (entries) for a given price. But in order to make a proportionality argument, you have to put a straight line through zero ... and zero is off the bottom of this plot.
Putting yourself in your audience's shoes and presenting information well is a learned skill. Failing to do so makes you look like an amateur, which is forgivable but doesn't build your audience's confidence in you. It's worse to fail to be honest about how well the actual numbers support your interpretation of them. That manipulation erodes your audience's trust in you, which is almost always a more valuable asset than winning a particular argument.
Click for big. [from http://www.icdd.com/products/2011SalesCatalog.pdf] |
The second, and most glaring, problem is that the Y axes don't start at zero. Look folks, whenever you make a bar chart or a scatter plot or anything where vertical position implies "more", the bottom should be "zero" unless you have a damn good reason for it not to be. I don't care what Excel's defaults are, it doesn't communicate effectively and it can be actively misleading. In this case, zero entries is no database, and zero price is free, which is exactly what the ICDD is competing against. As it is, I can't really tell how much the database costs today, except that it's somewhere between $3000 and $8000.
Probably. The 2011 list price could actually be more than $8000, because "$8000" is the second of eight labels on the right Y axis whereas the red line treads below the second of only seven horizontal lines across the graph. AAAARRRGGH.
Fourth, I have to ask why the right Y axis goes up to a price of $38000 when they don't sell anything worth more than $8000. (Probably.) Should they have started the right Y axis at zero and topped it out at $10000? Probably.
The truth is, what they did is they plotted the number of "entries" and got a nice looking upwards slope. Then they they plotted the "price" and got two noisy, up-and-down but vaguely increasing swaths across the middle of the plot. So they jacked up the maximum value of the "price" axis until the list price of the first data point, 1987, sat on top of the number of entries for that year. Hence the seemingly random top price of $38000. For the minimum on the price axis, they either left Excel's default nonzero value, or they actually changed it from zero to $3000 to make their prices look smaller. I can understand that they're trying to imply that as time has gone on, users have actually gotten proportionally more value (entries) for a given price. But in order to make a proportionality argument, you have to put a straight line through zero ... and zero is off the bottom of this plot.
Putting yourself in your audience's shoes and presenting information well is a learned skill. Failing to do so makes you look like an amateur, which is forgivable but doesn't build your audience's confidence in you. It's worse to fail to be honest about how well the actual numbers support your interpretation of them. That manipulation erodes your audience's trust in you, which is almost always a more valuable asset than winning a particular argument.
Subscribe to:
Posts (Atom)