Confidence and trust when presentations fail

We're never surprised when an inexperienced coworker puts up a chart that doesn't make any sense or fails to prove their point.  But we expect better from scientists.  And when a scientist puts up a bad graph in a professionally produced brochure, while trying to sell data to other scientists ... well, it's a cautionary tale.  When this happens, should you merely lose confidence in the presenter or actively distrust them?  Let me go through a few of the rules of data visualization with the help of a counterexample from the International Centre for Diffraction Data
Click for big.  [from http://www.icdd.com/products/2011SalesCatalog.pdf]
The ICDD sells a database called the Powder Diffraction File which is used by crystallographers.  Here, they've plotted the size of the database (which they call its "value") on the left Y axis, and the price on the right Y axis.  Time is on the X axis, so you're looking at history from left to right.  The first problem here (common to all double-Y graphs) is that it takes effort to sort out which line corresponds to the numbers on which side.  The blue line corresponds to the numbers on the left Y axis, but you can only learn that by finding the word "Entries" in the legend and then finding the same word along the left Y axis.  They should have attached an arrow to the blue line, pointing leftwards.
The second, and most glaring, problem is that the Y axes don't start at zero.  Look folks, whenever you make a bar chart or a scatter plot or anything where vertical position implies "more", the bottom should be "zero" unless you have a damn good reason for it not to be.  I don't care what Excel's defaults are, it doesn't communicate effectively and it can be actively misleading.  In this case, zero entries is no database, and zero price is free, which is exactly what the ICDD is competing against.  As it is, I can't really tell how much the database costs today, except that it's somewhere between $3000 and $8000. 

Probably.  The 2011 list price could actually be more than $8000, because "$8000" is the second of eight labels on the right Y axis whereas the red line treads below the second of only seven horizontal lines across the graph.  AAAARRRGGH.

Fourth, I have to ask why the right Y axis goes up to a price of $38000 when they don't sell anything worth more than $8000.  (Probably.)  Should they have started the right Y axis at zero and topped it out at $10000?  Probably. 

The truth is, what they did is they plotted the number of "entries" and got a nice looking upwards slope.  Then they they plotted the "price" and got two noisy, up-and-down but vaguely increasing swaths across the middle of the plot.  So they jacked up the maximum value of the "price" axis until the list price of the first data point, 1987, sat on top of the number of entries for that year.  Hence the seemingly random top price of $38000.  For the minimum on the price axis, they either left Excel's default nonzero value, or they actually changed it from zero to $3000 to make their prices look smaller.  I can understand that they're trying to imply that as time has gone on, users have actually gotten proportionally more value (entries) for a given price.  But in order to make a proportionality argument, you have to put a straight line through zero ... and zero is off the bottom of this plot.

Putting yourself in your audience's shoes and presenting information well is a learned skill.  Failing to do so makes you look like an amateur, which is forgivable but doesn't build your audience's confidence in you.  It's worse to fail to be honest about how well the actual numbers support your interpretation of them.  That manipulation erodes your audience's trust in you, which is almost always a more valuable asset than winning a particular argument.

No comments:

Post a Comment