I don’t actually want to talk about dataviz here, I want to talk about the data that is visualized. I routinely see graphs that are not (necessarily) bad as misleading graphs, but is bad data to be presented in a graph. There are plenty of examples of unreasonably constrained axes, or simply incorrect bar heights — but that’s not the problem for today.
Today, I want to give an example of data that is displayed as if the information is comparable, when it isn’t – like dollars and scores, or percentage improvement versus totals. What do I mean? I have a great example!
This graph is a masterpiece of the errors I am talking about. And it seems the very recently deceased Dr. Coulson is being maligned by a wiki article on Cato attributing this graph to him. (At the very least, the original seems to have kept dollars and percentages separate.) This graph tries hard to make incomparable data comparable, by displaying percentage change of a variety of incomparable datasets — which is better than showing the comparable raw data, right?
Well, no. At least not here. But why are they incomparable?
First, we have NAEP scores, which are inconsistently measured over the time period; the meaning of the metric changed repeatedly over the time period displayed, as academic standards have been altered to reflect the changing abilities and needs of students.
They are also scores, and as I’m sure everyone is aware, the difference between a 1300 and an 1400 on the SAT is much smaller than the difference between a 1500 and a 1600. Percentage improvements on these tests are not a great comparison. They are also a range-bound number; the scores are in the range 0–500, so that doubling the math scores is not only not linear, but in most cases literally impossible; it’s already around 300.
Next, the basis for all of these numbers is non-constant, in an interesting way. The chart presents enrollment as a total, but ignores the changing demographic mix — and no, this isn’t about the soft bigotry of low expectations, it’s about the expanding school population. Expanding? Yes — because the number is constant, but the total is shrinking. (Chart by Bill McBride)
The 1970s were the height of the baby boom — and the percentage of people who were going to school was still on an upwards trend;
The totals were flat, but the demographic split wasn’t, and the percentage of low achievers, who are the least likely to attend, is increasing. And the demographic composition of schools matters. But I won’t get into divergent birth rates and similar demographic issues any further for now.
But what about cost? I mean, clearly that can’t be deceptive — we’re spending more, because we keep hiring more teachers, like the chart seems to show! But we aren’t —teachers only increased by 50% in that time, not nearly 100%. But the chart isn’t wrong — they’re hiring more staff (largely to deal with regulations, as I’m sure Cato would agree.)
And this also explains why total cost went up — we have way more non-teacher staff, many of whom are much more expensive. We’re also neglecting the fact that the country is richer, and as a share of GDP, we’re way behind, because we pay teachers the same amount, but the economy as a whole grew. But that’s a different issue.
So yes, we can show a bunch of numbers correctly on a chart, but it won’t mean what it looks like if we’re sloppy — or purposefully misleading.