Shorrock’s Law of Limits

I recently saw an interesting new insight into the dynamics of over-optimization failures stated by Steven Shorrock; “When you put a limit on a measure, if that measure relates to efficiency, the limit will be used as a target.” This seems to be a combination of several dynamics that can co-occur in at least a couple ways, and despite my extensive earlier discussion of related issues, I think it’s worth laying out these dynamics along with a few examples to illustrate them.

When limits become targets

First, there is a general fact about constrained optimization that, in simple terms, says that for certain types of systems the best solution to a problem is going to involve hitting one of the limits. This was formally shown in a lemma by Dantzig about the simplex method, where for any convex function the maximum must lie at an extreme point in the space. (Convexity is important, but we’ll get back to it later.)

When a regulator imposes a limit on a system, it’s usually because they see a problem with exceeding that limit. If the limit is a binding constraint — that is, if you limit something critical to the process, and require a lower level of the metric than is currently being produced, the best response is to hug the limit as closely as possible. If we limit how many hours a pilot can fly (the initial prompt for Shorrock’s law,) or that a trucker can drive, the best way to comply with the limit is to get as close ot the limit as possible, which minimizes how much it impacts overall efficiency.

There are often good reasons not to track a given metric, when it is unclear how to measure it, or when it is expensive to measure. A large part of the reason that companies don’t optimize for certain factors is because they aren’t tracked. What isn’t measured isn’t managed — but once there is a legal requirement to measure it, it’s much cheaper to start using that data to manage it. The companies now have something they must track, and once they are tracking hours, it would be wasteful not to also optimize for them.

Even when the limit is only sometimes reached in practice before the regulation is put in place, formalizing the metric and the limitation means that it becomes more explicit — leading to reificiation of the metric. This isn’t only because of the newly required cost of tracking the metric, it’s also because what used to be a difficult to conceptualize factor like “tiredness” now has a newly available albeit imperfect metric.

Lastly, there is the motivation to cheat. Before fuel efficiency standards, there was no incentive for companies to explicitly target the metric. Once the limit was put into place, companies needed to pay attention — and paying attention to a specific feature means that decisions are made with this new factor in mind. The newly reified metric gets gamed, and suddenly there is a ton of money at stake. And sometimes the easiest way to perform better is to cheat.

So there are a lot of reasons that regulators should worry about creating targets, and ignoring second-order effects caused by these rules is naive at best. If we expect the benefits to just exceed the costs, we should adjust those expectations sharply downward, and if we haven’t given fairly concrete and explicit consideration to how the rule will be gamed, we should expect to be unpleasantly surprised. That doesn’t imply that metrics can’t improve things, and it doesn’t even imply that regulations aren’t often justifiable. But it does mean that the burden of proof for justifying new regulation needs to be higher that we might previously have assumed.


Goodhart’s law, Changing Wikipedia, and the Hawthorne Effect

I’ve been interested in Goodhart’s law for a long time, and in the past couple years even wrote a few articles about it. So I’ve left a column on Tweetdeck running with a search for Goodhart’s law, to see how it is used and discussed.

If you’re not familiar, the popular paraphrase of Goodhart’s law is “When a measure becomes a target, it ceases to be a good measure.” This quote bothered me for a long time, since it is a significant generalization of Goodhart’s original phrasing, “any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.” This was confusing until I saw a tweet saying that the popular paraphrase is known as “Strathern’s Variation,” and I found that others had noted the same thing. This prompted me to investigate.

Digging through the Wikipedia edit history, I found a reference to Strathern that had since been edited out, citing a 2007 publication, “Wireless Communications: The Future.” This wasn’t available online, and I was fairly sure I had seen the quote before then anyways. Digging, I found the origninal source; a 1997 paper by Strathern. So on August 4th, I edited Wikipedia to include the fact that the frequently quoted paraphrase of Goodhart’s law is actually hers, and added a link.

From August 1st-4th, I count 14 mentions of the term “Goodharts Law” on Twitter. That’s probably par for the course; it gets mentioned around 100 times a month. But before August, I can find 1 time that Strathern has been mentioned referring to the quote this year — the one prompting my investigation — as opposed to 3 in the month ending September 2nd — and another several dozen in the week since due to bots retweeting a Techcrunch article that leads with the quote. This isn’t yet statistically significant, but it’s an interesting impact to notice.

The problem with writing this article, then, is that it brings further attention to the issue — and that highlights the difference between Goodhart’s law and the Hawthorne Effect, an earlier and simpler claim that paying attention affects measurement. The appearance of the article potentially warps how well my measure represents the effect of the original edit, but it’s not placing any pressure on the measure.

Chasing Superior Good Syndrome vs. Baumol’s (or Scott’s) Cost Disease

Slatestarcodex had an excellent (as always) piece on “Considerations on Cost Disease.” It goes over a number of reasons, aside from Baumol’s cost disease, for why everything in certain sectors, namely healthcare and have gotten much more expensive. I think it misses an important dynamic, though, that I’d like to lay out.

First, though, he has a list of eight potential answers, each of which he partly dismisses. Cost increases are really happening, and markets mostly work, so it’s not simply a market failure. Government inefficiency and overregulation doesn’t really explain large parts of the problem, nor do fear of lawsuits. Risk tolerance has decreased, but that seems not to have been the sole issue. Cost shirking by some people might increase costs a bit, but that isn’t the whole picture. Finally, not on that list but implicitly explored when Scott refers to “politics,” is Moloch.

I think it’s a bit strange to end a piece with a long list of partial answers, which plausibly explain the vast majority of the issue with “ What’s happening? I don’t know and I find it really scary.” But I think there is another dynamic that’s being ignored — and I would be surprised if an economist ignored it, but I’ll blame Scott’s eclectic ad-hoc education for why he doesn’t discuss the elephant in the room — Superior goods.

Superior Goods

For those who don’t remember their Economics classes, imagine a guy who makes $40,000/year and eats chicken for dinner 3 nights a week. He gets a huge 50% raise, to $60,000/year, and suddenly has extra money to spend — his disposable income probably tripled or quadrupled. Before the hedonic treadmill kicks in, and he decides to waste all the money on higher rent and nicer cars, he changes his diet. But he won’t start eating chicken 10 times a week — he’ll start eating steak. When people get more money, they replace cheap “inferior” goods with expensive “superior” goods. And steak is a superior good.

But how many times a week will people eat steak? Two? Five? Americans as a whole got really rich in the 1940s and 1950s, and needed someplace to start spending their newfound wealth. What do people spend extra money on? Entertainment is now pretty cheap, and there are only so many nights a week you see a movie, and only so many $20/month MMORPGs you’re going to pay for. You aren’t going to pay 5 times as much for a slightly better video game or movie — and although you might pay double for 3D-Imax, there’s not much room for growth in that 5%.

The Atlantic had a piece on this several years ago, with the following chart;

Food, including rising steak consumption, decreased to a negligible part of people’s budgets, as housing started rising.In this chart, the reason healthcare hasn’t really shot up to the extent Scott discussed, as the article notes, is because most of the cost is via pre-tax employer spending. The other big change the article discusses is that after 1950 or so, everyone got cars, and commuted from their more expensive suburban houses — which is effectively an implicit increase in housing cost.

And at some point, bigger houses and nicer cars begin to saturate; a Tesla is nicer than my Hyundai, and I’d love one, but not enough to upgrade for 3x the cost. I know how much better a Tesla is — I’ve seen them.

Limitless Demand, Invisible Supply

There are only a few things that we have a limitless demand for, but very limited ability to judge the impact of our spending. What are they?

I think this is one big missing piece of the puzzle; in both healthcare and education, we want improvements, and they are worth a ton, but we can’t figure out how much the marginal spending improves things. So we pour money into these sectors.

Scott thinks this means that teachers’ and doctors’ wages should rise, but they don’t. I think it’s obvious why; they supply isn’t very limited. And the marginal impact of two teachers versus one, or a team of doctors versus one, isn’t huge. (Class size matters, but we have tons of teachers — with no shortage in sight, there is no price pressure.)

What sucks up the increased money? Dollars, both public and private, chasing hard to find benefits.

I’d spend money to improve my health, both mental and physical, but how? Extra medical diagnostics to catch problems, pricier but marginally more effective drugs, chiropractors, probably useless supplements — all are exploding in popularity. How much do they improve health? I don’t really know — not much, but I’d probably try something if it might be useful.

I’m spending a ton of money on preschool for my kids. Why? Because it helps, according to the studies. How much better is the $15,000/year daycare versus the $8,000 a year program a friend of mine runs in her house? Unclear, but I’m certainly not the only one spending big bucks. Why spend less, if education is the most superior good around?

How much better is Harvard than a subsidized in-state school, or four years of that school versus 2 years of cheap community college before transferring in? The studies seem to suggest that most of the benefit is really because the kids who get into the better schools. And Scott knows that this is happening.

We pour money into schools and medicine in order to improve things, but where does the money go? Into efforts to improve things, of course. But I’ve argued at length before that bureaucracy is bad at incentivizing things, especially when goals are unclear. So the money goes to sinkholes like more bureaucrats and clever manipulation of the metrics that are used to allocate the money.

As long as we’re incentivized to improve things that we’re unsure how to improve, the incentives to pour money into them unwisely will continue, and costs will rise. That’s not the entire answer, but it’s a central dynamic that leads to many of the things Scott is talking about — so hopefully that reduces Scott’s fears a bit.

Deceptive Dataviz and Confusing Data; Uncomparables in Education

I don’t actually want to talk about dataviz here, I want to talk about the data that is visualized. I routinely see graphs that are not (necessarily) bad as misleading graphs, but is bad data to be presented in a graph. There are plenty of examples of unreasonably constrained axes, or simply incorrect bar heights — but that’s not the problem for today.

Today, I want to give an example of data that is displayed as if the information is comparable, when it isn’t – like dollars and scores, or percentage improvement versus totals. What do I mean? I have a great example!

This graph is a masterpiece of the errors I am talking about. And it seems the very recently deceased Dr. Coulson is being maligned by a wiki article on Cato attributing this graph to him. (At the very least, the original seems to have kept dollars and percentages separate.) This graph tries hard to make incomparable data comparable, by displaying percentage change of a variety of incomparable datasets — which is better than showing the comparable raw data, right?

Well, no. At least not here. But why are they incomparable?

First, we have NAEP scores, which are inconsistently measured over the time period; the meaning of the metric changed repeatedly over the time period displayed, as academic standards have been altered to reflect the changing abilities and needs of students.

They are also scores, and as I’m sure everyone is aware, the difference between a 1300 and an 1400 on the SAT is much smaller than the difference between a 1500 and a 1600. Percentage improvements on these tests are not a great comparison. They are also a range-bound number; the scores are in the range 0–500, so that doubling the math scores is not only not linear, but in most cases literally impossible; it’s already around 300.

Next, the basis for all of these numbers is non-constant, in an interesting way. The chart presents enrollment as a total, but ignores the changing demographic mix — and no, this isn’t about the soft bigotry of low expectations, it’s about the expanding school population. Expanding? Yes — because the number is constant, but the total is shrinking. (Chart by Bill McBride)

The 1970s were the height of the baby boom — and the percentage of people who were going to school was still on an upwards trend;

The totals were flat, but the demographic split wasn’t, and the percentage of low achievers, who are the least likely to attend, is increasing. And the demographic composition of schools matters. But I won’t get into divergent birth rates and similar demographic issues any further for now.

But what about cost? I mean, clearly that can’t be deceptive — we’re spending more, because we keep hiring more teachers, like the chart seems to show! But we aren’t —teachers only increased by 50% in that time, not nearly 100%. But the chart isn’t wrong — they’re hiring more staff (largely to deal with regulations, as I’m sure Cato would agree.)

And this also explains why total cost went up — we have way more non-teacher staff, many of whom are much more expensive. We’re also neglecting the fact that the country is richer, and as a share of GDP, we’re way behind, because we pay teachers the same amount, but the economy as a whole grew. But that’s a different issue.

So yes, we can show a bunch of numbers correctly on a chart, but it won’t mean what it looks like if we’re sloppy — or purposefully misleading.