December 2016 – David Manheim's Blog

“Guys,” he writes. “It’s time for some game theory.” Game theory, for the uninitiated, is a branch of mathematics that uses computational models to predict the behavior of human beings in potentially conflictual situations. It’s complex, involves a lot of formal logic and algebra, and is mostly useless. Game theory models human actions on the presumption that everyone is constantly trying to maximize their potential gain against everyone around them; this is why its most famous example concerns prisoners — isolated people, cut off from all the noncompetitive ties that constitute society.

I agree with Sam Kriss about a few points he made in his article on Garland’s now famous/infamous thread. I take issue with his unjustified attack on John Nash, but I don’t blame him for his ignorance. Not many of us know game theorists — though I happen to have a nice one sitting down the hall from me. But the attack on game theory itself seems silly; it’s the basis for a ton of microeconomics and decision theory that was written in the past 50 years, and may have prevented nuclear war. Given what he said, then, Sam Kriss is a bit of an idiot — but unlike Garland, at least he’s a game theoretically optimal one.

Game theory is about describing and understanding the interaction of multiple parties that act at least somewhat rationally — and while Kriss’s straw man isn’t entirely wrong, it’s certainly not right. It’s not always complex, doesn’t always require algebra, and has essentially nothing to do with formal logic — another field I assume Kriss knows nothing about.

An Example

A journalist wants to sound intelligent to their audience, but knows little about most subjects. They have several choices — attempt to learn enough to actually be educated on every subject, learn just enough to sound educated to a lay-person, or not bother to learn anything, and either use technical terms very wrongly, in a way transparent to most readers, or not use them at all.

This has different effects on different people; Journalists have a cost of getting it right, and a cost of sounding dumb. The important difference is that in some cases — being only partly educated, or faking it — there can be a consequence for getting caught or called out, and a benefit for the educated in doing so.

A game theorists would represent this notionally, below. (We don’t know the exact values of each option, but a sketch will be helpful for thinking about it.) For each person, they have a payout for what happens. In the second and third column, the result depends on whether the journalist is called out by someone knowledgeable. The first entry is if they are not called out, the second if they are. Obviously, these numbers are not exact, but they are useful in understanding the dynamic, without resorting to “formal logic” or “complex” “algebra.”

Payoffs to:   | Learn | Learn a bit |  Fake it | Sound Dumb
------------------------------------------------------------
Journalist    |   -5  |   -1 / -3  |  0 / -20 |    -10
Knowledgeable |   +1  |   -2 / 0   | -2 / +50 |     -1
Lay Public    |   +5  |     +1     | -1 / +1  |     -1

(Of course, the exact values differ by area — the cost for a business journalist to be ignorant might be higher, since their audience is mostly knowledgeable people. Similarly, the choice isn’t discrete — journalists can pick how much to learn, anywhere from a bare minimum to a PhD, and so there are a contiuum of options. But this is sufficient for our purposes.)

If you think about the above table for a while, you can see that journalists would love to be able to fake it — but the cost if they are called out is high. Being moderately informed, however, has little downside. Even if someone knowledgeable calls them out, they only look a little silly, and the benefit for someone knowledgeable to do so is small, or nonexistent. So they know enough about game theory to describe it loosely, but not enough to appreciate why it’s useful.

This is the first insight from a game theoretic explanation — Journalists become moderately educated on most subjects, and rarely fake knowledge completely. But they also aren’t usually interested in becoming really educated, because it has too little benefit for them.

The second insight, though, gets to the famous “prisoner’s dilemma” he mocked. Hundreds or thousands of people would benefit from knowledgeable journalists, but there is not enough incentive for them to do so. This means that the public gains little, but not nothing, from reading their semi-informed thoughts. This has little to do with prisoners, except in the sense that we are all held captive by the idiocy promoted by lazy journalism. This is referred to in economics as a market failure — there is insufficient incentive for smart journalism, and little incentive for educated people to call out journalists on their semi-educated status.

Ideally, journalists should learn more, and then stick to their strengths — everyone would win, at the expense of journalists working a bit harder to develop those strengths. Instead, we have a news culture that rewards writers for moderate ignorance. And that’s why it’s optimal for Kriss to stay ignorant of game theory — while still obeying its dictates.

Recently, Gwern pointed me to a blog post by Chris Stucchio that makes the impressive-sounding claim that “a pro/con list is 75% as good as [linear regression], which he goes on to show based on a simulation. I was intrigued, as this seemed counterintuitive. I thought making choices would be a bit harder than that, especially when you have lots of choices — and it is, kind of. But first, let’s setup the problem motivation, before I show you pretty graphs of how it performs.

Motivation

Let’s posit a decision maker with a set of options, each of which has some number of characteristics that they have preferences about. How should they choose? It’s not easy to figure out exactly which option they would like the most — especially if you want to get the perfect answer! Decision theory has a panoply of tools, like Multi-Attribute Decision Theory, each with whole books written about them. But you don’t want to spend $20,000 on consultants and model building to choose what ice cream to order; those methods are complicated, and you have a relatively simple decision.

For example, someone is choosing a car. They know that they want fuel efficiency of more than 30 miles per gallon, they want at least 5 seats for their whole family to fit, they prefer a sedan to an SUV or small car, and they would like it to cost under $15,000. Specifying how much they care about each, however, is hard; do they care about price twice as much as the number of seats? Do they care about fuel efficiency more or less than speed?

Instead of asking people to specify their utility function, as many decision theory methods would require, most people just look at the options and pick the one they like most. That works OK, but given cognitive biases and sales pitches that convince them to do something they’ll regret later, a person might be better off with something a bit more structured. That’s where Chris brings in Ben Franklin’s advice.

…my Way is, to divide half a Sheet of Paper by a Line into two Columns, writing over the one Pro, and over the other Con. Then…I put down under the different Heads short Hints of the different Motives…I find at length where the Ballance lies…I come to a Determination accordingly.

Chris interprets “where the Ballance lies” as which list, Pro or Con, has more entries.

The question he asks is how much worse this fairly basic method, which is uses a statistical method referred to as “Unit-Weighted Regression,” is than a more complex regression model with exact preference weights.

Where did “75% as Good” come from?

Chris set up a simulation that showed that, given two random choices and random rankings, with a high number of attributes to consider, 75% of the time the choice given by Ben Franklin’s method is the same as that given by a method that uses the (usually unknown) exact preference weights. This is helpful, since we frequently don’t have enough data to arrive at a good approximation of those weights when considering a decision. (For example, we may want to assist senior management with a decision, but we don’t want to pester them with lots of questions in order to elicit their preferences.)

Following the simulation, he proves that, given certain assumptions, this bound is exact. I’m not going to get into those assumptions, but I will note that they probably overstate the actual error rate in the given case; most of the time, there are not many features, and when there are, features that have very low weights wouldn’t be included, which will help the classification, as I’ll show below.

But first, there’s a different problem; he only talks about 2 options. So let’s get to my question, and then back to our car buyer.

Multiple Options

It should be fairly intuitive that picking the best option is harder given more choices. If we picked randomly between two options, we’d get the right choice 50% of the time, without even a pro-con list. (And coin-flipping might be a good idea if you’re not sure what to do — Steven Levitt tried it, and according to the NBER working paper he wrote, it’s surprisingly effective. Despite this, most people don’t like the idea.)

But most choices have more than two options, and that makes the problem harder. First, I don’t have any fair three-sided coins. And second, our random guess now gets it right only a third of the time. But how does Ben Franklin’s method do?

First, this shows the case Chris analyzed, with only two options, compared to 3;

The method does slightly worse, but it’s almost as good as long as there aren’t lots of dimensions. Intuitively, that makes sense; when there are only a couple of things you care about, one of the options probably has more than the other— so unless one of the options is much more important than the others, it’s unlikely that the weights make a big difference. We can check this intuition by looking at our performance with many more options;

With only a few things that we care about, pro/con lists still perform incredibly well, even when there are tons of choices. In fact, with few enough features, it performs even better. This makes sense; if there is a choice that is clearly best we can pick it, since it has everything we want. This is part of the problem with how the problem was set up; we are looking at whether each item has or doesn’t have the thing we want — not the value.

If we have a lot of cars to choose from, and we only care about the 4 things we listed, (30 MPG, 5 seats, Sedan, cost < $15,000), picking one that satisfies all of our preferences is easy. But that doesn’t mean we pick the best one! Given a choice between a five-seater sedan that gets 40 MPG and costs $14,000 or one that gets 32 MPG and costs $14,995, our methods calls it a tie. (It’s “correct” because we assumed each feature is binary.) There are plenty of algorithmic ways to get around this that are a bit more complex, but any manual pro/con list would make this difference apparent without adding complexity.

Interestingly, however, with many choices, the methods starts working much worse with many feature dimensions. Why? In a sense, it’s actually because we don’t have enough choices. But first, let’s talk about weak preferences, and why they make the problem seem harder than it really is.

Who Cares?

If we actually have a list of 10 or 15 features, odds are good that some of them don’t really matter. In algorithm design, we need a computer to make decisions without asking us, so a binary classifier can have problems picking the best of many choices with lots of features — but people don’t have that issue.

If I were to give you a list of 10 things you might care about for a car, some of them won’t matter to you nearly as much as others. So… if we drop elements of the pro/con list that are less than 1/5 as important as the average, how does the method perform?

And this is why I suggested above that when building a Pro/Con list, we normally leave off really low importance items — and that helps a bit, usually.

When we have lots of choices, the low importance features add noise, not useful information;

Of course, we need to be careful, because it’s not that simple! Dropping features when we don’t have very many is a bad idea — we’ll miss the best choice.

The Curse of Dimensionality versus Irrelevant Metrics

We can drop low importance features, but why does the method work so much worse with more features in the first place? Because, given a lot of features, there are a huge number of possibilities. 5 features allows 2⁵ possibilities — 32. Anything that has all 32 that we want (or most of them,) will be the best choice — and ignoring some of them, even if they are low weight, will miss that. If we have 50 features, though, we’ll never have 2⁵⁰ options to find one that has everything we might want — so we want to pay attention to the most important features. And that’s the curse of dimensionality.

If I were really a statistician, that would be an answer. But as a decision theorist, that actually means that our metric is a problem. Picking bad metrics can be a disaster as I have argued at length elsewhere. And our car buyer shows us why.

There are easily a hundred dimensions we could consider when buying a car. Looking at the engine alone, we might look at torque, horsepower, and top speed, to name a few. But most of these options are irrelevant, so we would ignore them in favor of the 4 things we really care about, listed above; picking a car with the best engine torque that didn’t seat 5 would be a massive failure.

And in our analysis here, these dimensions are collapsed into a binary, both in our heuristic pro/con list, and in the base case we compared against! As mentioned earlier, this ignores the difference between 32 MPG and 40 MPG, or between $14,000 and $14,995 — both differences we do care about.

And that’s where I think Ben Franklin is cleverer than we gave him credit for initially. He says “I find at length where the Ballance lies…I come to a Determination accordingly.” That sounds like he’s going to list the options, think about the Pros and Cons, and then make a decision — not on the basis of which list is longer — but simply by looking at the question with the information presented clearly.

Note: Code to generate the graphs in R can be found here; https://github.com/davidmanheim/Random-Stuff/blob/master/MultiOption_Pro_Con_Graphs.R

Month: December 2016

It’s time for some game theory, about game theory, about game theory.