Monday, December 21, 2009

Figures Lie, and Liars will Figure (v. 2.0)

Last summer, when my guy Mahmoud was having his latest electoral triumph, there were some folks who were convinced that there was some fraud going on. Now, I was one of the folks who was pretty sure. But I certainly couldn't prove it. And it wasn't just in Iran where you'd see rigged elections, which are often in plain sight. And in fact, I was sort of hoping that the election was rigged, as rigged elections are a lot of fun in my book. Especially when you look at the aftermath of these things. But what I thought was interesting was how the statistics used in the Washington Post article were interpreted to "prove" that the fraud took place. Essentially, the odds of any one even happening, given the almost limitless possible permutations, is astronomical. So arguments that build up an argument parallel to that idea are on shaky ground to begin with.

But the use of data and statistics is something that we're becoming more and more comfortable with. In sports, folks in basketball are looking for more useful statistics (disclaimer - I have nothing to do with this statistic, and actually think that it looks hokey, but I love the name) to analyze player performance. Sometimes, the results are somewhat comical, and entirely counterintuitive to our everyday observations, as Bill Simmons points out (about 1/3 of the way down) when commenting on stats-guru Wayne Winston's take on Tim Thomas. And in baseball, we saw voters overlook traditional counting numbers (wins, losses, HR, RBI), and focus more on efficiency numbers (ERA, WHIP, OBP, SLG), when selecting Greinke, Lincecum, and Mauer as the best pitchers and hitters (Pujols was a lock no matter what criteria you use).

Of course, as more and more people employ the use of statistics, you have more and more people misusing statistics, whether by intent, or by ignorance. One of my pet peeves is the use of macro level data to evaluate singular events, under the assumption that the data is perfect.

Recently, there was a situation in the NFL, that has led to a lot of commentary and angst. Basically, the Patriots are winning by 6, have the ball on their own 29 yard line, and it's 4th and 2. And coach Bill Belichick decides to go for it. Only, they don't have their personnel correct, so they have to burn their last time out. Then, they come back out of the timeout, and line up on offense again, in a 5WR, empty backfield. I'll be honest - as I was watching the game, I figured that they'd try to draw them offsides, or maybe put in a QB keeper to get the 2 yards.

I was legitimately shocked when they actually snapped the ball and threw. And immediately, I knew that there'd be a maelstrom of dissenting opinion on this one. And the argument would basically fall along two lines. The first group would be the supporters, and the support would be statistically based, likely building off of the argument that the average offensive play gains 5 yards, etc. The second group would be the dissenters, who would argue for conservatism, talk about how that's not how you play the game, you that's disrespectful to your defense, etc. Normally, I am all for statistical evidence. But one of the biggest things that the stat folks forget is that just because you have a stat, it doesn't mean that it's appropriate to use in a situation.

There's a basic assumption that is being made here that (in my opinion) isn't perfectly clear. You are assuming that the use of macro level probabilities is appropriate for making micro level decisions, regardless of context. I'd argue that the context matters, an that the probabilities for winning should be considered with the actual players.

If you're a poker player, then you will often see people playing in tournaments make decisions exclusively based on pot odds. Basically, if you are a 2-to-1 underdog, but you're getting the appropriate incentive to take the risk (for example a reward of a 3-to-1 situation), you should take on the risk because price is correct. However, there's something implied that is often not considered: essentially what you are saying is that any 1 chip is worth the same, regardless of the situation, regardless of who owns that chip. Now, if you had a computer simulator and could re-run that scenario an infinite number of times, the stat-based decision will be the optimal method. But life is a one-shot deal, and the outcomes at a micro level need to be examined in terms of the macro level.

In that same situation, wouldn't you love to have statistics that could tell you things like a) expected tournament winnings, b) tournament win probability, c) the probability of finishing out of the money, d) the probability that this decision changes how other players play against me (and how it does so), etc. All of these things are just as useful as knowing the odds of winning that one hand, and unless it's an all-in, much more useful information. However, these stats aren't readily available statistics. The odds of winning the hand is much easier to attain, so that's the information that people will often use. And there are a lot of folks who are more than happy to use that information because that's the best they got. Evidently this is the new math.

To paraphrase Phil Hellmuth, I'm not sure if I buy into the new math. Essentially this methodology implies that the user is of (at best) equal skill to every other player in the tournament, and that the size of any player's chip count is irrelevant. No poker player would ever admit to the first part (in that sense, the new math does prevent hubris, which I suppose is good), and the 2nd part is clearly wrong, both statistically and psychologically. And I would venture to guess that none of the good poker players really use pot odds exclusively when they're making their calculations. They're always adding on things like implied odds, the odds that an opponent is bluffing, the odds based on what they believe an opponent has, how well they know an opponent, how well the opponent knows them. Now all of these things are "soft" data - no hard evidence. But this is the majority of the decision making process.

This isn't a 1-to-1 correlation, but in my opinion, the Belichick story and the poker example are in the same realm. The use of historical-level data, which accumulates basic information on individual plays, does not (at least the data that I've come across hasn't) sort out things like a) at what point in the game a given play was run, b) the game situation, c) the formations that were used, d) the caliber of the teams that were involved, e) playoff implications of the teams involved, etc. For the folks who are defended the Belichick move solely based on these historical stats, I'd argue that the defense is questionable. I don't disagree with the outcome, but how you defend the outcome is. Basically, I'd be much happier with an argument of, "Listen, Peyton Manning's there on the sidelines, and he's wanting the punt, and just about craps his pants when he see's that the Patriots are going for it. That makes it the right call."

That said, I was pretty shocked when the Patriots actually went for it, and didn't just try to draw them offsides. I'm pretty neutral about the effectiveness of the Belichick decision, but I loved it. I think that the probability of the outcome is close enough to defend either position, and I generally love counterintuitive thinking done by confident people. That said, I absolutely hated the fact that the Patriots screwed up 3rd down and burned their last timeout (which kept them from challenging the spot on the last play).

Now this past week, we had another interesting situation with Mike Tomlin of the Steelers deciding to kick an onside kick, right after his team had taken a 2 point lead, with 4 minutes left to go in the game. Basically, Tomlin's argument is that he's going 2-for-1, like they do in basketball all the time. You see arena football teams onside kick as the clock winds down so that they get the "extra" possession. Here, Tomlin's saying that the best-case scenario is that you win the game outright if you get the ball back. And with the worst-case scenario, his guy (who had already thrown for 400+ yards) will get the ball back with about 1:30 left and 75 yards to go needing a TD for the win. Interestingly, Tomlin was perfectly fine with the worst-case scenario, which was a lot like the punt scenario. So, here you see both aspects of the Belichick decision. Of course, the counterintuitive thinking leading to the gutsy call is evident. But the worst-case scenario that Tomlin accepted was very similar to what Peyton would have been working with after the punt that never happened. And we got to see how both things played out. Tomlin's initial counterintuitive though backfired, and the Packers got the ball at the Steelers' 39. But Big Ben got his hands on the ball with 2:06 left and used every second to march 86 yards and get the W.

I suppose if you were to take Tomlin's outcomes and map it back to Belichick, our hero Genius was left with a damned if you do, damned if you don't situation. Which is what the pundits love, since everyone gets to pile on, regardless of how well-informed they are.

-Chairman

4 comments:

Robby said...

"In that same situation, wouldn't you love to have statistics that could tell you things like a) expected tournament winnings, b) tournament win probability, c) the probability of finishing out of the money, d) the probability that this decision changes how other players play against me (and how it does so), etc. All of these things are just as useful as knowing the odds of winning that one hand, and unless it's an all-in, much more useful information. However, these stats aren't readily available statistics."

This information is mostly available through ICM (Independent Chip Model). The most useful applications being all-in situations in SNGs and calculating your worth at a final table. Of course simple models quickly become less useful in more difficult situations (non all-in or when blinds are so big that their next location is of great importance) but I feel as though both of these NFL situations were simple enough that relying on historical data applied to the correct calculation is conclusive.

We can agree that the misapplication of statistics is dangerous but what evidence is there of that happening in the analysis of these plays by the sports statistics crowd? With a simple calculation and large amounts of data available it is much more dangerous to ignore statistics.

Chairman said...

Robby - I'm moderately familiar with the notion of chip equity, but I'm definitely not intimately familiar with ICM. What is their methodolgy in calculating the things that I list? Also, the point that you made about the ICM is that it's most useful when you know with precision what the outcomes are (i.e., the problem that's being considered is relatively well-structured). But what if you're 40 minutes into a 300 person, deep stack tourney?

In sports like baseball, where you have hundreds of games and thousands of events for each player, the numbers start to work for you. In football, where you have relatively fewer events, I'd be less confident. Additionally, wouldn't you be concerned with taking broad data in a sport where philosophy seems to drive decisions (like the Klosterman article that Ryan noted in IJAB)?

Now, if there were data that considered the specific situation (one play to get 2 yards, empty set backfield, two very good teams, etc.), I'd be more inclined to trust it.

It's like the false positive rates in detection. If you apply drug or cancer tests to the general public, you get an alarmingly high rate of false positives. However, once you narrow the places from which you acquire your sample (i.e., testing people with family histories of cancer or testing people who are acquaintances with known drug users), those false positive go down dramatically.

Robby said...

"What is their methodolgy in calculating the things that I list?"

It pretty much just factors in the pay scale.

"But what if you're 40 minutes into a 300 person, deep stack tourney? "

Given a reasonably top-heavy pay scale these factors are negligible. Play to maximize chips.

I think the argument may boil down to how difficult you think these situations are to analyze. I think they appear to be very simple problems where it is possible to come up with a formula which factors in all important possibilities. Then, given a sufficient amount of historical data, looking at the statistics proves extremely valuable. Whereas in more difficult cases statistics may appear to be much less useful and a coaches gut feeling would be more valuable.

I think arguing against the use of statistics in these scenarios would be similar to arguing that because we can't accurately model global warming we shouldn't trust a weatherman's forecast for tomorrow.

Chairman said...

You're definitely right about the complexity/uncertainty of the problem leading to different data requirements.

I think that my complaint boils down to this: I dislike the use of a generic yard on a generic play as the data that drives my decision when I know much more about what's going on. In baseball, you get more instances of relatively specific situations (like a given batter vs. pitcher), since they keep track of so many things (a cultural thing in baseball, for sure).

With regard to the ICM model - basic models like that also hinge that any one chip is worth the same, regardless of the player that is holding it. I just don't know if I buy into that notion.