I got Nate Silver’s book ‘The Signal and The Noise’ as an early Christmas present (thanks Dad!) and just finished it last night. I’m hoping to touch on a few different points from the book over time, but today I’m going to focus on one where I agree with Nate, which is the importance of having some theoretical background to guide a predictive model. That sounds overly fancy, but it will be clear as day in a second.
When people make predictions, they use a model. Sometimes that’s a mathematical model, like the ones I tend to talk about, but sometimes it’s just a rule or heuristic, like always bet on black. That model can really be anything; you could make predictions based on a wild guess but then your model would be something like flipping a coin or randomly generating a number in your head. There’s always some basis for a prediction.
However, there isn’t always a guiding reason for that model. Let’s say you want to predict the winner of NFL games. You decide to flip a coin and take the home team if it comes up heads. There isn’t much of a guiding reason there; you could be making a grand statement that single NFL games are basically random and you feel good just flipping a coin. But there’s no theory or reasoning behind it. You would actually be ignoring established knowledge, like the home team wins more than 50% of the time (and you can do even better than that). You could decide to follow the choices of a soccer-loving octopus, but you’d have to ask yourself how the octopus is making decisions.
A great place to look at various models is in predicting the winner of an election. Nate Silver is (now) (somewhat) famous for his 538 blog, where he predicts which state will swing for which presidential nominee as well as various governor and Senate/House races. To do this, he uses polling information combined in some manner. The theory is fairly straightforward: if you ask people who they’re going to vote for, and adjust for noise and bias in the sample, then you have a good idea of who they’ll vote for in November. But you could also try other models, like those listed in this Cracked article. Perhaps you like the Redskins rule: if the Redskins lose their most recent home game before the election, the incumbent party (or perhaps the last party to win the popular vote) will also lose. Or the Summer Olympics rule, where the incumbent nearly always wins as long as a country that has previously held an Olympics hosts the most recent Games.
This is where having a theoretical reason for your model comes in handy: what if some of the models disagree? London hosted the most recent Summer Olympics and had hosted them before, so that predicted an Obama victory. Nate Silver’s polling data also went for Obama. But the Redskins lost their game against Carolina, predicting a Romney victory. Before the election happened, could we form a preference for one of the models over the others? After finding out the result, could we say what happened and what went right (or wrong)? Particularly with rare events like Presidential elections, you don’t get a lot of feedback. One wrong prediction isn’t a death-knell for a model unless you think it should be 100% correct; maybe the Redskins rule works but happened to be off this year? The Olympics rule was wrong once too.
If you have a theoretical model, you gain three benefits. First, if something goes right you can say why. Nate Silver can point to the polls and say that we should have known Obama would win because when asked, people told us they would vote for Obama. That makes sense. That doesn’t mean the theory is right, but the correct outcome helps assure us it’s probably on the right track. Second, if something goes wrong you can start asking questions about the theory. Is it correct? Is the relationship between my predictors (like polls, or how the Redskins play in one specific week) and the outcome different from what I thought? Is there something important that I’m not taking into account? Does my theory make any sense at all? Or perhaps this was dumb luck; maybe things just turned out differently from what I predict due to noise. Having a theory in the first place can help answer that last question most specifically. Third, having a theory can help you form your model in the first place. Silver talks about this most specifically in contrasting weather forecasters, who have pretty solid physical models of the weather, and earthquake researchers, who might have some theories about why earthquakes happen but are greatly handicapped by the fact that they can’t observe the rocks miles below the surface.
Let’s apply this to the Redskins. First, what theory would generate that model? The Cracked article notes that you might think about public excitement at a football game, which reflects happiness with the state of the universe, providing extra home benefit to the Redskins. However, the article also notes that it’s hard to think of why it would apply to that one game. The incumbent has been in office for over three years at that point; why not all Redskins home games? Should this apply to the Ravens, who are close enough to presumably benefit from happiness in DC? Should it apply to the Nationals since they’ve moved to town, or the Senators? How about teams everywhere, since this is a national election that we’re talking about? The answer, of course, is that there wasn’t really a theory that produced the rule. Someone happened to notice that the pattern held, and then it (kind of) worked in the two elections prior to this year’s. In other words, this was not a theory-based prediction but a data-based prediction. I’m sure I’ll have more to say about the theory/data distinction in the future.
That being said, what does the prediction tell us about how elections work? The answer is not much; the result would lead to predictions like “Dan Snyder (or some Redskins players/coaches) could sabotage the team in election years to determine the election”. That seems highly unlikely. And since the rule has been wrong twice since it was put out there (ignoring the popular vote twist that was added to explain its immediate failure in Bush/Kerry), we can ask what those failures tell us about the theory. One option is that it failed out of dumb luck; the method is sound but happened to run into some noise. Being wrong two out of three times seems like bad luck for an ironclad rule though. So what could be done to tweak the rule? One change was already made; it isn’t simply the winner that counts but the winner of the popular vote. That didn’t hold up either. This is the point where you should think about the fundamentals of the theory, and as we just discussed the fundamentals are pretty weak. In short, it seems like the Redskins rule is a prediction model doomed to a poor future thanks to a lack of any reason to work.
We can look more briefly at the other rules in the Cracked article for a comparison. The Olympics rule, it seems to me, falls in the same camp. The Olympics have little to do with the Presidential election; it isn’t even limited to the United States. I would put the Oscars rule in the same boat, although I could see an argument made that great movies are made with positive outcomes when people are happy but are made with negative or sad stories when people are unhappy. That strikes me as flimsy, but plausible enough and the record is still intact. I find the other three rules more plausible. If Vigo County is pretty representative of the country as a whole, then it seems like it should be a good indicator of the national outcome. It would be a miniature version of Silver’s polls. Similarly, asking kids to vote is a kind of poll, and kids presumably get their opinions from their parents or the same sources that will influence their parents. And the Halloween mask rule is kind of another poll, with the assumption that people buy the mask of the person they would want to vote for. Of course, if any of those predictions start failing in 2016, we would want to examine our theory again.
All of this applies to sports as well. In my football model, for example, the theory is that certain statistics predict team quality and that team quality predicts winning (along with home field advantage). There are more specific assumptions, like which statistics are the important ones and that regression is the best way to determine how to combine those statistics. But there is a theory and it is testable, which is the important part. If my model started predicting poorly (like it has been this year), I can look at my assumptions and examine if they still hold or not. I will likely be doing this in the offseason. Another interesting case is the connection between team salary and winning. Data shows that there is a connection between the two, but it is strictly correlational. You could generate a theory to explain it, like spending more money causes a team to win more. This theory alone is wrong though; if every player on the Wizards currently got a raise, they would not win any more games. We could modify the theory to say that teams typically pay more for better players, and thus teams who spend more tend to have better players than teams who spend less. That theory makes intuitive sense and explains why the salary predictions may not work every time; sometimes teams will spend a lot of money on a player who isn’t good, or sometimes they will spend little money on a good player. But this money to players to winning theory tells us that we don’t need to worry about salary too much if we want to predict team quality, and instead we can focus on player quality. Observing the failure to predict some teams, like the Knicks squads that were paid well despite being full of crummy players, allows us to evaluate our theory and revise it.
A final example that toes the line between theory and data is NBA player evaluation, specifically RAPM. RAPM has no theoretical basis; it is a product of a data analysis method (I guess the theory could be that the players on the court determine which team will score more points, but I think that’s true of every player evaluation system). This makes it difficult to evaluate RAPM and the predictions it makes; it predicts that having LeBron James on the court will cause his team to do well, as will having Nick Collison. It predicts that having J.J. Hickson will cause his team to do poorly. But we really have no idea why; we feed in numbers and we get predictions out. There is little we can say when RAPM happens to fail, and there is little that a coach or GM could use RAPM for practically speaking other than to try to acquire and then play guys with high ratings. On the other hand, RAPM does do a pretty good job of predicting future performance. It has a good track record and is a definite improvement over other systems that do have clear theories, like PER. So RAPM lives in a kind of a hazy place between a lack of theory and quality predictions; it makes good predictions, and has a track record that continues to hold up, but we have little insight into why or what it tells us about why some players are good and others are bad. Thus not all atheoretical predictions are bad, but they do introduce their own difficulties.
To sum up, having a theory that guides your predictive model is nice because a) the theory should tell us something about the world, b) the theory tells us why a prediction is the way it is, c) theories can be updated or rejected based on the outcomes of the predictions, and d) a theory makes predictions ‘actionable’ in that we can examine the theory to make other associated predictions or decisions (like if we predict that a hurricane is going to hit New Orleans, we can evacuate New Orleans). Those are a lot of benefits, and in the long run having a theory should lead to better and better models as prediction outcomes reveal where and how to change the theories. But in at least some situations, good predictions can be made without a theory.