Awwww yeah.

Yes, the time has finally arrived to talk about the football model (although if anyone wants to talk about Flight of the Conchords, that’s also cool).  Here’s the deal: over the past two years I’ve been working on a model to predict NFL winners against the spread.  I have a model that works, in that it would make money (if I lived somewhere where I could bet on NFL games), but then I bought a new computer and didn’t want to have to put MatLab on it, which is what the program was written for.  So I decided to rewrite things for R, and while I was at it see if I could improve the model.  I tested three models: my previous one, which is kind of a stripped-down version in terms of the number of parameters; a ‘kitchen sink’ model with everything I could throw in there, and an intermediate model.  My data set covers all games from 2004 to present, including playoffs.  I include playoff games with the rest of the data because I don’t think they are fundamentally different from other football games, perhaps with the exception of week 17 games for teams that have nothing to play for.  While I was checking on the models, I thought I would also expand from predicting score differences to also predict total points (used for the over/under) and winners via a logistic regression to produce a probability (used for the money line).  I don’t have over/unders or moneylines, but I do have the spreads and accompanying odds (because not every game is run at -110) from Bodog for 2008 and 2009, and the lines from mrnfl.com for the earlier seasons.  Given the interest in potential legitimate betting, I’m going to focus on the past two seasons.

The first thing I wanted to check was how the models fit the data.  In terms of the logistic regression for win probability, the AIC (a measure of fit; smaller means better fit) drops as I move from the sleek to the middle to the kitchen sink models, which you would expect.  More predictors means better fit.  However, a likelihood ratio test also says that each bigger model is better even accounting for the extra predictors, so in this case bigger is better.  The point difference fit is a little different since it’s a linear model (not logistic).  The best R squared and adjusted R squared values actually belong to the middle model.  Perhaps the kitchen sink has too many colinear predictors or more of the predictors are irrelevant to point differential.  This is also true for the total points prediction.

Of course, the actual fit isn’t too interesting.  What we’d really like to know is how well these models predict the future.  Since the future hasn’t happened yet, I had to do the next best thing: predict the past.  I did this by deleting the 2008 and 2009 seasons from my data and running through the algorithm like I would a new season.  So the models all started with the 2004 – 2007 seasons as known data, then the results from week 1 of 2008.  This was used to predict the results of week 2 2008.  Then the models were given the week 2 data and used to predict week 3, and so on through last year’s Super Bowl.  Using the predicted score difference and the line from Bodog, I can decide which way to bet a game and then tell if that decision was correct or not.  Using something similar to my betting rule (which is a modified Kelly criterion), I can also see how much money each model would have bet and made over the course of the past two years.  I don’t know how to evaluate the win probability and total point predictions as well, since I don’t have money lines or over/unders, but I’ll take a look at those too.