Picking Winners in the NFL. Also, ROCs

As I said in my last football post, my big project is projecting winners.  Sports analysts do this every week; some people claim to do it well enough to make money (I’m going to end up being one of those people).  Most people, however, are very bad at this.  TMQ annually mocks the analysts at the end of the season because many are roughly at chance, and the few that do better than chance rarely keep it up for more than a season.  This has led to TMQ’s ‘system’ for picking winners, which is simply picking the home team.  Over the span of time my data covers (2004 to present), and excluding week one (because in the end I won’t be making any predictions for the first game of the season), this would lead you to pick the correct winner 56.8% of the time.  Chance would be 50%, so the ‘pick the home team’ system would be significantly better than that.  More recently, TMQ has moved to the ‘Isaacson-Tarbell Postulate’ (seen here), which says that you should pick the team with the better record, or if they’re tied take the home team.  Thus, only take the away team if they have a better record.  In my data, this would lead to you being right 61.5% of the time, which is significantly better than just picking the home team.  Can we do better than the IT postulate?Another way we could try to predict winners is to use their points scored.  It turns out that picking the home team unless the away team averages more points scored does about the same as the IT postulate.  But it might be more reasonable to use points for and points against.  Point differential is generally taken to be more predictive than actual win percentage in predicting future performance (at least in basketball).  We can also be a bit fancier and put these numbers into a logistic regression, like in my NBA playoff predictor.  This has two benefits.  First, we can use statistical tests to compare various models of winning that we might develop.  Second, a logistic regression will pop out win probabilities, instead of ‘yes’ or ‘no’, and so we can create ROCs to compare models.

To know what an ROC is, we have to know a few things.  First, there are four categories that a response could fall into: hit, miss, false alarm, and correct rejection.  A hit is correctly saying that something is there when something is actually there.  Since we’re oriented towards the home team, that means in this case that a hit is picking the home team to win when they actually do win.  A miss is the complement; it is thinking that the home team will not win but they do.  False alarms are claiming that something is there when it isn’t; in this case it is saying that the home team will win but the away team wins.  Finally, a correct rejection is when you think the away team will win and they do.  Second, We can turn these counts into a hit rate and a false alarm rate.  The hit rate is the number of hits divided by the number of times there could have been a hit, which is hits + misses (however many times the home team actually won when we thought they would out of however many times we thought the home team would win).  The false alarm rate is the same thing but for false alarms (when we thought the home team would win but they lost out of the number of times the home team lost).  Chance, or guessing, is indicated when the hit rate and false alarm rate are equal.  Performing above chance is indicated by a higher hit rate than false alarm rate; the false alarm rate should never be above the hit rate unless the guesser is purposefully getting things wrong, or you don’t have enough trials for a good estimate.  Finally, to make the ROC curve, we need hit and false alarm rates at a variety of criteria.  In the psychology work I do, this is usually confidence.  High confidence responses have very low false alarm rates and higher, but still relatively low, hit rates.  Confidence tracks accuracy for the most part, so high confidence responses are usually correct (hence the low false alarm rate) but there aren’t a lot of hits either because the subject doesn’t want to make false alarms.

Coming back to football, the IT postulate doesn’t allow for confidence ratings.  There’s no way to assign confidence values; it either says the home team will win or the away team will win.  But when we move to probabilities from the logistic regression model, we can impose confidence values.  For example, we could set a criterion that we will pick the home team to win so long as the probability of that happening is greater than 1%.  This would result in picking the home team almost all the time, and so there will be lots of hits and lots of false alarms.  We could set a criterion at 50%; presumably we would have roughly equal home and away team picks, and so a middling hit and false alarm rate.  If we do this for a lot of criteria, we can plot out all the hit and false alarm rates and get an accuracy curve.  As long as the curve is above the straight line where hits equals false alarms, accuracy is above chance.

So what happens if we use a logistic regression with average points scored by the home and away team?  We do ever so slightly better than the IT postulate.  As I said, the IT postulate doesn’t allow for criteria to be set, but we can still get the hit and false alarm rates and plot that point, and the curve for the points-scored version of the IT postulate goes a tiny bit above that point.  Can we do any better?  Yes; if we add average points given up by both the home and away teams, we find a higher ROC curve.  You can see this below; the dotted line is the point differential (points +) accuracy, the solid line is the IT points accuracy, and the triangle is the IT postulate accuracy.

Coming back to the main point, how can we best pick winners?  TMQ makes the point that many people are basically at chance, or guessing; you can do better than chance just by always picking the home team.  You can do even better than that if you know the teams’ records.  And the plot above shows that you can do even better if you use both teams’ point differentials.  As TMQ likes to say, any of these methods would be super-easy; you don’t need to know who’s playing or have any insider information, you just need to know the teams’ records (or point differentials, usually listed with their record).  But can we do better?  The 63% or so accuracy of using point differentials probably isn’t good enough to win you money in Vegas.  My next football posts, and carrying on through the season, will hopefully convince you that I can.