I had already planned on starting off with a depressing sentence, which was: “the models usually do well with underdogs, so things probably weren’t going to go well given that Luigi picked 11 of the 14 games correctly”. Then I was going to go with “The casinos took a beating this week, and so did the models. Let’s sort through the wreckage.” I think that’s relevant, but I didn’t want to suggest any connection between the model picks and the casinos; it’s really just more to say that it was an unusual week. Now that it’s established that the numbers are going to be ugly, let’s just get on with it.
In an overwhelming display of irony, the over/under picks were in the positive this week. Luigi was 8-6, Yoshi 1 was 10-4, and Yoshi 2 was 9-5. That gives them season records of 46-65-5, 55-56-5, and 50-61-5. Hey, Yoshi 1 is nearly at chance!
As mentioned, Luigi was 11-3 at picking the outright winner. Yoshi 1 was 8-6 and Yoshi 2 was 9-5. That puts the season records at 69-47, 67-49, and 67-49. As promised, when the ‘favored’ (at least by the models) team wins, the moneyline picks do poorly. Luigi was 5-9, which isn’t terrible except that three of the winners were favorites, and favorites don’t pay. Luigi’s season record is 32-73. Yoshi 1 was 4-9 for a season record of 38-70. Yoshi 2 was 3-9 for a season record of 36-70.
The spread also went poorly. All three models went 5-9. That puts their season records at 47-64-5, 58-53-5, and 50-61-5. Ignoring the ‘too close to call’ games, Luigi was 5-8, Yoshi 1 3-9, and Yoshi 2 4-9. The season records are 41-46-5, 45-42-5, and 40-45-5. The consensus picks were only 1-6 for a season record of 28-27-2.
Given that kind of performance, you wouldn’t expect much for the Hilton SuperContest picks. You would be roughly correct. Luigi went with Tampa +1.5 (win), Arizona +11 (loss), Giants -3 (loss), Atlanta -4 (win), and Chicago -3.5 (win) for only a 3-2 week. Luigi’s season record on the top-five picks is 19-19-2. Yoshi 1 went with Kansas City +7.5 (loss), Arizona +11, Buffalo +10.5 (loss), Tampa +1.5, and Minnesota +4.5 (loss) for a 1-4 record and a season mark of 17-22-1. Yoshi 2 went with Arizona +11, Tampa +1.5, Giants -3, Bengals +3.5 (loss), and Atlanta -4 for a 2-3 record and overall mark of 18-20-2. As poorly as the models are doing, there are people in the actual contest doing worse! Hooray for enjoying the misfortune of others.
Finally we have Bill Simmons, who is starting to run away with our one-sided (in that he doesn’t know it exists) competition. Against his lines, Bill was 7-7 while each of the models was 5-9. That puts the season records at 63-50-3, 49-64-3, 55-58-3, and 48-65-3.
Someone asked in the comments a couple weeks ago why the picks were doing so poorly. The short answer is that I don’t know; it could be random noise, it could be something weird about this season. You could try throwing out the first three weeks, where the replacement officials were maybe making things funny. It would certainly help the outright winner numbers; Luigi was 12-20 (37.5%) by the end of that Packers-Seahawks game and has gone 57-27 (67.9%) since. However, those games still happened and they get factored into team performance, which means that they’re still influencing the predictions to some extent. Since everyone has played at least 8 games now it’s less than half the data, but it’s still a decent amount. If those games were truly odd, they will be swamped by other games by later in the season, but it’s still hard to say how odd those games were.
It’s also possible that the game has simply changed enough that my model is a bit out of touch. Brian Burke has noted that passing is not only at an all-time high, but is also increasing at an increasing rate; we’re seeing a growth curve instead of a growth line. My models are based on data that go back to 2004, which means that a fair amount of their information comes from games where passing wasn’t quite as great as it is today (for example; Brian has graphs for a number of stats that have been changing over time). Perhaps this is the year where the new trends have gotten far enough out of whack with previous years that the models simply aren’t accounting for the stats properly. One way I could account for this would be to simply dump old seasons. However, that means less data and that almost always means worse predictions. Instead, I could do something like weight more recent seasons more heavily, so the model ‘uses’ more of 2009 than 2005, for example.
But then we have to ask, what if a recent season was odd? Maybe the replacement refs have really contaminated this year. When I start making predictions next season, these games are going to be the most important ones in the data set. Won’t that make next year worse? Or what if there’s simply an unusual spike in some stat? If you look at that article I linked to, net yards per attempt had a big spike in 2004 and then wasn’t that high again until about 2010. Had I been using data from 1995 to 2004 and started predicting 2005, it would have emphasized games with passing outcomes that wouldn’t occur again for five years.
This is all to say, it’s hard to decide when a model has stopped performing well because it’s hard to tell why a model has stopped performing well. It’s important to note that Luigi has done well since it was created two years ago. It had over 45 correct picks in SuperContest-like conditions each of the past two seasons; add in some pushes and that’s profitable. Picking every game against Bill Simmons’ lines, Luigi has 130 wins or better in those years, which is also high enough to be profitable. Let’s say, for the sake of argument, that Luigi is objectively a 55% accurate picker in favorable circumstances, like against Bill’s lines or when looking at its most certain games. So far we haven’t seen statistical evidence to the contrary. Luigi’s 19-19 record in the SuperContest generates a 95% confidence interval of about 35% to 65%; we can be 95% sure that its true accuracy is somewhere in that range. That’s a big range! Even the larger sample from Bill’s games, 49-64, generates a range from 34% to 53%, which is lower than I’d like to see but still includes numbers over 50% when picking every NFL game. That would suggest that nothing is wrong so far, and we’ve just seen bad luck. Alternatively, if I compare the proportion so far this year to the proportion for the past two seasons total (on Bill’s picks), it is significantly worse this year. If that keeps up I’ll think more carefully about making some changes in the offseason. But so far I think it’s a little early to jump the gun and make a change or claim that anything bizarre is happening.
Instead I think predictions are hard and sometimes they go badly. If you’d like to check it out for yourself, I’d recommend taking a look at an NFL prediction tracker. Sort by the spread; you’ll see that only about 10 sites are over the 53% you generally need to be profitable and well over half of the predictions are below chance (50%) this year. Then look at previous seasons. You’ll see that at most 10-15 systems are profitable, and they fluctuate a lot from year to year. In fact, the third-best system this year (the first two don’t appear in previous seasons) was below chance in the past three years. Take a look at the various Sagarin predictions; they’re usually close to chance and sometimes they bump up to the top of the pack. Maybe in the long term Luigi will end up being a chance picker and I happened to start with two good seasons; maybe it’s above chance and this is a bad year. The only way to find out is with time and more data.