In the last post, I laid out a logistic regression model for picking the winner of a NBA playoff series. But the motivation for doing that came from TrueHoop’s Stat Geek Smackdown. In that competition, you’re supposed to pick a winner and the number of games. While the logistic regression does one, it does not do the other. To pick the number of games, I switched to using a multinomial model. A multinomial model is basically the same thing as a logistic regression, except that it tries to fit a small number of outcomes (in this case, winning 0, 1, 2, 3, or 4 games) instead of a binary outcome (winning or losing the series). We already know that we like the home team to win so long as they’re a point worse on point differential or better, so let’s look at the away team.
This looks complicated, so let’s walk through it. Starting at the left, the top curve (starts at .5 and drops down) is the probability of winning 0 games. So if you are ten points worse than your opponent, you can expect to be swept half the time. As you get relatively better, that probability drops. The next line down (triangles) is for winning 1 game, then it skips to winning 3 games (x’s), two games (pluses), and winning all four games (diamonds). The way I decide how to pick what will happen is just to take the most likely outcome at some point difference range. For example, it is most likely that the away team will be swept (the 0 games won line is the highest one) in the range from about -5 to -10. That complex graph can then be reduced to the following rules: pick home team 4-0 if away team is 5 or more points worse; pick home team 4-1 if away team is 3.5 to 5 points worse; pick home team 4-2 if away team is 3.5 points worse to 1 point better. If the away team is more than a point better, I can look at the home team curves and see that 4-2 is the pick out to about 6 points worse; practically speaking, the home team is never that bad compared to the away team (the worst in my sample is under 2.5), so we always pick 4-2 if the away team is favored.
Using these rules, how would I have done in the Stat Geek Smackdown? They award 5 points for correctly picking the winner of a series and 2 bonus points for picking the right number of games. This is a situation where having the most accurate model, as opposed to the statistically significant model, is important, because a small ‘non-significant’ mistake like leaving out home court, which might only mean a change of a few percentage points in predicted outcome, can lead to a change of 5 or 7 points in Smackdown outcome. The model with home court does very well. In 2007 I would have gotten 61 points, only good for 4th but also only 4 points behind the winner. In 2008 I would have gotten 80 points, winning by 7. 2009 was a bad year; I would have gotten 59 points, well behind Dave Berri’s 75. This past year I would have gotten 77 points, making me the winner by 6 points. So in four years I would have a total of 277 points, making me the leader of all the people who have been participants for four years (which includes Henry’s mom, Kevin Pelton, Jeff Ma, and John Hollinger). With the exception of Henry’s mom, the other guys are hanging out in the 250s.
So the multinomial model is fun for picking number of games, but the logistic model is probably more important, as winning the series is ultimately more important than how many games it takes a team to do so. In the future I’ll look at some ways we can use the logistic model to talk about other issues, such as LeBron and Cleveland underachieving in the playoffs and the luck involved in winning the championship.