Predicting the Playoffs: NBA Edition, Part One

A few years ago, Henry Abbott at the TrueHoop blog on started what he calls the Stat Geek Smackdown, where people known as stats guys in the NBA community pick the outcome of the various match-ups throughout the playoffs.  I thought this was kind of neat so I wanted to see if I could come up with a model for making the picks.  Following on things that Henry mentioned, and that participant (and 2009 champ) Dave Berri said on his website, I decided to use point differential and home court advantage as my predictors.  Point differential is the difference in points scored and points given up per game, and can be found pretty easily, such as on ESPN’s standings page.  At first I didn’t think that home court advantage would matter much in the playoffs (being skeptical of the general wisdom and all), but you never know until you look.  So I put together a spreadsheet with playoff series info going back to 2003, when the Spurs beat the Nets, up through the season that just ended.  An entry look like this:

games won    point difference    opp. difference    difference   home court    team    winner

4                      7.6                          2.7                         4.9               1                      Lakers    1

That describes the Lakers’ first round series against Utah in 2009.  The Lakers won 4 games (e.g. won the series); they had a point differential of 7.6 through the regular season while their opponents (Utah) had a 2.7, for a difference of 4.9.  The Lakers had home court advantage, and won the series.  The first thing I did was run a logistic regression predicting winning a series from the point difference.  It looks like this:

This graph shows the typical probability curve that I described in my earlier post; probability increases slowly at the bottom, speeds up to almost linear through the middle, then slows down again as you get closer to 1 (or 100%).  Obviously there is a relationship between point difference and winning; the more points you typically won by in the regular season compared to your opponent, the more likely you are to win the series against that opponent.  That graph ignores home court advantage; if we take that into account, we get this graph:

The top curve is for the team with home court, the bottom is for the team without.  This looks like a pretty big difference, but the effect of home court is only at trend significance (p = .07) in the model.  What that means is that even though the home team should do better numerically, there’s too much noise to be sure of that (at least at the typical significance level).  And while I’ve seen some other blogs that don’t seem to fully grasp the importance of p value and poo-poo it, this might be a situation where the p value isn’t the final word.  After all, we’re interested in the best description of winning a playoff series, so instead maybe we should look at categorization accuracy.  What we can do is use the model to predict a winner; for example, if the home team has a 50% or better chance according to the model, we would pick them.  If they actually win, that is called a hit; if they lose, it’s called a miss.  If the away team is picked to win but doesn’t, it’s a false alarm; if they do, it’s a correct rejection.  The percentage of times the home team won out of how many times we picked them is the hit rate, and the percentage of times the home team won when we picked the away team is the false alarm rate.  If the model is better than guessing, the hit rate should be above the false alarm rate.  We can get these two numbers for a lot of criteria values (this time it was 50%; we can calculate hit rate and false alarm rate at each cut-off from 0 to 100) and plot what is called an ROC curve, which is all the corresponding hit and false alarm rates at all of these criteria.  The ROCs for the point difference-only and point difference plus home court models are below.

The models are about the same most of the time, but the home court model (solid line) does pop out at some places, indicating that it has a better categorization accuracy (the diagonal line indicates chance, so both models are definitely better than guessing).  So between the ROC and the trend significance value, I feel pretty good about including home court.

So what’s the upswing here?  If you didn’t think that home court matters, you would just take the team with the better differential.  If you do think home court matters, then the model says that the home team can be almost a point worse (point differential = -1) and be at 50/50.  Conversely, the away team can be almost a point better and only be a coin flip.  So the benefit of home court advantage, gained by winning more games than your opponent, is that you can actually be a worse team and have a higher chance of winning (although you can’t be too much worse).  Many people would question why a team with fewer wins would be the better team, but it’s pretty well established that point differential is a better indicator of team quality than wins.

This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

2 Responses to Predicting the Playoffs: NBA Edition, Part One

  1. This is cool stuff. I’m doing some work around home court as well. If I use the regular season homecourt numbers (.606 for 1998-99 through 2007-08), and run two even teams thru a probability model for a seven game series I get the home team winning only 53% but its not quite linear . So for example, The Celts in 2010 had 55% chance of winning the finals as the away team but that would have gone to 62 % as the home team. It not quite linear though.
    I like the idea of using this kind of modeling to do hedge betting over a large sample size. You should be able to turn a profit.

    • Alex says:

      Thanks Arturo. Do you mean that the home team wins 60.6% of the time in the regular season? That’s a bigger advantage than I would have guessed. My model says the home team should win 58.9% of the time if even with their opponent, and it’s symmetric: the away team should win 41.1% of the time. I know you’re pretty high on the Celtics, but just using their regular season differential in my model, they were a point worse than the Lakers so I only gave them a 32.2% chance in the Finals. They would have been about even if they had home court though.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s