## NBA Retrodiction Contest Part 1: What Happened?

So with my happy new database, I’ve been sitting on updating my project of predicting past outcomes to compare different NBA productivity metrics.  But that is true no more.  Here goes.

First, you should go look at that database link (and the other posts it links to) to see all the numbers involved and where they come from.  Then you should go take a look at the explanation for how I did the predictions before.  The method here is the same, except I’m only going to look at team point differential (not wins) and any player with fewer than 100 minutes played the previous season are granted their production for that season.  This avoids any issues with rookies.  It also makes the predictions more accurate overall, but that shouldn’t give any particular metric an advantage over the others.

As a short description in case you didn’t want to go through the links: each player has a predicted productivity level on each metric.  That prediction is simply their productivity the previous season according to that metric, with the exception named above.  This productivity is always a per-minute or per-possession measure.  His productivity is then multiplied by the minutes/possessions played in the year being predicted.  This is done to take out any influence of injuries or having to predict how many minutes a player will get; we take it as granted.  Once every player has a predicted wins/points produced number, they are simply summed up for each team and then compared to the point differential the team actually gained that year.  For metrics that work on wins, such as Wins Produced and Win Shares, the predicted amount of wins is converted to point differential by subtracting 41 and dividing by 2.54.

One thing I noticed when I went through this exercise previously is that the metrics that did well at predicting (namely RAPM with a prior assuming that all players should be average) did worse at actually explaining what happened.  You could get Wins Produced to predict better by building in some regression to the mean, but that of course means a worse explanation of the current year.  So to start off, I wanted to see how well each metric explained what actually happened.  To do this, all you do is take productivity for the year, add it up, and compared it to what happened as above; there’s no prediction component.  Here are the results.Anything with a NaN means that there are no numbers for the metric that year; looking ahead, it also means that there can be no prediction for that metric in the following year.  Two things are quickly evident: many metrics do a good job of explaining what happened.  This isn’t a big deal per se.  One of the strengths that Wins Produced has always claimed is that it fits outcomes very well.  One of the criticisms often lobbed at WP is often that this is barely an accomplishment; it is extremely easy to make sure player ratings add up to team outcomes.  This leads me to the second evident point: not every metric can pass over that low bar.  While both flavors of WP, Win Shares, ezPM, and ASPM all stay within half a point of average team point differential, PER, APM, and both flavors of RAPM do not do as well.  PER and old RAPM aren’t even within a point, which is a pretty poor showing.

In general, old WP (with full credit for defensive rebounding) does the best job of telling you what happened in a season, followed by ASPM in a virtual tie with new WP, ezPM, Win Shares, new RAPM, APM, PER, and old RAPM.  Things are close at the top though; you’re talking about an average error of just under .2 versus just over .2 for ASPM and the WPs.

So this wasn’t especially exciting perhaps, but hopefully everyone can figure out what I did at this point.  It also serves as a strike against old RAPM and PER, and to a lesser extent new RAPM and APM.

Here is part 2 (coming tomorrow).

This entry was posted in Uncategorized and tagged , , , , , , , . Bookmark the permalink.

### 19 Responses to NBA Retrodiction Contest Part 1: What Happened?

1. Most of those should tell the exact same thing; many of those sum directly to point differential. (Is that the same things as “telling what happened”? I would dispute that terminology…)

• Alex says:

That’s right, they should. But apparently they don’t. It isn’t surprising for RAPM; it is by definition biased away from the estimates that must sum to point differential. PER was never designed to add up to anything, to my knowledge. The rest are all within pretty close approximation.

They all at least tell some version of what happened. Point taken that most sum up properly but tell different stories though.

2. How could APM not sum directly to point differential? That doesn’t make sense. Its very definition is to sum to point differential, is it not?

• Alex says:

It’s likely the replacement-level players. They all get a rating of -3.8. They don’t get a lot of minutes, but there are enough of them across the league to add up. But you’re right, it should add up if everyone had an actual estimate.

• Ah yes–low minute players were all lumped together in order to not overly skew the few minutes they played (they would automatically cause residuals of their minutes to come to 0, thus preventing any of the other players to be informed from those minutes).

3. Guy says:

How can there be a difference between old and new WP? I thought the change simply redistributed value from big defensive rebounders to their teammates. Is that wrong?

• Alex says:

That should be correct. I think it’s because the WoW website has productivity summed across teams if a player is traded, whereas my old WP numbers have it individually for each team. For example, last year Gerald Wallace is a .177 for Portland and Charlotte. But if he was more like a .15 for Portland and a .2 for Charlotte (just making up numbers), they wouldn’t sum up quite properly. The same issue applies to some of the other metrics.

4. ethanluo says:

Hi!
I am new to your site and I found it very interesting to use the +/- metrics to predict outcomes of basketball games.

Just to check: what do the numbers actually mean in the figures you have in this post? Are there standard error of wins or point differentials for the whole season.

Also, is it possible to benchmark the different metrics by comparing their predicting accuracy on a game-to-game basis? I am currently trying to work on something like that and hope you can give me some advice.

Thanks!

• Alex says:

The numbers are errors from team wins. For example, the 2011 entry for PER is 2. That means that on average, PER (translated into wins) was off by 2 wins for any given team. Pretty much all the other measures do better than that because they have adjustments built in to sum to team wins. There aren’t any errors because no measure (as far as I know) has an error with it; you just get an estimate for each player.

It would definitely be possible, and people have suggested better, to do game-by-game accuracy. I just didn’t have the data at the time. The +/- measures, like APM and RAPM, are based on per-possession data, so you can go that fine-grained if you want.

5. ethanluo says:

Hi! I have downloaded the database you have (thank you so much for that!) and I have questions regarding the year. For example when you have year 2009 are you referring to Season of 2008-09 or all games that take place in 2009? (which then includes games from season 2008-09 and 2009-10)

• Alex says:

It’s been a little while since I looked at the data, but I believe the year refers to when that season ended. So 2008 means the 07-08 season. But I would double-check that to be safe.

• ethanluo says:

Thanks! but How reliable is the data? I might need it for some scientific paper

• Alex says:

What do you mean by ‘reliable’? All of the numbers I downloaded were what the various metric-makers made available. I can’t promise that I lined them up in the Google doc 100% accurately, particularly in the case of players who were on multiple teams within a single season.

If you want to have rock-solid data, you should get raw play-by-play data and calculate each metric using the algorithms each person describes. There’s no particular reason to trust that anyone did any of their calculations correctly.

• ethanluo says:

Actually I have parsed the play-by-play. But we used it for different purposes. Thank you for your help!