So with my happy new database, I’ve been sitting on updating my project of predicting past outcomes to compare different NBA productivity metrics. But that is true no more. Here goes.
First, you should go look at that database link (and the other posts it links to) to see all the numbers involved and where they come from. Then you should go take a look at the explanation for how I did the predictions before. The method here is the same, except I’m only going to look at team point differential (not wins) and any player with fewer than 100 minutes played the previous season are granted their production for that season. This avoids any issues with rookies. It also makes the predictions more accurate overall, but that shouldn’t give any particular metric an advantage over the others.
As a short description in case you didn’t want to go through the links: each player has a predicted productivity level on each metric. That prediction is simply their productivity the previous season according to that metric, with the exception named above. This productivity is always a per-minute or per-possession measure. His productivity is then multiplied by the minutes/possessions played in the year being predicted. This is done to take out any influence of injuries or having to predict how many minutes a player will get; we take it as granted. Once every player has a predicted wins/points produced number, they are simply summed up for each team and then compared to the point differential the team actually gained that year. For metrics that work on wins, such as Wins Produced and Win Shares, the predicted amount of wins is converted to point differential by subtracting 41 and dividing by 2.54.
One thing I noticed when I went through this exercise previously is that the metrics that did well at predicting (namely RAPM with a prior assuming that all players should be average) did worse at actually explaining what happened. You could get Wins Produced to predict better by building in some regression to the mean, but that of course means a worse explanation of the current year. So to start off, I wanted to see how well each metric explained what actually happened. To do this, all you do is take productivity for the year, add it up, and compared it to what happened as above; there’s no prediction component. Here are the results.Anything with a NaN means that there are no numbers for the metric that year; looking ahead, it also means that there can be no prediction for that metric in the following year. Two things are quickly evident: many metrics do a good job of explaining what happened. This isn’t a big deal per se. One of the strengths that Wins Produced has always claimed is that it fits outcomes very well. One of the criticisms often lobbed at WP is often that this is barely an accomplishment; it is extremely easy to make sure player ratings add up to team outcomes. This leads me to the second evident point: not every metric can pass over that low bar. While both flavors of WP, Win Shares, ezPM, and ASPM all stay within half a point of average team point differential, PER, APM, and both flavors of RAPM do not do as well. PER and old RAPM aren’t even within a point, which is a pretty poor showing.
In general, old WP (with full credit for defensive rebounding) does the best job of telling you what happened in a season, followed by ASPM in a virtual tie with new WP, ezPM, Win Shares, new RAPM, APM, PER, and old RAPM. Things are close at the top though; you’re talking about an average error of just under .2 versus just over .2 for ASPM and the WPs.
So this wasn’t especially exciting perhaps, but hopefully everyone can figure out what I did at this point. It also serves as a strike against old RAPM and PER, and to a lesser extent new RAPM and APM.
Here is part 2 (coming tomorrow).