In the first two posts in this series I looked at how well various NBA productivity metrics can explain what happened and predict what will happen. But I think at least some people would say that they would never stick to a single measure when examining a trade, for example. Here I’m going to take that idea seriously.
First I needed to decide what metrics to include in my pool. I dropped APM quickly because I dislike it. While PER fared poorly in my analysis, I left it in because I have so many years of it. In contrast, I had to drop ezPM because there simply aren’t enough seasons of it. I also dropped old RAPM since the new version is such an obvious improvement on it. That means that I kept old and new WP, new RAPM, ASPM, Win Shares, and PER.
What did I keep them for? I wanted to see what combination of the metrics did the best job at explaining what happened and predicting what will happen. I did it the hard way.
I ran 100,000 iterations of a program that selected random weights (from a normal distribution with mean 0 and standard deviation 1) for each of the six metrics in the sample. Those weights were then multiplied by each player’s rating on the respective metric, those values were summed, and that was divided by the sum of the weights to make sure the values moved back into a reasonable range. This created the ‘blend’ rating for each player. The same weights were also applied to his predicted score on each metric to create a predicted blend rating. Those ratings were then summed up and compared to actual team performance the same way the existing metrics were checked previously.
After that finished chugging, I sorted the results by ‘explaining’ error as well as predictive error. The best explanatory blend had an error of .107. Of course, other randomly chosen weights also resulted in low errors. Relative to how accurate the blends could be, the error increased somewhat quickly so I averaged together the weights for the ten lowest-error blends (as well as eye-balled the individual weights) to see what the common elements were. It turns out that the best way to explain what happened in a season is to use a good chunk of old Wins Produced, a smaller chunk of ASPM, and a little bit of Win Shares. New WP and PER are in there a little bit, but RAPM is pretty much absent.
Somewhat interestingly, this explanatory blend had a predictive error of about 2.7. But this was not the best predictive blend possible. When I sorted by prediction error, it turned out that you could get as low as 2.38, and you could average across the lowest 50 blends and not move too far off that error. So I did that (the results don’t change much if you average across 10 as I did for the explanatory blend), and the best predictive blend is a good chunk of ASPM, a roughly equal but negative chunk of new WP, a smaller chunk of RAPM, slightly smaller and equal chunks of old WP and Win Shares, and nothing from PER. Now I can already see some people’s eyes lighting up, thinking about the negative weight on WP. Keep in mind that these metrics are all correlated with each other and I have both ‘flavors’ of WP involved. The numbers and their signs would move around if I changed what metrics were in the mix. But with this set, it looks like the best predictive rating relies on all of the metrics except PER.
To check the numbers out more consistently with what I did previously with the existing metrics, I made a rounded-off version of the explanatory and predictive blends. The explanatory blend was created by adding together .5* old WP48, .35*ASPM, and .15*WS48. The predictive blend was created with .5*ASPM, -.5*WP48, .35*RAPM, and .3*WS48 and old WP48, then dividing the sum by .95. I then put those blends through the same code that I used for the previous post with the metrics. Here are the same tables from my previous posts but with the blend metrics included. Note that they only go back as far as I have new RAPM, since it’s necessary to calculate the blend.
As you can see, the explanatory blend does a better job of explaining what happened than any individual metric. Similarly, the predictive blend does a better job at predicting than any individual metric. But they rely on different combinations; explaining what happened is mostly Wins Produced and ASPM whereas predicting what will happen is a fairly equal blend of four metrics contrasted against a fifth; only PER gets left out.
Overall I think there are a couple of key points to take away. One is that ASPM appears to do a really good job overall. It describes what happened well, it predicts the next season well, and it contributes a good amount to both of the best blends. Another is that the different metrics appear to be good at different things. Wins Produced does a good job at explaining what happened. While many people malign how WP treats rebounding, the fact of the matter is that rebounds are useful and someone grabbed them. Comparing new WP to old, we see that it explains what happened slightly worse but makes predictions slightly better.
To make that second point a different way, I’ll make an analogy to the NFL. People often make the distinction between predictive and narrative stats there. Fumble recoveries is a good example. Fumble recoveries are fairly random, with teams showing very little to no consistent ability to recover fumbles. Also, a recovery can’t occur unless a fumble was forced in the first place, and forcing fumbles is somewhat consistent. Thus fumble recoveries is a poor predictive measure and fumble forces is preferable. However, if you tried to talk about why a game turned out the way it did, you would be lost without fumble recoveries. And since in any given game the actual percentage of forced fumbles that are recovered by either team can vary wildly, fumble forces aren’t as good of an indicator. Fumble recoveries are a narrative stat: they tell you what happened and who made it happen. Fumble forces are a predictive stat: they tell you what is likely to happen in the future.
The same ideas appear to apply to the NBA metrics, and even their ‘best’ blends. With that being said, even the ‘bad’ metrics do an ok job. Despite not being in either blend, PER correlates at .81 with the explanatory blend and .65 with the predictive one. Only APM and old RAPM have a low correlation with the explanatory blend, and everything correlates fairly well with the predictive blend with the exception of ASPM, which correlates the best due to its high contribution. The explanatory blend has a correlation of .632 with the predictive one. So you could get away with using one metric, but…
The final take-away is that using multiple metrics is the best way to go. If all you had access to was one measure, you could do a decent job (depending on what that measure was). But so many numbers are publicly available now that there isn’t much of an excuse to not gather as many viewpoints as possible. For example, as I’m writing this RAPM would tell you that LeBron is only the 8th best player in the league, behind Dirk, Nick Collison, Ginobili, Dwight, Paul Millsap, Chris Paul, and Luol Deng. Besides the fact that it appears that RAPM doesn’t do the best job of explaining current results, does that seem reasonable? Win Shares has LeBron second behind Ginobili (among players with at least 100 minutes), and of the rest of the group Millsap is next closest at 4th. WP has Manu first, then LeBron. ezPM has LeBron at number one. So a reasonable conclusion seems to be that LeBron has been the best player so far this year behind perhaps Ginobili. Nick Collison, on the other hand, is below average according to ezPM, good but not great by Win Shares, and a bit above average according to WP. He probably isn’t the second-best player in the league. However, his high RAPM is a good sign for his productivity next year (mediated by his more average rating on the other metrics).
Just to have something a little fun at the end: using my explanatory blend, I can look at the best players of the past ten years by total production. The top two are LeBron from 2010 and 2009; there’s a good reason the Cavs missed him last year. Chris Paul takes two of the next three in 2009 and 2008, with Shaq’s 2000 getting in the middle at number 4. The top ten rounds out with 2004’s Kevin Garnett, 2009 Wade, 2011 LeBron, 2010 Wade, and 2005 Kevin Garnett. The top surprising season, potentially, is Ben Wallace’s 2002 (good for 20th); it wasn’t just WP that liked him. The worst season goes to Michael Olowokandi in 2000; keep in mind that this it total production. He got 2500 minutes that year! His 2001 also comes in 4th. Andrea Bargnani can only look on in awe; he came in 5th for his 2011 season.