With my new data set, I thought I would take a look at how the various metrics relate to each other. Please check that link for a description of the data set and links to all the sources. Obviously there are a lot of ways to do this, but here are some beginning shots.

The most obvious way to start is to check the correlations. Unfortunately not all of the numbers are available for everyone, so the overlap differs for different correlations. Still, the cor command for R has a useful feature called ‘use’, and you can set it to use any available pairwise values, even if a complete set isn’t there for the entire matrix. So the ezPM – old RAPM correlation is based only on 2010, but the ezPM – new RAPM correlation uses both 2010 and 2011, and so on. For everything in this post, I filtered out anyone who played fewer than 500 minutes for a team within a season. Unfortunately that cuts the data set by a third (I’m using the entire set I have of 10 years, and it goes from about 6000 player-team observations to about 4000), but hopefully the remaining data is more reliable. Anyway, here’s the matrix:

The ‘old’ and ‘new’ got cut off the row names and new RAPM got cut off from the last column; sorry about that. But here are a few observations. First, new RAPM (using the previous season as a prior) correlates at .917 with old RAPM (using 0 as a prior). So there isn’t a dramatic difference there. Similarly, old WP48 correlates at .916 with new WP48. They reported a correlation of .98; the difference is probably due to however I recombined WP48 values for players who were on multiple teams within a season for old WP. The correlation is obviously still high.

Did these changes affect how the metrics relate to other metrics? If you look at the WP48 and old WP48 columns, most of the correlations stay about the same. WP48 has a much lower correlation with PER than it used to, but nothing else changed by more than .04. RAPM, on the other hand, changed quite a bit. It now correlates at a higher rate with every other metric except for APM, which it already correlated fairly highly with, and ezPM (which still went up a little).

In general, the metrics roughly agree with each other. New RAPM correlates with everything with a value of at least .5, ezPM at least .5, APM at least .4, old RAPM at least .32, old WP at least .34 (and at least .72 with any boxscore-based measures), new WP .32/.57, Win Shares .39/.77, and PER .34/.57. But the correlations are only somewhat helpful; they take into account two variables at a time and we have eight.

We can expand on the correlations by seeing how well any one metric is predicted by all the others. I’m going to check this with regression. For example, I can predict a player’s PER from his WS48, WP48, RAPM, etc. Because the old and new versions of WP and RAPM are so highly correlated, I’m going to only use the new versions for this part. I’m also going to use scaled (mean centered and unit standard deviation) independent variables to get an idea of which metric is most predictive of the metric of interest. I’ll go in order of the variables in the matrix above.

PER can be predicted by Win Shares, Wins Produced, RAPM, APM, and ezPM with an R squared of .63. The biggest contributor is Win Shares followed by ezPM, although WS is nearly three times as important. WP is also a factor, although with a negative weight in this particular regression (remember that all the metrics are positively correlated; the weights and their signs will jump around depending on which ones you included in a particular regression). RAPM and APM are not big factors. Given that PER serves mostly as an indicator of popular opinion and is well-known to overvalue scoring, high correlations could be viewed as a negative, although I’ll try not to judge.

While ezPM is a significant predictor, there are only two seasons of it. Removing it increases the number of observations dramatically and didn’t end up affecting the R squared. RAPM and APM are now both significant, although slightly less important than WP (which is still negative). Win Shares is far and away the most important factor, with a weight about 8.5 times as large as WP. So essentially, the best way to predict PER is to know a player’s Win Shares, and players will do best if they do well on Win Shares but relatively poorly on Wins Produced. Of course, since Win Shares and WP are pretty well correlated, that’s a tough thing to find and a minor change.

Moving to Win Shares, the R squared from all the measures is .78; it’s more predictable than PER. PER and WP are the two biggest contributors and are about equal, although all the metrics are significant predictors (APM is negative). The spread is also smaller; PER and WP are about twice as important as RAPM, which is third. Again removing ezPM to increase the sample size, the R squared stays about the same (technically increasing to .79) and the other descriptions above stay the same.

Wins Produced is next. WP is predicted by the other metrics with an R squared of .746. The biggest contributors are ezPM and Win Shares with ezPM being a little more important. PER has a negative contribution in this regression, and APM and RAPM are virtually non-correlated. Again running the regression without ezPM, the R squared drops to .64. It looks like ezPM really has the most to say about WP (remember that the others so far haven’t really changed when it was removed). RAPM and APM are still virtually uncorrelated, Win Shares is far and away the most important predictor, and PER is negative. I was curious, so I checked: you can remove RAPM, APM, and PER and the R squared is basically unchanged from the original regression (.736) and the relative contribution of ezPM and Win Shares stays about the same. So those two alone are roughly sufficient to describe WP as well as it can be from these metrics.

Next is the first non-boxscore measure, RAPM. It is predicted by the other metrics with an R squared of .77. I found this somewhat surprising because I have a faint memory of boxscore stats not doing a great job of predicting APM (e.g. statistical plus/minus). Of course, it then turns out that APM is far and away the best predictor, nearly four times more important than Win Shares at number 2. ezPM is significant but a minor contributor, and PER and WP are non-significant. Thus it isn’t surprising that removing ezPM lowers the R squared a little (.736) but the rest of the story stays the same. For those who are curious, the boxscore measures alone only predict RAPM with an R squared of .398, led by ezPM and Win Shares; removing ezPM makes it .368 with roughly equal contributions from Win Shares and PER. So RAPM is fairly distinct from the boxscore measures after all.

Speaking of which, we’re on to APM. I don’t like APM, but I’ll cover it anyway. As you might have guessed, it is predicted fairly well (R squared of .727) and most strongly by RAPM. Removing ezPM drops the R squared to .6823 but removing RAPM drops it to .281. ezPM is then the biggest predictor, over two times as important as PER.

Finally we have ezPM, which is boxscore based but takes individual defense into account. If you were looking at broad strokes, it would be the next step in the move from WP (defense is only considered at the team level) to Win Shares (defense is part player, part team). ezPM is predicted by the other metrics with an R squared of .758. The biggest contributor is Wins Produced, over three times more important than PER. The other three metrics are pretty minor contributors. WP is a fairly important piece; the R squared drops to .638 if it is removed and Win Shares takes over as the biggest contributor.

So to sum up: each metric can be fairly well predicted if you know the other metrics, although RAPM and APM are distinct in that they predict each other pretty well but the boxscore measures contribute relatively little. That got me to thinking, maybe all these measures, although sometimes (usually?) viewed as competitors, are all saying roughly the same thing. To get a handle on that, I ran a principle components analysis. The short explanation is that a PCA turns your X variables (here, the eight metrics) into X uncorrelated variables (components) that are each combinations of the original variables. These new variables come in a particular order; the first accounts for as much of the variance in the original data as possible, the second the second most, and so on. Basically it’s a way to turn correlated variables into a smaller set (you don’t need to use all X variables) of uncorrelated variables that still describe the data fairly well. You can then look at the weights for creating the components to see where the variability in the data set is focused. An example from a class I took is race times for different countries in the Olympics; countries were observations (like players here) and races were the variables. Lower times in any race might be one component; relatively faster times on sprints as opposed to long races might be another. The interpretation is that countries generally vary most according to their speed in all the races, and next they vary most by if they tend to be better at sprints or long runs. The first component would separate your medal count contenders from your pretenders while the second would separate, for example, Caribbean countries from African countries. These components are independent though, so a country could score low on the first component (be generally slow) but high on the second (be better at sprints than long races) or really show any pattern.

In this case, the data set is limited to 2010-2011 since those are the only years of ezPM that exist. But the loadings are interesting; they basically follow the Olympic example I gave. The first component is to basically add equal amounts of each metric! That is, if you wanted to know if a player was good or not, the best first pass you could make would be to combine all eight measures in equal parts. That accounts for about 69% of the variance in player scores. The next component adds about 16% of the variance and is a contrast between the boxscore and non-boxscore metrics. Another 8% ignores RAPM and APM altogether and is a contrast between WP/ezPM and PER with a little Win Shares. I won’t describe the other components because the groupings become somewhat arbitrary and the variance described is low. But I think it’s interesting that if you wanted to describe how players vary as succinctly as possible, you would just see what the metrics say as a group.

As a kind of sanity check, I sorted the 2010 and 2011 players by their score on the first component. LeBron has entries 1 and 3 (3 was last year), with 2010 being the best score by a fair margin. Wade has 2 and 7 (7 was last year), with 2010 being a decent amount higher than LeBron’s 2011. LeBron’s 2010 is higher than Wade’s by about as much as LeBron’s 2011 is higher than the number 13 spot, which is Manu Ginobili’s 2010. Dwight Howard has 4 and 5, Chris Paul 6 and 9, Kevin Durant’s 2010 is 8th and Steve Nash’s 2010 rounds out the top 10. Scores start getting pretty tight by the time you hit number 11, so OKC supporters don’t need to be terribly offended that Durant apparently took a step back last year; he was still a top 30 guy across two seasons. About the only surprising name in the top 25 might be Greg Oden, who popped in at 16 for his 2010 ‘season’. I guess I should say surprising to me; others might still balk a bit at seeing Kevin Love, Nene, or Chris Andersen in a top 30 list. At the bottom of the list was Josh Powell (2010), Sasha Pavlovic (2010), Jonny Flynn (2011), Josh again, Mo Williams (2011), and Jannero Pargo (2010). Perhaps the ‘best’ player at the wrong end of the list is Aaron Brooks, although I’m sad to see 2011 Jason Maxiell come in at number 30.

For those of you who are curious about the RAPM/APM warriors on component 2, the top five are all from 2011 and are Nick Collison, Jason Collins, Ekpe Udoh, Ronnie Price, and Dirk Nowitzki; they all performed relatively better on RAPM/APM than the boxscore measures. The boxscore guys were Ed Davis (2011), Drew Gooden, J.J. Hickson, Nazr Mohammed, and Earl Boykins (all 2010). Finally, on component three we can see which guys are ‘stat nerd heroes’ (high on WP and ezPM but low on PER): Reggie Evans, Thabo Sefolosha, Shane Battier, Jeff Foster, and Ronnie Brewer, and which guys pass ‘the eye test’: Andrea Bargnani, Amar’e Stoudemire, Marreese Speights, Brook Lopez, and Mo Williams. Speights is a surprise, but the top 30 is a virtual who’s who of NBA stars, which tells me that the component is doing what it claimed.

This was a fun exercise overall. I hope everyone found it as interesting as I did; I like the idea that we’re all roughly on the same page even if WP guys slam APM and RAPM and APBR guys slam WP and most everyone slams PER. It makes me feel that we’re really arguing about degree for the most part, and not so much about what to do in the first place. And I hope to get good mileage out of this data set in general. When I get a little time, I might try to do some player projections even though the season’s already started. Seems like a big project, but a data set like this may demand it.

Excellent work, Alex! I’d like to add Advanced Statistical Plus/Minus (ASPM) to your collection–I can send you the spreadsheet. I’ve got historical data going back to 1978. It doesn’t appear you’ve got a true statistical plus/minus model in the mix here (created by regressing box score data onto APM, or in my case, 8 yr equally weighted RAPM.)

I’d like to see that too.

Sounds good to me! My email is sportsskeptic at gmail (there’s an extra s compared to the blog’s name).

Okay, I posted the new version of ASPM online on Google Docs at https://docs.google.com/open?id=0Bx1NfCUslJwxM2Q1MzFiMjEtNmY5Mi00ZjgxLWIyOTEtODMzMmM4YmQzMmEx It’s an XLSM file–can you open that?

Included are the ratings from 1978 to 2011, for ASPM, WS, and PER.

Success! I’ll have to see about actually getting everything lined up and whatnot, but I have the file at least.

This was really great, Alex. I’m pretty much at peace these days with the state of box score vs. +/- metrics. To me, it seems that box score stats can serve up a nice descriptive view of a player (which is why I like breaking down ezPM into its various components), but when I really want to have the best estimate of player value, I tend to give RAPM the most weight (even over my “baby”). My only question now is whether, for purposes of prediction, does having an “ensemble” average of RAPM plus a box score metric (take your pick) give any improvement whatsoever. I did two sets or predictions this season, one with RAPM alone and the other with a blend of ezPM and RAPM (I should probably do ezPM alone, but for some reason I didn’t think of it). It will be interesting to see how it goes.

I haven’t played with the current version of RAPM yet, but my feeling is that most of the predictive strength of ‘old’ RAPM was simply due to built-in regression to the mean (the 0 prior). The fact that Jerry told everyone to cut his ‘most predictive’ version of RAPM in half to make predictions only makes me believe that more, but I’m hoping to run my prediction test on the new data set when I get the chance, so I’ll find out then.

That all being said, I wouldn’t be surprised if some kind of blend of RAPM and something else would provide better predictions. It’s a separate source of information, and it would have to be completely redundant to not be any better. I think the more interesting question will be what the blend is like; more RAPM with less something else, vice versa, which other metrics boost predictions, etc.

One should multiply ratings with a factor <1 for BOTH old AND new RAPM. Usually 0.7 works fine, "half"(0.5) was said because of unknown effects of the lockout.

It's possible that with old RAPM that factor is closer to 1, but if there's a difference it's really minor.

Frankly, every metrics needs to either be doing this or adjust for aging because both help with prediction performance

Eric makes an important point that Jerry made sure to tell me as well when I used RAPM for my projections. I ended up using 0.68 as my multiplier because that was the slope I found when regressing RAPM on ezPM (which theoretically has a 1.0 multiplier).

I have no doubt that multiplying the other metrics by some <1 factor would help their predictions. The point isn't that this is wrong, but that old RAPM had it built in; this gives it an intrinsic predictive benefit if you use a simple projection like I did in my previous study of the various metrics. I'm guessing that the boxscore metrics would catch up quickly if I gave them some kind of regression to the mean. I also have no doubt that both old and new RAPM would benefit from (additional) regression to the mean.

I don't think I buy the lockout as a reason to reduce estimates. Right now, I'm supposed to believe that LeBron James is best described as a 4.9 RAPM player. He hasn't been that low since 2007 according to any single-season ratings. Why should I suddenly predict that he will be two points worse? Does he need practice time more than other players? What about young players or players in their prime? Dwight Howard has been around a 4 the past few years and a 5.5 last year. Why should I now believe he'll be a 3.8? I think he said to cut them in half because new RAPM doesn't have the inherent regression to the mean that old RAPM had. In any event, the website said that those ratings (prior to being cut in half) were 'the best version to predict outcome of future matchups'. That obviously isn't true if he then felt he had to adjust them. I find it odd. But I'll look into it more systematically in the future; this is obviously just my feeling right now.

No, sorry. The “deprecated by 0.4” means the WEIGHT of those seasons/portions of seasons is 40% (more regression toward the Bayesian prior)–that makes more of an impact if, say, a player played somewhere for 5 seasons, and then was traded middle of last season. The previous years are weighted far less than they would be if they were for the same team.

In my current projection algorithm, I regress toward a Bayesian prior based on MPG and Team Efficiency (you’d be getting more minutes as a mediocre player if you were on a bad team), but also toward a nominally central prior close to the median for all players (not the mean) (that’s what I found worked best). I’m going to try to unify those two priors at some point soon.

In regards to new vs. previous seasons– as a point of reference, if trying to do an accurate weighting to project performance from partial season + prior years, I’ve found that you weight the current season at 2, last at 1, and prior to that at 0.47^n, where n is number of years further back. (Accounting for aging, of course).

If a player switches teams, the previous results should be deprecated by about 0.4.

If you’re doing off-season projections, do the same numbers still apply?

The trade number seems interesting. Does it properly imply that good players who are traded become worse and bad players become better (I’m assuming average would be 0, although maybe I shouldn’t)? Is that a selection effect of some kind? Or does it imply that any traded player becomes worse?

There are many points wrong in that post (you’re even contradicting yourself)

but I’ll just say that there’s a difference between predicting *in season* matchups and matchups of an entirely new season

I think I see the contradiction. Is it that I said that regression should help but then argued against regressing LeBron and Dwight? Fair enough. I guess my argument is really with the extent of the regression. Even in ‘old’ RAPM, which assumes players should be close to average, LeBron has been above his projection the past three years. And why would the lockout make good players worse but bad players better?

No one learns if they don’t see their mistakes. Go ahead and leave a comment or email me.

BTW, Alex, are you on Twitter yet?

Nope. I post so little on Facebook or Google + that I felt I would be ‘abusing’ twitter by just following without putting much out there. But maybe I’ll make a New Year’s resolution or something.

There are too many talkers and not enough listeners already!

I’d bet that if you used only the offensive component of RAPM, correlations would be higher with the other metrics(especially PER).

As far as I know, the WP guys don’t slam RAPM, just APM (or maybe they think it’s the same, I don’t know).

Love the PCA! Definitely something you don’t see everyday in basketball related blog posts.

Interesting to see that informing RAPM with prior data moves it that much closer to the other metrics

“That is, if you wanted to know if a player was good or not, the best first pass you could make would be to combine all eight measures in equal parts”

Not really. You don’t really know whether an equally weighed mix of the metrics is a better metric than any other metric alone. The “perfect metric” could be in there and be subsequently washed up with crap metrics

I think that was true with APM (offensive APM was better predicted than defensive APM), so it seems reasonable.

That’s a fair point about combining the metrics. It’s more accurate to say that most of the variance in player metrics is accounted for by an equal combination of the variables. But my guess (which will need to be validated with something like what Evan and I mentioned in the other comment) is that some manner of combination will be better than any given metric alone.

Alex, great post. But out of curiosity, could add to the correlation matrix simple stats like Tendex, NBA Eff and/or even PPG?

There are obviously a ton of correlations to potentially look at, so I don’t think I’ll ever post all of them in one place. Points per 48 isn’t too hard though; the correlation is .79 with PER, .44 with Win Shares per 48, .064 with WP48, .26 with the old WP, .22 with old RAPM, .35 with new RAPM, .26 with APM, and .27 with ezPM100. I only used player-teams with over 500 minutes for that. I’m not surprised that PER’s value is so high, but I am surprised that WP’s is so low in its new incarnation; all they did was change defensive rebounds. But the correlation with true shooting is pretty high (about .56) and that seems to fit the theme, and I can’t find anything wrong with the numbers.

Alex: I saw this while checking in on your site for the first time in a while. Interesting data and analysis. I would just add one observation: looking only at correlations can sometimes obscure important differences between metrics in the magnitude of variation among players. For example, the correlation between old and new WP is (as you note) quite high. Yet the change in valuation can be quite substantial. In “old WP,” Dwight Howard was twice as valuable as Ray Allen last season, and contributed about 12 more wins than Allen. However, in “new WP” Howard was worth 50% more than Allen, or about 6 additional wins — a very different assessment of their relative values. Nor is this an isolated example: quite a few players see their measured value change by 2-4 wins.

Dave Berri argues, based on the high correlation, that “although there is indeed a difference in the rankings, the difference isn’t very large.” While “very large” is of course a subjective evaluation, I think most basketball fans would feel that changing a player’s value by 4 wins is indeed a very large difference. Certainly the marketplace says the difference is large, as adding 4 wins to your team is worth millions of dollars in salary. Or imagine that NBA salaries were based roughly on OldWP, while actual value was measured correctly by NewWP. Surely a GM that relied on NewWP would have an enormous advantage over his peers, and could be expected (given a reasonable payroll) to routinely put excellent teams on the court. Knowing that Howard is really worth only 50% more than Allen, while the other 29 guys think he’s worth 100% more (whether in dollars allocated, or players traded), is a huge edge.

So in evaluating metrics, I think you want to evaluate how well they capture the degree of variance(perhaps using predictive tests), in addition to whether they get the rankings right.

If you believe in such things (which apparently in WoWland, is not something to be taken for granted):

http://wagesofwins.com/2011/12/23/how-to-judge-predictions/

Good to hear from you, btw. Hope all is well.

Yep, that is true. When I get around to do my ‘retrodiction predictions’ again, I hope that will help clear some of this up; if old WP is really off by 4 wins on Dwight Howard, the prediction moving forward to next season will be pretty bad. New WP will presumably do better if the changes made were really for the better.

As a general disclaimer, I’ll also note that I have a much smaller correlation than Dave reports. Presumably I have some mistakes in my database (I’ve found one so far just randomly). If Dave’s database is the correct one, and I have no reason to think it isn’t, then the changes are likely small in general. Some players will have large changes, like Dwight moving by 4 as you point out, but that will mostly (if not entirely) be high minute, high production players. Dave isn’t the only person to say that such players are actually underpaid, and so the market implications there are uncertain. If a max contract already doesn’t pay LeBron enough, it probably still isn’t fair even if we dock him a couple wins. But if it were to turn out that a lot of more middling players had wins that were off by a few, then there would be more of an issue, I agree.

Evan: Happy New Year. Hope you are well. Wow (WOW?), that post is really something. I guess that’s what happens when you turn the asylum over to the inmates….

Alex: Sure, the big changes will be for the starting players, and mostly the good ones. And the spread of talent in the NBA is so large — larger than all the other big sports — that even fairly differently constructed metrics will tend to have pretty high correlations (as you show here). I’m sure that if you averaged all the metrics and identified the top 75 players, those players would each be among the top 100 in virtually every metric (and usually in the top 75). But what that means, I think, is precisely that the important valuation question in the NBA is knowing how to rank those top 75 players and how big the gaps between them are. I wouldn’t give much credit to any metric for telling us (correctly) that the other 300 guys aren’t worth much.

True, the salary cap is a confounding factor. But I think an accurate valuation model is still extremely valuable to a team. If some max salary players are 6 wins better than others, then it matters a lot which of those guys you sign. And if you are packaging players and draft picks to trade for a young stud, it matters a lot whether he’s a 10-win or a 13-win player (just as it matters a lot if the draft pick you are giving up is worth 2 wins or 4 wins.

BTW, I think Dave’s .98 is based on WP, while you are looking at WP48. His r is higher because MP of course remains the same.

Pingback: Players at heart of disagreement between metrics « Weak Side Awareness