## Consistency of Metrics

One of the things that I’ve looked at before as a pet project is the consistency of metrics across seasons.  For example, I mocked up some fake versions of Wins Produced, PER, and NBA Efficiency and found that Wins Produced would be the most consistent across seasons because it emphasizes a more consistent statistic – namely, rebounding.  Now that WP has changed and I have a bunch of other measures, I thought I would revisit the issue with the real thing on the way to doing some other projections.

First, let me clarify what I mean by consistent.  The question is, if you know how a player did last year, how well would you be able to guess how well he’ll do this year?  If a metric is consistent, then you should be able to guess pretty well.  If a metric is inconsistent, then you’ll have no idea -a good player last year could be good again, or average, or below average.  To measure consistency, I’m going to use the simple correlation between a player’s rating in one year and his rating in the next year.

Second, why is consistency important?  Some amount of consistency has to exist, otherwise making predictions is impossible.  Let’s say you’re the GM for the Cavs in 2007.  LeBron is finishing up his rookie contract in a season where you won 50 games and went to the NBA Finals.  LeBron was clearly your best player, and is a top talent in the league overall.  Should you give LeBron a max contract?  If you believe that NBA players are completely inconsistent, the answer is probably no.  Yes, LeBron was good this year but next year he could do anything.  He could be the MVP, he could be average, he could be terrible and sit on the bench all year.  Why would you pay anyone a max contract?  How would you know what to pay anyone?  Your best bet would be to trade anyone who ever did well for 2 or 3 other guys and hope that one of them plays like a superstar.  But if you believe that players are somewhat consistent you can make informed decisions.  LeBron was pretty good in 2007, so he’ll more than likely be pretty good again in 2008, and into the future.  You should sign him, and you have a decent idea of how much to sign him for because you know how about much production he’ll get you if he’s healthy.  You can cut bad players because you know they won’t suddenly become All-Stars.  Consistency is important.  Consistency in the NFL appears to be pretty low; I don’t envy the decision makers in that sport.

Finally I should describe the dataset.  I took the set I have posted (linked above) and said that a player’s ‘predicted’ rating is simply that rating from the previous year.  For a guy like Al Horford, that’s easy.  He played last year and only for one team.  So his PER, for example, was 20.7 in 2011 and 19.4 in 2010; his predicted PER for 2011 is 19.4.  For players on multiple teams the previous season, their predicted rating is a minute-weighted combination of the ratings from the various teams.  Sticking with the 2011 Hawks, Hilton Armstrong played for three teams in 2010.  He put up a 1.1 PER in 40 minutes for Houston, a 7.5 in 239 minutes for New Orleans, and a 8.4 in 56 minutes for Sacramento.  His predicted PER in 2011 is then 6.9 for both his stint in Atlanta and Washington.  Rookies are tough to project, so I’ve taken them out for this exercise.  Due to the noise in the measures for low-minute players, I’ve also taken out anyone who played under 500 minutes for a team in a given season.  This leaves about 3300 player-team-season observations in the whole dataset, although some of the measures have fewer since I don’t have numbers (or they don’t exist) for the whole timespan.  Keep in mind that since players have different entries within a single season if they played more than 500 minutes for each team, they will have different ratings for PER, WS48, old Wins Produced, ezPM, and ASPM.  Hilton Armstrong, for example, outperformed his projected PER on both Atlanta and Washington, but outperformed it by more in Atlanta.  In contrast, new Wins Produced, APM, and old and new RAPM are calculated only at the season level and so they will be the same for players who appear more than once within the same year.

Ok, so down to business.  Let’s start with PER.  The regression equation for year N predicted from year N-1 is 3.235+.7717*X (X is the N-1 rating); the correlation is .77.  As you might expect, this means that you should predict players to regress to the mean.  An average player (15 in PER) should stay average, but bad players will be better and good players will be worse.  The correlation is fairly high, so PER thinks that players are pretty consistent.  Of course, there’s still plenty of wiggle room; a correlation of .77 means an R squared of .595, so about 40% of the variance in PER comes from something other than a player’s PER in the previous season (age, injuries, teammates, system, other stuff).

Win Shares (WS48) has a regression formula of .0389 + .6059*X with a correlation of .623.  You again get regression to the mean, but the correlation is a fair amount lower.  Wins Produced, which is on the same scale as Win Shares (.1 is average; WP tends to have a wider range of scores though), has a pretty similar equation of .0352+.6285*X and a correlation of .661.  Old Wins Produced (before the defensive rebound adjustment) has an equation of .0285+.6875*X with a correlation of .705.  Unsurprisingly given the consistency of rebounding from year to year, lowering the weight for defensive rebounds has decreased the consistency for Wins Produced.  It has similarly led to an increased (predicted) amount of regression to the mean, but the effect is small; a .2 player is predicted to be either .166 or .161, which over 2500 minutes is either 8.65 or 8.38 wins.

We’ll step away from boxscore measures briefly to cover the plus/minus types.  Old RAPM (with a 0 prior) has an equation of .0766+.4175*X with a correlation of only .373.  Not only is this the lowest correlation so far, it’s also the first measure to predict average players (in RAPM, average is 0) to improve.  New RAPM, which uses the player’s previous season as a prior, is unsurprisingly much more stable.  It has an equation of -.0109+.7345*X (the intercept isn’t significant) with a correlation of .747.  Thus simply changing the prior is sufficient to move RAPM from the least consistent to nearly the most consistent metric so far.  And to continue beating APM into the ground, the equation is -.215+.4079*X (intercept non-significant) with a correlation of .446.  I’m surprised to see that APM is more stable than old RAPM, but for the most part I don’t care any more.  I hope no one is still looking at APM to inform their decisions.

Finally we have two adapted boxscore measures, ezPM and ASPM.  I call them adapted because ezPM uses play-by-play data instead of what you get from the basic boxscore and ASPM is based on advanced boxscore numbers (and, I believe, uses predictors beyond first-order).  ezPM has an equation of -.159+.575*X (intercept not significant) and a correlation of .569.  ASPM has an equation of -.0471+.6955*X (intercept trends significant) and a correlation of .718.

All of the ‘common’ metrics make a simple prediction of regression to the mean with something of an exception for old RAPM.  Old RAPM apparently believes that below average as well as average players will improve, although to be fair the prediction is very small.  A player starts to regress at a rating of .13, which is barely above average.  The metrics do vary in how much consistency they predict from year to year.  The low end is again old RAPM, with a correlation of only .373.  The others vary from ezPM’s low of .569 to PER’s high of .77.  Ignoring the plus/minus measures, the consensus/average correlation is .67.  That seems fairly reasonable, in part because RAPM is above that but imposes a certain amount of consistency through its prior.  It also makes a certain amount of sense to me that ezPM is on the low end of the boxscore metrics.  It is the only one to take individual defense into account, which is likely to be more variable than measuring defense (mostly) at the team level.

This entry was posted in Uncategorized and tagged , , , , , , , . Bookmark the permalink.

### 4 Responses to Consistency of Metrics

1. EntityAbyss says:

Hey, what is RAPM, and how does it manage to stay consistent? Also, old wins produced here has a .7 correlation year to year, but dberri had it at .8. How did you guys end up with different results?

• Alex says:

RAPM is ridge-regression (or regularized APM). There’s a number of places you could probably find descriptions, but the short version is that instead of the beta weights in the regression coming strictly from the sum of squared errors, there’s also a term that penalizes the size of the betas themselves. In the ‘old’ version of RAPM that I’m aware of, the penalty was for betas that were too far away from 0 (0 is called the prior). This has an assumption that all players should be average and then fits the regression around that. The new version did that for 2002 (I think), then used the 2002 rating as the prior for 2003, and so on up to now. Thus the assumption is that players should be similar to how they performed the previous season. This is how it stays consistent.

There are probably a few reasons my correlation is lower on WP. I probably still have some errors here and there that I haven’t caught in terms of lining all the player data up correctly. When I’ve done this before, I used adjusted WP48 and then gave each player his known position for the next year; if a guy played mostly at power forward one year and then mostly center the next, I accounted for that. But I didn’t do that this time. I also may have used different minute cut-offs than he did, and I have players split out if they played for multiple teams in the same year instead of combined for their full season. The number can jump around a bit depending on how exactly you organize the data.

2. EntityAbyss says:

Hey Alex. Since the new wins produced has a lower correlation than the previous one, what makes it necessarily better?

• Alex says:

Better than the original version? Well, many people complained that defensive rebounds were overvalued and we know that defensive rebounds in particular show diminishing returns, so reducing their value should make it better; that was the point of the change. In regards to the correlation, lower doesn’t necessarily mean worse. Presumably there’s some value where the correlation is obviously too low to be accurate and some value where it’s too high, but I don’t know if it’s possible to know what those numbers would be. Regardless, I don’t think the value for original or new WP is in danger of being in either area. In the middle it’s a grey area; players obviously change from year to year and it’s unclear what the straight linear correlation ‘should be’ with no accounting for age, changing teams, etc. So the drop in correlation isn’t a concern, I don’t think.