## An Interesting Problem

In my last series of posts, I looked at the predictive ability of Wins Produced, ezPM, RAPM, and APM.  It appeared that RAPM did the best.  One reason that I suggested but didn’t think was a good explanation was that RAPM assumes that more players are average.  This would essentially add a certain amount of regression to the mean across seasons which may help with predictions.  I decided to look into it a bit more and found some interesting results.

As part of the post, I thought I would walk through another team as an example, which gives me an opportunity to double-check that the numbers are right.  It was a good thing, because I noticed that my code accidentally put all rookies at replacement level for APM (since they had no measure in the previous season) under the average rookie assumption.  That means the APM numbers I reported previously for that case were off.  Don’t worry, I’ll give the correct ones below (and APM is still the worst).  This time I’ll look at the 2010 Denver Nuggets.  Here are their actual performance numbers:

You can see that the metrics generally agree that Afflalo, Allen, Balkman, Carter, Graham, and Petro were below average, while Andersen, Anthony, Billups, Nene, Lawson, Martin, and Smith were above average (although there are some disagreements, and disagreements as to degree).  And here are their performances in 2009:

Ty Lawson is missing since he was a rookie in 2009; I’m just going to assume average rookie performance today (.045 WP48, -1.92 points per 100 possessions).  And everyone gets 0 for ezPM because there are no (usable) numbers for that season.  As you remember, those performances are used to predict performance in 2010.  In this case I’m going to present the player predictions in terms of total points produced over the course of the season.  For ezPM, RAPM, and APM, that’s just the rating times number of possessions divided by 100.  For WP the formula is (WP48-.1)*82*100/(1.927*2.54*48) to get points per 100 possessions, then converted to total points the same way as the other metrics.  Here are the 2010 predictions based on 2009:

Then I sum up the points and divide by 82 to get predicted per-game point differential, do this for every team, and find the mean absolute error for each metric.  As presented previously, WP does worse than RAPM (and ezPM in the one season available) at predicting future team point differential.  As I mentioned, one suggestion was that WP had too much of a range; the good players are too good and the bad players too bad.  So I took the predicted points for each player and multiplied them by .8.  For example, Afflalo would be predicted to produce -115 points, Allen -52.75, and so on.  As it turns out, if you do this for all six years of data that I have team performance for, WP improves in every season.  Presented in order of WP, RAPM, ezPM, and APM, the mean absolute error for 2011 is 2.83, 2.51, 2.68, 3.47; for 2010 it’s 2.92, 2.97, (ezPM drops out), 4.58; for 2009 3.16, 3.04, 4.68; for 2008 3.89, (APM drops out), and 3.33; and for 2007 2.41 and 2.48.  If you look at the WP average rookie errors from my previous post, you see that WP improves by .13 to .61 points.  This is enough for it to beat RAPM in two seasons and only be behind by .1 to .5 points in the other seasons.  Previously with average rookie performance it was always behind and typically by closer to a point.

That result would suggest that regression to the mean is helpful for WP.  I only checked .8; it’s possible some other number would allow it to beat RAPM consistently.  Here’s the rub: what if you use those regressed ratings to explain the current year (e.g., use 2010’s ratings to explain 2010)?  WP does *worse* in all six seasons.  Pure WP has the best mean absolute error every year (ranging from .123 to .222) while RAPM is the worst (1.69 to 2.28); APM ranges from .705 to .874 and ezPM in its two years has .486 and .491.  But the regressed WP has a range from .506 to .811.  It has fallen behind ezPM and declined by over half a point in accuracy; it overlaps somewhat with APM.

From this quick look, it appears as though regressing to the mean functions to improve future predictions at the cost of current accuracy.  This appears in the pattern of results for RAPM (best at predicting upcoming seasons, worst at explaining current seasons) and happens with a test of Wins Produced (multiplying the player ratings by .8 improves prediction but hurts explaining current results).  If true, it suggests that my assumption was wrong; the assumption of average players could be enough for RAPM to make the best predictions.  However, it also implies that it does so at the cost of getting the ratings wrong, since RAPM does a relatively poor job at describing current team performance.  This would be problematic because it implies that the player ratings aren’t correct per se, but simply enough in the right direction that the forced regression to the mean smooths things over and makes good predictions.  I think it also suggests that there must actually be a good amount of regression to the mean for players across seasons.

This entry was posted in Uncategorized and tagged , , , , , . Bookmark the permalink.

### 4 Responses to An Interesting Problem

1. Guy says:

These findings could show a lot of player regression to the mean, but it’s not certainly not clear from these results alone. Just as possible is that your .8 regression is correcting flaws in WP in a crude but effective way (e.g. essentially adding a diminishing returns correction for rebounds and assists). And remember that there’s a practical limit to how poorly WP can ever do, as long as most players stay with the same team from one year to the next. If you just gave every player a rating of .2 * Team Point Differential, your errors might not be that much larger than WP’s.

You could just check to see the amount of regression each metric shows for returning players. For example, is a player’s WP in year N about .8 * WP(n-1) ? Do this for each metric, and use the regressed results for all metrics to predict next season , since there’s no reason to give WP alone the benefit of regression.

Doesn’t the fact that ezPM has almost as much player variance, and yet beats both unregressed and regressed WP, suggest that WP has more problems than just next year’s regression to the mean? Also, to the extent players do regress to the mean (which they must), shouldn’t that pose the same obstacle for every metric — why would that uniquely disadvantage WP?

BTW, it’s inevitable that regressed WP does a worse job of predicting current season outcomes. WP (and ezPM as well) is constructed to provide a nearly perfect match with current point differential. Multiply it by any constant other than 1 and you have to introduce errors. The disparity between accuracy in the current season vs next season is not a problem if the difference is simply regression to the mean. But it seems likely WP has bigger issues.

• Alex says:

Using the same dataset, where rookies/low minute players are predicted to perform as average rookies, and looking only at seasons with current and predicted values (which would be 2011 for ezPM, 2007-2010 for RAPM, 2009-2011 for APM, and 1980-2011 for WP), I ran the regression you suggested. WP has a weight of .269, ezPM .19, APM .456, and RAPM .37. I think it’s interesting that RAPM still has a fairly large amount of regression even though it assumes regression to the mean as part of the method.

ezPM is as different from WP as it is from RAPM in terms of player variance – see my response on the other post. But my point was that if you already assume many players are average, you’ve essentially already built in some regression to the mean. Let’s say that the metrics all agree that Kobe Bryant had a great season in 2010, for example. Simply due to their measurement characteristics, RAPM and ezPM will assume that Kobe is closer to the rest of the pack than WP (and APM). If he then regresses to the mean in 2011, RAPM and ezPM will give a better prediction because they already had him at a relatively lower level of performance than WP. So, of these three metrics, WP is uniquely disadvantaged. If APM were accurate in its player ratings, I would predict i to also be disadvantaged this way. (side note: the order of RAPM and regressed WP bounces back and forth, and RAPM beats ezPM in the one year ezPM is available, so I don’t think it’s safe to assume that ezPM beats WP, especially since I didn’t run numbers for ‘optimally regressed’ WP or ezPM)

I didn’t mention in the post, but I did think it was interesting that the other metrics, particularly RAPM, were worse than WP in assessing current outcomes. Every time that came up in the past, it was trivial and not a sign that WP was doing anything correctly; any idiot with a calculator could make a metric that sums up to current point differential. Apparently that is not the case. I find it particularly troublesome that RAPM never gets within a point of the correct differential. Perhaps it has larger issues?

2. Crow says:

What weighted combination of Wins Produced, ezPM, RAPM, and APM produces the best predictions for this team and all teams for one year or a series of years? I’d be interested in that. Unless one metric is vastly better, I’d think a meta-metric or a regressed meta-metric would produce the best predictions, if that is what one is after,