In part 1 I laid out the methods for generating retrodictions for Wins Produced, ezPM, RAPM, and APM. Here in part 2, you get the goods.

Let’s start with my original version, where rookies all got the same average productivity. In 2011, we get the following mean absolute error: 2.51 RAPM, 2.68 ezPM, 3.32 WP, and 3.95 APM. If rookies get their actual productivity, we get 2.72 RAPM, 2.76 ezPM, 2.77 WP, and 3.50 APM. A couple things to note: RAPM and ezPM get numerically worse knowing actual rookie production. This is only somewhat interesting for RAPM, since ‘actual’ rookie production for it in 2011 is an assumption of 0 (and you know why if you read part 1). For ezPM it’s only a difference of .08 points on average, so probably not a big deal. On the other hand, WP and APM get a decent amount better. If we had to choose, RAPM seems to make the best picks followed by ezPM, WP, and APM. Looking at one year, though, it’s hard to draw firm conclusions. So…

2010: I can’t use ezPM any more because the 2009 numbers are not to be trusted according to Evan. The average rookie errors are 2.97 RAPM, 3.39 WP, and 5.08 APM. Actual rookie errors are 3.22 RAPM, 3.15 WP, and 4.36 APM. Again we see WP and APM improve while RAPM gets worse. APM again brings up the rear, and RAPM is the best although not if actual rookie production is used. The predictions also seem to be generally worse for 2010 than 2011.

2009: The average rookie errors are 3.04 RAPM, 3.77 WP, and 5.02 APM; the actual rookie errors are 2.93, 3.27, and 4.37. The same pattern is popping up, although RAPM improved slightly by knowing rookie production this time.

2008: There is no 2007 APM data, so APM drops out here. RAPM has average and actual rookie errors of 3.33 and 3.57 while WP has errors of 4.23 and 4.19.

2007: the last year I can look at with more than one metric. RAPM has average and actual errors of 2.48 and 2.50 while WP has errors of 2.54 and 2.68. This is the only season where WP does worse using actual rookie performance.

Summary: there’s only one season where we can look at ezPM, but it does pretty well that year. It comes in second to RAPM, and is essentially tied with RAPM and WP if actual rookie production is used. Assuming that pattern would hold up, it seems like ezPM does a good job of both explaining existing results and predicting future performance.

APM covers three seasons, and consistently does the worst. There’s some chance that this is due to my replacement player assumption of -3.8. However, since replacement players by definition don’t play many minutes, they shouldn’t affect the results too strongly. APM consistently got better when allowed to use actual rookie performance, suggesting that it can describe within-season performance better than guessing the average. But given its overall performance, APM would not be my first choice as a player metric.

RAPM covers most of the timespan I looked at, and consistently comes in first. What’s potentially more interesting is that RAPM also consistently did worse when given actual rookie performance. This is excusable in 2011 when it ‘guessed’ that all rookies were average just as my default filler value, but it makes less sense for ’07, ’08, and ’10. Using WP as the most consistently available alternative, RAPM is better by about half a point of average absolute differential.

Finally, Wins Produced was consistently better than APM but consistently worse than RAPM. It did improve when allowed to use actual rookie production, which allowed it to make the best predictions in 2010. It was also close to RAPM in 2011 and 2007. So while RAPM is consistently ahead, it is not a wide gap; for the four years where both methods can use actual rookie performance, RAPM has an average error of 3.06 compared to WP’s 3.32.

In part 3, I’ll talk about some conclusions I drew from the retrodiction results and some methodological issues.

“RAPM seems to make the best picks followed by ezPM, WP, and APM.”

Thanks for reporting those results.

Hope others in the Wage of Wins Network / community will either acknowledge the results of this side by side test or be engaged in the discussion of the results and methodology used or both.

I’ll acknowledge that traditional APM lags and will continue therefore to predominantly use RAPM over traditional APM when I refer to an APM type metric. I say predominantly because I will still check traditional APM to look for cases where traditional APM and RAPM do not agree closely because I think that is worth knowing and investigating. I will continue to refer to a number of metrics as tools for uncovering more about players and metrics but will give a bit less or a lot less weight to the lesser performers.

In future metric comparisons, if they are focused on finding or concluding which is best, it would appear based on this test to make sense to compare challengers of any type against RAPM, the current leader. A future comparison of Wins Produced against traditional APM and traditional APM alone and not against the best current version of RAPM is going to be in my view a clear case of picking an opponent one can beat and avoiding the better opponent.

The rookie issue is a significant one in the comparison. I will have to read part 1 and some other stuff related to that topic elsewhere again. Maybe RAPM can be improved further for rookies in some fashion.

Congratulations on having the guts to post the results when you already knew that RAPM came out ahead. Respect

Welcome to the darkside, Alex!

