In part 1 I tantalized you with the ability of different NBA productivity metrics to explain what happened. This time we get to the important part, which is their ability to predict the next season.

No wasting time today; here are the results.

You can ignore 2000; since no one was in the database the previous season, everyone was given their actual production (i.e., there is no predicting involved). The winner looks like ASPM with an average error of 2.24. Win Shares is next at 2.37, then new RAPM at 2.62, new WP at 2.63, old WP at 2.74, PER at 3.07, old RAPM at 3.23, and APM at 4.24. The one year of ezPM came in after ASPM, new RAPM, and Win Shares. To put this in a bit more of perspective, we can convert these point differential errors to wins. With a very simple prediction method, ASPM is off by 5.7 games on average. Win Shares is only off by 6, so less than half a game across 9 years. New RAPM, which uses the players’ previous rating as a prior, is back by about a game at 6.7, and virtually tied with new Wins Produced (although the averages are over different years; RAPM is better in three of the four years in the database). Old WP comes in at just under 7 wins error then you have a jump to PER at 7.8, old RAPM at 8.2, and APM at 10.8.

So I should note right now, these results are different from what I got before. RAPM was better than WP previously, and now it’s worse. So now would be the time to double-check that database and let me know; otherwise I’m assuming that I was incorrect before.

Let’s start at the back. APM and PER do not do a very good job of predicting future performance. I don’t have the sense that anyone really uses these to evaluate players besides ESPN, so this isn’t really a big deal beyond the fact that the public will continue to have a poor idea of who the good players are. PER may retain some value as a general measure of public perception, even though you will get highly rated non-superstars, but I think that’s about it.

Next you have Wins Produced and new RAPM. Including a player’s previous season as a prior is helpful for RAPM, which is itself an improvement on APM. So things are heading in the right direction. However, you might hope that RAPM would do better given that it effectively has two previous seasons of information whereas the other metrics only have one. And that’s being generous; 2011’s rating has 2010 as a prior, which has 2009 as a prior, which has 2008 as a prior, etc. For players who have been in the league long enough, 2011’s RAPM rating is influenced by what happened back in 2002.

At the top of the heap (again, barring errors in the data) are ASPM and Win Shares. These are probably the two most complicated metrics, but they’ve apparently earned it. While Win Shares is widely available at basketball-reference.com and they have a whole page dedicated to describing how to calculate it, the actual equations are not present. Nor have I ever actually seen the equations from Basketball on Paper replicated anywhere. Similarly, I haven’t been able to get my hands on the formula for ASPM. I think it was tracked at least somewhat on the APBR site, but it’s become difficult to find things there since the site crashed a little while back. But ASPM uses ‘advanced’ box score metrics and is non-linear (i.e. includes interactions between predictors and/or squared terms and the like). ezPM could be included in this group, since it uses play-by-play level data and would thus be pretty tough for your average person to ever compute on his own. But it appears to be worth it; they all make predictions pretty well.

So you might think that’s the end of the story, but not quite! All good stories come in trilogies. Click here to check out part 3 (coming tomorrow).

ASPM isn’t really complicated; I just haven’t gotten around to writing up the the full description (the spreadsheet I sent you has all of the math in it directly).

The general equation is a*MPG + b*TRB% + c*BLK% + d*STL% + e*USG%*{TS%*2*(1-TO%) – f*TO% – g + h*AST% + i*USG%}

Scale so minutes-weighted average in the league = 0, then add j, where j adjusts the total to equal Team Efficiency Differential (either schedule adjusted or not).

The weights were found by minimizing possession-weighted squared error on an unweighted 8-year RAPM dataset, with players under 3000 possessions over those 8 years neglected.

That’s all there is to it! I’ve experimented with nonlinear h coefficients (I think your spreadsheet was nonlinear) and at times nonlinear b coefficients. The improvement is marginal (perhaps 1% in weighted R^2 for each).

Maybe I mostly remembered that last part; I thought it was worse. Still, a bit worse than NBA Efficiency :-)

Did you, in your predictions, include any regression? You can get better results with ASPM if you use regression toward a previous-minutes-&-team-efficiency-informed prior.

Nope, no regression at all. This is strictly the bare-bones, equal-footing prediction. I’m hoping to get to more sophisticated projections in the future, but I didn’t want to add anything on top yet. This isn’t meant to be the best possible prediction, but a comparison on level ground.

I thought that was the case, and I agree with your choice here.

I conducted a similar test here:

http://basketballprospectus.com/article.php?articleid=1985

I used basic linear-weights formulae (no team adjustments) from year Y to predict game-level outcomes in year Y+1 (low-minute players were assigned the lg avg). The dataset was every game from 1987-2011, excluding 1999 and 2000. Here’s how they ranked:

Metric % Correct

New SPM 63.2%

Alt Win Score 63.1%

Old SPM 63.1%

APMVAL 62.9%

Production 62.8%

Thibodeau 62.7%

Pts Created 62.6%

NBA Efficiency 62.5%

Old Win Score 62.4%

TENDEX 62.3%

Game Score 62.0%

Sports Illustrated 62.0%

VORP 61.1%

Nash 60.6%

I also did a test to predict games involving teams with high roster turnover from Y to Y+1. In that test, the APM-based metrics declined in effectiveness, but Alternate Win Score (Dan Rosenbaum’s Win Score tweak) retained its predictiveness even in extreme situations. Because it predicts well in a variety of circumstances, I anointed AWS the reigning king of linear weights metrics.

Thanks Neil. I saw that post when it came up, but I’m not a subscriber so I couldn’t read the whole thing.

How did you convert some of the metrics, like NBA Efficiency or TENDEX, into outcomes? Was it just to pick the team with the larger total?

Yeah, the team with the greatest minute-weighted total was predicted to win the game.

Ironic that the winner is a stat which Dan R. basically just pulled out of the air in order to make a point (about WP)!

I’m surprised the best is just 63%. Wouldn’t you get about 60% just by picking the home team? Do these metrics really ony add another 3%?

According to ScoreCasting, NBA home teams have won 60.5% of the time in the last ten years (ending in 2009), but over the history of the league it’s closer to 63%. But I would have to leave any other details/explanations to Neil.

In the sample I used, the home team won 61.5% of the time. So VORP (Kevin Pelton’s very old linear weights metric) and John Nash’s metric actually did worse than picking home teams, and no metric did especially better than home-court. Which sort of adds to Phil Birnbaum’s point about the shortcomings of advanced all-in-one metrics:

http://sabermetricresearch.blogspot.com/2011/01/sabermetric-basketball-statistics-are.html

Of course, the Lewin/Rosenbaum study cited by Phil also found that AWS was the best metric to predict future wins (and outperformed MPG — admittedly a low hurdle), so there seems to be a pattern of it outperforming other metrics. If I was looking at an all-in-one “production” metric, AWS is the one I’d choose.

Alex did you just use the prior season or an average of several seasons for each metric?

Just the previous season. I would need to double-check, but I believe if a player missed a year the program grabbed the last season he did play (e.g. if someone played in 2009 but not 2010, I think 2009 was used to predict 2011). Again, just keeping it simple.

Alex, did you adjust the point differential for the strength of schedule? Like using SRS instead? Given the fact that RAPM adjusts for the strength of the opponents players too, it will likely differ more from the point differential than from the SRS. Just a guess.

Btw, if you want, you can include my SPM metric in such a comparison as well. 2010-11 for example: http://bbmetrics.wordpress.com/player-ratings/2010-11/

It should also do better, if the results are compared to SRS than to MOV.

I didn’t adjust for schedule at all. If I work on player projections enough I might get that far in though.

I’ll see if I can get those added. wilq sent me a file with SPM as well; I’ll have to see if they’re the same as your rankings or a different flavor.

I think it would be more fair for RAPM here. Also, I saw that you used the average over the available years. If you just use the last 4 years, RAPM is actually the 2nd best behind ASPM.

ASPM: 2.52

newRAPM: 2.62

WS: 2.73

newWP: 2.84

oldWP: 2.98

PER: 3.21

oldRAPM: 3.21

The SPM shouldn’t be the same as wilq’s. And iff you want them in some sort of special format (like Excel, csv, etc.), I can provide them to you via e-mail. Just let me know.

Nice work.

I’d be interested in two other simple tests, just for comparison, to see how well these metrics do compared to some other simple things:

1) Previous year’s team point differential, or possibly the previous years point differential / 2, for regression to the mean purposes.

2) Simple player on court +/- per min, (divided by 5, since there are 5 players per team).

The first one is easy, assuming I lined up the franchises correctly across moves. The mean absolute error for previous year’s point differential is 3.04; if you divide by 2 it’s more like 2.45. I don’t have the data for the second one.

Pingback: NBA Retrodiction Contest Part 3: The Perfect Blend | Sport Skeptic

Pingback: Quora

Pingback: Estimating Player Impact « shut up and jam