Last time I looked at the ability of three different metrics (ezPM, WP, and RAPM) to predict how teams would do this past season. That post has all the details, so you should give it a read. It turned out that ezPM did the best and WP the worst, although performance was very close between the three. This time I’ll look at 2010.

In order to predict 2010, I’m primarily taking performance by players in 2009. Unfortunately, that means that I can’t use ezPM because Evan has told me that the data is likely flawed. Double unfortunately, that means I can’t use RAPM because I used Evan’s data to get the number of possessions played. So I’m left with Wins Produced. In 2011, WP had a mean absolute error of 8.3 for wins and 3.32 for point differential. In 2010, those numbers were 8.47 and 3.39. Very similar. The biggest mistakes in 2010 were for Phoenix and OKC followed by New Jersey, Milwaukee, Golden State, Minnesota, Sacramento, Toronto, and Boston (that sounds like a lot, but that group is just clumped together without a clear cut-off line). Phoenix was better than expected due to Channing Frye and Robin Lopez not being terrible, Goran Dragic improving, and improvements by Nash, Richardson, and Stoudemire. OKC was also better than expected for the reasons you would think: all their young players got better. Durant alone added 8 wins, and there were also unexpected additions from Harden, Ibaka, and Westbrook.

Since I don’t have much else to say at this point (although I do hope to do these retrodictions for WP as I have time to type in team performance for past seasons), I’ll mention a couple other things I saw in the data. One is the correlation between the three measures. For 2011, WP48 and ezPM100 had a correlation of .73, which I find to be perhaps surprisingly high although there’s obviously plenty of room for disagreement. The correlation between WP48 or ezPM100 and RAPM is unknown since I didn’t use that data. Looking at 2010, the WP48/ezPM100 correlation is down to .55; the WP48/RAPM correlation is .33 and the ezPM100/RAPM correlation is only .16. I have a few other seasons of RAPM, so here are the WP48/RAPM correlations for 2009, 2008, 2007, and 2006: .22, .32, .29, and .29. If you think that players are fairly separable from their teams (which is part of the reason we do this whole statistical analysis thing in the first place), then these correlations are actually artificially inflated to a certain degree. That’s because each player is listed for each team he plays for in a season. Thus any player who plays for multiple teams contributes multiple data points. If he plays about the same for each team, and each system picks up on that, the correlation between the systems will increase.

Assuming that the drop in correlation for WP48 and ezPM100 from 2011 to 2010 is generally true, then the three systems should be much less similar for 2010 and previous seasons. If I can get my hands on some good data, I’m looking forward to filling in some more retrodictions. I’m guessing there will be bigger differences and with enough seasons one method may pop out as the preferred option.

Nice work,Alex. I look forward to seeing additional seasons.

A few thoughts:

The best test for metrics IMO is when players change teams or MP. So it would be interesting to divide teams in terms of pct common player-MP with prior season, to see if certain metrics do best on high turnover teams.

Beyond comparing metrics, perhaps your analysis can identify strengths and weaknesses in a given metric. Do poorly-predicted ezpm teams have any common characteristics, such as players whose ezpm depended a lot on assists? Are under-performing WP teams strong predicted rebounding teams? And so on.

Could you extend this to looking at frequently used floor units? This would give you many more ‘teams’ to look at, with more variance in skill combinations. Quality of opposition becomes a factor, but every metric will have same handicap there.

Thanks Guy. It might not be too hard for me to come up with some measure of turnover with the data set I have; I’ll look into it. I could also probably do a little work to see what leads to changes in predictions, but it might be a little more work. The toughest part of the project was actually combining all the data from different sources, so I tried to stick to the least amount of numbers necessary, which means that right now I left out individual player boxscore stats from my final set. But it’s something I should be able to do. Related to that, analyzing at the unit level isn’t something I would be eager to jump in to. I know J.E. has been doing projects along those lines with RAPM, but I’m not too excited about playing with one of those raw data sets. The little bit I looked at the APM data I downloaded, I think I saw minor disagreements in possessions played between APM and ezPM, so I can only imagine the difficulty in reconciling play-by-play data across different metrics.

Speaking of the APM data, have you ever noticed the issue I mentioned? Take Kurt Thomas as an example (he’s near the top of the data set, so he’s who I noticed). The 1 or 2 year APM and error listed on the website does not appear anywhere in the data you can download. It’s true consistently across seasons. Any guesses?

I can see that analyzing units would be a huge increase in work. Hopefully someone will tackle it at some point — I do think that’s the unit of analysis that best allows you to separate players’ individual impact from their teammates’.

Does your datafile include players’ boxscore stats as well as the metrics? If so, I think it would be interesting to identify over- and under-performing teams (in any given metric), and then look at 3 elements: rebounding, assists, and usage (maybe fouls as well). I think those are the variables where there is least consensus about their “true” value. It could tell us a lot if the mis-predicted teams tended to be very strong or weak in certain variables.

I don’t know the APM data. But Even might be able to help here.

I could have boxscore stats included, but only for a couple seasons. It’s just a practical issue – I have a ton of seasons of WP that includes boxscore stats, but it’s summed to the season level, so those stats are not broken out by team played for if someone played for multiple teams within a single season. Then I have the same number of years of WP broken out for each player by team, which I used as my base for all the work, but that file does not have the boxscore stats. The ezPM files have boxscore stats but there are only two usable seasons. The RAPM data has no information other than RAPM values. And the APM data has minutes and possessions, which I think I’m going to go ahead and use, but no other boxscore numbers. Word on the street is that some other people are working on creating a single stockpile with a variety of these metrics, which should be available soon, so hopefully at that point this would all be easier and more of these questions can be answered quickly.

It’s kind of surprising to me that systems that are so uncorrelated produce such similar/effective results. Or would you characterize them as “equally flawed” rather than “equally effective”?

How about averaging ezpm and rapm and testing that, since they were simultaneously the best measures and the least correlated?

I’m not sure that I would want to make a call about flawed versus effective. That depends on how good you think it is to be off by a point or so on average, and these are not complicated predictions; they could definitely be enhanced with an aging curve if nothing else. So when I put it that way, it sounds like they’re doing pretty well.

Combining the metrics in some way is an idea I’ve had, but haven’t gotten around to implementing yet. Maybe in the future.