Earlier in the season I took a look at how well a couple of statistical models (ezPM and Wins Produced) could predict team results. This isn’t impressive on its face; everyone makes predictions. But what I wanted to do was put them on even ground to more fairly evaluate which one does better. When people make predictions, they make all sorts of assumptions beyond the quality of the players: how many minutes each player will get, who might get injured, who might get traded, how rookies will do, and so on. My goal was to eliminate as many of the assumptions as possible. You can read the articles to see what I did, but I’ll walk through it below as well. Now that the season is over, I can retrodict the full results. I also got a hold of some other numbers so I can compare more stats. Let’s get to it.

The stats involved are Wins Produced, provided by Dave Berri; ezPM, provided by Evan Z; and RAPM, provided by Jeremias Engelmann. I have WP numbers for every player since 1978. Evan has ezPM calculated for the past three seasons, but has told me that the 2009 data is probably not correct so I’m not going to use it. Finally, I have RAPM for 2006 through 2010. I am not using the current season’s data because the numbers available on the site include the playoffs, which the other sets do not. That isn’t too important because as you’ll see, this season’s player data would only be used to predict how teams do next year, and we aren’t doing that because next year hasn’t happened yet. I was also going to include standard APM, but the data I downloaded from basketballvalue doesn’t match up to what’s posted on the site to the best I can tell, and I haven’t heard back from Aaron about it yet. If you’d like to take a look, a few people have posted similar retrodictions for the past year at the APBR site. They’re using different assumptions than I will be, but you can see numbers from Daniel (DSMok1), Evan, Jeremias’ 1 and 3 year RAPM, and Aaron’s APM.

So how does this work? For every player this year I found the last season they played in the NBA. For a lot of guys, this is 2010. I took their per-minute productivity (for WP) or per-possession productivity (for ezPM and RAPM) and assumed that they would have the same productivity this year. If a player played for multiple teams last season, their productivity is the minute/possession-weighted average across teams; this doesn’t apply to RAPM where players get a single number for the whole season. If a player played in the past but not last season (like Von Wafer below), they get their productivity from that season if available. If a player didn’t play before, such as a rookie or someone who isn’t in a dataset for some reason, they get an assumed level of production. For WP that is .045 WP48; -1.95 ezPM100 as was used in my other post; and -3.9 points per 200 possessions for RAPM. I picked that number because ezPM and RAPM are both point per-possession measures and thus giving them the same rookie/unknown production assumption seems fair.

Now I have a per-minute/possession productivity measure for every player this year. One other thing happens for WP; as the only model that cares about position, I assume that we know the position that each player will get this year. So WP48 for 2011 is calculated by taking 2010’s ADJ P48, adding 2011’s WP48, and subtracting 2011’s ADJ P48. Now that everyone has their per-minute/possession productivity, I calculate total productivity by multiplying by 2011’s actual minutes/possessions played. Why? Because it takes out any assumptions about what rotation a team will use or who will get injured. Every model is on equal ground. All the players on a team have their total production summed up and thus we have predicted team wins (for WP) or predicted team point differential (for ezPM and RAPM). I’ll use the typical wins-to-points equation that says that a per-game difference of one point is worth 2.54 wins so that I can convert between the two and get predictions for both wins and differential for each model.

Let’s walk through the Boston Celtics as an example. The Celtics managed to have 21 players suit up this season. Ray Allen was one. Last year he also played for the Celtics, and only the Celtics. He had a .126 WP48/.258 ADJ P48, a .3 ezPM100, and a 3.9 RAPM. Each of those numbers are carried forward as per-minute/possession productivity predictions for 2011, with the exception that WP adjusts his position, ‘knowing’ that he’ll play at 2 instead of 2.06. So his predictions for 2011 are .132, .3, and 3.9. Ray played 5410 possessions in 2890 minutes, so we multiply each of those number appropriately and find that WP predicts that Ray would contribute 7.95 wins, ezPM predicts he would contribute 16 points, and RAPM predicts he would contribute 105.5 points. Another Celtic was Luke Harangody. Since he was a rookie, he gets .045 WP48, -1.95 ezPM, and -3.9 RAPM. In the 462 possessions over 241 minutes played for Boston, WP predicts he would produce .23 wins while ezPM and RAPM predict he would produce -9 points. As a final example, we can look at Von Wafer. He hadn’t played since 2009, so I don’t have an ezPM estimate for him. Also, since there’s no ezPM, I don’t have a count of possessions played. So his WP estimate is based on 2009 and predicts a WP48 of .068 and .78 wins produced. But he’s treated as a rookie for ezPM and RAPM and thus gets -1.95 points per 100 possession and -20.4 points produced.

If I add up all the predicted wins produced by the Celtics players, I get 48.9 wins. This can be converted to a per-game point differential of 3.09 by subtracting off 41 (the average number of wins) and dividing by 2.54. Similarly, I add up the predicted points produced by ezPM and get 364.8, which is a per-game differential of 4.45. This converts to 52.3 wins. RAPM produces a 1.34 differential and 44.4 wins. Boston actually won 56 games and 440/82 = 5.37 point differential. The measure of error I’ll use is absolute deviation. In terms of wins, everyone was below how Boston actually did, but the error would always be a positive number even if someone predicted over. WP has an error of 7.1, ezPM 3.7, and RAPM 11.6. In terms of point differential, WP has an error of 2.28, ezPM .92, and RAPM 4.03. So when it comes to Boston, ezPM produced the best predictions in terms of both wins and point differential. We can look at the predictions to see why this happened. Ray Allen, Paul Pierce, and Kevin Garnett did better than expected for WP to the tune of about 3 wins each, and Marquis Daniels was not as terrible as expected. I can’t evaluate RAPM since I didn’t use their numbers for this season, but according to RAPM the Celtics only had 4 players of any note in the positive range (Rondo, Pierce, Garnett, and Allen). Finally, ezPM was fairly on-target, but also underestimated Allen and Daniels. He got most of the improvement by being close on Garnett and Rondo (actually slightly overestimating both).

Of course, we don’t want to look at just one team to draw conclusions. I went through the same exercise for each team this past season. Here’s the tally: the mean absolute error for wins for WP is 8.3. The biggest misses were Philly (mostly due to Elton Brand’s resurgence) and Minnesota (everyone not named Kevin Love was terrible) followed by a group of Miami, Golden State, and Phoenix. For point differential the error was 3.32, which pretty much matches up with the win error. For ezPM, the error for wins was 7.13 and the error for point differential was 2.78. The biggest error was for Miami and then similar-sized errors for Philly, Minnesota, Washington, and the Lakers. And finally for RAPM the average error for wins was 7.8 and the error for point differential was 2.88. RAPM was most wrong about Chicago, Cleveland, Minnesota, and the Spurs.

To summarize, ezPM did best followed by RAPM and WP. They were all within 1.3 wins or half a point of differential, however, so it’s pretty tight. The most surprising team was Minnesota, which all three systems were pretty wrong about. I think it’s interesting that RAPM was wrong about different teams than ezPM and WP though; the two box score methods had a bit more agreement.

Next post, possibly coming later today: 2010.

Pingback: Predicting the Past: 2010 Competition | Sport Skeptic

Could you post all data [or link to a file with it] for all those metrics?

I’ll see if people are ok with that. The majority of it is available from the sites I linked to though.

It’s fine by me. I already make it public.

“The majority of it is available from the sites I linked to though.”

I know but what’s the point of repeating work which you have already done? 😉

I can certainly appreciate that 🙂 See my reply to Guy; my hope is that something like what you want will be available soon. I’m still playing with some of my stuff (like adding in APM), so I’m going to hold off for now.

I wanted to ask about it via e-mail but I can’t find so…

“I have WP numbers for every player since 1978” where do they come from?

Did you recreate them? Have you received them from Prof Berri?

Could you “pass them along” to me? 😉

My email address should be in the ‘about’ page. I did get the numbers from Prof. Berri; I also had a (potentially) different set for a smaller time period downloaded from nerdnumbers back when it was still up. Prof Berri told me that they’re working on a big page that will have tons of these things all available, so I would get in touch with him or hope that it pops up soon. I’m hoping it does; I’m sitting on another analysis for now so I can check it against that data set.

Pretty cool. Alex, I think you’re using the wrong rookie value for RAPM. When Jerry talks about “200 possessions”, he really means the same thing as we do when we talk about “100 possessions”. Can you try running the numbers for RAPM again using -1.95 like you did for ezPM? I think it will perform better. Anyway, I’ll take a win any way I can get it!

Thanks Evan. I’ll give it a look. I guess I’m surprised it did so well if I have the number of possessions off by a factor of 2. It means that every player is producing half as many points as they actually should be.

Alex, do these results make you re-consider the effectiveness of adjusted +/- metrics, such as RAPM? Do you think Berri is even aware of RAPM? I assume he is, but probably does not discuss it, because it does not fit into his overall narrative.

Also, if you haven’t already, you should look into LambdaPM, a new +/- technique that combines APM with box score stats.

http://www.sonicscentral.com/apbrmetrics/viewtopic.php?f=2&t=247&sid=c6bcd044c91a94c322f929382232ea16

Not really. I only got to look at one season of RAPM. And I still have a variety of issues with the method. But my current plan is to use the APM data that I’m unsure about so I can look at more seasons, and then I’ll have a better feel for things. I’m sure that Berri is aware of RAPM, but has at least some of the same concerns that I do.

I read the string for lambdaPM. I don’t see much of a point to checking it out until there are multiple years available for study.

The name is not Julian and the whole thing doesn’t make sense if you use different numbers for rookies for the different ratings. Not that predicting the past makes any sense at all. In 20 minutes I can put together a player ranking that greatly outperforms all those mentioned, but nobody cares about predicting the past. And nobody should

I had a typo, sorry about that. It’s fixed now.

I’m not sure how the rookie rating would matter. I assumed the same level of performance for each metric; the .045 WP48 is the same as -1.9 points per possession. That being said, someone else suggested that I allow each metric to use a rookie’s actual production, which would fix that concern since each metric would use their own rating. But I’ll be surprised if it causes any leapfrogging in predictive ability.

If you have a ranking that predicts future outcomes that well, I’m sure there are NBA teams waiting to throw you lots of money. Of course, if you don’t use it to predict previous seasons, thus showing that the rankings work, I don’t think anyone will care.

I was confused by EvanZ’s comment in which he mentioned that you should re-run things with the same rookie value for ezPM and RAPM. Did you do that already? If rookies get the same value everywhere, that’s good. It just didn’t seem that way from the comments. Please don’t use rookies’ “actual production”, that is absolute nonsense.

One more thing that’s not clear to me. Are you using 2010 ratings to “predict” 2011 or 2011 ratings to predict 2011. If you use 2010, what does “I am not using the current season’s data because the numbers available on the site include the playoffs, which the other sets do not” mean? You should not use the current season’s data because that would be cheating, not because it includes playoffs

Because I misunderstood the description for RAPM, rookies were treated as extremely bad players. I’m working up a post that covers more time in general, and that will use the same rookie production assumption for all metrics (.045 WP48/-1.92 points per possession for ezPM, RAPM, and APM). RAPM performs much better now. Some of that may also be due to going through and correcting mistakes in the database by hand; it would really be nice if everyone would use the same format for player names. I’m also crunching the numbers where each metric gives a rookie his actual production for that year; for example, in predicting 2011 Blake Griffin would get .246 WP48/unknown RAPM (which I’ll explain in a second)/2.56 ezPM100/6.72 APM. Why would you be against that?

2010 is used to predict 2011, and in my other post 2009 is used to predict 2010. I was a little unclear about RAPM, although in the sentence after the one you quote I say that it doesn’t matter because 2011 RAPM wouldn’t be used. But, unless Jeremias standardizes the numbers he publishes it does mean that I wouldn’t feel comfortable using 2011 RAPM to predict 2012 after that season is over (assuming it gets played). It also means that if I run the rookie analysis just described I won’t be able to look at RAPM for 2011 because I won’t use the 2011 ratings.

Alright, if you use the same rookie rating for every metric and if you use ratings from the year prior when evaluating the metrices, then it’s all good. There are still a couple of things I don’t understand but they might not be very important. They include:

“I assumed the same level of performance for each metric”

“Because I misunderstood the description for RAPM, rookies were treated as extremely bad players.”

Which one is it? Do the two statements not contradict each other?

“unless Jeremias standardizes the numbers he publishes”

What do you mean by “standardizes”?

You don’t want to give rookies their “actual production” from that year for a variety of reasons. One would be that the metric scores best that would give every non rookie a zero, and gives all rookies some value so that (rookie rating)*possessions corresponds to team point differential.

stats-for-the-nba.appspot.com uses player names from the basketballvalue player files, if that helps you.

For possession numbers you can use the basketballvalue matchup files

In the piece that is up right now, ezPM and RAPM do not use equal ratings for rookies. But, in the follow-up they do. And as I mentioned, RAPM does much better now. Part of that is also probably due to me fixing numbers for players that had not been aligned properly.

Jeremias offers a variety of numbers on his website, but the one-year RAPM for 2011 is not done in the same way as 2006-2010. Namely, it included the playoffs (whereas the rest were regular season) and I think he said he assumed a lambda value instead of allowing the algorithm to choose it. It’s actually worse now; apparently within the past couple weeks he’s made it so that the one-year calculations include the previous season’s rating as a prior for each player (but not for 2011, when 0 is the prior, for some reason). And to be super nitpicky, the names that he uses for 2006 are terrible and don’t follow what he did for more recent seasons. It would be nice if he standardizes the numbers so that each season is calculated the same way.

Your rookie hypothetical doesn’t apply to any of these metrics, where each player gets his own rating as is theoretically independent of all his teammates. Is there a reason to not use actual production with WP/RAPM/APM/ezPM?

“Is there a reason to not use actual production with WP/RAPM/APM/ezPM?”

That would defeat the entire goal of *retrodiction*. Your result should not depend in any way on data coming from the current season.

I agree in general. I’m just trying to think of a particular reason not to. It feels like knowing rookie production should just improve the overall accuracy; is there a reason to think it would make a difference across metrics?

Of course, it would make it more accurate. It would also make it more accurate to use ratings for all the players, not just rookies. But that would not be *retrodiction*. Not sure how to make that argument any clearer. But it’s your thing, so it doesn’t really bother me what you decide to do. Just be aware that most apbr-type folks would raise the same question.

I’m on board with the definition of retrodiction; that’s why I did it the way I did the first time. But, since I’ve been asked about it, I’m just trying to think about what the harm would be. Is there a reason to think that using actual rookie production would give one metric a benefit over the others? And it seemed like there was a decent amount of debate on the APBR board about what exactly to do with rookies. I wouldn’t see the harm in presenting both sets of numbers.

*except minutes, of course

“Namely, it included the playoffs (whereas the rest were regular season)”

everything on the site includes playoffs as of now

“he assumed a lambda value instead of allowing the algorithm to choose it”

that’s not true

“It’s actually worse now; apparently within the past couple weeks he’s made it so that the one-year calculations include the previous season’s rating as a prior for each player”

what’s so bad about this?

“[..]but not for 2011”

Not true. There are two versions for ’11. One works exactly the way you described, with ’10 ratings as a prior

The names, as pointed out before, are the names basketballvalue provides. If a player has a weird name in the basketballvalue data, that’s how it is.

“It would be nice if he standardizes the numbers so that each season is calculated the same way”

I assume you want 0 as a prior for every year? If you want to retrocast ’11 I don’t see a problem with using ’09 data to compute ’10 ratings. Do you?

This is where I’m looking – http://stats-for-the-nba.appspot.com/. Here are the sections I see: a multi-year ranking that sounds like it includes 5 years of data. A 5 year that includes coaches. A ranking for 2011 that includes the playoffs and specifically lists a lambda. It uses 0 as a prior for every player. Then for 2006-2010 (and some of 2002) RAPM that includes playoffs but uses the previous season as a prior. And in terms of single-player RAPM, that’s it.

Turning to your comments: everything including playoffs just means that I would not use RAPM for comparison to other metrics any more because the other metrics do not include playoff data. RAPM gets an inherent advantage by having more data.

Maybe he let cross-validation choose the lambda, but he doesn’t list it for any other RAPM single-season, so I assumed he did something different. Why else would he specifically mention that one?

The bad part of using previous seasons as a prior is that, again, it makes RAPM completely non-comparable to other metrics. If RAPM predicts team outcomes better than the others, is it because it’s an inherently good method? Or because it has extra data from the playoffs? Or because it has extra information by using the players’ previous ratings?

As I described, I don’t see a link for 2011 where 2010 is used as a prior. Maybe I’m just missing it.

I’m not going to argue about who started using weird names, I’m just pointing out that they exist.

0 as a prior seems to be standard, so I suppose that would be the way to go.

The non-coach multi-season version is ’11 with ’10 as prior.

About the fact that RAPM uses playoffs.. if you wanted to forecast ’12 now, wouldn’t you use ’11 playoffs? Why don’t you just use playoff data for the other metrices, too?

“Maybe he let cross-validation choose the lambda, but he doesn’t list it for any other RAPM single-season, so I assumed he did something different. ”

I think the lambdas are listed in the APBR thread.

“Or because it has extra information by using the players’ previous ratings? ”

Again, wouldn’t you go with the best possible combination of data to forecast ’12 now? If the other metrices use less data, that might be a sign that the people maintaining those metrices need to figure out a better way to combine old data with new data. You can’t fault RAPM for using everything that’s available

You mean the very first link? The method says that each year gets the previous prior fed into it, so the 2011 rankings include some amount of every year back to 2006. That doesn’t strike me as the same thing as using 2010 as a prior.

I would use the 2011 playoffs for 2012 if those measures were included in WP, ezPM, or APM, but it isn’t. I’m not calculating them myself, so I’m kind of stuck. The idea is to use as similar numbers as possible to make the comparison fair, so RAPM is the odd man out.

They could be in the APBR thread, but searching that is pretty difficult after the hack. And I don’t see any reason why the details shouldn’t be on his website, since it is, you know, his website with his data and results. It should be clear about what is going on.

Again, not faulting RAPM. I’m sure that including the playoffs helps. But the goal is to be as equal as possible because we want to compare the quality of the metrics, not how much data go into them. WP, ezPM, and APM could be calculated including the playoffs, I just don’t have those numbers. I could calculate the responses for WP all the way back to 1979 or so since I have those data; does that make it better than the other metrics since they’ve failed to include older seasons? Obviously the people maintaining those metrics should be watching old games in their free time and updating the play-by-play databases that exist, so as to cater to my every whim. Sadly, I am stuck with what is available to me and a goal of maintaining as much equality as possible.