My retrodiction post apparently got its own thread over at the APBR board. I would say I’m honored, except it seemed to get treated pretty harshly. I’ll try to address the issues that were brought up over there.
Going in order of the posts: J.E. wondered where I got old RAPM score from as far back as 2000 or 2001. He’s right that they don’t exist; they’re accidentally in my database as 0 instead of NA from 2000 to 2005. So the errors listed for those seasons should actually reflect what happens if you assume every player is average.
I guess I didn’t really expect people to click back through the links like I asked, but it would have been nice. J.E., I don’t have older years of RAPM because starting around 2006 the names are terrible. I wrote a code to line up all these different sources but people use different names; the RAPM files start using only first initials, sometimes no first name, sometimes the first name is first and sometimes it’s second after a comma. I fixed some by hand, but I ran out of patience after correcting 2006. I will happily include earlier years if naming is standardized. The seasons I do have of new RAPM were downloaded around Christmastime, so unless he’s changed the algorithm since then, they are all current.
I did make a mistake when I presented the averages; I should have done them over equivalent seasons for the different metrics. My bad. But I presented a table with all the individual numbers, so you can do as mystic did and get whatever averages you would like yourself.
I’m not sure that I really buy J.E.’s claim that whole seasons are too coarse a measure. There are, after all, whole other threads dedicated to making season predictions and J.E. is involved in them. That being said, if I had the game-by-game data to do predictions at that level, and the time to create each different metric for every player at the game-by-game level, I would certainly do that as well. I don’t know that you would expect the results to be drastically different (could one measure do better game by game but not do better for the season overall?), but I could do it.
As mentioned several times throughout these series of posts, this is the simplest level of prediction possible. The goal is to put the metrics on an even level, not to be as accurate as possible in making predictions. So yes, I could use regression to the mean. I could account for age, or minutes played last year, or any of however many factors influence season-to-season changes in production. I chose not to.
The point about ASPM being based on multiple-year RAPM is a valid one. It could give ASPM a benefit, since the weights used to calculate the ratings ‘know’ what the connection between box score stats and RAPM will be in future seasons. Of course, RAPM also uses multiple seasons of data, whereas none of the other boxscore measures do. So I guess some metrics get their own little advantages.
Moving down to mystic’s post: I do appreciate cleaning up the average-across-seasons issue. In terms of using different rookie ratings, again this was supposed to be simple. I do think it’s interesting that he thinks the boxscore metrics get an advantage from using the actual values. Do non-boxscore metrics do a worse job of analyzing rookies? Why should they not benefit from getting in-sample information?
Adjusting for strength of schedule could change the results, but I don’t know how big of an effect it would be. Maybe some metrics would benefit more than others by being compared to SRS, I don’t know.
Moving down a bit, J.E. seemed shocked that I would use actual rookie production instead of a general assumption. This time I indeed only used actual production. The last time I did this I did it both ways; the results don’t change much beyond the predictions being generally better if you know how the rookies will do (not surprising). I guess this does break a sacred rule of being a prediction. On the other hand, again, I don’t know why you would think that knowing rookie production would benefit one metric over the others.
That covers about all the content-related posts so far. The (current) final one is another by J.E. He says that many of my replies to comments are unkind. Let me know if this is true; I try to keep an even tone as much as I can, or to at least respond in kind. Maybe I’ll start having some warm milk when I check the comments. As to whether or not I’m making an honest attempt to find the best metric: I would like to see other work making a better attempt. I make no claims that what I’ve done so far is perfect, but I’ve attempted to be fair and I’ve presented all the details.
Perhaps I have given Wins Produced more benefit of the doubt than others would, particularly at the APBR board, but outside of the mistakes I’ve mentioned I don’t think I’ve done anything out of line. I was harsh to APM, PER, and the older version of RAPM, but I’m not sure what else to say. APM did terribly. I haven’t read anything at the APBR board to make me think that would be a surprising statement. Even the people that support it, as best I can tell, say that you need to use multi-year APM, and this is single year. Similarly, I don’t know why it would surprising that PER would do poorly. I don’t think that anyone uses it for player evaluation besides ESPN, and I’ve seen the same comment on the board. As for the previous version of RAPM, with all those mistaken years averaged in it looked bad. That being said, even in the past four years it wasn’t great. Regardless, it’s clear that the new version with the previous rating as a prior does wonders for the ratings. I would think that was clear, since you can’t even find the old ratings on the website any more. I didn’t think I was saying anything that wasn’t common knowledge. Maybe I can get some points for not making any ad hominem attacks? At any rate, I appreciate the feedback. Hopefully I can improve my future projects.