A couple of people asked about potential overfitting for the blends I mentioned in the last retrodiction post. After a couple days of waiting for the iterations to run, mixed with multiple restarts for completely unnecessary computer lock-ups, here you go.
To look at overfitting, it was suggested that I use the same method from last time to find the best predictive blend for a couple of years and then see how that blend does on other years. The easiest thing for me to do was to look at single seasons, so I did that. I reran my code for 2011, and then again for 2009. Basically, this would find the weights to create a blend of metrics (out of the pool of ASPM, old and new Wins Produced, Win Shares, PER, and RAPM) that would do the best job predicting 2011 team point differential from 2010 performance, and then 2009 from 2008. I then looked at how these two blends did across all four years that they could be applied to (RAPM being the limiting factor).
There definitely is the potential for overfitting. The best 2009 blend was something of an odd combination; it’s 1.3 x ASPM, -1.255 x new WP, .731 x old WP, and .224 x new RAPM (and those are the slightly rounded-off values). This blend had an error of 2.23 for 2009, which actually was not better than the blend I listed last time (I’ll list it again in a minute, don’t worry). But it was better than any other single metric, and better than the 2011 blend. However, in 2011, 2010, and 2008 it was worse than my ‘best’ blend and behind ASPM, RAPM, Win Shares, and WP for at least one of those years. This is a sign of overfitting; it did a decent (although still not perfect) job of figuring out how to connect 2008’s performance to 2009, but this connection doesn’t work as well for the other seasons.
The 2011 blend was -.135 x ASPM, -.357 x new WP, .347 x old WP, .468 x Win Shares, .179 x PER, and .5 x RAPM. So whereas the 2009 blend was mostly a contrast between ASPM and WP, with a bit of old WP and RAPM sprinkled in, the 2011 blend is a contrast between RAPM, Win Shares, old WP, and a bit of PER against ASPM and new WP. And as opposed to the 2009 blend, the 2011 blend does do the best job of predicting 2011, even better than my ‘best’ blend of .5 x ASPM, -.5 x new WP, .35 x RAPM, .3 x old WP, .3 x Win Shares. The error was actually under 2, which makes it the most accurate single year anywhere in my predictions. But, the overfitting shows up again. The 2011 blend is worse than the best blend in each of the other seasons, and worse than the 2009 blend in 2010. ASPM and Win Shares also beat it in 2010 but not the other years, and every other individual metric is worse than it in 2009 and 2008. So the 2011 blend is a fairly successful one overall, but the ‘best’ blend is better.
Looking across the four years that all of these metrics exist for, the best average error belongs to the ‘best’ blend at 2.37. Next is the 2011 blend at 2.45, then ASPM edges out the 2009 blend 2.53 to 2.56. Then you have a jump to new RAPM, Win Shares, WP, old WP, old RAPM, and PER. APM, despite not having a prediction for 2008, which is the worst year of the four for all of these metrics, still comes in last at 4.24. The relatively poor performance of the 2009 blend is a little confusing, especially since it didn’t even do the best job in 2009. It’s possible that even with 100,000 tries I didn’t catch the best weights. Or it could just be error in the third decimal place due to me rounding off the weights. But it still did better than most other single metrics, even in years it wasn’t optimized for. The 2011 blend did an even better job, ‘winning’ its year outright and beating every individual metric in two of the other three years as well. Despite showing signs of overfitting in general, this is still a decent blend. Perhaps 2008 or 2009 were just atypical seasons?
Overall, the best metric over the four years is the ‘best’ blend I presented last time. That isn’t too surprising since I picked the weights via the lowest average error across those four years. Given what we just saw for the 2011 and 2009 blends, we should assume that this blend also has some amount of overfitting. If you used it to predict 2007, or 2012, it may not be the best possible combination of metrics. However, there is presumably less overfitting since it’s optimized across four years instead of one.
I would again caution against looking at the exact weights too much since the metrics are fairly well correlated and they could change by taking one out or adding one in, but I do think it’s interesting to look at the commonalities between the 2011 and best blends, since they did the best overall. Both give RAPM a little more weight than old WP and Win Shares. Both have a contrast against new WP. Both have fairly low weight on PER, although it’s a comparatively large part of the 2011 blend. They do treat ASPM differently though, with it being a bigger part of the best blend and a smaller contrast predictor in the 2011 blend. I’m probably not going to spend too many neurons on that though.
So, just to sum up: you can get even lower errors for individual years by optimizing the weights for those particular seasons. This, however, leads to overfitting; the blends don’t do as well (generally speaking) in other seasons. The best predictive blend I reported last time has less of this issue, but presumably still suffers from it somewhat. That being said, it seems apparent that a well-chosen blend should still do better than any individual metric in most years.