NBA Retrodiction Contest: Blend Update

A couple of people asked about potential overfitting for the blends I mentioned in the last retrodiction post.  After a couple days of waiting for the iterations to run, mixed with multiple restarts for completely unnecessary computer lock-ups, here you go.

To look at overfitting, it was suggested that I use the same method from last time to find the best predictive blend for a couple of years and then see how that blend does on other years.  The easiest thing for me to do was to look at single seasons, so I did that.  I reran my code for 2011, and then again for 2009.  Basically, this would find the weights to create a blend of metrics (out of the pool of ASPM, old and new Wins Produced, Win Shares, PER, and RAPM) that would do the best job predicting 2011 team point differential from 2010 performance, and then 2009 from 2008.  I then looked at how these two blends did across all four years that they could be applied to (RAPM being the limiting factor).

There definitely is the potential for overfitting.  The best 2009 blend was something of an odd combination; it’s 1.3 x ASPM, -1.255 x new WP, .731 x old WP, and .224 x new RAPM (and those are the slightly rounded-off values).  This blend had an error of 2.23 for 2009, which actually was not better than the blend I listed last time (I’ll list it again in a minute, don’t worry).  But it was better than any other single metric, and better than the 2011 blend.  However, in 2011, 2010, and 2008 it was worse than my ‘best’ blend and behind ASPM, RAPM, Win Shares, and WP for at least one of those years.  This is a sign of overfitting; it did a decent (although still not perfect) job of figuring out how to connect 2008’s performance to 2009, but this connection doesn’t work as well for the other seasons.

The 2011 blend was -.135 x ASPM, -.357 x new WP, .347 x old WP, .468 x Win Shares, .179 x PER, and .5 x RAPM.  So whereas the 2009 blend was mostly a contrast between ASPM and WP, with a bit of old WP and RAPM sprinkled in, the 2011 blend is a contrast between RAPM, Win Shares, old WP, and a bit of PER against ASPM and new WP.  And as opposed to the 2009 blend, the 2011 blend does do the best job of predicting 2011, even better than my ‘best’ blend of .5 x ASPM, -.5 x new WP, .35 x RAPM, .3 x old WP, .3 x Win Shares.  The error was actually under 2, which makes it the most accurate single year anywhere in my predictions.  But, the overfitting shows up again.  The 2011 blend is worse than the best blend in each of the other seasons, and worse than the 2009 blend in 2010.  ASPM and Win Shares also beat it in 2010 but not the other years, and every other individual metric is worse than it in 2009 and 2008.  So the 2011 blend is a fairly successful one overall, but the ‘best’ blend is better.

Looking across the four years that all of these metrics exist for, the best average error belongs to the ‘best’ blend at 2.37.  Next is the 2011 blend at 2.45, then ASPM edges out the 2009 blend 2.53 to 2.56.  Then you have a jump to new RAPM, Win Shares, WP, old WP, old RAPM, and PER.  APM, despite not having a prediction for 2008, which is the worst year of the four for all of these metrics, still comes in last at 4.24.  The relatively poor performance of the 2009 blend is a little confusing, especially since it didn’t even do the best job in 2009.  It’s possible that even with 100,000 tries I didn’t catch the best weights.  Or it could just be error in the third decimal place due to me rounding off the weights.  But it still did better than most other single metrics, even in years it wasn’t optimized for.  The 2011 blend did an even better job, ‘winning’ its year outright and beating every individual metric in two of the other three years as well.  Despite showing signs of overfitting in general, this is still a decent blend.  Perhaps 2008 or 2009 were just atypical seasons?

Overall, the best metric over the four years is the ‘best’ blend I presented last time.  That isn’t too surprising since I picked the weights via the lowest average error across those four years.  Given what we just saw for the 2011 and 2009 blends, we should assume that this blend also has some amount of overfitting.  If you used it to predict 2007, or 2012, it may not be the best possible combination of metrics.  However, there is presumably less overfitting since it’s optimized across four years instead of one.

I would again caution against looking at the exact weights too much since the metrics are fairly well correlated and they could change by taking one out or adding one in, but I do think it’s interesting to look at the commonalities between the 2011 and best blends, since they did the best overall.  Both give RAPM a little more weight than old WP and Win Shares.  Both have a contrast against new WP.  Both have fairly low weight on PER, although it’s a comparatively large part of the 2011 blend.  They do treat ASPM differently though, with it being a bigger part of the best blend and a smaller contrast predictor in the 2011 blend.  I’m probably not going to spend too many neurons on that though.

So, just to sum up: you can get even lower errors for individual years by optimizing the weights for those particular seasons.  This, however, leads to overfitting; the blends don’t do as well (generally speaking) in other seasons.  The best predictive blend I reported last time has less of this issue, but presumably still suffers from it somewhat.  That being said, it seems apparent that a well-chosen blend should still do better than any individual metric in most years.

This entry was posted in Uncategorized and tagged , , , , , , , . Bookmark the permalink.

18 Responses to NBA Retrodiction Contest: Blend Update

  1. EvanZ says:

    Isn’t this the kind of thing that PCA is used for? You have a bunch of possily highly correlated factors, so maybe it would be useful to find the principal components first and then use the first two or maybe three components in your blend. Another thought would be to use a global optimization approach such as simulated annealing, instead of the grid (brute force) approach.

    • Alex says:

      I had a little PCA in one of my posts about the database. It found a first component where all the measures are weighted equally. It was also with a different group of metrics though. A quick look (didn’t double-check things) on the six from the blend also says equal weight except for RAPM, which is more like a 60% contribution; that’s the first component. But I’m not sure if the PCA should be expected to pick out the best weights for creating a prediction.

      • “But I’m not sure if the PCA should be expected to pick out the best weights for creating a prediction.”

        Those won’t be the best predictive weights (PCA is not predicting anything), but it will narrow down the number of parameters considerably. In other words, it might make it easier to find the best predictor. You should then be able to take the PCA weights and the predictive blend and reverse-engineer the weights of each individual metric after. At least, I think.

        • Alex says:

          Right, I could use the PCA as something to get me in the neighborhood and then just look around there. I’m a little curious, now, how a simple average of the metrics would do. Maybe I’ll take a look later today.

  2. Crow says:

    Evan’s suggestions sound good to me.

    I’d also be curious to see the optimal blend converted back (to the extent possible, can’t do it simply with overall RAPM though maybe it could be done better if all 4 factors of RAPm were available) to a simple linear weights formula of that optimal blend. How important are defensive rebounds, assists, etc. based on the optimal performing blend of metric values for each discrete stat component?

    I would also like to see an optimal blend without the option of using negative ingredients. Just positive additions. WP as a big negative counterweight may help the search but I have some problems with that approach, with any negative counterweight metric.

    It also might be worth exploring picking and choosing the best performing parts of metrics and assembling those parts instead of wrestling with the entire metrics which are probably normally composed of good and bad. Maybe take the treatment of shooting efficiency from WP or EZPM along with the rebounding treatment of alt WP or EZPM, etc.Maybe Evan’s suggestion would add this process, if I understand it correctly.

    • Alex says:

      I don’t think it’s quite what you’re asking for, but I did run a quick regression predicting the predictive blend from a number of per-48 minute stats, not scaled or anything. The R squared is only .47, which isn’t really surprising given the complexity of everything in the blend and the fact that I’m regressing a predicted performance on an actual performance. But it turns out fairly intuitively; missed field goals are bad, made three pointers are worth more than made two pointers, made free throws are good and missed free throws are bad, rebounds, assists, blocks, and steals are good. Turnovers are bad. The only things that are surprising to me are that offensive rebounds, while positive, is not a significant predictor, and fouls are beneficial. Maybe fouls are like interceptions in football; it’s a negative outcome but a sign of a good thing (passing in football, and trying on defense in basketball)? Anyway, just a first pass. As a side note, the per-48 minute stats do a much better job with the explanatory blend; the R squared is .94, everything is highly significant, and everything is in the direction I would have predicted.

      • I’m curious why you find no significance on OREB, yet Berri found so much. What am I missing?

        • Guy says:

          Evan: Wouldn’t poor shooting efficiency yield more OREB on a per-48 basis? So OREB/48 becomes a signifier of poor-shooting teammates, “cancelling out” the true value we assume it has. The reverse case applies to DREB: suppressing opponent efficiency creates more DREB opportunities. That would be my guess.

        • Alex says:

          Well, this is obviously different from what Berri did. I was predicting the predictive blend score for each player from their per-48 minute stats, not doing anything at the team level. Win Shares and PER also give credit for offensive rebounds, so it is odd. I don’t know why it turned out that way; that’s just what the regression told me. Could be colinearity issues. Maybe offensive rebounding is frowned upon by RAPM or ASPM? Jerry or Daniel would know more about that. If it was a minus there and a plus in the others it could have coincidentally ended up having a weight near 0.

          • One of the criticisms of “over-rebounding” on offense is that you don’t get back on defense. My understanding is that BOS intentionally foregoes offensive rebounding for this reason. If I recall there is a weak inverse correlation between OREB and DRTG. I may be wrong, though.

          • Alex says:

            Sounds familiar to me too. Maybe a different function (besides linear) would be more appropriate, where a little offensive rebounding is good but too much is bad. Although it seems like offensive rebounding at the player level could be a positive; designate one guy to go for the board or hassle the defensive rebounder while everyone else gets back. Any rebounds he gets would be a bonus.

          • Guy says:

            Which Berri finding on OREB are you guys referencing?

          • Alex says:

            I assume Evan meant the weight for WP, but maybe he was thinking of something different.

          • Guy says:

            One reason you might not see a strong positive coefficient for rebounds is that WP includes a position adjustment. So a center with a Reb48 of 14 may have the same WP as a guard with a Reb48 of 7, other factors equal (#s for illustration only). But I can’t see why that would impact OREB more than DREB. Maybe DREB is valued more highly in some of the other metrics because it is positively correlated with some other contribution, such as high defensive value?

  3. Andrew says:

    Love the posts, hope you do a follow on the 2012-2013 data.

    I think having new and old wp in the blend is an issue because they are so highly correlated. The two become almost entirely confounded. I’d also love to see some two measure combos, like if one starts with win shares what measure consistently gives the most improvement. And the same with others, it might provide some insight into what systematic blind spots some of the measures may have and which are compliments to each other

    • Alex says:

      Thanks Andrew. The two versions of WP are pretty correlated, but not dramatically more than some of the other combinations. I have some correlations in this post if you haven’t come across it: . It isn’t quite what you’re asking for, but the regressions mentioned in that post sort of get at what you’re asking for in terms of complements. For example, PER can be predicted fairly well by Win Shares, so you probably wouldn’t want to use those two alone in a blend. But RAPM doesn’t explain much of PER at all, so it would be a better candidate. My guess is that if you wanted to limit yourself to two metrics your best shot would be some variety of RAPM along with one of the more respectable box score stats, like Win Shares or Wins Produced.

      • Andrew says:

        Alex- I was going by both the correlation you found and the correlation reported by Berri at 98%. Plus they’re conceptually very similar and that’s where the blend gets the negative sign, which makes interpretation tricky.

        I agree with you that new RAPM and the better box score metrics are likely to give the best improvement, though ezPM is interesting too one there’s enough data..

        • Alex says:

          Yeah, they’re definitely highly correlated. I was just pointing out that some of the other measures are pretty high up there too. The actual sizes and signs for the weights are likely always going to be a bit tricky because of how related the metrics are, the box score ones in particular. I think when you do something like this, it’s important to remember that the final result (i.e., getting the best prediction possible) is the important part and interpreting the weights is secondary at best.

          > Date: Tue, 12 Feb 2013 14:42:48 +0000 > To: >

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s