Statistics Can Lie… Sometimes

Just wanted to pass along this post, which I think does a neat job of describing why some scientific studies seem to disagree from week to week (have some wine! no, don’t have wine! coffee is good for you!  caffeine is bad!).  And a quick thought on how it applies to sports below the jump.

With sports, I think the most important of this article is the part about generalizability.  If you look at any of the NBA all-in-one statistics (PER, WP, RAPM, etc), they aim to give you a summary of what a player is ‘worth’ to a team.  We might imagine that such a measure will generalize; we take a sample (the preseason, the first X games of a season, one year out of a career) and assume/hope it applies to other situations (the regular season, the other 82-X games in the season, next year).  However, we don’t have a sample of the NBA; we only have what actually happens.  This means that when the circumstances change for a team or a player, what we think we know can change dramatically.  Take, for example, the year-to-year correlation chart near the end of this post.  It shows that the correlation in Wins Produced for players who change teams is much lower than for players who stayed on the same team.  Players who change teams are those who are most likely to change their role on a team or their circumstances in general (perhaps they moved on because they were injured or over/under-producing).  I haven’t seen a similar plot for other metrics, but I imagine similar results would come out.

You don’t have to change teams to have such a change though.  Imagine that the Heat switched LeBron and Ray Allen’s duties.  LeBron would run off screens and be more of a catch-and-shoot guy while Allen would handle the ball a lot, drive to the rim, and (try to) become an opponent-crushing engine of basketball destruction.  LeBron would certainly still be valuable in his new role; he’s a decent enough shooter.  Allen may or may not be good in his new role; he used to be the focal point of an offense, but he was much younger then.  I think all of our intuitions would agree that even if they were still productive players with these new responsibilities, both LeBron and Allen’s productivity would change.  But we don’t really know how much and it’s hard to say what the effect on the Heat would be as a team, since it’s likely that Wade would pick up some extra duties to help Ray out.  That is, it would be tough to generalize.  And that’s with a pretty drastic change in team strategy; what if the Heat just decided to run their offense a little differently by adding more LeBron post sets, or more Wade isolations?  We can guess at the consequences, but they would certainly be guesses.

Another way to say this is that I think it’s important to assume that a lot of what we think we know about players is context-dependent.  Related to that, a lot of what we think we know about how a player would do in a different context is complete speculation.  Four years ago, if you had asked me about J.J. Redick’s ability to lead a team, I would have been doubtful; he’s basically a catch-and-shoot guy that comes off the bench.  But the last two years seem to have demonstrated that he had (or learned) some of what it takes to be more of a focal point: his minutes and starts are up, but most of his numbers have stayed steady despite an uptick in usage.  He apparently isn’t completely limited on offense.  When people talk about Tyson Chandler’s limitations, I sort of assume that they’re right and he can’t shoot jumpers.  But maybe his coaches and teammates won’t let him, because he’s so much better at playing near the rim?  Or who’s to say that if there had been the opportunity, he could’ve learned to shoot jumpers?  We don’t know, and we can’t know because the information is never in our ‘sample’ because it isn’t really a sample.

I’ve used the NBA as an example, but the ideas really apply to all sports.  What we know about soccer players probably depends on their teams and teammates.  And the ideas apply to different levels of a single sport as well; I wouldn’t assume that what we know about the NBA applies to the WNBA or college game outside of a few basic things.  Even then I would be hesitant; I would assume that shooting threes is better than long twos at any level, for example, but I could be wrong.  There could be a league somewhere where accuracy drops dramatically enough at the three point line that it wipes out the value of the extra point.  And if such a league existed, you’d be foolish to apply what you know about it to the NBA.  You should always consider the conditions under which your data are collected before you get in a rush to talk about how they apply to other situations.

This entry was posted in Uncategorized and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s