Predicting the Past: ezPM Edition

Sorry for the hiatus; sometimes a guy just has to go to Vegas (it was a lot of fun, thanks for asking).  To get back into things, I’m going to follow up on my last post by looking at predictions made by Evan Z‘s ezPM model (there are updates, but that’s a good intro post).  If you don’t remember, the idea was that predictive ability should be the hallmark of a model; if you can predict what is going to happen, you obviously understand what is going on.  However, practically speaking, predictions vary for lots of reasons: model of productivity, aging curves, diminishing returns, predicting minutes played, etc.  These differences make it very difficult to compare models in a reasonable manner.  So I laid out a few very basic assumptions and presented prediction data using Wins Produced, the model of choice in the Skeptic household.  I hoped that others would follow suit, and Evan generously did.

A quick word on ezPM (although you should really go to Evan’s blog for a full description and all the work that has gone into it).  Similarly to WP and a number of other metrics, different events that happen on the court (made shot, missed free throw, etc) have a value at the team level.  ezPM then breaks up these values to assign them to players, giving different credit depending on if a shot was assisted or not, what kind of rebound was gathered, etc.  You add up all the stats a player acquires multiplied by their value and then adjust to per 100 possessions; this is the player’s ezPM100 value.  Since the numbers come from play-by-play data, defense has been added by penalizing a player for what his counterpart does.  Perhaps different from most other box score stats, ezPM is meant to reflect point differential as opposed to wins.

Since it’s based on play-by-play data with counterpart information, there are only one or two seasons of ezPM available.  So to make predictions, only the current (ongoing) season was predicted.  I followed the same method as I did for the WP post, with a couple of exceptions to accommodate ezPM.  I started with the ezPM100 values from last season as the measure of a player’s productivity.  It was assumed that every player this season would have the same production in however many possessions they played this season (the data is available on Evan’s site, and the numbers I used covered through Feb. 3).  This honors the assumption that we know how much players will play so we don’t have to predict minutes.  For players who were rookies this year or otherwise didn’t appear in last year’s data, they were assumed to have a production of -1.95 ezPM100, a value that Evan and I worked out in the comments on the last post.  Then each player this season had a point differential predicted based on last year’s productivity and this year’s number of possessions used.  The spreadsheet with the results is here (note that the ‘predwin’ column actually refers to point differential).  Point differential was then summed up for each team (sheet 1 in that file; you can ignore 2010 since there’s no data to predict that year) and used to predict team wins so far this season.  The prediction comes from an equation connecting point differential and win percentage (each point of differential is worth 3.28%, with 0 differential obviously equal to 50% win percentage); predicted win percentage was then multiplied by games played to get a predicted number of wins and compared to the teams’ actual number of games won.  This is all on sheet 2, and the final measure is average absolute error.  ezPM has an average error of 4.74 this year through Feb. 3.

Similar to the WP predictions, ezPM had a hard time with the Cavaliers.  They are hugely underperforming, and the biggest offenders according to WP (in no particular order) are Hickson, Mo Williams, Jamison, and Moon.  ezPM also thinks most of the Cavs have underperformed, with the most blame going to Hickson, Moon, Mo Williams, and Anthony Parker.  So that’s a fair amount of overlap.  In terms of who has actually done better than expected, WP picked Varejao, Sessions, Gibson (barely), Manny Harris, and Ryan Hollins (barely); ezPM picked Alonzo Gee, Hollins, Leon Powe, and Sessions.  WP is not very happy with Gee this year but he’s only played about 300 minutes; ezPM is not very happy with Boobie Gibson, thinks Varejao took a tiny step back, and has Harris as being just under an average rookie.  So a little more disagreement here but nothing too major.  Overall, both WP and ezPM thought that Cleveland would have 20 wins instead of the 8 they had at the time.

The discussion in the comments from last post also inspired me to check a couple things on the WP predictions since they cover more years.  One thing is that the WP predictions almost always predict more variance in wins than actually occurs.  My guess is that this reflects the absence of any diminishing returns or regression to the mean in the model; players who have unusually good or bad years are expected to do the same the next year, and very good (or very bad) teams that add players don’t expect them to contribute less (or more) even though we know they will.  Again, these are very simple predictions, so I don’t think it’s a big deal.  If you run a regression predicting actual win percentage from WP predicted win percentage and use it to adjust the predictions for this season, the average error is virtually identical to ezPM’s.

So to sum up, it looks like for this short season (all the data is pre-All Star break), ezPM and WP make equally accurate and at least somewhat similar predictions.  There certainly are differences, which should be expected since they use different weights on the box score stats.  In general, it would be hard to make any big, sweeping conclusions anyway since there’s so little data.  So hopefully this offseason I can get more data from more years for a variety of measures and put them all through the same exercise to get a good comparison.  I’m sure someone out there has years of Win Shares (or if I have time I can download them), and I know that 5 year adjusted plus/minus numbers exist so there must be at least five years of that.  If anyone would like to volunteer the data, or any other measures, I’ll be happy to run it.

This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.

4 Responses to Predicting the Past: ezPM Edition

  1. EvanZ says:

    Alex, thanks for doing this. The predictions look pretty good, except in cases where it’s difficult to predict. Duh.

    Portland is beset with injuries. Cleveland lost LeBron (who is probably highly undervalued by box score metrics, in general). The Clips have a pretty good rookie. Etc.

    Aging and multi-year regression should make some improvements, but injuries and rookies are still going to be issues going forward.

    Oh, one last thing…You said:

    ” if you can predict what is going to happen, you obviously understand what is going on. ”

    This is not generally true, and there’s an important distinction to make here. It gets back to the causation vs. correlation issue. If you can predict, then you can predict, and you may know something about a process. But there are many cases where you can predict a process, but not really know much about it at all. For example, we can predict that an obese person is more likely to have heart attack. I mean you and I, without and training in that area, can make that prediction. But do we “understand” what is going on? I don’t. It’s very complicated.

    • Alex says:

      Fair enough on the prediction issue (although it reminds me of the heart attack story in ‘Blink’ – sometimes “knowing” more isn’t better). But knowing which things make predictions better or worse are definitely informative, and people who know more should be more likely to pick what information is going to help their predictions. Maybe you and I don’t know what causes heart attacks, or even what exactly it is about obese people that makes them more likely to have one, but we know more than people who don’t know that obese people are more likely to have heart attacks. And we know that investigating obese people might help us find out. I think that’s a good thing.

      • EvanZ says:

        “And we know that investigating obese people might help us find out. I think that’s a good thing.”

        Of course, and if you said just that, I wouldn’t have disagreed.

        In the case of obesity, it’s important to know what the cause is, so that it can be treated better. Is it a “bad gene”? Is it simply overeating? Lack of exercise? Knowing that obesity is correlated with the disease is the first step, similar to knowing smoking is correlated with lung cancer. In the latter case, the cure was simple enough. Stop smoking. With obesity, is it as simple as to say “stop eating”? Maybe, but that may not be enough in many cases. And it may not prevent disease that has been long in the making.

  2. Pingback: Predicting the Past: 2011 competition | Sport Skeptic

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s