I’m a strong believer that in life, predictions are everything. Some people claim that the brain exists to make predictions; almost certainly, that’s what memory is for. But today’s discussion is about sports, so let’s stick to that. Knowing what happened yesterday is important if you write for the newspaper (I assume there are still people who do that?), otherwise it’s only good for helping you figure out what will happen next. There are a million ways to explain things that have already happened, but if you can predict what will happen, chances are you know something about why it will happen, and that’s the big payoff. In the NBA, that translates to saying that if you can consistently predict how a team will do in the next season, you probably know what makes teams good or bad. Today’s post is about figuring out what will happen in the past’s future.
To be more concrete, we’d like to predict how a team will do in an upcoming season. This isn’t an easy thing to do. You have to not only predict how each player in the league will do next year, but also how many minutes (or possessions) he’ll play. There are lots of things to take into account; age, injury, player rotations, teammate productivity, how will a rookie play, etc. Thus when people make predictions, they can come to varying conclusions even when using the same productivity measure. Some people will think that player A will get 3000 minutes while others think he’ll get 2250; some people think player B will do worse because of his new teammates while others might not take it into account at all.
With all this in mind, it’s hopefully obvious that comparing season predictions across different methods (such as Wins Produced, Win Shares, adjusted +/-, etc) is close to impossible. They’re going to have so many different moving parts that any differences in accuracy could come from one or more of a number of sources, ignoring the noise that might come from only looking at one season’s worth of predictions. So the goal here is to have an extremely simple prediction, based on as few and as simple assumptions as possible, for a number of years. Given my predilections, I’m taking Wins Produced. Hopefully other people will copy the methodology with other metrics so we can get a decent comparison started.
Here’s how it works: the automated WP site (powered by Nerd Numbers) has every players’ minutes and WP48 from the 2001 season through the current 2011 season. I grabbed them. For every player this year, I got their per-minute productivity (WP48) from the previous year (2010). If the player is a rookie I assumed their WP48 to be .045, which I think is the average rookie score (if this is far off, let me know; it’s an easy fix). If a player missed a whole season, they got their productivity from the last time they played (e.g. Josh Childress). Then I multiplied their productivity from the previous season by their minutes played this year. Why this year? It will put every method on equal footing; no one has to predict minutes played or injuries. Then I did that for every year in the sample; we’ll ignore 2001 since there’s no data for before that season. After seeing how that turned out, I made one change. If a player played less than 100 minutes, I assume they play at the rookie level of .045 WP48. Why? Because in 2006, Nene played 3 minutes, during which time he put up the impossible WP48 of -2.132. In 2007 he was healthier and played 1715 minutes, which gave him a projected -76 wins. I thought that everyone would find that unreasonable. Players might be under 100 minutes for a season for a couple reasons, one of which is injury. It might not be correct to assume that an injured player (or a deep benchwarmer) will come back with the ability of an average rookie, but I’m keeping it simple.
So let’s walk through an example. The least accurate prediction for this year so far (all data includes the games played on Tuesday 2-7) is for the 76ers, but Dave Berri just covered them. Instead I’ll look at the second worst prediction, which is the Cleveland Cavaliers. The table below has each player who has suited up for the Cavs this year.
All the data in the table is for this year except for the last column, which is their predicted wins for this season based on how the player did last year. Let’s start with Varejao as an example. He played 31 games before going out with a foot injury, which is bad news for Cleveland as he’s their best player. But remember that we aren’t trying to predict injuries; we’re taking it as granted that Varejao has only played 993.7 minutes. Varejao did play last year (2010), putting up .181 WP48 in 2166 minutes for a total of 8.2 wins produced. Since he isn’t a rookie and played more than 100 minutes, we take his WP48 of .181, divide by 48 to get productivity per minute, then multiply by the 993.7 minutes he played this year. That gives us a prediction of 3.747 wins produced. He was actually a bit more productive this year than last so this prediction is a bit low. Manny Harris is a rookie; he’s given an automatic WP48 of .045. In his 636 minutes we predict that he’ll produce .596 wins but he’s been a bit above average for a rookie and has actually produced .9 wins. After we get a predicted wins produced for each player, we add them up and get just under 20; if everyone on the Cavs played as they did last year, they should be near 20 wins instead of 8. The biggest offenders are Anthony Parker, Jamario Moon, Antawn Jamison, Mo Williams, and J.J. Hickson; all are playing worse than they did last year. I think there are injury and age issues. Ramon Sessions is the only player who’s raised his game, and thus the Cavs are awful.
This method has obvious weaknesses; as mentioned it assumes all rookies will play like average rookies. It also doesn’t take age into account. The biggest error in the whole data set is for LeBron in 2005. As a rookie in 2004 LeBron was a little above average (for a rookie) with a WP48 of .066, but not great. Then he made the leap in 2006 and put up a .307. He was expected to produce 4.66 wins and instead generated 21.7. But this kind of error should be true for any method.
As a final thing to do playing with the data, as a Pistons fan I wanted to know when it was most wrong about Detroit. That was 2002 when the players were predicted to generate 28 wins but actually got 50. What the heck happened? The predictions were low for a few players; Cliff Robinson, Rebraca (there’s a name from the past), Stackhouse, and Corliss Williamson all played between a win or two better than expected. But the biggest jumps came from Chucky Atkins (3.5 extra wins), Ben Wallace (2.7) and Jon Barry (7.2). Chucky went from a negative contributor to below average but positive. 2002 was his third season, so it would be tempting to say that he was just getting better, but all of his numbers are at his career averages – except his shooting percentages. Out of nowhere Chucky shot 50 points better from 3 and 50 points better on his overall field goal percentage. It was the second highest true shooting percentage he ever put up and his best effective field goal percentage ever. Similarly, Ben Wallace had his second-best season ever by WP48 (best ever by Win Shares) and had near-career highs in shooting. He also increased his blocks and decreased his turnovers by substantial amounts. So what the heck did Jon Barry do to increase by 7 wins? We again get career highs in shooting but also defensive rebounding (nearly so for total rebounding) and assists. He shot 93% from the free throw line that year. It looks like a big, unexpected jump in shooting accuracy led the Pistons to 50 wins and the second round of the playoffs.
Here’s what you’ve all been waiting for: predictions for each team in each year from 2002 to the present. This is an Excel file with all my work. The ‘WPout’ sheet has the raw data and predicted wins; sheet 1 is a pivot table that adds up wins for each team in each year; sheet 2 has those predicted wins and the actual wins (actual wins were input by hand, so there might be typos; let me know); and finally sheet 3 has the errors. I used absolute value of predicted minus actual; if a team were predicted to win 40 games but won 50 the error is 10 and it’s also ten if they actually won 30. The numbers to the right and at the bottom are the averages per team or year, and the number in the bottom right corner is the overall average error, which is pretty much 8. That means that with a very stupid prediction rule, WP48 is on average off by 8 games in predicting how a team will do next year. The worst year was 2002; it doesn’t have the highest average error, but there were only 29 teams then. The best year was 2007. This year’s number is low because not all the games have been played yet.
So there you are everybody. Have at it.