I’m a strong believer that in life, predictions are everything. Some people claim that the brain exists to make predictions; almost certainly, that’s what memory is for. But today’s discussion is about sports, so let’s stick to that. Knowing what happened yesterday is important if you write for the newspaper (I assume there are still people who do that?), otherwise it’s only good for helping you figure out what will happen next. There are a million ways to explain things that have already happened, but if you can predict what will happen, chances are you know something about why it will happen, and that’s the big payoff. In the NBA, that translates to saying that if you can consistently predict how a team will do in the next season, you probably know what makes teams good or bad. Today’s post is about figuring out what will happen in the past’s future.

To be more concrete, we’d like to predict how a team will do in an upcoming season. This isn’t an easy thing to do. You have to not only predict how each player in the league will do next year, but also how many minutes (or possessions) he’ll play. There are lots of things to take into account; age, injury, player rotations, teammate productivity, how will a rookie play, etc. Thus when people make predictions, they can come to varying conclusions even when using the same productivity measure. Some people will think that player A will get 3000 minutes while others think he’ll get 2250; some people think player B will do worse because of his new teammates while others might not take it into account at all.

With all this in mind, it’s hopefully obvious that comparing season predictions across different methods (such as Wins Produced, Win Shares, adjusted +/-, etc) is close to impossible. They’re going to have so many different moving parts that any differences in accuracy could come from one or more of a number of sources, ignoring the noise that might come from only looking at one season’s worth of predictions. So the goal here is to have an extremely simple prediction, based on as few and as simple assumptions as possible, for a number of years. Given my predilections, I’m taking Wins Produced. Hopefully other people will copy the methodology with other metrics so we can get a decent comparison started.

Here’s how it works: the automated WP site (powered by Nerd Numbers) has every players’ minutes and WP48 from the 2001 season through the current 2011 season. I grabbed them. For every player this year, I got their per-minute productivity (WP48) from the previous year (2010). If the player is a rookie I assumed their WP48 to be .045, which I think is the average rookie score (if this is far off, let me know; it’s an easy fix). If a player missed a whole season, they got their productivity from the last time they played (e.g. Josh Childress). Then I multiplied their productivity from the previous season by their minutes played this year. Why this year? It will put every method on equal footing; no one has to predict minutes played or injuries. Then I did that for every year in the sample; we’ll ignore 2001 since there’s no data for before that season. After seeing how that turned out, I made one change. If a player played less than 100 minutes, I assume they play at the rookie level of .045 WP48. Why? Because in 2006, Nene played 3 minutes, during which time he put up the impossible WP48 of -2.132. In 2007 he was healthier and played 1715 minutes, which gave him a projected -76 wins. I thought that everyone would find that unreasonable. Players might be under 100 minutes for a season for a couple reasons, one of which is injury. It might not be correct to assume that an injured player (or a deep benchwarmer) will come back with the ability of an average rookie, but I’m keeping it simple.

So let’s walk through an example. The least accurate prediction for this year so far (all data includes the games played on Tuesday 2-7) is for the 76ers, but Dave Berri just covered them. Instead I’ll look at the second worst prediction, which is the Cleveland Cavaliers. The table below has each player who has suited up for the Cavs this year.

All the data in the table is for this year except for the last column, which is their predicted wins for this season based on how the player did last year. Let’s start with Varejao as an example. He played 31 games before going out with a foot injury, which is bad news for Cleveland as he’s their best player. But remember that we aren’t trying to predict injuries; we’re taking it as granted that Varejao has only played 993.7 minutes. Varejao did play last year (2010), putting up .181 WP48 in 2166 minutes for a total of 8.2 wins produced. Since he isn’t a rookie and played more than 100 minutes, we take his WP48 of .181, divide by 48 to get productivity per minute, then multiply by the 993.7 minutes he played this year. That gives us a prediction of 3.747 wins produced. He was actually a bit more productive this year than last so this prediction is a bit low. Manny Harris is a rookie; he’s given an automatic WP48 of .045. In his 636 minutes we predict that he’ll produce .596 wins but he’s been a bit above average for a rookie and has actually produced .9 wins. After we get a predicted wins produced for each player, we add them up and get just under 20; if everyone on the Cavs played as they did last year, they should be near 20 wins instead of 8. The biggest offenders are Anthony Parker, Jamario Moon, Antawn Jamison, Mo Williams, and J.J. Hickson; all are playing worse than they did last year. I think there are injury and age issues. Ramon Sessions is the only player who’s raised his game, and thus the Cavs are awful.

This method has obvious weaknesses; as mentioned it assumes all rookies will play like average rookies. It also doesn’t take age into account. The biggest error in the whole data set is for LeBron in 2005. As a rookie in 2004 LeBron was a little above average (for a rookie) with a WP48 of .066, but not great. Then he made the leap in 2006 and put up a .307. He was expected to produce 4.66 wins and instead generated 21.7. But this kind of error should be true for any method.

As a final thing to do playing with the data, as a Pistons fan I wanted to know when it was most wrong about Detroit. That was 2002 when the players were predicted to generate 28 wins but actually got 50. What the heck happened? The predictions were low for a few players; Cliff Robinson, Rebraca (there’s a name from the past), Stackhouse, and Corliss Williamson all played between a win or two better than expected. But the biggest jumps came from Chucky Atkins (3.5 extra wins), Ben Wallace (2.7) and Jon Barry (7.2). Chucky went from a negative contributor to below average but positive. 2002 was his third season, so it would be tempting to say that he was just getting better, but all of his numbers are at his career averages – except his shooting percentages. Out of nowhere Chucky shot 50 points better from 3 and 50 points better on his overall field goal percentage. It was the second highest true shooting percentage he ever put up and his best effective field goal percentage ever. Similarly, Ben Wallace had his second-best season ever by WP48 (best ever by Win Shares) and had near-career highs in shooting. He also increased his blocks and decreased his turnovers by substantial amounts. So what the heck did Jon Barry do to increase by 7 wins? We again get career highs in shooting but also defensive rebounding (nearly so for total rebounding) and assists. He shot 93% from the free throw line that year. It looks like a big, unexpected jump in shooting accuracy led the Pistons to 50 wins and the second round of the playoffs.

Here’s what you’ve all been waiting for: predictions for each team in each year from 2002 to the present. This is an Excel file with all my work. The ‘WPout’ sheet has the raw data and predicted wins; sheet 1 is a pivot table that adds up wins for each team in each year; sheet 2 has those predicted wins and the actual wins (actual wins were input by hand, so there might be typos; let me know); and finally sheet 3 has the errors. I used absolute value of predicted minus actual; if a team were predicted to win 40 games but won 50 the error is 10 and it’s also ten if they actually won 30. The numbers to the right and at the bottom are the averages per team or year, and the number in the bottom right corner is the overall average error, which is pretty much 8. That means that with a very stupid prediction rule, WP48 is on average off by 8 games in predicting how a team will do next year. The worst year was 2002; it doesn’t have the highest average error, but there were only 29 teams then. The best year was 2007. This year’s number is low because not all the games have been played yet.

So there you are everybody. Have at it.

Alex, I think this is a great exercise. Hopefully others will replicate it using other methods.

As you’d expect, I have a few suggestions if you do more work on this. First, you might consider creating a “Marcel” style estimate of each player’s productivity, using 3 years of data rather than just using last season’s WP48. If predicting year Y, then something like .5*(Y-1)+.33*(Y-2)+.17*(Y-3). Should give you better estimates for veterans. You could also adjust for MP in each season, but probably not worth the trouble.

Second, it would be very helpful to estimate a “turnover” metric for each team, i.e. how much the team changed in terms of player-minute composition. It is the teams that change the most we are most interested in, as they do the best job of telling us whether a metric accurately apportioned credit for team productivity. So knowing the average error among the top quartile (or whatever) of high-turnover teams would be very useful.

If this is done with other metrics, it would be useful to run regressions that use the metric plus prior year’s team wins (or point differential) as predictor variables. That would tell you which ones are really adding value.

Also, can you tell us what the average error would be if you simply used the prior season’s team point differential to predict wins? (Or if you don’t have that, just team WP?) How much larger than 8 is the average error?

Yeah, the predictions can definitely be better, and I might look into that (although I think Arturo and Berri have already put in a lot of work on that front). But as the first step, I’m curious to see how other metrics do with the simple rules. After that maybe we get everyone to agree on an aging curve, teammate diminishing returns, etc; those exist for WP and could be used for predictions, but it doesn’t help with comparing across metrics unless everyone agrees to the same functions.

The correlation between the predicted team WP and actual team wins in the data I posted is .71. The average absolute error from the regression is 7.332. The data I have with team wins/point differential is the set that Arturo posted, so it’s only 2003 through 2009 (I dropped 2002 because there isn’t a previous season) but that’s what I used. If I correlate previous year point differential with current year wins, the correlation is .576 and the average absolute error is 7.729. If I correlate previous year point differential with current year point differential, the correlation is .579, so only a tiny improvement if it’s even significant (it doesn’t make sense to look at the error since this is point differential instead of wins).

The error from the regression is a little smaller than the 7.93 in my Excel sheet, mostly I think, because it fixes the mean. The predicted wins produced totals are pretty much always below the actual number of total wins in a season. I assume that’s due to player survival bias; players who are doing poorly and on the way out aren’t there next season and so the players who *are* in the next season are likely to improve. But that’s just a guess. The regression equation is current team wins = .674*predicted team WP + 13.71. But in any case, it improves on the prediction from point differential, which it pretty much has to since it knows where the players will be and how much they’ll play.

Interesting data. You are probably underestimating wins overall because playing time is awarded to players who are most productive. Those who are producing more than last year will get more MP than those who underperform. The rookies who do better than .045 will get a lot of MP, those who don’t will get fewer minutes.

I can’t say that the .71 correlation impresses me a lot — given how consistent basketball players are supposed to be, wouldn’t you have expected that WP48 plus perfect knowledge of playing time could explain more than half of the variance? The error gain from 7.9 to 7.3 doesn’t strike me as very impressive either (I wonder how much adding WP to last year’s differential would improve your R^2? :>). But perhaps other metrics will do even less well….

Nice work…..

The correlation in WP48 for players who change teams is .73, I think, and around .8 in general. So the .71 at the team level seems reasonable enough to me since you’ll also have some noise from the ‘conversion’ between wins produced and team wins (by which I mean wins produced and wins don’t have a perfect correlation, like point differential and wins) and the relative lack of quality of the predictions. Since other metrics tend to have lower within-player correlations, I’m not entirely sure how they might predict team wins better, but that’s exactly what I’m waiting to see. Maybe the noise will average out across players when summed up to team wins?

As a side note, the issue of consistency came up in my last post as well. The WP claim is that NBA players are consistent relative to other sports. There’s no particular claim that they’re consistent on some absolute scale. So explaining half of team variance maybe isn’t that great, but you’d predict that a similarly dumb prediction system for, say, baseball would do even worse. I’m sure hockey is atrocious, and I’m certain football would be a waste of time. Do you know about any work along those lines? Also keep in mind all the stuff not in this model that would improve it; I think it would be pretty easy to get the R squared up for someone with the time and data on hand. The exercise isn’t about the best possible WP predictions but setting a common ground for comparing metrics.

Last I checked, just using last season wins regressed 50% ((Last + 41)/2) to predict this season had an average error of about 6.5 projected over the full season. I think we can agree that any metric that can’t beat that standard isn’t telling us a lot. Maybe using regressed WP48 would do that, but it’s not clear it can

“The correlation in WP48 for players who change teams is .73, I think, and around .8 in general. So the .71 at the team level seems reasonable enough…”

What is the minimum MP here? If it’s 1000 MP or less these correlations may understate how consistent the player WP48s really are for the purpose of your exercise, since the 2000+ starters produce most of the value and their y-t-y r must be higher.

Also, you need to keep the distinction between metrics and reality clear. The fact that WP correlates at .73 for team changers doesn’t necessarily mean such players change that much — the true correlation may be .9. You can’t use that to evaluate how “good” the team predictions are (it’s like self-grading). Similarly, you can’t assume that joining a good team reduces a player’s productivity — that’s just a WP thesis (and in that case, highly unlikely to be true).

Although you don’t have other metrics to work with, you can evaluate WP against simple evaluation methods. For example, it would be interesting to see how much WP outperforms these metrics:

*Prior year MP

*Share of team wins (prior season team wins, divided by prior season MP)

*Prior year points per 48 (if you have that data)

In the data set I have, using previous wins or previous regressed wins has an average absolute error of 7.76. So basically the same as using point differential, and worse than the WP prediction. Maybe the lower error is based on data from years ago when there was less team turnover? That number comes from putting those through a regression; if you just use the actual number the regressed wins does better but still has an error of 7.83. Obviously the regression makes it better.

Not sure what the qualifications were; I’m at work and don’t have my stuff handy. I’m sure you have lots of this other info around; feel free to jump in!

The 6.5 for regressed prior wins is based on the reports over at APBR: http://sonicscentral.com/apbrmetrics/viewtopic.php?t=2618&postdays=0&postorder=asc&start=90. Maybe the error is unusually low this year for some reason (despite Cleveland!). So it makes sense to use your 7.73 as the benchmark. What you’re reporting is that knowing each returning player’s exact number of MP, the true MP and prior WP for any new players, plus the MP for rookies — all of that buys you only a .4 reduction in the average error. I remain unimpressed, but let’s see what other systems do. My guess is that using last season’s point differential plus MP might do nearly as well. And it wouldn’t surprise me if points per 100 possessions did as well or better than WP. That would be ironic, no?

As of when I collected the data, I have the WP prediction as a tiny bit ahead of the regressed single year for this season so far. But I wouldn’t take part of a single season as an indicator of anything.

Happened to stumble on this old post by Eli W. that breaks down WP’s y-t-y correlation based on MP: http://www.countthebasket.com/blog/2007/12/04/evaluating-player-ratings-year-to-year-correlations/. For starting players who generate most of the productivity (over 2000 MP), the y-t-y correlation is extremely high: about .9 for players on the same team, and .85 for team switchers. So if it’s measuring productivity correctly, wouldn’t that suggest your predictions should correlate with team wins at better than .7?

Hard to say. The way Eli did it, he’s tossing out players who change their minutes drastically, or played less than 1000 minutes in one season. It feels like that would toss out a lot of rookies or injured players, who we might expect to be most likely to have unexpected changes in their productivity. Also, he used adjusted P48, not WP48; it’s possible adjusted P48 is more consistent. I don’t know of any numbers that compare the two directly.

Using AdjP48 should actually reduce his correlation, because some players will change position while he’s implicitly assuming that never happens. However, you make a good point that the <1000 MP players add a lot of volatility. I'm thinking that the 2000+ MP players produce a very large proportion of wins, so the <1000 MP guys just don't matter that much for predicting team wins, but that could be incorrect.

I'm increasingly thinking that something as simple as MP might outperform WP. That is, I think the second of these two-variable regressions might be as strong as the first:

A) 1. Last season point differential, and 2. your predicted WP;

B: 1. Last season point differential and 2. last season MP (crediting rookies at, say, 800 MP).

Nope. A has R squared .4702 and point differential loses a lot of predictive ability to predicted WP; B had R squared .3327 and last season minute played isn’t anywhere near significant. It isn’t significant on its own either. But keep trying!

Side note: you might have noticed that the R squared I mentioned is smaller than the R squared I mentioned for predicted WP alone. That’s because they’re based on different portions of years; the WP pred alone is the full set from 2002 to current as in my post while the regression I mentioned above is from Arturo’s data set, which only covers up to 2008-2009. So just to put it all on equal ground using the Arturo-sized data set: using previous season point differential alone, R squared is .332; using previous number of wins it’s .316; using sum of minutes played in previous season by current players it’s .003; using predicted WP it’s .447. If you put previous wins and predicted WP together you get to .4714 and WP is 2.5 times more important (by scaled coefficients, although they’re pretty well correlated) and if you use previous point differential and predicted WP you get to .4702 and WP is still about 2.5 times more important.

Another side note: is there a way to download all the data from a basketball-reference.com search at once as opposed to one page (100 entries) at a time? This was really easy to code if the data are organized properly, so I could do Win Shares in a snap if I had the data like from the automated site.

I am surprised that MP has no predictive power at all. Just checking: you weighted MP(prior) by current MP, the same way you did WP, right?

Weight it how? You mean adjust previous season total minutes to minutes per game and then multiply by games this season to get predicted total minutes played this year?

The idea is to treat MP(prior) as a measure of productivity. So you use it same way you use WP(prior), and weight it by current MP (i.e. MP1 * MP2).

Weighting it that way gets minutes alone to an R squared of .279, minutes and last year’s differential combined to .4143. Still below predicted WP alone (.4475) or WP and previous point differential (.4702). If you use normalized versions of all three you only improve to .496 and WP is over twice as important as the other two. Maybe we should wait for another metric to make some predictions?

You consider that a “victory” for WP, to have an R^2 that is .03 higher than using MP alone? MP is a very crude estimate of NBA coaches’ view of player effectiveness, since bad teams have the same # of minutes to distribute as good teams. But it’s nearly as good as WP.

I guess we have different standards of success here — which is fine. But given this result, I think it will be hard for WP to outperform a simple scoring metric lik points/100.

WP actually crushes MP; MP only catches up once you incorporate point differential, which WP already does a good job of predicting. But either way, you set the bar and WP cleared it, so I would call it more your victory than mine. I’m still waiting for anyone else to follow suit. Such as yourself with points/100 maybe? Or adjusted +/-? Or statistical? Or win shares? Or….

I agree it would be interesting to see other metrics put to the same test. And I confess I’m too lazy to create the databases needed to do it. I’d suggest you do a post over at APBRmetrics, linking to this post and inviting others to test other metrics similarly. You may well find takers there.

When you say MP only catches up with the help of prior season’s point differential, well, the whole point (IMO) is to see how much a metric can improve on point differential. We know that team WP equals differential, so even a very poor apportionment of that among players will still have a lot of predictive power as long as most players stay on the same team (which they do). The question is how much more predictive power a metric gives you, compared to knowing prior point differential alone (which we always know). That’s why I feel that an interesting test for variables is how well they predict the teams with the greatest changes in player-minutes. For those teams prior differential is less powerful, and the metric has to carry more of the load.

Of course MP alone is weaker than WP — MP doesn’t “know” anything about point differential. But the fact that differential + MP is almost as good as differential + WP suggests that WP isn’t telling us a whole lot about which players really contributed the most value. I’d guess that simply using team WP (prior) / MP would be essentially as powerful as player WP.

The proper view for improvement over another piece in the model is the partial R squared. Minutes played has a partial R squared of .198 over point differential, so it’s adding something to be sure; WP has .293, so it is adding a decent amount more than point differential. They end up near the same place because R squared is bounded and so the actual improvement is larger than the apparent improvement of .0559 you get from just comparing point differential+minutes to point differential+WP.

This is good stuff. There are so many things that can explain what happens from season to season. I just want a metric that’s based on sound reasoning for how players affect their teammates and what things are about the players themselves.

All players are different. They all improve differently, all recover through injuries at a different rate, and all decline differently. Now if you were to combine this with dberri’s average wp48 increase (improvement of players) from year to year, use the rookies’ actual wp numbers (it’s accepted that rookies are hard to predict), and maybe also coaching, and then see how it correlates, that would be a nice sight. If it explains variance highly, then I think we can agree that WP is a sound way to evaluate players.

When predicting, things like diminishing returns on different stats could come in. Injuries will always be a hassle though.

Just to note…if you take last year’s wins and regress to the mean, the R^2 is 0.465. Let’s call this “Dumbo”:

(W_2009+41)/2

Right now, Dumbo is beating out several other WoW pre-season predictions, including Dre’s (0.449), NBeh (0.403), and Miami Heat Index (0.419). The WoW blogs ahead of Dumbo are Hickory High (0.482), Roblog (0.551), Alex (0.537), and Arturo (0.461). I’m doing the worst (0.356). Although I should remind that my picks were a blend of WP and Win Shares.

So, how is Hollinger doing? Pretty good. 0.738. Bow down to the man.

It should also be noted that simply using last year’s win totals (what Arturo calls “Bobo”) is also 0.465. So, even Dumbo doesn’t beat Bobo.

The R squared for last year’s wins and last year’s regressed wins will always be the same because regressed wins is just a linear transform of last year’s wins. Dumbo can’t beat Bobo, they can only be equal. As far as it goes, from ’03 to ’09 regressed wins has an R squared of .316 so it’s over-performing this year.

You’re changing the subject though Evan. If you’d like to post or send me the ezPM numbers from last year (or further back if you have it) I can run the same analysis on your numbers to see how they’re doing so far. So that we can get back to comparing apples to apples, like the point of the post says.

Cool, I’d like to see this. Here’s a csv file of the 2009 ezpm data. Remember that it’s +/- per 100 possessions. I’m assuming you have some way to convert that to minutes by using team pace. Let us know what you find.

https://spreadsheets.google.com/pub?key=0Al6a2ecvJfTidE1ETG8zV1RwRjhSb0ZCbV9JTkV2eXc&output=csv

My plan was to use possessions so far this year. So Aldridge, for example, had an ezPM of .32 last year. This year so far he’s played 3465 possessions (from the data on your site), so the predicted points added is .32*3465/100 = 11ish. I add up all of Portland’s players and I get the team’s predicted point differential, which I convert to wins with your 2.54+40.9 formula. Sound good?

Yep. Sounds good. And what about for rookies? What’s the equivalent in +/- to 0.045 WP48?

Yeah, just realized I should ask. The WP data I have for this year puts the league-average wp48 at .048, or just over a rookie. The data you have posted puts the average ezPM at -1.88. Should we say rookie ezPM is -1.9? -1.95?

I want it to be equal. What is the win% of a 0.048 team?

yep, just calculated it, that seems right

Which one? -1.9?

1.95 is a bit closer (can’t imagine it makes much difference either way)

And just to check, the data on the web site includes games played on the 6th?

“The R squared for last year’s wins and last year’s regressed wins will always be the same because regressed wins is just a linear transform of last year’s wins. Dumbo can’t beat Bobo, they can only be equal.”

Also, thanks! I should have realized this.

“And just to check, the data on the web site includes games played on the 6th?”

Actually, it’s through the 3rd. I can update it with data through yesterday’s games. Should I do that?

Nope, just want to check that I get wins for the right time period. The link says the 3rd but the spreadsheet says the 6th, so I assumed the 6th was right.

Here’s the data through the 13th:

https://spreadsheets.google.com/pub?key=0Al6a2ecvJfTidG1QSmFGQ3IycThIN2V6bFhId01hZlE&output=csv

I got it done for through the 3rd. The spreadsheet is at https://spreadsheets.google.com/ccc?key=0AsB8b3QV6LtcdGFka2pUYmRUbFdfQlVVTlljSVIyM3c&hl=en&authkey=CKSUkJ8M .

I have ezPM with an average absolute error so far this year of 6.89. The spreadsheet is laid out similarly to the the WP one I posted. The ezout sheet has the raw data and the predicted point differential produced by each player (it’s labeled as predwin, but is actually points). Lastpos is possessions from last season, and lastpos2 is possessions from last season multiplied by possessions this season (from the minute stuff Guy asked about). Sheet 1 has the pivot table to sum up team point differential. Sheet 2 copies those sums over and turns them into predicted wins by changing them to point differential per game multiplied by 2.54 and then adding half the number of games each team played through the 3rd. Atlanta, for example, played 49 games so they would win 24.5 if they were average. They have a predicted differential of just over 2 though, so they’re predicted to win about 5 more games, or just under 30 in total. The columns to the right have the actual team performances; Atlanta was 31-18 after the 3rd, so the prediction was off by about a game and a quarter for them.

In the WP post, games were through the 7th and the predictions had an average absolute error of 5.48. Error should increase as more games are played because there is more room to be wrong by, so I give the preliminary edge to WP for having a smaller error after more games were played. I’d obviously rather look at multiple seasons, of course, or at least look again at the end of the season when all the data will be final.

Let me know if you want the R code I used. It obviously assumes the data are organized a certain way, but I can send the files and describe what I did.

What’s the R^2?

looks like 0.56

One other note, I had it use the rookie assumption for any player with less than 300 possessions, but I don’t think there were any.

The R squared is .56. The WP correlation from this year is .554. So probably not different, but ez is a fraction ahead.

🙂

I win! jk, that’s a push

Yeah, the confidence interval on the correlation is pretty big with one season/30 teams. If you did this for the whole modern era of the NBA you’d still only get to +/- .07ish. Put that way, it’s hard to say if many reasonable methods would be statistically different.

I’m a bit confused. How can R^2 = 0.56 and R=0.554? Or are you using different terminology?

Alex: I think your formula for converting point differential to wins is off. Each point is worth about .033 in win%. That is about about 2.5 wins (though not exactly) over a full season, but not over fewer games.

I assume he calculates the full season and then converts by using the win %.

I don’t think so: “Atlanta, for example, played 49 games so they would win 24.5 if they were average. They have a predicted differential of just over 2 though, so they’re predicted to win about 5 more games, or just under 30 in total.” By my estimate, +2 points/game over 49 games = .574, or 28.1 wins.

I used the regression Evan posted on his site; in Arturo’s data set a point of differential is actually worth 2.69 wins or .0328 win%. But in case Evan uses that value, I wanted to use his number.

That equation works reasonably well when using full season data. It is incorrect on partial-season data. Point differential obviously can’t produce a fixed number of additional wins regardless of the number of games played.

Alex, thanks for running this. Looking at the output, I’m actually quite pleased. I can see that there are major errors for LAC and MIA, but I completely would expect that given the emergence of Blake and diminishing returns for LeBron. Obviously, WP or any metric might have the same issues. But overall, the predictions are solid.

Guy, you appear to be right. +2/100 poss should be 27.5 wins through 49 games.

Don’t know if this helps or hurts me! 😉

Yep, it changes things if I use win percentage. I used point differential per game *.0328 +.5 and multiplied by games played. It shrinks the error but only moves the R squared up to .563, so not a lot going on there.

Evan, all the numbers I’ve mentioned are R squared, but I looked at correlations as a short cut here and there, and to look at the error. I might have said the wrong one at some point, but all the numbers are definitely R squared values.

Alex, can you put up the new file when you get a chance?

Which new file? With the win% prediction instead of 2.54*differential?

“It shrinks the error but only moves the R squared up to .563, so not a lot going on there. ”

Isn’t it the error we care about more in this case? Theoretically, a model could have a high correlation but still miss a lot on its predictions, if the predicted variance was too great or too small. In fact, that’s what we’ve just shown with the mistaken points:win conversion, which artificially pushed teams too far from .500 without changing the ranking at all. It may not make any practical difference here, but unless I’m thinking about this wrong I think the average error is the most important measure.

The error probably is more important since the goal is to predict wins, not multiples of wins. But practically I assume anyone making actual predictions would move them (maybe with a regression) to match the actual total wins possible. It’s limited data, but this year ezPM underestimates the spread in wins by more than WP overestimates it. So practically speaking, someone using either one would want to alter the predictions to fit the actual number of wins, and so (more than likely) fix the variance.

Coincidentally, I did the same kind of conversion to win% for the WP numbers and the absolute error drops to virtually the same value as ezPM. Instead of converting ezPM’s predicted point differential to win% with a regression equation and then to predicted wins, I converted the WP predicted wins to win%, converted it to predicted win% with a regression equation from other seasons, and then to predicted wins.

Well, you can only “fix the variance” after the fact, right? So getting it right in advance is important. Interestingly, a perfect metric will actually have too little variance compared to actual team wins and losses. The total variance in a season is the sum of the variance in teams’ true talent plus random variation. Teams’ actual point differential will vary from their true talent, and wins/losses will vary from differential.

Does WP project too much variance generally, or just in the current season? If so, that’s an indication of a problem, since it should underestimate variance at least a bit.

I fixed the variance before the fact here, so I don’t think that’s a problem. I also don’t see why the random error away from differential couldn’t cancel across teams as opposed to sum if it were actually random. In that case the variance of actual wins would be the same as the variance of predicted wins. You seem to be suggesting that the noise will always move teams apart from each other.

Sorry, I’m not following — how did you fix the variance without knowing the actual distribution of wins/losses?

The observed variance will equal the sum of variance (talent) and variance (random). The random variance doesn’t cancel out because you are squaring the errors. Let’s say a true 46 win team (+5) will sometimes win 44 and sometimes win 48. Your variance is now (3^2 + 7^2)/2 = 29, not 25.

So you want your predictions to have a bit less variance than the actual season. If WP is predicting more, then its estimates of player productivity have too much variance (which would have been my guess).

I regressed the WP predictions, converted to win%, on actual win%. Then I put the WP predictions through the regression equation and multiplied by games played to get ‘final’ predictions. This gives the predictions a smaller SD than actual wins. When I did that for this season (the regression data being prior seasons), the absolute error moves in line with the ezPM predictions, which is what I said before. That’s what I meant by ‘fixing’ it. In any event, it’s also possible that a more reasonable prediction (including teammate effects, aging, etc) would already have variance more in line (or smaller) with actual values, so it seems premature to worry about. But I’m glad you’re finding every reason to think that the WP predictions are doing poorly.

For a given team, wins = ‘true wins’ + error. Presumably the error is random with mean 0, since we always assume that. That means the actual wins could be bigger or smaller than ‘true wins’. If you get a sample where the errors turn out right, the observed variance will be smaller than the true variance. It’ll be more or less prevalent depending on how much error there is. I can post some simulation code if you’d like.

Jeez, Alex, we were having such a nice, snark-free discussion here. I’m not “finding” a reason to view the WP predictions poorly. You reported that WP was predicting more variance than actually occurs. That’s surprising for any metric, because the actual league variance should be a bit greater than predicted variance for the reasons I explained. It caught my attention because I’ve always thought the variance in player WP48 was likely too large (mainly because of failure to account for DR on rebounds, and perhaps because assists are overvalued). I may be wrong about that, but it’s not like I’m contriving to invent some new objection.

In any case, I think we agree that it’s important for a metric to get the scale of player differences right, in addition to ranking players correctly. If a metric is understating or overstating the variance in player talent, that’s important. A metric which got the ranking right but the variance wrong would probably be relatively easy to fix, but you still want to know that it needs fixing. Right?

And what do you mean by a “more reasonable prediction” that includes teammate effects? Doesn’t Berri argue that the teammate effect is very small?

Fair enough, but I still think it’s true that in any given season the predicted variance could be high or low without any particular flaw. You’re probably right on average though; even without the random noise issue you probably don’t want to predict extreme values, and this would lead to have smaller predicted variance in wins. I wonder how many people worry about fixing their predictions; even on the APBR board, let alone other places, very few predictions add up to the actual number of wins in the league. Maybe most people just don’t think it’s an issue?

The teammate effects (similar claims are made for defense and, I think, age) are small at per-minute productivity levels, but they would obviously become bigger when you multiply out by minutes to move to total wins produced. So if you were picking between two people you expected to play the same amount you could roughly ignore them, but you wouldn’t want to if comparing a starter to a bench guy. Depends on what level you’re interested in.

Interestingly, if I would have used my pre-season minutes projections for the Warriors, I would have predicted ~36 wins. Probably not too far off. I had predicted 45 using WP/WS.

Alex, for the revised regressions, what did the mean absolute error turn out to be for WP and ezPM?

Just converting ezPM predictions to wins through the win% equation (I didn’t regress them the same way I did for WP because there’s only the current season data to use) gets the average absolute error to about 4.7. WP is very similar, maybe higher by a couple hundredths. I don’t have the number in front of me. I replaced the google doc with the same file but using the win% equation; it’s at https://spreadsheets.google.com/ccc?key=0AsB8b3QV6LtcdFF1SUtXbUkzQ3BUWGRkeEg0WGpMWXc&hl=en&authkey=CODTtMUC .

Thanks. You know, thinking about this some more, it might make more sense to compare directly to point differential rather than wins.

for example, I have 21 as the prediction. They’ve won 12 games. But if you take their point differential they “should” have won about 16 games to this point. They lost a lot of close games. We all know p.d. is a better predictor over the long run (although I guess Vegas wouldn’t care about that). But the point is our models are really trying to predict p.d.

Sacramento

Yeah, I guess it depends on what exactly you’re trying to do. The stats people might be more interested in point differential. Vegas, sports writers, more general fans are probably interested in wins, playoff seeding, etc. You’d probably want to predict both, and to the extent they’re strongly related if you’re consistently good at one you’ll probably be just as good at the other. It’s hard for me to envision a case where one person would be consistently #1 at predicting point differential but consistently #2 with wins, unless you purposefully did different things to predict the two of them.

Alex/Evan: I was going to ask you two guys what you thought of the idea of predicting 5-man units rather than teams. Seems like it offers a lot of advantages, although also some potential sample size problems. But then today someone named “mystic” over at APBRmetrics posted exactly that kind of analysis. Here is his spreadsheet: http://bbmetrics.files.wordpress.com/2011/02/lineup-check_2007-2011.xls.

He measured the correlations for four metrics, first against net points for the unit and then the unit’s APM, using the top 50 lineups for each of past 4 seasons. His results (PRA is his metric):

For unadjusted Net:

1. WS/48 with 0.696

2. PRA with 0.667

3. WP48 with 0.593

4. PER with 0.577

For APM:

1. WS/48 with 0.674

2. PRA with 0.656

3. WP48 with 0.568

4. PER with 0.546

I’ll save Alex the trouble of pointing out these are all “small” R^2s. But that’s to be expected with much smaller samples. However, I think it’s still a valid test unless we think the correlations are biased in some way. That is, small samples will create noise, but I can’t see why that noise should help (or hurt) Win Shares more than Wins Produced.

Thoughts?

I gave mystic some data. Interested to see…

Yeah, I saw the post. I’m not sold on fitting to +/-, but no strong objections to people seeing what happens when you do.

Pingback: Predicting the Past: 2011 competition | Sport Skeptic