## More on R Squared

After some of the discussion about R squared and sample size in my other post and its comments, I thought I should take a better look at it.  I’m glad I did, because I was reminded of something I should have known but didn’t get 100% right (I should say I haven’t checked my comments since this afternoon, so apologies if someone already caught this).  The question at hand is how does R squared react with increasing sample size.

First, a quick reminder about R squared.  In a simple regression (e.g., one predictor variable and one variable to be predicted), R squared is literally the correlation coefficient squared.  In multiple regression, R squared is the squared correlation of the actual dependent variable values (the thing being predicted) and the predicted values from the regression equation.  In either case, R squared tells you how much of the proportion of the variance in the dependent variable your independent variable(s) explains.  Thus it can range from 0 to 1; you can explain nothing about the variable, or everything, or any proportion in between.

So, how does sample size influence r squared?  What I should have remembered, and the graphs below will illustrate, is that a larger sample size gives you a better estimate of the ‘real world’ correlation.  Correlations are, after all, subject to random error, and and larger sample reduces that error and makes your estimate move closer to the true value.  My simulation worked like this.  I looked at sample sizes ranging from 2 to 900, 30 of them in total.  For each sample size, I created a data sample that contained two variables with a correlation of .75.  I ran a regression on that sample predicting one variable from the other and collected the R squared.  I created 10 samples for each sample size, so in the end I have 300 samples along with their sample size and R squared.  If I make a scatterplot of those two things, I get the graph below.

The line on the graph is set at the ‘true’ R squared, which is .75^2.  What you can see is that with small sample sizes the R squared is very noisy.  The reason I started with sample size=2 is because you have to have at least two points to calculate a correlation, but with only two you will always have a perfect relationship (two points connect a line).  With random noise it’s possible the correlation could be -1 or +1, but either way the R squared will always be 1.  As the sample size increases from 2 to about 100, you notice that the band of dots narrows, sort of like a funnel on its side.  This is because as sample size increases, you’re reducing random noise and getting a better indication of the true relationship between the two variables.  From about 200 out to 900, the band is roughly the same size.  There’s still variability at each sample size value, but it’s due to noise in the sample (e.g. the random samples are drawn independently and may have values a little different from the .75 correlation I specified).

That exercise isn’t exactly like what happens if you look at wins and salary in the NBA, which is what prompted the discussion.  In that case you’re more likely to look at overlapping samples than completely independent samples.  The simulation above, for example, might be appropriate if you looked at the relationship for the years 2008-2010 versus only 1995; you’ll get a better estimate for 2008-2010 because there’s a larger sample, but those samples are also independent.  I think it’s more likely that people would look at the previous season (say 2010) or the past few seasons (say 2008-2010).  In that case the sample aren’t independent because they both include some of the same data.  To simulate this kind of analysis, I created one data set with 900 points (again, two variables with a correlation of .75).  It turns out that the particular sample I got had a correlation more like .76.  Then I ran a regression predicting one variable from the other using different samples sizes from those 900 observations.  I started with the first two observations, then the first three, first four, etc up to 900.  I again collected the R squared for each model and plotted it against the amount of data (the sample size) used in the regression.

The line on the graph is again the true value, this time the sample correlation instead of the one specified in creating the sample.  You can see that again with two observations the R squared is 1, as it must be, but then it drops dramatically and quickly heads up towards .58 (which is the .76 sample correlation squared).  So the first 50 observations or so were fairly noisy, but with the full sample of those 50 observations we get a decent estimate of the true relationship.  But then the R squared drops again, before coming back up to .58 and more or less staying there from sample size 250-ish on.  So after the first 50 observations, it turned out that the next 200 or so muddied the water (were noisy) enough to give us bad estimates.  But with great sample size comes great estimates, and the regression is spot-on (in this particular sample) with a sample size of around 700.

I should note that I’m not saying that you need 200 or 700 observations to get an accurate R squared, that’s just what happened in these simulations.  But in either case (overlapping or independent samples), more is obviously better, and increasing your sample size will lead you asymptotically to the true value.  This must be the case; if you had the ability to measure the entire population you were interested in without error, you would have to find its true R squared (and correlation, mean, variance, etc.).

Returning to the NBA salary issue, I made a mistake in my response to Phil.  I agreed with him that a larger sample should lead to a larger R squared.  That will only be true if the sample this year happened to give you a value below the true one.  So in Phil’s old post, if the value was .256, we have to assume that’s the true value.  Fortunately there’s more than one season of data available, so we can improve that estimate but there’s no particular reason to assume it will go up.

Alternatively, there could be a reason to think the R squared value will increase.  I (and others, including Phil) have argued that salary is really standing in for other variables, like team quality.  To the extent that players are paid appropriate for their ability, salary will serve as an indicator of team quality, and thus wins.  The low R squared we actually see indicates that players are not being paid completely appropriately.  But, if player evaluation is getting better with time, the connection between salary and wins should increase, and thus the R squared will increase.  So if, for example, it turned out that the wins-salary R squared from 2005-2010 (or going forward into the future) is higher than from 2000-2005, that might be an indication that player evaluation is getting better across the league.

This entry was posted in Uncategorized and tagged . Bookmark the permalink.

### 47 Responses to More on R Squared

1. EvanZ says:

Basically, what you are saying here is that variance decreases with sample size. In other words…what Phil said a couple days ago (or years ago). Or what anyone reading a first year stats book would have known.

“I agreed with him (Phil) that a larger sample should lead to a larger R squared.”

Given that, in your other post, you just commented (after this post was written)…

“My post from last night (this one), I think, makes it clear that Phil’s idea about R squared and sample size at least is wrong. I also believe him to be wrong about how to interpret R squared and coefficient significance tests. And, as I said in that post, we all (including Phil) think that salary is a stand-in. I’ve only ever seen him say it in comments, not in his posts themselves.”

How can Phil be wrong, when in this very post you are illustrating what he said in the first place? Do you even realize what you’re saying? It would be funny, if you weren’t at the same time being so derogatory toward Phil.

You need to take a 30,000 ft look at all this, and not get so caught up in your numerical wizardry. Honestly, I think it’s starting to melt your brain.

• Alex says:

I’m just going to quote from Phil’s comments on my post. “I guarantee you that if you use the number of wins in two seasons (164 games) instead of just one, you’ll get an r-squared higher than .256.” “If you use one game, you get close to .000. If you use 82 games, you get .256. If you use between 0 and 82 games, you’ll get something between .000 and .256. And If you use 164 games, you get something higher than. 256.”. “However, if you combine 2, or 5, or 10, or 20 seasons, the r-squared would go up and up and up.”

How should I interpret those other than to think that Phil thinks sample size runs R squared up to 1? Or, as he clarified in a later comment, a functional max of .67 in this case? If he said something more specifically about decreased variance or increased accuracy of the estimate, just let me know where and I will be happy to apologize.

I’ll say again, as I did in my post, that my initial reply to Phil in the comments was incorrect. I should not have agreed that the R squared would increase.

2. Evan,
No. What he’s actually saying is that as you increase sample size, sample correlation will approach true population correlation. This is the Law of large numbers.

So Phil’s statement is incorrect. A small sample might give you an indication of the population but it might be higher or lower. A large sample should give you a clear indication.

• EvanZ says:

Now all we’re doing is arguing about what Phil is saying. It’s probably best at this point to see if Phil wants to answer for himself, so I’ll keep my mouth shut.

3. Guy says:

Alex/Arturo: What Phil was describing is what Alex’s 2nd graph clearly illustrates. As sample size increases, the measured correlation rises toward the true correlation you would find with an infinite sample size. The average win:salary correlation for samples of 41 games will be smaller than for samples of 82 games. And if we forced teams to play a 162 game season, the correlation would be still higher. Arturo is incorrect in saying a smaller sample could give you a higher or lower correlation. (Any single sample could, of course, but take a bunch of 41-game samples and a bunch or 82-game samples, and the average correlation will be higher in the larger samples.)

Alex’s first scatterplot is irrelevant to this discussion, because he forces the correlation to be the same at each sample size. Phil’s whole point is that the correlation will be different depending on the sample size.

As the old saying goes, fellas, when you’re in a hole, stop digging.

4. Phil Birnbaum says:

We are talking about two different things here.

You are talking about the number of datapoints in the regression. I agree with you on your point here, that the more datapoints, the closer you get to the true r-squared.

What I am talking about, however, is sample size in terms of number of games that goes into each data point. Completely different thing.

You are saying, “if you have 2,000 data points, where each point is an 82-game season, you get close to the true r-squared.” I agree.

But what I am saying is, “if you have 2,000 data points, where each point is a 41-game season, the true r-squared is larger than if you have 2,000 data points, where each point is a 164-game season.”

I’ll give you an example in a separate comment later.

• Alex says:

What is each data point in this example? If I have 2000 R squared values, each based on 41 games? Or if I have 2000 win/salary pairs, each based on 41 games?

• Phil Birnbaum says:

2000 win/salary pairs. Just like in the Berri regression that we’re talking about.

• Alex says:

Wait, before you said that if you combine seasons the R squared would go up and up and up, and here it sounds like you’re saying the smaller sample (41 games versus 164) would have the larger R squared. Typo?

• Phil Birnbaum says:

• Alex says:

So if I look at the correlation for wins and salary last year (Cleveland, for example, has 61 wins), I get .465. If I look at the correlation for combined wins and salary over the past two years (Cleveland has 127 wins), I get .433. Those are samples with the same number of data points (30) but different sample sizes going into them, 82 games versus 164. If I do the two years prior to last year so that it’s still 164 games but not overlapping with last year’s data (Cleveland has 111 wins), I get a correlation of .317, which is even farther away from .465. So it doesn’t look to me like combining seasons increases the correlation (or equivalently the R squared).

5. Phil Birnbaum says:

Suppose teams have two choices. They can spend \$1 million on salaries, and be expected to go .366 [30-52] (subject to random binomial variation), or they can spend \$2 million on salaries, and be expected to go .634 [52-30] (subject to random binomial variation).

We would agree, wouldn’t we, that salary has a strong effect on wins (by the intermediate effect of buying better players, if you want to say that explicitly)?

We wouldn’t need to see an r-squared before concluding that \$1 million corresponds to an increase in winning percentage of .268, would we?

I’ll wait for your confirmation before continuing with this example.

• Alex says:

I think we’re heading down a bad road here. A city could choose to hire 5 police officers and have a rate of 5 crimes per 100,000 people, or they could choose to hire 10 police officers and have a rate of 10 crimes per 100,000. Would you agree that having more police officers has a strong positive effect on there being more crime?

6. Phil Birnbaum says:

I don’t think this thread is going to go anywhere. I’m trying to argue about the effect and the r-squared, and you seem to be trying to change the subject to correlation vs. causation, which I’ve already told you I agree with you about.

• Phil Birnbaum says:

Wait, maybe I misunderstood where you’re coming from. When I say,

“Suppose teams have two choices. They can spend \$1 million on salaries, and be expected to go .366 [30-52] (subject to random binomial variation), or they can spend \$2 million on salaries, and be expected to go .634 [52-30] (subject to random binomial variation).”

I am not saying these are estimates, which is what you seem to be assuming. I am asking you to hypothesize a league where this is TRUE. Where God has come down and told you that this is the rate at which dollars buy players which create wins.

This is not too unrealistic, and not too hard to imagine, right?

So, I ask you again: assuming this is TRUE, we would agree, wouldn’t we, that salary has a strong effect on wins?

• Alex says:

Hard to argue with God. Sure, go ahead.

7. Phil Birnbaum says:

Okay. Here’s the argument. We both agree that if \$1 million in annual salary buys a .268 increase in winning percentage, that’s salary having a strong effect on wins.

Now, let’s take ten teams. Five (A-E) spend \$1MM; five (V-Z) spend \$2MM. They have a one-game season. The ten season results will most likely look something like this (payroll, WPCT):

\$1MM 1.000
\$1MM 1.000
\$1MM .000
\$1MM .000
\$1MM .000
\$2MM 1.000
\$2MM 1.000
\$2MM 1.000
\$2MM .000
\$2MM .000

That is: the \$1MM teams have a combined .400 record, and the \$2MM teams have a .600 record. Close enough.

The r-squared: .04.

Which means:

— We are in agreement that salary has a strong effect on wins.
— The r-squared between salary and wins is 0.04.

Therefore, the statement “The NBA r-squared between salary and wins is only 0.25” does NOT entitle you to conclude “therefore salary does not have a strong effect on wins.”

Now, this is only one game, I know, but I used an extreme example to make the point.

Let’s try 82 games. Over 82 games, the SD of winning percentage is about .05. So the ten teams might look vaguely like this:

\$1MM .366
\$1MM .391 (+0.5 SD)
\$1MM .341 (-0.5 SD)
\$1MM .441 (+1.5 SD)
\$1MM .291 (-1.5 SD)
\$2MM .634
\$2MM .659 (+0.5 SD)
\$2MM .609 (-0.5 SD)
\$2MM .709 (+1.5 SD)
\$2MM .559 (-1.5 SD)

Now the r-squared is .88. In real life, it won’t be so neat — it is random, after all. But, on average, you’ll get something in that area.

If you use something in between 1 game and 82, you’ll get something in between .04 and .88. If you use more than 82 games, you’ll get something above .88.

(These are all subject to confidence intervals for the r-squareds and beta-hats, of course, but they’ll average to something around .04 and .88. And the actual values depend on how the random variables work out. If they turn out to be .08 and .73, or whatever, the point is the same.)

So:

For the exact same strong relationship between salary and wins, and the exact same number of datapoints (10),

— sometimes you get a high r-squared, and
— sometimes you get a low r-squared.

That’s my point. For this particular question, it’s the regression equation — \$1 million equals 0.268 WPCT — that shows the strength of the relationship, not the r-squared.

8. Phil Birnbaum says:

That’s interesting, and unexpected.

It could be one of three things:

1. Just random luck making things go the opposite way than usual for the one season.
2. The effect of compression in salary range, if teams’ regression to the salary mean from season to season is high enough.
3. I’m wrong.

I suspect a combination of #1 and #2, but I will think about it.

However, try this alternative: try last year, but only the first 41 games. Even better, if you have the data, try the first 10 games. Then the first 20. Then the first 30 … all the way up to 82 games (which is .465^2). I bet you will generally see a steady (but perhaps not perfect) increase from a very low r-squared for “first 10 games vs payroll” to .465^2 at the end, for “first 82 games vs. payroll”.

• Alex says:

I got wins through the first 41 games for both last year and the previous year, and both are individually larger than the full season correlations. 2009 full year: .465, 2009 first half: .564. 2008 full year: .202, 2008 first half: .218.

• Phil Birnbaum says:

Wow. Not what I expected. I will try to find time to investigate and see if I can figure out what’s going on.

• EvanZ says:

I’m not sure I would split the season into two like that. Playoff bound teams tend to rest players more and could skew the results.

Why not try Phil’s original idea of interleaving data (take every other game or every fourth game) or simply randomly selecting games from the season? This would help correct for the issues I mentioned.

• Alex says:

All I have is full season results, and a quick snatch-a-grab that I entered manually for the past two seasons (first half is easy since you can search bball reference for a range by game number, like 1 to 41). I don’t have the time to get the data together for what you suggest. If someone does have the data to give me I could run it, or I’d be interested to see the numbers too if someone else can run it.

Although either way, resting players shouldn’t affect the combined seasons approach, and if you sum together the wins and payroll for every team for the past nine years the R squared is only .05. I didn’t look at every number in between (2009, 2008-2009, 2007-2009, etc), but it looks like the R squared in this case actually goes down and down and down with more data.

• Phil Birnbaum says:

OK, here’s what I did.

I looked at the 2003-04 NBA (because I had the data already typed in for some reason). The r-squared of salary vs. wins is .1238.

Then, I created a 10-game season by using picking 10 random numbers, and using the actual winning percentage of each team. (Teams were independent — I didn’t make one team’s win another team’s loss — but that shouldn’t matter much).

I then found the r-squared for that 10 game season. I reran that 10 times.

The average of the 10 r-squareds was .0715. That’s substantially less than the .1238 for the full season. 9 of the 10 seasons were less than .1238. The result is statistically significant.

I repeated this for 41-game seasons. This time, the average r-squared was slightly *higher* than the original, at .1301. Five of the 10 r-squareds were higher, and half were lower. The result is not even close to statistically significantly higher.

I bet if I ran the 41-game case for 50 or 100 samples, the average r-squared would be lower and statistically significant. Since the 10-game sample has a lower r-squared, it makes sense that the 41-game sample should, too.

Alex, do you agree, or do you want me to run a bigger sample? Or do you want to do it? I guess I should do it just for completeness. It’s a pain the butt because I have to use one software to generate the sample, and then another one to run the regression.

I’ll run a 400-game sample too. That should easily give us a higher r-squared.

So, Alex, I think your r-squareds came out similar just by random chance. One or two seasons isn’t enough to guarantee a lower or higher r-squared in real life, even though the EXPECTED r-squared is indeed different.

• Phil Birnbaum says:

Oops. 2002-03, not 2003-04.

• Phil Birnbaum says:

Another oops: I am dumb. Going to 820 games with the method I used (as opposed to real life) won’t improve the r-squared. That’s because my method, by taking the actual W-L records as an unbiased estimate of talent, effectively locks in the binomial error in the original data, instead of reducing it.

I can explain that further later if anyone cares, but I’m having company over in five minutes.

• Phil Birnbaum says:

Oh, possible reason #4:

4. In the NBA, the better team wins the game much more often than in baseball or hockey (not sure about football). That means that there is less binomial variance, because the approximation “if a team is .600, assume it has a 60% chance of winning each individual game” is more false than usual. If there is less binomial (random) variance, that means there is less reduction in variance when you average out multiple seasons.

When I have a chance, I’ll run the same two-season test for baseball. I bet it’ll be different. But don’t trust me until you see it. 🙂

9. Phil Birnbaum says:

BTW, the comment threading on this WordPress template is confusing, and some of my comments are in the wrong place. Just to emphasize, my comment of Jan. 9 at 2:47 still stands. That’s this one:

https://sportskeptic.wordpress.com/2011/01/07/more-on-r-squared/#comment-259

That comment is my theoretical (but non-rigorous) explanation of why the r-squared changes with the number of games.

10. So to parse all that:
There’s a positively sloped correlation between wins and salary but it’s a weak one. So the NBA salary marketplace is inefficient (much like baseball’s used to be). Sounds familiar.

• Guy says:

Arturo: If the NBA salary marketplace is so inefficient, why does your own data show a correlation between salary and player Wins Produced of .73? When you consider the artificially low salaries for younger players, and injuries no one can predict, that seems like an extremely strong relationship.

And you might have found time to at least acknowledge that Phil just showed you were wrong when you said above that “a small sample might give you an indication of the [correlation] but it might be higher or lower.” Sheesh…..

11. Oh and by the way if you want a more realistic looking correlation over the last ten years: Take out the Knicks as an outlier.

12. Phil Birnbaum says:

OK, here’s what I think is really going on.

You can decompose the variance of wins into three (mostly) independent components:

1. Talent due to salary;
2. Talent of players above and beyond salary (e.g. players paid more than they’re worth, players paid less than they’re worth); and
3. Binomial randomness in game outcomes.

From the regression equation of the 02-03 data, we get #1 is an SD due to salary of about 4.2 games (variance 17). [BTW, the regression came up with 1 extra win per each extra \$3.3 million spent.]

From the binomial approximation to normal, we get that the binomial SD #3 is about 4.5 games (variance 21).

What about #2? Alex reports that the correlation last year between salary and wins was .465. By trial and error, I determined that a “talent beyond salary” SD of about 6 games (variance 35) led to a simulated value of around .465.

Summary so far:

Variance due to salary: 17
Variance due to additional talent: 35
Variance due to binomial error: 21
TOTAL: 73

So, the r-squared for a season due to salary should be 17/73, which is about .23. The square root of that, r, is about .48, which is similar to the .465 that Alex found for last year.

If you increase the number of games from 81 to infinity, the “variance due to binomial” per 81 games drops to zero. That’s because, if you play an infinite number of games, the law of large numbers applies and the team will play exactly to its talent.

That reduces the total variance from 73 to 52. So the theoretical maximum r-squared you can get from this simulation is now 17/52, which is .33 (or an r of .57).

And that’s what happens in my simulation. Here are the r’s for various season sizes, over at least 1,000 simulated seasons:

0.47: 82 games per season
0.42: 41 games per season
0.27: 10 games per season
0.10: 1 game per season

0.51: 164 games per season
0.55: 1640 games per season

No surprise: this is exactly as the model predicted.

And this is what I mean when I say, the r and r-squared depend on the number of games in the season.

• EvanZ says:

Thanks, Phil. That was illuminating.

• Alex says:

I generally agree (I’m sending you an email now). If I could make one small correction, I would say ‘salary due to true talent’, not the other way around. Wins come from talent, and talent can be connected to accurate allocation of salary as well as be unconnected to salary (due to incorrect evaluation). These are still your #1 and #2, but I would portray them a little differently.

13. Guy says:

Alex: Phil’s nice breakdown here also provides you with a helpful way to think about the issue of y-t-y correlation of player statistics we were talking about in another thread. The correlation we expect for any statistic — if there is no change in player ability at all — is a function of sample size and the underlying variance in ability to generate the statistic. If we take Phil’s example above:
Variance due to player talent: 52
Variance due to binomial error: 21
we would expect team wins in year 1 and year 2 to have an r of .84 (sqrt(52/73)), if teams’ true ability were entirely unchanged. However, if the SD of true team win% were just .050, instead of the .088 Phil found (i.e. the differences in team strength were much narrower), then the y-t-y correlation would fall to just .67. Alternatively, if we compare a team’s first-half and second-half record, we would expect an r of just .75. So you can see that the y-t-y correlation of a given statistic will depend both upon sample size AND the variance, even before we get to actual changes in ability that may occur.

• Alex says:

Fair enough, but none of the year-to-year correlations vary sample size. They’re always based on a full season of data. I suppose it could be the case that players have different sample sizes in terms of playing time or opportunities (e.g. players on teams that miss a lot of shots have a larger ‘sample size’ for rebounding compared to players on teams that don’t miss as many shots), but that applies equally to all the statistics (rebounds, assists, etc).

• Guy says:

“Fair enough, but none of the year-to-year correlations vary sample size.”

Sometimes they do. The average player takes 12 FGA2 per 48 minutes, but only 5 FTA and 4 FGA3. And he has 82 rebounding opportunities (arguably — that’s complicated). Estimating opportunities for turnovers is very complex, but clearly is a function of how often a player handles the ball. So I would say sample size varies.

But I would agree that the more important factor is the underlying variance. For rebounds, it’s huge. So if Marcus Camby gets 300 rebounds more than the average center, but takes 150 of those from teammates (just for illustration), he’s only taking about 20 rebounds each from 7 different guys. That will have very little effect on correlation for rebounds, but I’m sure you would agree it’s a very large diminishing return rate. I’m not saying this is evidence of DR, only that it’s entirely possible to have a high DR rate AND a high y-t-y correlation, if the underlying variance is high.

• Alex says:

I take it that, like Phil, you’re unimpressed that WP48 has a correlation of .98 with a version of WP48 where players have half of their rebounds taken away and given to their teammates?

Also, do you mean that a statistic could not have diminishing returns if it has a low variance? Or are you saying that the correlation and DR are independent?

14. Guy says:

Alex: I’m not saying that DR and correlation are completely independent. High DR should, all other things equal, mean a lower y-t-y correlation. But a high variance can produce an r that looks “high” even in the presence of DR, especially if players don’t change teams a lot. Plus you have the issue that even players who change teams will often be asked to play a similar rebounding role as in the past. (You don’t trade for Camby and tell him to stop rebounding.) I assume teams DO try to have the best rebounders take on that responsibility. So I don’t doubt at all that high-Reb48 players are in general better rebounders than low-reb48 guys — it’s just that the raw difference in Reb48 greatly overstates the talent difference.

On Berri’s .98 WP48 correlation after reducing the value of dreb by .5, yes, I’m underwhelmed He’s made the claim for years that changing coefficients makes “little difference,” and it’s deeply misleading for two reasons. First, it DOES make a difference. Look at his table. Marcus Camby loses four wins in revised WP. That’s about \$7 million in value in the NBA. And that’s what you get from a partial recognition of DR on defensive rebounds only. If you used Hollinger’s coefficients (or Evan Z’s), some of these big rebounders would have their WP reduced by 7 or 8 wins. That difference takes a team from .500 to .600. I can’t imagine you agree with Berri’s claim that this “doesn’t matter.”

Second, the high correlation tells us mainly that the ranking remains similar, while ignoring the magnitude of player differences (neat trick by Berri, don’t you think?). But that only tells us that one of two things is true: either rebounds play a very small role in determining WP48 (so changing coefficient doesn’t matter), OR rebounds play a very large role in determining WP48 (so rankings remain similar even though the coefficient changes). If rebounds explain a lot of WP48, and scoring plays a small role, then reducing the value of rebounds can’t change the rankings very much — the top players will still mainly be high Reb48 players. Think about it this way: if the final exam counts for 80% of your grade and the mid-term 20%, and the professor decides to reduce all the final exam scores by 50%, the final exam will still acount for 67% of your final grade. For “final exam,” substitute “rebounds.” So the lack of change in the rankings with new coefficients is just as much evidence that WP48 is too dominated by rebounds as it is evidence on the other side.

Also, isn’t this a very weird “defense” of WP? Berri’s complete argument is now “Rebounding is an extremely important skill. My rebound coefficients are right, and metrics with different coefficients are badly flawed. But by the way, you’ll basically get the same results no matter what coefficients you use.” How is this a coherent position?

Last thought: Reb48 is highly correlated with eFG%opp at the team level. So WP is effectively assigning credit for good defense to the guys who grab the drebs, even though there is no evidence they deserve that credit. (If you have two guys with equal dreb%, the one on the better defensive team will get more drebs.) That’s why it is so bizarre that Berri brags that his team defensive adjustment hardly changes player WP48 at all. Teams vary a lot in forcing missed shots by opponents — this determines about 25% of wins in the NBA — so a team defensive adjustment SHOULD impact player ratings. The reason it makes relatively little difference in WP is that WP has already captured much of this info through the back door of drebs — but allocated credit for it to the rebounders.

• Alex says:

I disagree about how much the rebound adjustment matters. If the correlation is that high, it means the relative ordering is very much intact. If I’m choosing between Camby and Dwight, I still take Dwight. It isn’t perfect, but it’s very close. It also does account for the magnitude of differences. If you wanted to ignore them, you could use the player rankings instead of their WP48’s and calculate the Spearman correlation. And as I’ve asked Evan in the comments, of course his model (or Hollinger’s) produces different ratings. It’s a different model. The question is, what makes their ratings right?

The FAQ says that defensive rebound % only has a -.2 correlation with opponent eFG, and in my data total defensive rebounds has a correlation of about -.5 with opponent PPS or TFG%. I guess we can argue about what a ‘high’ correlation is, or which measure of rebounding/shooting is better, but either way it tells us that the majority of differences in rebounding across teams has to do with something other than how well they get teams to miss. The team defense adjustment is also calculated at the team level and then carried down to the player level, as with most of the rest of the model. So if you think the team-level model is ok (which I think you’ve said before), I don’t know why this would become a complaint. Rebounds and the stuff in team defense are present in the team model, so their relative importance is already set. You could complain separately about how they’re carried down to the player level (and you have!), but it’s a different issue.

• EvanZ says:

Alex, it’s got to make quite a big difference. How else do you explain Jason Kidd as the fourth highest rated PG according to WP, when he’s 11th in my model? He’s shooting below 0.500 TS(!), which is really, really awful. Supposedly, TS is the biggest component of WP, but Kidd’s WP48 is still at 0.240 last I checked. Don’t you think that’s too high? The only explanation I can give is that he still rebounds well for a point guard.

• Alex says:

He’s 32nd in PER; don’t you think 11 is too high? The only explanation I give is that your model probably weights something too highly.

He might be rated highly in WP and your model because he’s 5th in the league for point guards in assists per 48 and 3rd in assist/TO ratio, so he takes care of the ball very well. He’s third-last in fouls, 13th in blocks, 7th in steals, and he would be 8th in free throw % except he doesn’t get to the line enough. So about the only thing he doesn’t do extremely well is shoot accurately or often; he doesn’t even qualify to show up in the ESPN stat page because he isn’t on pace to shoot enough.

15. Guy says:

Alex: Is it really your view that changing player valuations 30% or more makes “little difference?” The goal isn’t just a rank ordering — the NBA isn’t a fantasy snake draft. We need to know how much more to pay Dwight than Camby. We need to know if it makes sense to trade players A and B for C. We need to know, in other words, HOW MUCH better one player is than another.

You say “of course” different models produce different ratings, but Berri’s argument is that using Hollinger’s coefficients makes “little difference.” He said that, even though using those coefficients reduced Ben Wallace’s value by 22% (the player he was discussing) while increasing the value of a player like Reggie Miller by about 30%. To me, this is not a recognizable use of the English language.

And I don’t understand why you say “either way it tells us that the majority of differences in rebounding across teams has to do with something other than how well they get teams to miss.” This is not true. A 1 SD change in dreb% adds about 60 defensive rebounds, while a 1 SD decline in FG%opp adds about 75 rebounds. Both matter, but making opponents miss matters a bit more. And surely you agree that giving full credit for each dreb to the rebounder effectively credits him with a lot of the value of opponent misses.

• Alex says:

When you cherry-pick examples it sounds like a big difference, but you can look even at the short list provided in the FAQ and see that 11 of the 20 don’t change by more than 2 wins. And these are all the best players who have the most wins to lose. In fact, none of them gain in WP48, even a guy like Steve Nash. If guards don’t benefit from getting free rebounds, who does? All this adjustment does, after all, is take rebounds from big men and give them to guards.

Practically speaking, the relative value is at least more important, I would say. When a guy comes up for free agency and demands a max contract I don’t think owners decide ‘does he deserve \$120 million?’. I think they ask ‘is he as good as the other guys with max contracts?’.

As I’ve said before, and I’m sure you’re aware, R squared is defined as the amount of variance in Y explained by the model. So even if you find a correlation as high as .7, the R squared is only .49 and you’ve only explained half of why your variable differs across observations. If DRB% has a correlation of -.2 with opponent eFG, then getting a team to miss shots only explains 4% of why teams differ in their rebound percentage. 4%! That leaves 96% of the variance to be explained by noise and talent. So even if you thought team DRB% was 92% noise, just dumb luck, getting opponents to miss shots and your team’s ability to rebound would still be equally important.

• Guy says:

Alex: Remember that the 4-win change for top rebounders — which I still submit is a very large change in both basketball and financial terms — is when Berri provides only a partial DR correction and only for drebs. If you used the Hollinger coefficients — which Berri also claims doesn’t matter much — the changes would be even larger. I don’t know how anyone can say these are little differences. Berri routinely calculates WP48 to three decimal places, but now it’s OK if these ratings are only accurate to within +/-30%? If that’s the best WP can do, you may as well rate players by points per game and save a lot of time. But I think we should set the bar higher.

You’ve gotten mixed up on the rebound issue. You’re talking about reb%, but WP uses Reb48 — very different. My point was that using reb48 will result in giving much of the credit for opponent missed shots to the player getting the rebound. That would not be true to the same extent for reb% (although it appears to be true that causing opponents to miss a lot also increases reb%). But again, WP relies on reb48, not reb%. So my argument stands.

• Alex says:

I mentioned the correlation for total rebounds as -.5; I imagine it would be fairly similar for Reb48. So you’re explaining 25% of team variability. That suggests to me at least that teams vary a lot in their talent for getting rebounds beyond their ability to make the opponent miss, which implies that players vary a lot in their talent for rebounds. Is that not fair?

I’m also curious if anyone has checked the flexibility of weights for PER or statistical +/-. Most models have a certain amount of flexibility; do you get vast changes in other rating systems if you change how they weight rebounds, or assists, or what have you?

16. Pingback: Arturo's Silly Little Stats

17. Pingback: Hiatus | Sport Skeptic