I’ve emphasized using R squared a fair amount on the blog, and I didn’t think it would be a big deal. But apparently it’s more controversial than I thought. So this post is about why you should always check your regression’s R squared before worrying about the significance of the variables in the regression.

First we have something you might remember from your intro to regression class or the stats book you taught yourself from: in a simple regression (one variable of interest, one predictor), the significance test for the predictor is the same thing as the significance test for the entire regression. And this makes sense; you only have one predictor, so if it isn’t significant, the regression shouldn’t be either because then you only have an intercept left. But it also means we can talk about R squared and the test of the regression and it means the same thing as talking about R squared and the test of the predictor.

There’s a straightforward equation that connects the F value of the regression (which you would compare to some critical F value to check for significance) to R squared. It’s F = R* (N-p-1) / p*(1-R). R there just means R squared (for simplicity), N is the sample size, and p is the number of predictors. Since we’re talking about simple regression, p is always 1. So we can break the equation into two parts: R/(1-R) and N-2. You multiply those to get your F value and check to see if it’s significant. For example, let’s say you ran a regression that had 50 data points (N=50) and an R squared of .5. You have .5/(1-.5) * (50-2) = 48. You would check a table in your book (or get the value out of your stats software) and see that to be significant at the usual 5% level, the F would need to be 4.04 or greater. Since we’re above that, our regression is significant, which means the predictor is significant. The R squared is .5 which means X explains half of the variance in Y, or tells us about half of why people (or teams, or whatever) vary in their Y values. If Y is weight and X is height and the R squared were .5, we would know that half of why people have different weights is due to them having different heights.

So why did I mention breaking F into two parts? Because one is related to how much explanatory power you have (the R/(1-R) part), and the other is just sample size. If R squared is very tiny, like close to 0, then the F will generally be close to 0 and you will not have a significant regression (with an exception coming up in a minute). If it’s large, you will tend to have a large F; say R squared is .9, then that half of F is .9/(1-.9) = 9. If you multiply by 9, you don’t need much of a sample size to get a significant result. But what happens if you have a lot of data? Say you’re looking at data for each team in the league since the Bobcats joined in 2004. You have six complete seasons of data for 30 teams, or 180 data points. The critical F value for a regression with N=180 and one predictor is 3.89. We know that the N-2 part is 178. That means to get a significant result (after a little algebra), you only need an R squared of about .02 or better. That’s a pretty low bar; your predictor only needs to explain 2% of the variance in your variable of interest. Here’s a graph that shows the R squared necessary for a significant regression at different sample sizes:

You can see that with a low sample size (N=5), you need a fairly high R squared of .77 to get a significant regression (and thus predictor). But as the sample size goes up, the R squared drops very quickly.

So why am I emphasizing R squared? Because it tells you how much of the variance you are explaining; in other words, it tells you how good the predictions you make will be. Let’s look at three different situations, one where the data is pretty clean, one where it’s noisier, and one where it’s very noisy. I created some random numbers to be the predictor variable (X) then created our variable of interest (Y) from the formula Y = 2*X +5 + noise. The bigger the standard deviation of the noise, the more noise there is. The graphs below show you the data for each level of noise along with the line that best fits that data; the R squared and regression equation are in the title.

You can see the effect of having noisy data; the y values become more and more spread out relative to the regression line. The regression equations change a bit, but each of them have a significant predictor and contain the actual slope value (2) in their 95% confidence intervals (the high noise in the third case actually drove the mean of Y down that far away from 5). Besides just looking at the spread of the data around the regression line, we can quantify how good the predictions made by each model are. The typical comparison is between the model you fit (e.g. the regression’s predicted values) and a model that uses only the intercept; that is, you predict the mean of Y every time. Then you could find the residual, which is how wrong each prediction is. For example, in the low noise data the lowest X value is -2.795. The actual Y value that goes with that X is -.507. The regression predicts -.631, so the residual is .124. Just using the intercept, we would guess the mean of Y which is 4.63, which is obviously much farther away from the correct value. If we square all the residuals, add them up, and divide by 100 to get the mean squared error, we get .0767 for the low noise regression; it’s usually pretty accurate. The mean squared error for the intercept is 3.86, so we’ve improved our predictions (by this measure, at least) by about 4900% by using X to guess Y. In the medium noise condition, our predictions improve by about 110%. In the high noise condition, however, our predictions only improve by about 7.5%. As the noise increases, we gain less and less by using X to make predictions compared to simply guessing the average of Y. The more those two predictions are the same, the less useful X is. But remember, in each case the coefficient for X was appropriate (not significantly different from the true value of 2) and significant (in the high noise case it had a p value of .0078 even though there were only 100 data points).

There are two things you should consider when you evaluate an analysis you ran. One is if you got statistically significant results. As I noted above, this becomes more and more likely simply by having more data. The smallest data set I can envision in sports research would be across teams, where you would only have 30 data points. But if you look across players in a season you’ll likely have at least 400 points (depending on any minute cut-offs you might use); if you look across line-ups that play together in a season you’ll have at least 200 (again depending on cut-offs). The sample sizes are usually fairly big and the significance thresholds thus relatively low.

The other thing you need to evaluate is how much your variable matters. Do you really know much more about Y if you know what X is? I showed above that noise in Y can lower the usefulness of X, but it also comes simply from the connection between X and Y. If you’re trying to predict people’s weights, it would be much better to use X=their height than X=their salary. The R squared tells you how well-connected X is to Y. If the R squared is very low, meaning that X isn’t connected to Y, it’s a sign that using X to predict Y simply isn’t buying you much; you might as well use the mean of Y for every guess. Unlike significance tests, there aren’t thresholds for R squared to tell you if your value is ‘good enough’. In physics, which has precise measurements in controlled environments, R squared values are typically over .9 and sometimes virtually 1. In psychology or sports, where even if measurements are precise the environment is noisy and there are often other influences that could affect how X and Y are connected, R squared values of .5 or .3 might be good. But R squared has a fixed range, from 0 to 1 (it’s a proportion; in percentage terms you can explain 0% of Y up to 100% of Y). And I think everyone can agree that there simply isn’t a lot going on if you can only explain, say, 5% of Y. If that’s the case and you have a significant predictor, especially if you have a good amount of data, you have to temper your enthusiasm about any conclusions you want to draw.

Good post. This has no comments. I don’t know why. I like this blog. Always good work.

Thanks. I think it’s because people aren’t nearly so willing to read things if they aren’t directly related to sports or inflammatory. I should have titled it “Eli W is an idiot” (which I don’t think is true) and badmouthed the research on usage and efficiency. Then everyone would have skimmed the stats part and called me names at least.

Good post. I really like simulating different levels of noise in the data. When working on the lineup level, you are always going to be in a situation like your third graph, or worse. But I still think one can learn a lot in these cases, even if the R^2 is very low. If your goal is simply to use the results you get to predict the dependent variable, then obviously your error will be great. But sometimes that’s not the primary goal. In the usage/efficiency example, certainly the specific coefficient found is just a first guess estimate that could surely be improved upon (by increasing the sample size, adding omitted variables, etc.). But just being able to find a negative relationship between usage and efficiency (regardless of the actual size) was a big step, considering most past studies (including those Berri has cited in his books and on his blog) had methodological issues that led to them finding a positive relationship between the two.

I am a big proponent of doing things the right way when possible (even before Larry Brown came to the Pistons). Has anyone repeated your analysis with more data? I imagine if you collected line-ups from across three years or so that there would be plenty of usable data even if you had a 100-possession cutoff (maybe even higher).

Ryan Parker did reproduce my study on a larger data set from a different season – http://www.basketballgeek.com/2009/10/25/individual-offensive-efficiency-ratings-extracted-from-play-by-play-data/

Alex: This is a neat exercise, but I fear you’ve drawn exactly the opposite conclusion of what it actually shows you. The three graphs show that you can measure the relationship between X and Y fairly well even with a very low R^2. In every case, you are able to see and measure that relationship. If you can do that with an R^2 of .07 and also with an R^2 of .98, then your lesson is precicely that “R-squared is NOT the important part.” And in this case, the R^2 is actually 100% irrelevant, because you’ve already told us that all the remaining variance is just noise. So who cares how much of that variance you “explain?” By definition, it’s not explainable! In your example, the model has explained 100% of the variance which is possible to explain, so it’s actually a perfect model.

And even where the remaining variance is not random, we still may not care about R^2. Let’s say Y is risk of heart disease and X is smoking status, and your model tells me that smoking raises my likelihood of heart disease by 10 percentage points. That tells me everything I want to know — the relevant question now is whether a 10% chance of getting heart disease is worth the pleasure derived from smoking. I don’t care if the model explains 2%, 20%, or 95% of the variance in heart disease risk — that is irrelevant. Now, if the R^2 is .04, it does tell me that there are a lot of OTHER risk factors for heart disease. So I might want to know what those other risks are too. But if all I care about at the moment is deciding whether to smoke — or the relationship between usage and efficiency — then I don’t care about the R^2.

Let’s say I found a significant coefficient for eye color when predicting someone’s weight, and an R squared of .0001. Would you be happy to say that eye color has a relationship with weight? Are you unconcerned with the fact that you are more likely to find significant coefficients simply by having a big sample size?

Or, I’ll ask a different question – how important/informative can X be if we predict Y virtually as well when ignoring X?

The point is that you need to use both the r2 and the regression stat. you have to use both tools to interpret your data accurately. Alex does say this in the post above, which is really useful and clear. I think you are missing some of the very useful information he’s given. don’t reject the tool just because it isn’t as easy as just blethering on about the regression. Proper science is done by assessing both bits of information, and if you don’t want to do proper science, don’t do any at all.

Alex, do you even read these comments? You always reply with questions — which you seem to think have obvious answers, though often they don’t — but you don’t address the arguments and examples presented. I have explained clearly how and why X could be important despite not predicting a large share of variance (as has Eli). Or better yet, did you even read your own post? You have provided an excellent example of how the relationship between X and Y can remain exactly the same while R^2 changes dramatically. Do you really not see that you proved the opposite of the point you were trying to make? (is that a specialty of yours?) The relationship between X and Y is always the same in your models. So it’s either an important relationship or not, but that answer has zero to do with the R^2. Indeed, if I wanted to teach someone that R^2 can sometimes be misleading, or at least not important, I would send them to this post! (To see the data — not your curious interpretation of it.)

I can’t answer your first question because you haven’t given coefficients. If having green eyes adds 15 pounds, then yes, there’s a relationship. If it adds 15 grams, then no. The answer depends on whether the relationship is meaningful in the real world. Eli’s model suggested that a 30% usage player adds about 4 wins to his team by increasing the efficiency of his teammates. Does that mean usage “matters?” I’d say yes, but maybe you would choose to argue 4 wins isn’t a big deal. The point is, it’s that relationship you should care about.

If it adds 15 pounds it’s significant, even if people vary by 75 pounds in their weights? What if they vary by 15 grams but the variance is 5 grams? Usually you’re very attuned to the issue of variance, but you’re ignoring it in this case. If data is very noisy, you simply can’t feel that strongly about what the coefficients tell you.

I can also simulate data where there’s no connection between X and Y and yet I get a significant coefficient – and yes, the R squared will be small. It isn’t difficult or contrived. The issue is that if your data is that noisy, you simply can’t be that confident about your conclusions. That’s all. Do you think that isn’t the case?

Guy,

Maybe I’m wrong, but this seems like a correlation vs causation argument. There’s a strong link between ice cream and drowning, but I don’t think limiting ice cream sales will stop people from swimming in the summer.

“If data is very noisy, you simply can’t feel that strongly about what the coefficients tell you. ”

Why?

By noisy, I mean unpredictable. More or less by definition, if something is unpredictable you can’t feel that good about what the coefficients predict. Eli called the data in his usage analysis ‘noisy’, I assume that’s what he meant as well.

Ah, now I see where you’ve gone wrong. You assume your conclusion is true “by definition.” That’s why you keep repeating it, and asking incredulously whether people disagree with it. Until you are willing to consider the possibility that your assumption is not true by definition (which it isn’t), it’s hard to have a productive discussion.

Maybe it would help you to imagine what Eli would find if he had a sample of 3,000,000 NBA lineups, each of them still representing small samples in most cases. He would still have an R^2 of about .04. Would you continue to reject his findings on that basis, if his model reported the same coefficient? It seems that you would. And that would be crazy — at that point, we would be quite certain his coefficient was correct.

I would actually feel safer in assuming that there’s no connection if he had an R squared of .04 with 3 million lineups. To have that tiny an effect with that much data, there’s no way it could be an important effect. Maybe it would make more sense if we talked about it in terms of effect size? I’m not arguing that an effect doesn’t exist, I’m arguing that if it exists it’s very small.

Define “very small.” Is 4 wins in the NBA “very small?” Why?

And please God don’t tell me that “small” depends on the variance, because then your argument is simply a tautology.

I don’t think using imprecise phrases like “important”, “confident”, “feel good about”, and “not a lot going on” helps clarify things. Better to stay rooted to defined statistical concepts. There is a well-defined statistical measure of uncertainty on the value of a coefficient estimate in a regression – it’s the coefficient’s standard error. You seem to be arguing that instead of looking to that measure, one should look at the overall R^2 of the regression (or at least that in cases where the R^2 is “low”, one should give less heed to the coefficient SE). But the coefficient SE already takes that into account. The equation for the standard error of a coefficient can be broken down into three terms that are multiplied together: sigma * sqrt(1/SST) * sqrt(VIF) (see here – http://en.wikipedia.org/wiki/Variance_inflation_factor). The sigma term is sometimes called the standard error of the regression. Sigma = sqrt( sum(residuals^2)/(n-k-1) ). If the overall regression fit is poor, the residuals will be large, which means sigma will be large, which means the coefficient SE will be large. So the overall fit of the regression definitely does impact the confidence one should have in the coefficient estimates – but that is already taken into account in the value of the coefficient SE. So in the usage/efficiency case, or your third noise-heavy simulation, what’s relevant is that despite the low R^2 values, the coefficient estimates were still over twice as large as the coefficient standard errors.

The standard error of the estimate is also influenced by the sample size. In Guy’s example where your analysis had 3 million lineups, a significant coefficient is all but assured. There’s virtually no way, with any amount of noise or variability, that the coefficient would not be significant. The only way you could evaluate the importance of the coefficient is to ask how much of an impact it has, and the statistical measure of that is effect size. In this case I would prefer R squared (or R squared/1-R squared if you want to be picky); maybe you have something else you like. But the coefficient and its error is not a good measure of the quality of a predictor, especially when N is high.

Hey, guys. This is what power analysis is for. Let’s decide on an “effect size” a priori (or posteriori) and figure out how many samples are needed. For example, assuming power = 0.80 (standard assumption), and R^2 = 0.1, you need N=79. As R^2 decreases you need higher and higher N to maintain the same power. So, for R^2 = 0.01, you need N=768. There’s no real controversy here.

http://www.danielsoper.com/statcalc/calc01.aspx

We aren’t really worried about if the analysis is underpowered or not – the sample size is plenty large (although complicated by being a weighted regression). The question is if the effect size he found is ‘large’ enough. Eli and Guy are arguing that the effect size isn’t important, so I don’t know if you can get them to pick a number to put in the power analysis anyway.

There is no statistical answer to the substantive question of whether the coefficient is “substantial” or “small.” It’s a basketball question: does the magnitude of the effect Eli found matter in practical basketball terms? Or in my heart disease example, in life-and-death terms: does a 10% chance of getting heart disease “matter?” The R^2 can’t possibly answer these questions.

The small kernel of truth in what Alex is saying is that there will often be some relationship between the amount of variance in Y and whether we consider our effect to be “substantial.” But in that case it’s the TRUE variance in Y that is our benchmark, not the variance in our sample. In the case of Eli’s model, it’s fair to ask if the usage effect is large relative to the differences in true offensive talent among lineups/teams. And I would say the answer is yes. Alex’s mistake is equating the variance in the sample with true variance. In a lineup study, 90% or more of the variance is noise. That is totally irrelevant for determining substantive effect size (as Alex’s post nicely demonstrates). If you calculated the proportion of non-noise variance that usage accounts for, that would be a relevant measure — and it would be much higher than .04.

The other complication in this case is that teams don’t often force players to go very far from their normal usage rate. So the range of usages we can study is very constrained, and that limits the amount of variance created by usage differences among lineups. But that itself is likely evidence that coaches recognize a usage/efficiency tradeoff. So the key basketball question is not how much of the current variance is accounted for by differences in lineup usages, but rather how much would various players lose/gain at 20% usage?

So what you’re saying is that I can run an analysis on data that’s ridiculously noisy and nearly impossible to pull something from, but if I do pull something from it, it will be meaningful?

Are you interested in another wager? Eli gave a link to someone who followed up his study and found an even smaller effect (although of course significant; no sample size or other info listed). Can you find someone else who can run the same analysis on a full two or three year period? I bet 1) the R squared will be under .05 2) the coefficient for usage will be highly significant 3) the coefficient will be smaller than the .25 that Eli found.

Let’s say you find those 3 things. What do you conclude?

That the effect is weak at best, and not a big deal if it does exist. Which is what I’ve been saying all along.

Alternatively, does anyone have a good prediction model that uses the usage/efficiency trade-off? How much accuracy does it lose if that part is taken out? I’m guessing not a lot.

DSMok1 has it in his ASPM model. Raise this thread from the dead and ask him (I would be interested):

http://www.sonicscentral.com/apbrmetrics/viewtopic.php?t=2603

“That the effect is weak at best, and not a big deal if it does exist. Which is what I’ve been saying all along. ”

Yes, you keep saying it, but half of it is unambiguously wrong, and the other half you refuse to explain or provide evidence for. First, if it is “highly significant” then it definitely exists. And with all that data the SE will be smaller, so we will have a lot of confidence in the accuracy of our new coefficient. Second, why is a coefficient less than .25 “not a big deal?” (I couldn’t find a definition for this phrase in my statistics textbook.) What size coefficient would be “a big deal,” and why? Unless/until you answer these questions, the discussion is pointless.

Here’s another question for you to ponder. Suppose Eli had used each lineup’s offensive efficiency as his dependent variable, and included each player’s offensive rating as well as lineup usage as the independent variables. He would presumably get the same coefficient and SE for usage, but would now have a much higher R^2. Would you now have more confidence in his result because he reported a higher R^2. Why?

You have a stats textbook? I’m a little shocked. Mine don’t say anything about using the significance of a coefficient when evaluating a model. It also tells me that statistics in general, and p values especially, are probabilities; nothing ‘definitely exists’. It also tells me that with sample size as large as it is in this case, a coefficient is virtually always going to be significant. As in, you would have to try hard to find something so unconnected to your variable that it would not be significant.

To be more clear, I don’t think any coefficient that comes out on an analysis with such a small R squared is a big deal. It’s my technical term for when I think something is largely irrelevant and ok to ignore. If I’m wrong, I’m not missing out on much. I predicted a smaller coefficient in a bigger analysis because the two posts I’ve read lead me to think it’s likely smaller than what Eli found. No size coefficient would be large enough to be a ‘big deal’ unless the model seemed reasonably specified. With the R squared this small, I don’t think it’s a good model. If usage did a better job of explaining expected efficiency, then I would move on to evaluating the coefficient and deciding how important it is.

I’m not sure his R squared would go up that much if he separated out everyone’s rating, but let’s say it did. No, I wouldn’t feel better. I would want to know the partial R squared for usage, which I imagine would stay just as small. Were the R squared for usage to change that much, it would suggest fairly strong collinearity and then we’d have a more complicated story to tell anyway.

After all this, you just repeat your tautological argument: an effect is not important because the R^2 is low. I have to assume you simply aren’t capable of explaining why you think this is true. Which makes sense, since it’s hard to imagine a logical argument behind it. So, there is really nothing to discuss…..

By tautology do you mean proper? You evaluate the quality of a linear regression by its R squared. Low R squared means crappy model, and hence you can’t feel strongly about what it tells you. It doesn’t mean the conclusions you draw from it must be wrong, it just means you wouldn’t lose very much by ignoring those conclusions either. Can you tell me why I should trust the predictions made by the model and dismiss the fact that the model makes wildly inaccurate predictions?

I’ll repeat my point about power analysis. It should be obvious, though.

Alex is saying that as you increase the sample size (N) for two non-correlated uniform random variables, the p-value decreases. This is true. At some “large enough” N, the correlation will be deemed statistically significant (let’s say p<0.05), with some very low R^2. Let's call that *R^2*. If we know what that *R^2* is, say 0.01 (1%), then we can decide that we will only call the correlation significant if we get 10X *R^*2, or R^2=0.1. (10%). Now, if you get that level of R^2, you will be satisfied that it is not simply due to randomness. Power analysis simply helps you decide what N you should be using in the first place.

I got your point Evan. Let’s say we all agree that usage has a small effect on efficiency, which it doesn’t sound like Guy wants to do. Eli has plenty of power to detect his small effect. Power analysis does assume that the null is false, so maybe Guy will like that part of it. In either case, the power of the regression doesn’t tell us any more about how ‘large’ or ‘important’ the effect is. The regression can find tiny effects; that’s what I’m saying has been found.

I’m still not sure you do, so just to be clear…

“The regression can find tiny effects; that’s what I’m saying has been found.”

Here’s what I’m saying. If you take N number of non-correlated samples (use the same N that Eli used), what would be the R^2? I am suggesting it would be much, much lower than what Eli found. That is what is important here.

Why is that the important part? The question isn’t if the regression is significant (it is), so the R squared is above 0. The question is how much does the regression inform what we know? For that we need the effect size. The effect size is small. Is it so small as to be 0? No, but your variables would have to be completely uncorrelated to get 0.

Evan, since you’re at least willing to talk about effect size, let me ask. If you ran an analysis and got such a poor fit, how much would you trust the results? If you were advising a GM, would you tell him to give a max deal to a guy with high usage so that he would make all your low/medium usage players better?

Alex, just do the test I suggested. Take N non-correlated uniformly distributed random variables and do the significance testing. N should be whatever size Eli got. What’s the p-value for that test? What’s the R^2?

As to this question:

“If you ran an analysis and got such a poor fit, how much would you trust the results?”

I would trust the results if I knew the power of the test was high.

You get 5% false alarms, just as you would expect. The average R squared and average p value are smaller and larger than Eli found, but you do get samples with larger R squared and smaller p values. I’m not sure I see the point; can you know when you’re in a situation with a real result as opposed to a false alarm? It still doesn’t address the issue of how important an effect it is.

The power of the test is high, as it must be for a regression with the sample size it has. So I would assume you’ll incorporate usage as an interaction term in ezPM 3.0? You’re vastly underrating Monta Ellis. Guy will tell you, with all those possessions he uses he’s worth nearly four wins more than you think.

“So I would assume you’ll incorporate usage as an interaction term in ezPM 3.0?”

I’ve been thinking about how to do it best, but yes, I actually will at some point try to account for usage. The question is not whether, but how.

You want to know why?

Dirk Nowitzki. I have him pegged at +1.65. His offensive ezPM is 3.65. That doesn’t pass the laugh test in my opinion (OMG, calling out my own metric!). When he was out, Dallas sucked. Kidd had to take more shots. Kidd and Chandler are very low usage. Dirk is being undervalued by ezPM. His 2yr Adj. +/- is 10.68. His 1yr Adj. +/- is +16.38. When I see a disconnect like that, it makes me want to fix it, not defend it.

You’re going to change your model to account for the laugh test on one of the 400 players in the league? Dirk was out when the team played something like 9 games in 17 days (they had more than one day’s rest for one game) including the Spurs twice, Orlando, OKC, and Portland. That’s a pretty tough stretch. Butler got hurt, Marion missed a game, they started Ajinca twice after trying Cardinal. Seems like there’s a lot going on. Which doesn’t mean that Dirk wasn’t responsible for the drop off; just saying.

I don’t mind if you do though; I honestly hope it works better. Are you going to be making predictions going forward with all the different versions of the model to see if the improvements are actual improvements?

Alex, are you saying you don’t think there is a substantial usage/efficiency tradeoff? I thought you just had a methodological quarrel with Eli’s study. You don’t really doubt there’s a tradeoff, do you?

Mostly an interpretation quarrel, which leads me to have an issue with ‘substantial’. If I had to guess, I’d say there’s a small but reliable trade-off. But I’ve also only seen a year and a half (I guess? don’t know how much of the season was in Eli’s data) of numbers, and they weren’t even combined. If I were running a team and trying to get my line-ups to be more efficient I might look at usage, but I’d try and look at a lot of other things first.

Alex, if you were a GM you would probably look at usage first. Say you were considering trading Rose for Fields. If you were the Chicago GM, you would have to consider how you would replace Rose’s shots. That wouldn’t be the last thing you would check, would it?

Replacing shots and improving efficiency are two different concerns. If I were trading Rose my first concern would be who would bring up the ball, so instead of Fields let’s say Jason Kidd (takes about the same number of shots and same overall usage as Fields anyway). You’d obviously need to change the offense to account for the fact that Kidd isn’t going to run the pick and roll or just take his man off the dribble like Rose, but I feel pretty confident that Kidd can run a good offense and distribute the ball to his shooters. I’m pretty sure the Bulls would still get shots off.

But none of that really addresses the question of if I would expect the Bulls to shoot drastically worse because Rose isn’t there with his 30% + usage. The prediction in this case is that everyone who plays with Kidd instead of Rose (including Kidd) will be worse off on offense because they have to absorb the extra 15% or so of usage. That isn’t obvious to me. Rose is shooting pretty well this year so it would obviously hurt to lose him for a shooter as bad as Kidd, but if Kidd were a better shooter with his same usage I wouldn’t be that worried.

If I recall correctly, there was a usage issue in Philadelphia when Allen Iverson was traded for Andre Miller and then again when he was traded for Chauncey Billups. I wonder what happened in those two situations. Just two examples though.

Iguodala’s rookie season, his USG was 12.8%. His TS% was 58.0%. His sophomore season (the year before AI was traded), Iggy’s USG% was 14.7%. His TS% that season was 59.8%. The year AI was traded, Iggy’s USG jumped way up to 22.6%, and his TS% dropped to 56.2%. The following season his USG% went up to 23.8%, his TS% dropped to 54.3%. Basically, when AI left, defenses began to focus on Iggy who became their main threat. And his efficiency dropped and never got back to his first two seasons.

The point is you can’t just arbitrarily increase usage. There is almost always a tradeoff, except for the truly elite superstars. (The reason I’m so high on Stephen Curry is that he has very high usage – 24.7% – and efficiency 59.6%, but this is the exception, not the rule.)

Andre Igoudala is one example (although you could find more examples). Players tend to have seasons when they increase their usage a lot (tends to be third season), but the thing with high usage players leaving is that all the players slightly increase their usage. Not one player does a big jump. Using your argument though (usgae goes up, ts% goes down),

Sam Dalembert – usg% – +2.9 ts% – +1.3

Steven Hunter – usg% – -1.2 ts% – -2.5

Kyle Korver – usg% – +5.1 ts% – -0.8

Andre Miller – usg% – -0.8 ts% – -1.2

Kevin Ollie – usg% – +3.2 ts% – -1.0

Joe Smith – usg% – -1.5 ts% – -2.2

average change – usg% – +1.28 ts% – -1.1

These guys played over 1000 minutes. I didn’t include Willie Green who played 10 games the previous season. Most of these guys however were past their prime at the time. I think is a study is done, age should be considered and assists of the teammates. However, I doubt that will be done. Adjusting for different factors that could also have an effect makes too much sense.

Right, because most players don’t increase their usage that much during their career, it’s difficult to find a relationship. But when there are example (like Iguodala) of players who have significant jumps in their usage, it is generally the case that their efficiency goes down. Here’s the thing, it’s not random which players increase their usage. Coaches and players know who has the ability to create their own shots. Those players tend to have significantly higher usage.

It really has to be that way, because if it were true that any player could generally increase their usage without losing efficiency, we would see it. The Landry Fields and James Jones of the world would have 25-30% usage. There’s really no other rational way to explain why they don’t other than a usage/efficiency tradeoff. To argue against that is tantamount to saying everyone in the NBA is insane. (I guess that’s what Berri’s line of thought is, though.)

I don’t think that dberri argued that these guys would necessarily increase their usage by that much, but rather, the team as whole would increase their usage. So everybody would slightly increase their usage rather than everyone having a big jump.

It seems like increasing usage isn’t what has the big effect, but rather moving it up or down significantly from a certain level will drop your ts%. If Jason Kidd went to the bulls, and D Rose left, I doubt that he would take all those shots. I’d bet that the shots would go around.

I assume everyone on Denver shot better when AI got there then? And everyone on Philly shot better as Iggy absorbed the extra usage?

Alex, why would you assume that? Denver already had a high-usage player in Melo. If anything, the issue with AI going to Denver was diminishing returns.

You might also find it interesting that AI actually was more efficient in Denver. Maybe it had something to do with the fact that his USG actually went down the lowest of his career.

Thanks, though, you’ve actually helped illustrate the point.

Could also be coaching. Could be Dean Oliver. Could be altitude… maybe.

Also, wouldn’t AI be another person to increase the players’ efficiency? Why would it then be diminishing returns?

Then there’s Ron Artest. His usage went down 8.3% when he went to LA, which I think you’d agree is a big drop. His ts% went up .02% though. Not much oomph.

Also, what about the bulls when Michael left the first time? I’m not seeing this strong connection.

He still used 27%, which I don’t think you would call ‘low’. And it’s higher than what Billups used. Which means other Denver players should have dropped their usage and gotten more efficient as well. Billups, by the way, increased his usage in his first full year in Denver and also had a higher TS%. So I’m sure that means that the efficiency-usage connection can’t be true, right? One example must prove the case.

An analysis with all the relevant examples (at least for a year or so) has been run. It leads to a weak conclusion, in my opinion. If you feel differently, fine. I don’t think it’s going to change the world if my feelings run one way or the other.

Not to mention, Philly was a worse team without Iverson an with Miller.

Their offense with Iverson on the court ’06 was 109.8 per 100. The Miller led squad of ’07 was 106.9. That’s quite the drop. (They were slightly better defensively, but not enough to off-set the drop in offense).

Iverson on the floor committed less turnovers, attempted more free throws, and had a higher efg% than Miller on the floor the next year. And this all happened despite Iverson’s squad assisting less.

Clearly, as others had to increase their usage to cover for Iverson, their efficiency dropped. In theory, their efficiency would have dropped even if they maintained the Iverson squad’s efficiency. But they got worse, so the drop is more significant.

The Billups one is harder to measure because with Iverson off the court the team scored 104.5 per 100, and without Billups it scored 106. This nearly matches the bump by Billups unit of 2 from 112.5 Iverson squad to 114.4 for Billups.

The reason? The switch from Camby to Nene. those are where the 1.5 points come from.

Camby’s Usage was 13%, while Nene’s was 18%. And Nene wasn’t raising his usage, he simply got back on the floor. You have to look at the Billups swap as a Nene+Billups for Iverson+Camby swap. So the drop from Iverson’s usage to Billups was almost all inherited by replacing Camby by Nene, so the other players didn’t have to change anything (and Nene was a FAR more efficient player).

Of course, this leads us back to how overrated Camby was. 😀

Wait, what. I could’ve swore the 76ers were 5-17 before the trade and 30-30 after. How were they better with Iverson? Did I miss something?

You want to look at the same season? Not only is it a smaller sample of games, but the players were very different.

Iverson-Ollie-Iguodala-Webber-Dalembert

Iverson-Iguodala-Carney-Webber-Dalembert

Iverson-Ollie-Iguodala-Randolph-Dalembert

Iverson-Green-Iguodala-Korver-Dalembert

Those were Iverson’s top 4 lineups.

Miller-Green-Iguodala-Hunter-Dalembert

Miller-Iguodala-Carney-Hunter-Dalembert

Miller-Iguodala-Korver-Joe.Smith-Dalembert

Miller-Iguodala-Korver-Hunter-Dalembert

Andre’s top 4.

According to 82games.com, Miller didn’t play with Ollie or Webber and Iverson didn’t play with Stephen Hunter at PF (and hardly at all).

The closest lineup had the PF for AI being Kyle Korver (lolwat).

That lineup had a +/- of 0 compared to Miller’s -43 (total). No efficiency numbers available that I see.

The entire reason the team got better without AI had nothing to do with Miller. It was Philly’s decision to stop playing Kevin Ollie. With Ollie on the floor, the team posted a 100.8 off eff rate. Until the Iverson trade, he played a bunch of minutes alongside AI. AI still managed to put off a 105.4 despite those minutes with Ollie (which means he was well above it without Ollie). Miller, on the other hand, played far fewer of his minutes with Ollie with Ollie playing on the second unit.

Iverson’s top 4 lineups above were a +33 in 198 minutes. Miller’s were a +51 in 830 min..

Iverson’s top 10 was was + 46 in 415 minutes.

Miller’s top 10 was + 34 in 1285 minutes.

The team played better with Iverson out there. So why the difference in overall? Iverson played 5 more minutes per game than Miller on Philly. In those 5 minutes, AI played with scrubs that Andre Miller did not play with. Those scrubs killed AI’s overall +/- stats.

Again, in the top 10 lineups for Iverson, comprising 415 of the 640 minutes, the team was +46. For him to end up minus -50 total, that means in the 235 other minutes, his teams were outscored by 90. He played with Ollie and also Korver as a PF while Miller didn’t seem to do so.

I can’t recall if this was due to injuries or coaching as it’s been a while. But it wasn’t because of AI. And if their record was that bad, it wasn’t Iverson’s fault.

Another interesting stat. With Miller not in the game, the team scored 101.9 per 100 in 1830 minutes. Iverson was in 640 of those minutes in which the team scored 105.4. How bad were the non-AI/Miller minutes (1190)? Something like 100 per 100. And Iverson played an extra 5 minutes per game with these chumps than Miller.

So what are the effect of those 75 minutes? Let’s say best case scenario is Iverson puts up 102 with those chumps. If we take out those 75 minutes of 102 ball we get to 106 per 100, about. Roughly 1 point below Miller.

And I’m being generous, here. There’s a good chance it was like 101. And AI played more minutes besides that with players out of position or Kevin Ollie who was horrible.

tl;dr summary: If you actually break down the numbers, the swap of AI to Andre barely did anything (also, Philly won more games with Miller due to luck, their expected wins were 32). the difference in the teams is that Andre Miller played less of his minutes with the team’s scrubs than AI. Their top 10 lineups were not only vastly different, but AI’s were collectively better and it wasn’t close. That along with AI’s extra 5 minutes of scrub teammate play killed it.

I’d argue there was no discernible difference among the same non-crappy teammates.

“An analysis with all the relevant examples (at least for a year or so) has been run. It leads to a weak conclusion, in my opinion. If you feel differently, fine. I don’t think it’s going to change the world if my feelings run one way or the other.”

What analysis are you referring to here?

Eli’s, and the follow-up that he linked to in an earlier comment.

And “weak” because you think the R^2 is too low or the coefficient was not high enough? Or both?

I thought the comments have made it pretty clear. The R squared is too small (again, in my opinion). It’s actually a really similar situation to predicting wins in the NFL from one season to the next. Knowing a team’s record from last year helps you predict how they’ll do this year in that the coefficient in the regression is significant, but you barely do any better than just guessing everyone will go 8-8. In this case, knowing a line-up’s usage simply doesn’t gain you very much. Your predictions are barely any more accurate than not knowing their usage. That’s what the low R squared tells you.

What’s the R^2 for TS% y-t-y?

No idea, I’ve never run it. I’d guess maybe a .3?

Alex, players (except young players) tend to keep a consistent ts%. I’m not sure it would be that low.

Kevin Pelton summarized y-t-y correlation for the major box score stats here:

http://www.basketballprospectus.com/article.php?articleid=1430

Doesn’t have TS%, but the R^2 for 2P% is about 0.25 and for 3P% around 0.09.

Should we not consider 3P shots in our metrics? Pretty low R^2.

That’s some questionable rounding Evan. Anyway, I think that would be important for coaches to take into account. Just because a guy shot well last year it doesn’t mean he’ll shoot well this year. If you wanted to predict his value, you might not miss much by including his three percentage.

I understand that you’re trying to make the R squared argument look silly, but you can’t. The only useful thing in sports analysis is predicting. Knowing what happened in the past is only useful if it helps you make decisions going forward. If it turns out that people shoot threes so inconsistently that you can’t tell who will be good at it next year, would you use it to pick free agents? If you’re looking specifically for a guy to shoot threes then obviously that’s what you go with, but if you’re looking for who will most likely give you value you might focus on other parts of his game.

“That’s some questionable rounding Evan. ”

I was looking at the second table for players who switched teams.

“The only useful thing in sports analysis is predicting.”

Ok, even if that’s what you think, the fact is that WP is not very good at predicting y-t-y. Right now, it’s dead last compared to numerous other metrics, including Hollinger’s simulations, Vegas, and DSMok1’s ASPM. So, why bother? It doesn’t bring anything to the table, in terms of better prediction. That’s a fact.

cue the “WP can’t predict minutes” argument.

“Just because a guy shot well last year it doesn’t mean he’ll shoot well this year.”

But I thought WP shows that players are consistent over time. So if their shooting percentages aren’t, but the WP is consistent, what does that say about the metric and the claims of rebounds being overvalues.

Seems like a case of wanting to eat one’s cake, too.

I’m not sure where to start here. I guess you can put arguments in my mouth, but I’ve never said anything about minutes. WP shows that NBA players are relatively consistent, especially compared to other sports. The consistency comes from the fact that players are relatively consistent at pretty much everything that isn’t shooting: rebounds, blocks, assists, free throws, fouls (to the extent that the correlations in Kevin’s article aren’t position-driven). I have no idea what it says about rebounds being overvalued. WP is fairly consistent because it values lots of things, and as I just said, lots of things are consistent.

“cue the “WP can’t predict minutes” argument.”

Good point, but can be worked around with retrodiction, where we already know the minutes. Use prior year’s WP, ASPM, etc. I’m going to do this for ezPM (e.g. retrodict 2009 using 2008 data).

How many years are you going to do? And for how many metrics? If you want to declare a metric to be ‘the best’ after part of one season you can feel free, but I’d advise against it. I don’t know of anyone who’s made predictions with various metrics, giving them all the same minute allocations and rookie values, for multiple years to see which one does best. Anything else is a bad comparison.

“If you want to declare a metric to be ‘the best’ after part of one season you can feel free, but I’d advise against it.”

Glad you’d advise against it, although it’s not clear to me that I suggested it, unless you misread my comment.

I was going from your comment that WP is behind Hollinger, Vegas, etc. You certainly seem ready to name WP the worst, if not pick another as the best. You mean so far this year, right? Is there a bigger data set somewhere? Has anyone even updated it since you checked a while back?

APBR:

http://www.sonicscentral.com/apbrmetrics/viewtopic.php?t=2618

Last updated Jan. 23

You mean the projections that expect the Cavs to go 11-20 the rest of the way (35.5 winning percentage against 15.7% so far this year) while the Spurs go 19-13 (59.4 versus 84%)? I think I’ll wait and see. And, it sounds like you are basing your opinion on a fraction of one season.

I’ve asked if anyone would try that, but to date no one has. My guess is we’d find that the predictions don’t work with players who change teams (when teams are left relatively unchanged, including injuries, we’d expect little to change).

“And, it sounds like you are basing your opinion on a fraction of one season.”

Pretty big fraction. But I’ll give WP the benefit of the doubt that it come back from last place.

I went on the links Evans said about the consistency in differents. Even the guy said something along the lines of “if rebounds were so heavily affected by teammates, then there wouldn’t be so much consistency in the stat, even when players change teams”. Also, ts% is highly affected by freethrow rate. If there is huge inconsistency, it’s probably due to age, improvement, and injury. Most players I see improve their ts% as they get older, and some have bad years. High efficiency players tend to stay that way.

That’s the strawman argument. It’s not the teammates are so affected by teammates in general. It’s that a few elite rebounders have an affect on teammates while for everyone else it’s pretty much what it is. And the issue is that at the team level rebounds are worth less than at the individual level.

Rebounds are consistent because there are significant positional differences. Obviously, if you don’t account for position, the correlation will be very high. Point guards will always rebound less than Centers, for example, so it looks like the overall correlation is high. The correlation would be lower if it were done for each position separately. How much lower? I don’t know, but definitely much lower.

It’s been done; you should take a closer look at Berri’s FAQ (or you could read Stumbling, although I’m sure it would be a chore). The correlation stays above .8, which would put it about on par with what Pelton found (for players who change teams) for free throw percentage and below only other stats that probably should be position-adjusted (three pointers attempted, blocks, assists, two pointers attempted). If you adjust for position, rebounds per minute will be the most consistent stat across seasons excluding blocks according to Stumbling. Perhaps part of rebounding consistency is still due to some centers being asked to rebound while others aren’t, but some people have the opinion that it comes from the fact that some people are good at it while others are not.

“It’s been done; you should take a closer look at Berri’s FAQ (or you could read Stumbling, although I’m sure it would be a chore). ”

From Berri’s FAQ:

“As noted in Stumbling on Wins, per-minute rebounding is very consistent across time. The correlation coefficient for rebounds per-minute – comparing this season to last season in the NBA — is over 0.9. When you adjust for position played, the coefficient is still 0.83.”

What about when players change teams?

Pelton showed that blocks and rebounds change the least when players change teams. I assume that’s true even if you break out by position; I don’t see a good reason to think it would change drastically. Berri’s model of coaching obviously includes changing teams, which doesn’t affect Adj WP48 (http://www.stumblingonwins.com/ChapterEightNotes.pdf). So if their rebounding does change a lot, they apparently systematically get better (or worse) at other things to cancel it out. Is there a point you’re building towards? Even when you account for position or changing teams, rebounding is one of (apparently the second) the most consistent measures in the NBA. Guy has plenty of explanations as to why this might happen and why it doesn’t matter, so you don’t need to be worried about it.

Alex, sometimes a question is just a question. Are you this defensive in the “real world”?

No; I don’t get bombarded with questions by people who go out of their way to try to disagree with me in the real world. I wrote a post on valid, widely accepted statistical analysis and then I spend all my time in the comments talking about WP and rebounding; that’s what I talk about regardless of what I write unless it’s about football. It isn’t something I enjoy or want to do but I know that silence is typically interpreted as an admission of wrongness online. I’ve never seen any indication that APBR people want to agree with anything that Berri has done, even if it doesn’t involve WP, so I have to assume that the comments are attacks. I’d be much happier assuming that questions are just questions and people want to engage in reasonable conversation, but that’s been sadly rare.

“I’ve never seen any indication that APBR people want to agree with anything that Berri has done, even if it doesn’t involve WP, so I have to assume that the comments are attacks.”

Attacks? It sounds to me like you take this stuff too personally. Do you feel like I’ve personally attacked you on your blog? If so, let me know. I don’t feel like I have, but if that’s your impression, I’d like to know.

In all honesty, I was just asking a question above about rebounding. Rebounding is part of any model, mine included, so it’s not really attacking WP simply to ask the question.

You haven’t that I can remember, but you wouldn’t be the first. But, point taken.

Pingback: Further Thoughts on Significance and Regression | Sport Skeptic

Pingback: Accuracy and Precision | Sport Skeptic