My last post had a lot of info, but was short on a few details. This post is going to fill in at least one of the holes.

Guy asked for the specifics on predicting WP48 from other box score stats. He also asked that position be accounted for. The WP info is from the automated site, so some players have mixed positions. If I take them out to avoid relabeling those 400ish players (it’s the holidays after all, and I didn’t expect to be on the computer this much), I still have 449 players in the past two years. I ran a regression predicting WP48 from three pointers made, two pointers made, free throws made, missed field goals, missed free throws, offensive rebounds, defensive rebounds, turnovers, steals, blocks, assists, position, and position interactions. Those are all the variables that are listed on the website I linked to for calculating WP, and the position and position interaction terms should account for what Guy asked for. All variables except for position are set to per-36 minutes and normalized to Z scores.

Center is first alphabetically, so it got set to baseline. There are 11 variables and 4 position comparisons to center, so there are 44 interaction terms. Only ten are significant (and an eleventh is marginal, p=.088). One of these involves rebounding; small forwards get reduced WP48 credit for offensive rebounds compared to centers. The other negative values (suggesting that centers get more credit) are on blocks for power forwards and point guards, missed field goals for small forwards and shooting guards, and free throws for power and small forwards. On the other hand, power and small forwards get more credit for two pointers, small forwards actually get more WP48 points for turnovers, and shooting guards get marginally positive credit for missed free throws. In all it’s a confusing picture, and I think tied to the fact that there’s a position adjustment already in place that is being taken apart in crude fashion. Also, all these interaction effects have significance values in the .001 to .08 range, whereas all the main effects have values of 2 x 10 (-13) or smaller. On the whole, I don’t think anyone could claim that WP48 gives tall players more credit for rebounds. But, for completeness, I’ll post the interaction info in a comment on this post.

I ran the model again leaving position in but taking all the interactions out. The R squared only dropped .0066 (from .979 to .9724), so the interactions weren’t buying us very much. The regression equation is WP48 = -.597+.423*3PM+.435*2PM+.191*FTM-.492*FGMiss-.108*FTMiss-.228*TO+.123*BLK+.353*AST+.477*ORB+.465*DRB +.337*PFor+1.01*PGuard+.846*SFor+1.11*SGuard. So if a player was completely average, he would have the lowest WP48 if he was a center (-.597), followed by power forward (-.26), small forward (.249), point guard (.413), and shooting guard (.513). Of course, these positions don’t produce equal amounts of the different stats; that’s why there’s a position adjustment (the different positions in my sample have roughly equal WP48, as it should be; centers are actually 3rd of the five positions numerically). Otherwise what we see is that missed field goals is the ‘primary’ driver of WP48, with a weight of -.492. This is followed by the two rebounding variables (.477 and .465) and made two and three pointers (.435 and .423). Keep in mind that all these values are on scaled variables, so a player would increase his WP48 by the values listed if he increased that statistic by a standard deviation.

Summary: I ran a regression predicting WP48 from all the variables that go into it, along with position adjustments. Position didn’t seem to interact with the variables in any meaningful way, and not particularly with rebounds (if this interaction is a sign of overweighting something, it appears that centers might get too much credit for blocks). Looking only at main effects, the biggest weight is placed on missed field goals (if a player missed 2.23 fewer field goals per 36 minutes, his WP48 would go up by .492). Rebounds do indeed follow next (getting just over two more of either kind of rebound would increase WP48 by about .47), and made shots are the last two variables with weights over .4 (WP48 would go up by about .43 if a player made 1.86 more two pointers or .89 more three pointers).

These weights only differ over a range of .06, so the effects aren’t that different. Let’s say a completely average player dropped his missed field goals as described. His scaled WP48 would move from 0 to .492. We convert it back to regular WP48 by multiplying it by the standard deviation (.2079) and adding the average (.05) to get .152. Doing the same thing for offensive rebounds, defensive rebounds, made two pointers, and made three pointers, you would get WP48s of .149, .147, .140, and .138. If each of these players played 2000 minutes, the range would cover 5.75 to 6.3 wins, so not big differences. So I think that shooting efficiency and rebounding are about equally weighted by WP48, although three of the top five variables are related to scoring.

Here are the interaction terms as promised. The format is variable, weight, p value. Remember that these are scaled variables compared to centers. For example, the first one means that if you increased a power forward’s made three pointers by one standard deviation, his normalized WP48 would go up by .0157 points less than if a center increased his made three pointers by the same amount. But this effect has a very high p value, suggesting that it isn’t significant (e.g., the effect is pretty unreliable). I put an asterisk after the variables with p less than .05.

3PM*PF, -.0157, .765

3PM*PG, -.034, .511

3PM*SF, .0449, .449

3PM*SG, .0368, .494

2PM*PF, .11, .001*

2PM*PG, .0298, .405

2PM*SF, .104, .032*

2PM*SG, .04, .311

FT*PF, -.114, .0003*

FT*PG, -.0493, .119

FT*SF, -.101, .001*

FT*SG, -.0411,.154

FGMiss*PF, .008, .832

FGMiss*PG, -.0259, .459

FGMiss*SF, -.105, .008*

FGMiss*SG, -.0892,.036*

FTMiss*PF, .0095,.719

FTMiss*PG,.0452,.136

FTMiss*SF,-.0019,.976

FTMiss*SG,.0647,.0885

TO*PF,.0327,.241

TO*PG,.0091,.742

TO*SF,.0892,.013*

TO*SG,-.001,.979

STL*PF,.0428,.189

STL*PG,.0159,.618

STL*SF,.0283,.419

STL*SG,-.0048,.912

BLK*PF,-.116,.0001*

BLK*PG,-.308,.0002*

BLK*SF,-.0886,.105

BLK*SG,-.049,.416

AST*PF,-.051,.39

AST*PG,-.0129,.791

AST*SF,.0705,.372

AST*SG,.0565,.382

ORB*PF,-.0539,.187

ORB*PG,.0186,.869

ORB*SF,-.164,.044*

ORB*SG,.109,.16

DRB*PF,.0507,.146

DRB*PG,-.02,.732

DRB*SF,.0879,.17

DRB*SG,-.0838,.132

OK, now you’re getting somewhere. But you need to account for the very high correlation between FG and FGMiss. By looking at them separately, it appears each component is having a large impact. Sure, a 1 SD increase in missed FGs would have a big effect, but it will always be accompanied by a large offsetting increase in FG made. So you must use a combined scoring efficiency metric (I thought your earlier choice of TS% was fine) if you want to compare the relative importance of efficiency and rebounding. If you simplify your model that way, I think you will find that rebounds do indeed play a dominant role.

BTW, much easier than using interaction variables is simply position-adjusting rebounds, which is roughly what WP does. For Reb48, the position averages are C 12.4, PF 11.4, SF 7.6, SG 5.6, PG 4.7. I suppose you could also position-adjust assists (C 2.2, PF 2.9, SF 3.6, SG 4.6, PG 8.6) and PF (5.8, 4.9, 4.2, 3.7, 3.6) if you were highly motivated. But it doesn’t sound like position adjustments will change the story in a material way.

It sounds like you have most of the numbers handy. Do you have a post somewhere I can look at?

WOW has position averages here: http://www.wagesofwins.com/PosAvg.html. But given your high R^2 without using the interaction variables, perhaps position-adjusting the variables is less important than I had thought.

BTW, I see now that you have also standardized the dependent variable (WP48). I think that makes your results much harder to interpret, especially if you standardized it separately by position which would mean 1 SD does not always have the same value (there is much more variance among the big men). For example, you say that missing 2.23 fewer FG36 would increase WP48 by .492, when the actual increase would be .098. I assume the .492 is measured in SDs.

And as I mentioned, no player ever decreases his FGMiss in isolation like that. In real life a decline of 2.23 FGMiss36 always means 1.5 to 3.3 fewer FGmade (assuming a FG% range of 40% to 60%). So the real range of impacts on WP47 is far narrower. Indeed, the fact that you report a one SD increase in rebounding produces roughly the same WP48 gain as such an extreme gain in shooting efficiency — your example implies an average gain in FG% of about .115 — just confirms that WP is substantially overweighting rebounds.

I didn’t scale wp48 (or any of the variables) by position, just by the sample mean and standard deviation.

I’ll tell you what though. Since you seem dissatisfied with all of my analyses, why don’t you email me a guest post. We can go back and forth a bit to make sure we’re both happy with the analysis, and then I’ll post it. I haven’t run anything that tells me that rebounding is far and away the most critical part of WP48, but I’m willing to be convinced if the numbers are there.

Alex, you stated in your post that you now think rebounding and shooting efficiency are equally weighted in WP48. How do you square that with the analysis myself and others have done that shows shooting efficiency is a much greater factor in terms of winning at the team level? I think Guy and myself see the disconnect, but not sure if you do (yet).

I think it might be due to WP48 being about possessions, whereas the team-level analyses (that I’ve seen) haven’t tried to equate possessions. In your analysis of the four factors, for example, how many possessions would you have to change to increase a team’s ORR by the 2.8% standard deviation? How many possessions would you have to change to increase a team’s eFG% by 2.3? In some numbers I threw around just to give myself an idea, I think you have to change more possessions to change your shooting percentage. If you grant a team more rebounds so that the possession changes are about the same, the effect on wins is much closer. It would take a lot more number-crunching to really look into it though; you’d want to try to find the trade-off for making a three versus getting part of an offensive rebound or the opponent getting part of a defensive rebound, then the same thing for twos and free throws. In the WP framework everything is equated to possessions at the player level and that’s where I’ve found that scoring and rebounding are more closely weighted.

Alex: I don’t think you need a guest post to understand my analysis — I described it pretty clearly (I think) in a comment to your previous post. I think the way to determine how much influence rebounds (or any other variable) has on WP48 is to do a regression with WP48 as the dependent variable, and include rebounds, shooting efficiency (e.g. TS%), and the other stats as independent variables. Then compare the standardized coefficients. In my regression, rebouding has about 1.5x as much impact on WP48 as scoring efficiency, and much more impact than any other variable. (In my version I position-adjusted rebounds, but I think your solution of just including position as a variable is superior.)

So I don’t think we are very part at all, except I disagree with your decision to include all the various scoring efficiency components separately (2P, 3P, FGmiss, FT, FTA). What we want to know is the combined impact of scoring efficiency, and how it compares to rebounding. Looking at a 1 SD change in FGmiss without accounting for the extra scoring that comes with it, or looking at a 1 SD change in FGmade without accounting for the extra misses, is not really meaningful — in each case you are measuring just one half of a change in usage. If you simply combine all the scoring varibles into one efficiency metric, I think your results and mine will be quite similar. But if not, then perhaps I’ve made an error or there’s a problem with Arturo’s data set.

As for comparing these results to Evan’s team level analysis, I don’t see the problem. All of the four factors are essentially per-possession metrics, so possessions are controlled for. And like you, he has used regression to isolate the impact of each factor, controlling for the correlation of factors that, as you’ve written, can make simple correlations misleading. So it seems to me like the two methods are parallel, and it’s fair to compare the two sets of results.

I meant more, you know what analysis you want to do and you have the numbers, so why don’t you just write it up?

The problem you mention also applies in Evan’s data. The four factors are not all all per-possession; turnover rate is, but free throw rate and eFG% is per shot attempt and rebound rate is per missed shot. In any event, changing any one of them changes the others. As you said, changing field goals missed has to come at the cost of other things; field goals made, rebounds, potentially assists if you want to have them in your model, etc. That’s true even if you use eFG or true shooting percentage. If I increase that value, it means I changed a make into a miss or vice versa, which will generate rebound opportunities or take them away. These trade-offs will occur regardless of what measures you’d like to use for shooting and rebounding.

This issue is related to my comment to Evan. I didn’t use the four factors, but if I use true shooting % in place of all scoring measures and keep the other box score stats (off/def rebounds, turnovers, assists, etc), I do indeed find that shooting percentage has the highest weight. But it has a standard deviation of nearly 2%. You have to change a lot of shot results to move your shooting percentage that much. I looked at the Bucks from a couple years ago because they had about the league-average TS%, and they would have to change FGA+.44*FTA by around 250 to move a standard deviation. On the other hand, the SD for either rebound measure is about 100. So a standard deviation of TS% is indeed more important, but you have to make a lot more changes to get there. If you try to equalize the change in plays by moving rebounding by about 200, now that’s 2 SD and rebounding is much more similar to TS%. As we’ve been saying, it might be simplistic to think about suddenly gaining 200 rebounds or changing 250 shots without looking at all the changes that would have to accompany them, but at least by moving to the possession level we can think about accounting for all the changes necessary.

Now you’ve lost me. The four factors are basically independent from one another: a change in TS% will change Orebound opportunities, for example, but oreb% is a per-opportunity stat. So I don’t see the problem you’re describing. And your last paragraph simply confirms that differences in scoring efficiency play a much larger role than rebounding in determining team wins, because teams actually differ in their efficiency much more than they differ in rebounding ability.

But in any case, let’s get back to individual players and WP48, which is what you and I were discussing (and I think everyone, including Berri, agrees that scoring efficiency is far important than rebounding when it comes to explaining team wins). Can you tell us what your regression model shows when you replace 2PM, 3PM, FGMiss, FT, and FTMiss with TS%? Are rebounds still as important as shooting efficiency in determining WP48, or not? I think that’s the question on the table.

And as for the original, underlying issue — how large are diminishing returns on rebounds? — I still think you can settle that issue quite easily by running the simulation we discussed. How different is actual team variance from what a simulation produces if you assume players have no effect on their teammates’ rebounding totals? If you do that exercise, I think you will immediately be convinced that there are enormous diminishing returns (and no, I haven’t already done this myself, so you may prove me wrong).

Pingback: Arturo's Silly Little Stats