This post is a little belated for a variety of reasons. It’s already the third, I’ve already put up three posts this year, and Thanksgiving is the usual holiday for giving thanks. But New Year’s is also when everyone tends to look back and look forward, and that’s what this post is about.

Before everyone stops reading, I wanted to say thanks. I mostly want to thank Dave Berri of Wages of Wins. A few years ago I was becoming more and more interested in applying good mathematical analysis to sports, and somehow (I don’t remember how), I heard about his book. It was pretty amazing. After reading it I ordered one of John Hollinger’s prospectus books to see how exactly he went about creating PER. Did you know it turns out he just made up most of the numbers? I don’t mean that he didn’t think about them or that he picked them out of a hat, but there’s no objective basis for them. Thus a sports skeptic was born. Soon after that I came up with some football models, and this past year I finally decided to start a blog to get them out there and on the record. I put up my first post on 8/15, and according to the wordpress email I got, I put up 98 posts. That’s a post every day and a half or so, which I think is pretty good.

I also want to thank the people who take the time to read the blog. Obviously anyone with a public blog wants their words to be read, but I didn’t particularly expect anyone to read this one. To that end I’d like to thank all the readers, and more so people like Dave, Arturo, and Dre for linking to me and sending readers my way. Most of my traffic has come on posts about basketball, which is fine with me. My basketball posts have mostly been about my second goal for the blog, which is expanding knowledge on stats and analysis. Dave, Arturo, and Dre are much better equipped for providing NBA analysis than I am, but I’m happy to put in my two cents when I can and even happier that some people value the work. These three are obviously not the only people working on the NBA (I’d recommend pretty much any of the sites linked on their blogs) or who have influenced my thinking; I should also thank Brian Burke at Advanced NFL Stats for his great football (and stats in general) posts. And of course I want to thank everyone who gets things wrong from time to time or who doesn’t completely agree with what I think; without alternative viewpoints there would be nothing to be skeptical about, after all. I tend to think I’m right (who doesn’t?), but finding numbers to answer questions is what this is all about. They tell me I’m wrong sometimes too.

After looking back, it’s time to look forward. I’m hoping to blog a tad bit less if possible, so that I can actually spend some time writing my thesis (although this thing is fairly addictive). With football season wrapping up in about a month, that shouldn’t be too hard, since football takes up most of my time. But with the end of the season will come my major project for the rest of the spring and summer, which is a big overhaul of the models. Mario didn’t do nearly as well as I thought it would, and Luigi did reasonably but also had kind of an off year. That just happens sometimes (my old models, for whatever reason, just could never figure out 2007), but I’m going to take a deep look into what factors are most predictive of NFL game outcomes. I may not publish all the specifics (as much as I believe in academic sharing and transparency, I believe even more in profiting from what might be profitable), but hopefully the outcomes will still be informative.

My second goal for the year is to have more posts on statistics at a more introductory/informative level. They will always be based in sports, I promise. It’s hard enough for people to believe that looking at numbers can tell them more about a game than watching it does, especially when they’ve been watching or playing it their whole lives. Even if you think the numbers can tell you something, it’s hard to know which numbers to believe. My hope is that with more background, people can make better judgments about what the numbers mean and how to decide what numbers they want to believe. To that end, if anyone has gift cards they want to use after the holidays, I would recommend Wages of Wins and Stumbling on Wins, Mathletics, The Drunkard’s Walk: How Randomness Rules Our Lives, and Freakonomics and Superfreakonomics. Although I might not agree with everything in all these books, I think they’re accessible reads that get you to think about what’s happening around you in a more systematic way. There are plenty of great books not on that list too, of course (including a couple I got for Christmas). I think everyone benefits from having their beliefs questioned and their minds stretched now and again.

On top of the football and stats, I’m sure I’ll come up with other things that raise my ire (although hopefully this deals with some of them). At this point I’m limited by what data I have handy and how much time I have to download/sort through it, but farther in the future I’d like to expand more into basketball and start getting into hockey (it looks like the Red Wings are on course for another good year). I hope you all had a good 2010, have a great 2011, and thanks again for coming by.

Hey, Alex, happy new year. Great to hear you will be writing more about statistics. Did you ever get a chance to run that regression to determine how much of the variance in Wins Produced is explained by rebounds vs. shooting efficiency? I thought we were getting pretty close to a resolution there before you took your break.

And I still think the simulation we discussed could really help settle the whole debate over the extent of diminishing returns on rebounds.

Hey Guy – I back-burnered the simulation because I realized I wasn’t 100% sure we were thinking of the same thing. I thought you meant looking at something like the variance rule, where var(x+y) = var(x)+var(y)+2cov(x,y). I think the diminishing returns claim would be that player rebounding is negatively correlated; if one player gets a lot of rebounds, other players on his team get fewer rebounds. So the variance of combined center and point guard rebounding, for example, should be smaller than the variance of center rebounding and the variance of point guard rebounding added together (in actual data). In simulated independent data, those numbers should be the same. Is that what you were thinking of? I’m also not sure what level to do it at. I have/can get position rebounding averages and variances per 48 minutes and simulate season-level data, and thus add up to season-level team data. I could try to simulate game-level data, but that would be much more intensive. So what I would end up with is a comparison of the variance in total team rebounding across teams to the variance in players across teams. Beyond those questions, I’m not sure if I need to run a simulation. If the variance issue is what you were thinking of, can’t you just take existing data and see what the covariance is between positions or players?

I’m also not sure what we would gain from the regression you suggest. Berri posted something very similar here (look for the elasticity section), and adjusted field goal percentage (or points per shot) is the largest factor in WP48. Beyond either of those, I’m not sure that they address the point you made about the regression I ran with shooting factors broken out independently (using two pointers made, missed, three pointers made, missed, etc, instead of something that combines them all). You said that leaving them separate isn’t meaningful because changing two pointers made, for example, doesn’t change the shots missed, and thus half the usage is gone. But that’s also true if I use true shooting percentage or some other measure of efficiency; if I change that, I would also have to change the rebounds that would have occurred (or not occurred) with the new accuracy. In either case they are separate variables in the model (field goals made and missed, efficiency and rebounding), so I don’t think your objection would go away. That was also why I made the comment about the size of the standard deviations; changing true shooting percentage by 1 standard deviation involves altering more possessions than changing rebounds by 1 standard deviation, so I’m not sure that scaling them provides a fair comparison if we’re interested in possession-level analysis.

If you haven’t had a chance to look at it, I think the FAQ is pretty useful in general. I hadn’t remembered seeing an analysis where rebounds were halved and divvied back out to players (which is one of the suggestions that people have made), but it’s in the FAQ. Apparently it doesn’t make a big difference.

Alex, good luck with the thesis. Did you make a New Year’s resolution to be a little more skeptical?

Speaking of that, have you wondered why Minnesota is only slightly above average in DREB% even though they have a guy who is leading the world in defensive rebounds (by number and %)? Yet they are second in OREB%. It doesn’t seem odd to me, but it would if I didn’t think there were significant diminishing returns for defensive rebounds, and much less so for offensive rebounds.

I also noticed that Michael Beasley’s DREB% has dropped by 4 points since his move from Miami, and Milicic has dropped by about 5 points since he was in Memphis (he only played 71 minutes in NYK). Of course, it could be a coincidence that both of these players who moved happen to get significantly worse in rebounding. I guess. The odd thing is that WP claims that rebounds are so consistent. I don’t know what to make of it. Do you?

Just to start, I’ll again point out that no one is saying there aren’t diminishing returns, the question is how much. I think most of the suggestions so far (worth .3, worth .5, worth .5 and adding the other half back in, etc) are both too low (just my opinion) and have no particular reasoning behind them other than that they seem to look good or maybe someone else uses them. If someone comes up with an empirical analysis that says it should be .5, or .78, or whatever, I’ll be much happier. But, to continue the conversation…

Berri’s numbers say that rebounds per minute for centers have a correlation of .83, which is high but obviously not perfect. Maybe all the decline is due to diminishing returns, I don’t know. What I really can’t figure out is why the correlation for free throws is so low. Do you know why players are so inconsistent at free throw percentage?

I’ve also noticed that Marcin Gortat’s rebounding (both offensive and defensive, but offensive more so) has plummeted since he was traded to the Suns. I thought the Suns were virtually afraid of rebounding. Shouldn’t his numbers be going up with decreased competition from his teammates? And have you noticed that both LeBron and Wade’s assist percentage have dropped this year? Where’s my diminishing returns for assists?

In short, no I haven’t been wondering at all. I think there are diminishing returns for lots of stats, not just rebounding. I think the more interesting question is if someone can quantify it in a reasonable way (which I might try to do if I had the data, but I don’t). In the meantime, I find it unproductive to banter back and forth solely about rebounding and WP48. To my knowledge, no model reduces the value of shots, assists, rebounds, etc. because anyone can do it and doing so butts in on a teammate doing it. So hopefully we can agree to that all the models are wrong. In my mind, WP is the least wrong, but that’s obviously an opinion that not everyone shares.

” If someone comes up with an empirical analysis that says it should be .5, or .78, or whatever, I’ll be much happier. But, to continue the conversation…”

I did come up with a rationale for weighing rebounds. A defensive rebound is worth as much as the league average offensive rebounding rate (ORR), and an offensive rebound is worth as much as the defensive rebounding rate (1-ORR). This is a natural outcome of having to account for shot defense and the fact that when players miss a shot, there is a chance (ORR) that the offense will recover the possession. One consequence of the model (ezPM) is that I have Love ranked 22nd overall, instead of 2nd (as WP does).

http://thecity2.com/2010/12/12/ezpm-yet-another-model-for-player-evaluation/

That’s a rationale, but what does it connect to? Is there evidence that those weights lead to better player evaluation than WP48 or PER? I noticed in your post’s comments that it seems like you’d like to try to match up with adjusted +/- ratings. What makes you think that should be the goal?

If you are going to weight them that way, shouldn’t you adjust them for each team (or line-ups, if you’re using your PBP data)? Minnesota is in the bottom half of the league in shooting (field goal % or effective field goal %) and shooting efficiency by opponents. Possessions are at a premium for them because they don’t make enough shots and their opponents make too many. Shouldn’t Love get more credit for his rebounding than, say, Garnett?

Related, why did you decide to have missed shots worth -.7 at the player level? I know that the possession could continue due to an offensive rebound (providing another opportunity to adjust for the team or line-up ability level instead of using league average), but I would assume that rebounding your own miss is fairly rare. Shouldn’t a player be punished for missing a shot and the other player credited for continuing the possession? Taking shots is not a useful skill unless they happen to go in; I’m not sure why misses shouldn’t be considered a lost possession for the player who took the shot.

The weights are decided, as I said, by league average ORR, not randomly. Everything pretty much follows from that. And for very good reason. WP overpenalizes misses. See here:

http://thecity2.com/2010/12/07/debate-which-shooting-performance-is-better/

I didn’t say randomly as if you were throwing darts at a dartboard; your rationale is based on rebounding rates. But why is that the right way to do it? You argue that the weight used in WP is ‘wrong’; why is yours ‘right’?

No, players should not be penalized if there is a definite chance that the ball will be rebounded. In a league where ORR=0, yes, you’re right. But in a league where ORR~26%, you overpenalize for missing, if you subtract the whole point.

Sorry, for the multiple replies, but something else is important to say.

Let’s say that I am wrong, and your argument that a shooter should be debited 1 pt for missing is correct. (I don’t agree, but it actually helps me make another point.) If that were true, then do you think the defender or the rebounder should get credited for that point? See, in my accounting, the defender (or team) should get credit for creating a missed shot. It’s very obvious when we look at blocked shots. If Tim Duncan blocks Derrick Rose, you would debit Rose 1 marginal point. But shouldn’t Duncan get +1 point?

Now, I subtract ~0.7 pts (roughly) for a missed shot, and give that to the defense as a credit. I split it evenly (for now), unless someone blocked the shot. In that case, we credit the shot blocker. So, what happens to the leftover 0.3 pts credit? That goes to the rebounder!

On offense, the guy who gets the offensive rebound gets +0.7 pts, thus recovering the -0.7 pts his teammate was debited by missing, and resetting the possession to zero.

The logic is very straightforward, and has not failed me yet.

I think it works just as well using +1 and -1. Let’s say Rose takes a shot and misses and Noah rebounds it. Rose gets -1 and Noah +1, the Spurs (their opponent) get nothing and we’re back to square, like in your example. If Rose misses it and Duncan gets the rebound, it’s still -1 and +1 which is appropriate because the Bulls lost a possession and the Spurs gained it.

If Rose gets blocked by Duncan, Duncan should certainly get some credit, but how much credit depends on how often blocks are recovered by either side. ‘Recovered blocks’ don’t turn up in the box score (I don’t think? Are they counted as rebounds?), but for the sake of argument let’s say recovering a block is a 50/50 proposition. Rose gets blocked and loses a point. Duncan gets a block and gets .5. Noah recovers the block and gets .5. On sum, the Bulls lost half a point and the Spurs gained half a point, for a difference of 1 (which is less than the difference of two above if Duncan just rebounded the miss). Our two methods would disagree on who accounted for that 1-point transaction; in mine Rose gets -1, Noah .5, and Duncan .5. In yours, Rose gets -.7, Duncan .7, and possession of the block is in limbo. If block recoveries are considered rebounds, then my ‘final’ scores would be Rose -1, Noah +1, Duncan +(whatever value to give blocks). Your scores would be Rose -.7, Duncan +.7, Noah +.7. You also give credit to the Spurs for creating a missed shot, and WP would give credit to the Spurs in the defensive team adjustment. Does that sound right?

Just thought I would show how the rankings are affected by my logic:

Player…WPRank…ezPM100Rank…Difference (sorted descending order)

Deron Williams…24…7…17

Manu Ginobili…19…6…13

Dwyane Wade…10…2…8

Steve Nash…11…4…7

Pau Gasol…16…9…7

LeBron James…7…3…4

Nene Hilario…23…19…4

Tyson Chandler…17…15…2

Matt Barnes…26…24…2

Dwight Howard…6…5…1

Chris Paul…1…1…0

Rajon Rondo…9…10…-1

Paul Pierce…25…26…-1

Kevin Garnett…5…8…-3

Al Horford…8…12…-4

Ronnie Brewer…29…35…-6

Joakim Noah…13…20…-7

Carlos Boozer…28…36…-8

Jason Kidd…20…29…-9

Tim Duncan…12…22…-10

Kevin Love…2…18…-16

Lamar Odom…22…41…-19

Andrew Bogut…30…52…-22

JaVale McGee…21…48…-27

Blake Griffin…18…50…-32

Zach Randolph…15…49…-34

Marcus Camby…3…40…-37

Landry Fields…14…60…-46

Kris Humphries…4…58…-54

Josh Smith…27…89…-62

Makes a difference. Now, some may not like the difference it makes, but to say there’s no difference, or that the difference is not significant. Well, that’s hardly justfiable.

I should have added these are for the top 30 players according to WP.

I don’t think anyone would argue that your model isn’t fairly different from WP. But as in my other comment, what makes you think that it’s better? If all I wanted was different, I could use adjusted +/-, PER, Win Score, NBA Efficiency. Out of curiosity, how does your model ranking compare with those other rankings?

see replies above

I haven’t compared to all these other models, but I am mainly interested in the +/- comparisons. That’s why I chose this particular framework.

The idea is to predict actual +/-.

Alex, in response to your reply above (WordPress has a reply # limit), PBP data tells you who rebounded the ball after the blocked shot, so there’s no need for approximation. If Duncan blocked the ball and got the rebound, in my system he would get full credit. (I also do things like split the credit for team rebounds, which you can get from PBP data.)

“Berri’s numbers say that rebounds per minute for centers have a correlation

of .83, which is high but obviously not perfect. Maybe all the decline is

due to diminishing returns, I don’t know. What I really can’t figure out is

why the correlation for free throws is so low. Do you know why players are

so inconsistent at free throw percentage?”

Alex, this is a quite puzzling statement. The .83 y-t-y correlation by itself tells you very little, if anything, about the extent of diminishing returns. (Dennis Rodman was fairly consistent in his rebounding rate, but also consistent in reducing the number of rebounds recorded by teammates.) Conversely, the “missing” .17 is not necessarily

evidence of diminishing returns (correlations will be less than one because of random variation, plus changes in player talent unrelated to teammates).

Year-to-year correlation alone is not a very good way to compare player “consistency” in different areas. The correlation depends on three things: 1) the variance in the statistic (how much do players differ), 2) sample size in each year, and 3) how much players’ ability to generate the statistic actually changes. Only the 3rd item is really about a player’s “consistency.” Rebounds have a very high correlation mainly because 1) there is huge variance among players (some players get 3x the rebounds of other players), and 2) sample size is quite large — about 83 rebound opportunities per team per game. Players could be fairly “inconsistent” and the r will still be relatively high. Free throw shooters, in contrast, have only 5 FTAs per game (and I’d guess the variance is lower too, but I’m not sure). Unless you take account of both sample size and variance, you can’t compare correlations and draw any

meaningful conclusions about player consistency at all.

This statement too is quite curious: “I think there are diminishing returns for lots of stats, not just rebounding. I think the more interesting question is if someone can quantify it in a reasonable way… To my knowledge, no model reduces the value of shots, assists, rebounds, etc. because anyone can do it and doing so butts in on a teammate doing it.”

Of course many efforts have been made to quantify diminishing returns. Most obviously, EVERY metric includes some penalty for FGAs. So the fact that players take scoring opportunities from teammates by taking shots is addressed by everyone (though in different ways, of course). Hollinger, and now EvanZ, weight rebounds at much less than one, in part to address diminishing returns. Indeed, Dr. Berri mentions a new version of WP in his FAQ that assumes a 50% diminishing return rate for drebs, and implies that .5 coefficient is research-based. Statistical plus minus, by using statistics to predict adjusted plus-minus, clearly accounts for diminishing returns. Win Shares also seeks to account for diminishing returns. Statistical analysts (except for Dr. Berri) have long understood diminishing returns, and attempted to measure the (presumably different) rates for different statistics.

Your statement suggest to me you may not understand the concept of diminishing

returns, which may explain why you are so skeptical about it. At a minimum, you clearly are not familiar with a vast body of work on the subject.

Players change teams, and the correlation I mentioned is for per-minute rebounding. If there are diminishing returns, the correlation would be less than 1 simply because players who move to worse-rebounding teams will increase unexpectedly while players that move to better-rebounding teams will decrease unexpectedly. I wasn’t saying that diminishing returns is why the value is .83; as you say there’s random variation as well as injuries and other factors. But DR could be part of it.

I disagree with your description of correlations. The definition of the term involves dividing by the variance of the measures involved, so were it true that players vary more in rebounding than something else, it should not inflate the correlation. At any rate, your argument suggests that other variables that have large variance and many opportunities to occur should also have large year-to-year correlations, but each of the following has a smaller correlation than rebounding: points scored, FGA, fouls, steals, field goal percentage. So I would assume you think that all of these are still less consistent than rebounding.

Evan’s work specifically says that missed shots are valued at .7 because there is an offensive rebound 30% of the time, not because of diminishing returns. According to Hollinger’s Prospectus (at least as of 2002), he used the same logic; I believe that’s where Evan got it from. The penalty for missed shots comes from the fact that it costs you possession of the ball unless someone on that player’s team rebounds it. I’m not as familiar with statistical +/-, so I won’t speak to that. But, nowhere in Hollinger’s description of PER, Evan’s description of his metric, or in the calculation of WP is diminishing returns used as the reason to weight a variable a certain way.

As Berri’s FAQ says, he has in fact researched diminishing returns for various stats (in addition to WP48). And he finds it. I’ll admit to having forgot that part of Stumbling on Wins when I wrote my comment. But Berri has found diminishing returns for points, FGA, FTA, defensive rebounds, assists, and blocked shots. His conclusion is that the effect is fairly small and doesn’t change many of his conclusions, as you’ll find in the FAQ; reweighting rebounds as others have suggested doesn’t lead to big differences in player rankings.

My point, as I tried to clarify for Evan, is not that diminishing returns don’t exist. I have in fact never said that. I think the question is how best to quantify them, and then to incorporate them in a model and demonstrate why that model is better than the others. Reducing the weight for rebounds may be appropriate, but I have yet to see really solid reasoning that it should be .7 for offensive rebounds and .3 for defensive rather than .75 and .25 or .6 and .4 or 1 and 1. Even if .7/.3 is correct, I don’t know why these models haven’t docked the credit for scoring or assists or anything else. My opinion is that people are unhappy with WP and rebounding, and so they are hounding that one fact. We can at least be honest and say that all the models get a variety of things wrong.

Finally, I don’t come to your house and piss in your pool. Please be respectful, or save yourself the time of reading my uninformed opinions and stop coming to the site.

Alex:

I just saw this response. I’m happy not to comment on your blog if that’s your preference.

Just to quickly respond to your specific comments here:

It may help you to think about y-t-y correlations this way: each stat will have a certain correlation if every player’s true consistency were 100%. You could calculate that for each stat: it’s just a function of the variance in players’ ability interacting with the sample size for that stat. That is, the ratio of true variance to sampling error is what’s important. This “natural correlation rate” will be very different among stats. To see how consistent players are, you want to compare your observed correlation to the correlation you’d expect under perfect consistency.

I don’t think it’s true that all statistics have diminishing returns. Shooting efficiency, for example, seems to have “increasing returns” — efficient shooters make their teammates better as well. Good metrics will neither ignore DR nor assume they exist everywhere, but actually try to measure separately the impact of each stat on team wins. People “harp” on rebounds because the DR rate happens to be extremely high there, and since it is by far the largest determinant of players’ WP48, this has huge and unfortunate consequences for WP’s accuracy.

I agree with most of your last point. Do you think there’s a useful model that does what you say a good metric will do?

I think statistical plus/minus is a powerful approach. The idea is that although APM is extremely noisy at the player level, in the aggregate it should provide an unbiased estimate of how valuable boxscore stats actually are. If an individual player is +2 we really don’t know how good he is. But 100 +2 players presumably really are +2 ON AVERAGE. A guy who goes by “DSMok1” over at APBRmetrics has a version that is currently doing very well at predicting 2010-11 wins. Neil Paine at B-Ref blog has a version too. (I haven’t followed either closely enough to know the differences.) Pelton’s WAR also seems like a strong approach to me. I know you are a fan of WP, but frankly I think the evidence shows you would be better off relying on virtually any other metric.

FYI: If you don’t read Paine’s blog, I would highly recommend it. A lot of really smart analysis there.

Statistical +/- is similar to what Evan is working on with his model, right?

Does either model (stat +/- or Paine’s) do well at predicting player performance? While we disagree on the WP model, I think we can agree that player performance is a good application of any model. Have they also predicted team wins well in the past?

I would let Evan speak to how his model differs from SPM. I believe he agrees that a metric’s success in predicting APM is a good way to validate a model, but again, I wouldn’t presume to speak for him.

If you’re asking about the y-t-y correlation for each metric, I don’t know that. But again, y-t-y consistency is a plus only to the extent players are actually consistent and only to the extent you’re measuring something that truly produces wins. If I allocate team point differential by height, my r will be very high! WP has a high y-t-y correlation because it is based largely on rebounds, but since we know that rebounds correlate much less with wins than with player WP, this is not really a virtue.

We sort of got away from our original topic, which was how much player WP is driven by rebounds. You said above “Berri posted something very similar [in his FAQ] (look for the elasticity section), and adjusted field goal percentage (or points per shot) is the largest factor in WP48. ” Berri is completely wrong on this, of course: elasticities do not tell us the proportion of variance in one variable that is explained/determined by another. It surprises me every time he repeats this canard, as it puts his statistical illiteracy on display. But since you’re a betting man, I think we can settle this particular dispute with a bet. We’ll pick 100 player seasons at random, looking at position, Reb48, AFG%, and WP48. You try to predict WP48 using only position and AFG%, and I’ll do the same using only position and Reb48 (and we both show our decision rule to prove our predictions relied only on the two variables we were given to work with). Whoever does a better job of predicting WP48 (by any criteria you want) wins the bet. I’ll put up anywhere from $1 to $1,000 (or, I’ll settle for loser does a post on your blog saying “I was wrong”). Are we on?

We’ve already had this discussion Guy. Changing 1 SD of true shooting percentage (which was what I was using) involved changing more possessions than changing 1 SD of rebounds. I’m not sure that it makes for a fair comparison. Also, as you and I have both said, it doesn’t make sense to change one variable and look at the effects when we know that will change other variables; if I raise one player’s (or team’s) FG%, rebounds have to disappear. I thought we both said that a more fine-grained analysis would be needed.

I agree that the true-world consistency may not be knowable, but surely we can agree that it’s relatively high, right? If the lowest y-t-y correlation at the player level is field goal % at something like .5, that’s pretty high as far as these things go. At the least, can we agree that it should be higher than the near-0 correlation produced by adjusted +/-?

Alex: The issue at hand is, and always has been, which plays a larger role in determining players’ WP48 — rebounds or shooting efficiency? That is, which explains more of the variance? You have made an affirmative claim in this debate: “adjusted field goal percentage (or points per shot) is the largest factor in WP48.” I say you (and Berri) are completely wrong. And I’ve offered to make a wager (or money or pride) that I’m right. Are you going to stand behind your position or not?

To the extent that AFG% and Reb48 are correlated, that will help us equally to make our predictions. If you are worried that Reb48 is correlated with some other component of WP48, thus giving me an unfair advantage, I’m happy to add assists, blocks, steals, PF, and/or turnovers to our respective models (that covers everything). The important point is that I will know rebounds, and you will know shooting efficiency. Ready to bet?

I’ll give it a shot. Here’s what I’m thinking: I’ll use a model of wp48 = position+TS% (true shooting). You use a model containing position and total rebounds per 48. You can post your rule in a reply to this comment and I’ll post my rule as a reply to that before we pick a sample. We can pick 100 players at random if you like, or we can use the data I have which is only the past two seasons (a bit under 900 player-seasons). The criterion for winning is average squared error; we’ll get the error for each player prediction, square it, sum up across all the players, and divide by 100 or 900 or whichever size you’d prefer. In any event, I’ll put it all up in a post, and if I lose I’ll give you a hearty internet handshake. You can do the same in the comments if you lose. Game?

I’m game. My only request would be to have access to the same data set (at least position, Reb48, and WP48) so I know the mean for your players, and to the extent our models overfit the data that will be true for both of us. Can you post as a google doc? (Or you can send to my email, which you have.)

OK. Alex has graciously shared with me his data set of players covering the last two seasons (2008-09 and 2009-10). We have agreed to limit our prediction contest to players with only one clear position and who played at least 1000 MP in a season. By my count there are 219 such player-seasons. I will use only position and Reb48, and he will use only position and points-per-shot (PTS-FT/FGA).

My model to predict player WP48 is: -0.0746 + 0.0381*Reb48 – C*0.3187 – PF*0.2521 – SG*0.0209 – SF*0.0825. I get an R^2 of .481 with this model, so rebounds are determining about 1/2 of the variance in WP48. The correlation between WP48 and Reb48 (position-adjusted) is .67, for anyone interested. Game on. And good luck to Alex. Whoever “wins,” we should learn something from the exercise.

Just so I’m clear, your model is predicting 2010-11 data, right?

If I recall, the correlation for WP as a whole predicting one season using the previous season is 0.8 (R^2 = 0.64). So, you’re already really close just with rebounds, eh?

which unless I’m crazy means there’s only about 20% of the variance left to explain by shooting, turnovers, PF, etc.

No, we are just trying to “predict” the WP48 in the last two seasons. The question is, if I only know a player’s position and Reb48, and Alex knows only position and PPS, who can best estimate that season’s WP48?

I see.

(Damn, these WordPress reply limitations!)

To be fair, I said true shooting. But I’m also checking PPS to see if it makes a difference.