There was some discussion over at the APBR board about Matt Bonner (and, to a lesser extent, Nick Collison) recently, namely as to what they do exactly that has them rated so highly in some metrics (primarily RAPM). That led me to think of a question the other day: Gregg Popovich is widely regarded as one of the best NBA coaches ever, right? Wouldn’t you think that one of the best NBA coaches ever would get a top-5 NBA talent on the court for more than 20 minutes a game? And that led me to today’s topic: what does get an NBA player on the court?

I went to my player data set and grabbed everyone who played more than 500 minutes in a season for a particular team. Your first guess (or perhaps hope) is that better players will play more and we can tell which players are better by looking at one or more of the many advanced stats metrics that exist. So I’m going to regress/correlate minutes per game with the metrics I have handy. Since I mentioned Bonner and RAPM, I can start there. There’s a positive correlation but an R squared of only .21; better players do tend to get more minutes, but it’s kind of noisy. Going down the list of metrics (all in terms of per-possession or per-minute production), we see the same story: positive correlations but lowish R squared values. The lowest predictor is Wins Produced at .100 and the highest is PER at .389.

Using any one measure probably gives us an incomplete measure of player ability, so we can use a few at a time to see if we get a better picture of what a coach might be looking for. If I use the blends from one of my previous projects, I get R squared values of .25 for the retrodiction blend (the ‘best’ combination of metrics for adding up to team wins) and .32 for the predictive blend (‘best’ at summing to team wins in the next season). Since PER did the best on its own, I started adding other metrics to it in a multiple regression. Putting in RAPM gets the R squared to .40; Win Shares to .448; and ASPM to .51. Other metrics can be added in, but they don’t add a whole lot of predictive ability even if their beta value is significant; all of these metrics are correlated with each other so they just start eating into each other’s value. An exception is good ol’ NBA Efficiency, which bumps the R squared up to .544. So a combination of five player evaluation metrics can tell us about 55% of the variation in minutes played in the NBA.

Now, we know that PER and NBA Efficiency are not great metrics. Primarily, they say little about defense and seem to overvalue inefficient scoring. So I was curious if I could add any ‘basic’ stats to the regression and improve the fit. I tried some of bball-reference’s advanced stats, like offensive and defensive rating, usage, etc. Steal percentage bumped things up to .597, which was a little surprising to me. Defensive rebound percentage had a little bump whereas offensive rebound percentage did nothing. But in general, nothing else moved the needle. I then tried all the typical stats as per-48 minutes values. Fouls was a big contributor, which makes sense since you can’t play if you’re in foul trouble. Steals were again predictive. And that was about it. In total, PER, RAPM, Win Shares, ASPM, NBA Efficiency, steal percentage, fouls per 48, and defensive rebound percentage explain 65.8% of the variation in minutes played per game.

Not all of these are good things, of course. More fouls lead to fewer minutes. Interestingly, steals are also a negative; the more steals a player gets, the fewer minutes he plays in general. Of the advanced metrics, PER, RAPM, and ASPM have positive contributions while Win Shares and NBA Efficiency have negative. Keep in mind these metrics are all positively correlated with minutes and each other, so it’s more of a contrastive issue. If you have a high PER, RAPM, and ASPM but a comparatively low WS/NBA Efficiency you would be predicted to get more minutes than someone who was high on all five of them.

To provide a couple of examples, let’s look at some players from last year (I haven’t put this year’s stats in the data set yet). As a baseline, the regression predicts that a completely average player would get 25 minutes per game (of course, an average player is actually a bit above average). LeBron James played 38.8 minutes per game while the regression predicts he would play 44.7 minutes per game; LeBron is pretty good. The impetus for the post, Matt Bonner, played 21.7 minutes per game and was predicted to play 27.4. And for kicks we can look at Kobe Bryant. Last year he played 33.9 minutes per game and would be predicted to play 41.5. Plugging in Kobe’s numbers from around the interwebs, he would be predicted to play 45 minutes a game this season. Despite what you probably read from a number of sources, it isn’t surprising to see that Kobe got more minutes this year given his performance.

So in general, to play in the NBA you want to be good at basketball. No surprise there. It doesn’t hurt to grab some defensive rebounds. And you don’t want to commit a lot of fouls, since then your coach yanks you. But you also don’t want to get a lot of steals. I guess that could be indicative of gambling too much on defense, but it’s a totally post-hoc story. And keep in mind that this says little about how much a player *should* play, but how much they *do* play. Kobe’s numbers say that he would play more and he did. But it isn’t a lock: compared to last year, Bonner’s PER, Win Shares, and ASPM are pretty much the same but his fouls and steals are down and his RAPM and rebounds are up. He would be predicted to play more but his minutes actually went down. It looks like Matt Bonner could stand to be on the court a little more after all.

Unfortunately the amount of minutes a player plays in any given team relies too much on one factor completely out of the player’s control/metrics: Is there a better/comparable player on the team at his same position?

And as for the Kobe/LeBron projected minutes per game, I’d like to see what the year by year leaders totals are in minutes per game – If the best player in the league (LeBron) kills a team for 3 quarters and the Heat are up big, he’s more likely to NOT play as many minutes in the game, as opposed to a close game (which could be due to LeBron not performing as well) which would make his coach more likely to leave him in

Yep, totally a valid point. It isn’t 100% foolproof – I’d argue Manu and Harden are examples of guys who purposefully come off the bench even though most people think they’re better than the guy who’s starting. But there are probably some guys who could start for other teams (or at least get more minutes) but don’t because their team has someone better ahead of them.

You can check minutes per game in a few places. ESPN and bball-reference should have it. The top five this year were Deng, Love, Durant, Kobe and Dwight. I thought about the fact that better teams could rest their players more, but I didn’t want to try putting in a bunch of quadratic terms or interactions. I could though, it wouldn’t be too tough. But I don’t think the ‘too much winning’ is too much of a concern – LeBron is 6th on that list for this year, and there are only two guys in the top 10 from non-playoff teams (Love and David Lee).

Many good players at one position are an issue but also distribution around the league: your regression probably suggests that all Spurs players should play more than 240 minutes combined while Bobcats’ roster on average probably doesn’t deserve to play even half of a game…

“I haven’t put this year’s stats in the data set yet”

Are you going to do it?

I’d have to go and check, but that can’t be the case exclusively. Bad teams still have guys play for 30+ minutes, obviously, because someone has to. But I’m sure you’re right that starters for good teams would add up to pretty much a whole game.

I’ll get the new data in at some point. Maybe after the playoffs.

“I’ll get the new data in at some point. Maybe after the playoffs”

And? Come on, you’ve promised… 😉

Yes, I have… it’s a terrible time right now though. Don’t let me off the hook; maybe I can get to it in September.

> Date: Mon, 20 Aug 2012 13:08:52 +0000 > To: akonkel@hotmail.com >

Have you considered the issue of statistical multiplicity at all, when adding/removing all these variables? Also, just because a Beta (coefficient) is significant, doesn’t necessarily mean much. A Type 3 SS might be a better evaluation tool.

This is a good explanation of it is not as simple as running a million correlations and pulling out the 50,000 that are significant and throwing them all into a stepwise regression equation as independent variables:

http://www.thejuliagroup.com/blog/?p=1289

You mean multiple tests? All the variables I mentioned are very significant; I think they would survive whatever adjustment you would want to make. Also I decided on what to keep by looking at the R squared for the overall regression more than the p value for any given beta value. That being said, I didn’t run a formal ANOVA at any point so much as eyeball it, so you can take the results with a grain of salt if you want.

If you say so. Did you use an Adjusted R-squared when there were multiple predictors in your model, or just a plain old R-squared?

I looked at ‘plain’ R squared because I was more interested in prediction than parsimony, although I wanted some simplicity.

If it would make you feel any better, I threw a whole ton of stuff into a single model and ran a stepwise regression on it. It keeps 24 variables (whether you use backward steps or backward and forward, since they came to the same conclusion), including a lot of per-48 minute stats on top of PER, ‘old’ Wins Produced, Win Shares, 0-prior and last-year-prior RAPM, and ASPM. That has an R squared of .718, based on the couple seasons that have all of that overlapping data.