Merry Belated Christmas

My holiday project this year was to finally get Win Share data from basketball-reference.com.  I was able to write some code that grabbed each player from each team in the past 8 years (excluding, of course, the Bobcats, who didn’t exist that whole time).  I then did a little work to combine that with other data sets I already had so that for each of those players I have all their total stats, bball-reference advanced stats, PER, wins produced (total and per 48), ezPM100, APM, and RAPM.  And now, those data are also yours.  More detail below the jump.

As it sounds like, I started with the bball-reference data.  I grabbed season totals and season advanced stats (PER, true shooting, rebound percentage, etc.).  The per-48 minute numbers I calculated myself simply by multiplying a stat by 48 and dividing by total minutes played.  If someone had a 0 (such as taking no shots), they get a 0 per 48.  If someone played no minutes (a distinction belonging to at least one lucky guy), I set the per 48 to 0.  That should cover all the columns out to BF.  Here’s the glossary if you don’t know what something is.  The only ones that might be unclear are TGM and TGA (and their per-48s); those are three point field goals made and attempted.  I used T instead of 3 because R doesn’t like header names to start with numbers.  Also, year refers to the end of the season.  So 2011 is the season that concluded with the Mavs beating the Heat in the Finals.

The next columns are Wins Produced position, per-48 productivity, and total wins produced from the website.  However, the website combines stats across the whole season, meaning that if a player played for multiple teams his numbers are not separated out.  However, my player rows are; they are player-team rows.  So I calculated total wins produced by multiplying the season-long WP48 number by the minutes played for that team.

Columns BJ and BK are ‘old’ WP48 and WP total.  Recently WP added an adjustment to defensive rebounds in the calculation of WP48 and that is reflected on the website; these two columns use the old calculation and come from a spreadsheet I already had.  Total is calculated the same way as it is for new WP.  If you’d like to check these numbers yourself, you can follow the steps from the website and ignore the defensive rebound adjustment.  I think mine do differ from the original spreadsheet a little, but only for players who played for multiple teams in the same season.  They differ due to how I combined them to the season level at the time, but I believe the changes to be minor.  The new WP numbers should all be accurate.

Next you have ‘old’ RAPM.  These values were calculated by Jeremias and use 0 as a prior for each player-season.  They are not available any more.  2011 is listed as ‘NA’ because that season, and only that season, included playoff data.  These numbers go back to 2006 and may be spotty due to issues with how the player names were entered.

Moving to column BM you have APM from basketball value.  I’ve noted the low quality of both APM and these particular APM values before, but to recap:  APM suffers terribly from colinearity and sample issues, and the numbers are likely not good reflections of player quality.  Further, the numbers I downloaded from the website do not match the numbers you can actually see on the website.  But they’re here if you want them, going back to 2008.

Next is ezPM100 from Evan.  He told me the 2009 numbers available on his site aren’t reliable, so anything from 2009 and earlier is listed as NA.  The next column is possessions, which you’ll want for APM/RAPM/ezPM, which are all per 100 possession measures.  Evan has his own possession numbers, but with only two years available I used possessions from the APM file instead.  It turns out that those values have a very strong correspondence to minutes played, so possessions are available for all years and are calculated from minutes played.

Finally, you have ‘new’ RAPM.  These are the numbers currently available on Jeremias’ site.  They start in 2002 and use that year’s rating as a prior for the next season, each one feeding forward to the next year.  Because of issues with player names that start in 2006, 2006 and earlier are listed as NA.  There are a very small number (I think 4) of low-minute players in 2007 that I couldn’t line up either, but 2006 is worse.

Disclaimers: for some reason, bball-reference doesn’t use the same player order for season totals and season advanced stats.  So they should line up, and I did some spot checking, but I can’t guarantee it.  Similarly, I’ve spot-checked the other stats I added in, but they might not properly line up everywhere.  Why eight years?  I got 10 years because that covers a good amount of time and as far back as the Wins Produced site goes.  Obviously I could get more from bball-reference, but there would be nothing to compare besides Win Shares and Wins Produced, and even then you’d have to calculate all the WP yourself.  However, the 10 year file was too big to upload to Google docs, so I dumped 2000 and 2001.  Thus you get eight years.  Why no playoffs?  Because they playoffs are for fun, not stats.

So now that you have your gift, I get to ask you for one.  I’ve made the spreadsheet accessible but not editable.  I’d like you guys, either via email (in my ‘about’ page) or in the comments, to let me know if you catch any mistakes.  Then I can verify them and fix them.  That would include things like if you have values for something listed as NA or if you have player-team values for something currently only at the player-season level.  I won’t make the spreadsheet publicly editable because it’s a resource I’m providing and I want to be able to vouch for it.  I’d love to have a more complete set of numbers, but here’s a good start with two years of ezPM, four years of APM, five years of new RAPM, six years of old RAPM, and eight years of PER, new and old Wins Produced, and Win Shares.  Enjoy!