For my penultimate look at regularized regression, I wanted to see how noise affected things. We know that NBA play-by-play data is very noisy; the R squared is, at best, maybe .1. How does noise affect the benefits of ridge regression?

The process is again pretty similar to before, but now I’m going to increment the variance of the random noise added in calculating Y. The larger the variance the more noise, and thus the worse the regression will fit the data. I decided to have the noise range from 0 to 1000 in steps of 10 and ran five simulations at each level.

Here’s the mean VIF by noise level. There’s no relationship, nor should there be; VIF depends on the collinearity. We get some movement on that because of random sampling, but noise in Y doesn’t impact the collinearity in the X matrix.

How about model fit? As expected, more noise trashes the R squared. I’m actually only going to plot the first ten noise values because it’s already zeroed out by about 50.

The plot of lambda versus noise is kind of interesting. After a certain amount of noise lambda is either fairly small or maxed out at 180. My guess is that when there’s too much noise the regression essentially guesses whether it should shove everything to 0 or just chug along.

With more noise comes more error; here are the (mostly) unsurprising plots for standard regression retrodiction error and ridge regression retrodiction error. The ridge error is a little odd in that it occasionally just seems to go off the reservation. If you plot the two errors against each other, it looks essentially just like the ridge plot. The ridge error can be as good as standard regression, or much worse.

How well do the models recover the real beta weights in the face of all this noise? Terribly! Just look at the scale!As a side note in case it isn’t obvious from the graphs, ridge regression always does better than standard. But it sure doesn’t take much. Recall that the weights are (1,2,3,4). Standard regression makes estimates in the 100s! Ridge regression looks positively reasonable just by guessing in the 10s. In the face of a lot of noise neither does a great job, but ridge is decidedly the way to go.

Finally, we have prediction error. I won’t bother plotting standard against ridge, because ridge wins again.Ok, so what do we have? More noise is bad. With no noise, collinearity isn’t an issue; even standard regression can figure it out. But with some noise things get bad quickly. Ridge regression deals with it better, but still not great (depending on your definition of great). The coefficient estimates can still be pretty inflated, and it looks like once in a while it just lays an egg.

Here’s the R code. It should look pretty familiar by now.

library(car)

library(MASS)

library(parcor)

correl=.99

obs=1000

noise=c(rep(seq(0,1000,by=10),5))

linearerr=NULL

ridgeerr=NULL

vifs=NULL

linearprederr=NULL

ridgeprederr=NULL

linearcoeferr=NULL

ridgecoeferr=NULL

ridgelambda=NULL

rsquared=NULL

for (a in 1:length(noise)) {

means=c(0,0,0,0)

covar=matrix(c(1,correl,correl,correl,correl,1,correl,correl,correl,correl,1,correl,correl,correl,correl,1),4,4)

Xs=mvrnorm(obs,means,covar)

Xsframe=data.frame(Xs)

Y=Xsframe[,1]+2*Xsframe[,2]+3*Xsframe[,3]+4*Xsframe[,4]+rnorm(obs,0,noise[a])

linfit=lm(Y[1:(obs/2)]~X1[1:(obs/2)]+X2[1:(obs/2)]+X3[1:(obs/2)]+X4[1:(obs/2)],data=Xsframe)

ridge.object=ridge.cv(Xs[1:(obs/2),],Y[1:(obs/2)])

linearcoeferr2=sqrt(linfit$coef[1]^2+(linfit$coef[2]-1)^2+(linfit$coef[3]-2)^2+(linfit$coef[4]-3)^2+(linfit$coef[5]-4)^2)

ridgecoeferr2=sqrt(ridge.object$int^2+(ridge.object$coef[1]-1)^2+(ridge.object$coef[2]-2)^2+(ridge.object$coef[3]-3)^2+(ridge.object$coef[4]-4)^2)

linearerr2=mean(abs(linfit$fit-Y[1:(obs/2)]))

ridgefitted=ridge.object$int+rowSums(ridge.object$coef*Xs[1:(obs/2),])

ridgeerr2=mean(abs(ridgefitted-Y[1:(obs/2)]))

linearpred=linfit$coef[1]+rowSums(linfit$coef[2:5]*Xs[(obs/2+1):obs,])

ridgepred=ridge.object$int+rowSums(ridge.object$coef*Xs[(obs/2+1):obs,])

linearprederr2=mean(abs(linearpred-Y[(obs/2+1):obs]))

ridgeprederr2=mean(abs(ridgepred-Y[(obs/2+1):obs]))

linearerr=c(linearerr,linearerr2)

ridgeerr=c(ridgeerr,ridgeerr2)

linearprederr=c(linearprederr,linearprederr2)

ridgeprederr=c(ridgeprederr,ridgeprederr2)

linearcoeferr=c(linearcoeferr,linearcoeferr2)

ridgecoeferr=c(ridgecoeferr,ridgecoeferr2)

vifs=c(vifs,mean(vif(lm(Y~X1+X2+X3+X4,data=Xsframe))))

ridgelambda=c(ridgelambda,ridge.object$lam)

rsquared=c(rsquared,summary(linfit)$r.sq) }

plot(noise,vifs)

plot(noise,rsquared)

plot(noise,ridgelambda)

plot(noise,linearerr)

plot(noise,ridgeerr)

plot(linearerr,ridgeerr)

abline(0,1)

plot(noise,linearcoeferr)

plot(noise,ridgecoeferr)

plot(linearcoeferr,ridgecoeferr)

abline(0,1)

plot(noise,linearprederr)

plot(noise,ridgeprederr)

plot(linearprederr,ridgeprederr)

abline(0,1)

Pingback: RAPM – Conclusions | Sport Skeptic