//
you're reading...
data visualisation, R, statistics

Visualising Shrinkage

A useful property of mixed effects and Bayesian hierarchical models is that lower level estimates are shrunk towards the more stable estimates further up the hierarchy.

To use a time honoured example you might be modelling the effect of a new teaching method on performance at the classroom level. Classes of 30 or so students are probably too small a sample to get useful results. In a hierarchical model the data are pooled so that all classes in a school are modelled together as a hierarchy and even all schools in a district.

At each level in this hierarchy an estimate for the efficiency of the teaching method is obtained. You will get an estimate for the school as a whole and for the district. You will even get estimates for the individual classes. These estimates will be weighted averages of the estimates for the class and the estimate for the school (which in turn is a weighted average of the estimate for the school and the district.) The clever part is that this weighting is itself determined by the data. Where a class is an outlier, and therefore the overall school average is less relevant, the estimate will be weighted towards the class. Where it is typical it will be weighted towards the school. This property is known as shrinkage.

I’m often interested in how much shrinkage is affecting my estimates and I want to see it. I’ve created this plot which I find useful. It’s done in R using ggplot and is very simple to code.

The idea is that the non shrunk estimates bi (i.e. the estimates that would be obtained by modelling classes individually) are plotted on along the line x=y at the points (bi, bi). The estimates they are being shrunk towards ai are plotted at the points (bi, ai). Finally we plot the shrunk estimates si at (bi, si) and connect the points with an arrow to illustrate the direction of the shrinkage.

Here is an example. You can see the extent of the shrinkage by the the distance covered by the arrow towards the higher level estimate.

ShrinkPlot

Note the arrows do sometimes point away from the higher level estimate. This is because this data is for a single coefficient in a hierarchical regression model with multiple coefficients. Where other coefficients have been stabilized by shrinkage this causes this particular coefficient to be revised.

The R code is as follows:


# *--------------------------------------------------------------------
# | FUNCTION: shrinkPlot
# | Function for visualising shrinkage in hierarchical models
# *--------------------------------------------------------------------
# | Version |Date      |Programmer  |Details of Change                
# |     01  |31/08/2013|Simon Raper |first version.      
# *--------------------------------------------------------------------
# | INPUTS:  orig      Estimates obtained from individual level 
# |                    modelling
# |          shrink    Estimates obtained from hierarchical modelling
# |          prior     Priors in Bayesian model or fixed effects in 
# |                    mixed effects model (i.e. what it is shrinking
# |                    towards.
# |          window    Limits for the plot (as a vector)
# |
# *--------------------------------------------------------------------
# | OUTPUTS: A ggplot object
# *--------------------------------------------------------------------
# | DEPENDS: grid, ggplot2
# |
# *--------------------------------------------------------------------

library(ggplot)
library(grid)

shrinkPlot<-function(orig, shrink, prior, window=NULL){
  
	group<-factor(signif(prior,3))
	
	data<-data.frame(orig, shrink, prior, group)
  
  g<-ggplot(data=data, aes(x=orig, xend=orig, y=orig, yend=shrink, col=group))
	g2<-g+geom_segment(arrow = arrow(length = unit(0.3, "cm"))) +geom_point(data=comp.in, aes(x=coef, y=mean))  
  g3<-g2+xlab("Estimate")+ylab("Shrinkage")+ ggtitle("Shrinkage Plot")
	
	if (is.null(window)==FALSE){
		g3<-g3+ylim(window)+xlim(window) 
	}
	
	print(g3)
	
}

About these ads

About Simon Raper

I am an RSS accredited statistician with over 15 years’ experience working in data mining and analytics and many more in coding and software development. My specialities include machine learning, time series forecasting, Bayesian modelling, market simulation and data visualisation. I am the founder of Coppelia an analytics startup that uses agile methods to bring machine learning and other cutting edge statistical techniques to businesses that are looking to extract value from their data. My current interests are in scalable machine learning (Mahout, spark, Hadoop), interactive visualisatons (D3 and similar) and applying the methods of agile software development to analytics. I have worked for Channel 4, Mindshare, News International, Credit Suisse and AOL. I am co-author with Mark Bulling of Drunks and Lampposts - a blog on computational statistics, machine learning, data visualisation, R, python and cloud computing. It has had over 310 K visits and appeared in the online editions of The New York Times and The New Yorker. I am a regular speaker at conferences and events.

Discussion

3 thoughts on “Visualising Shrinkage

  1. Awesome example, if you faceted by color and plotted with transparency it would possibly make it alittle less busy. You may be interested in the plots in an old paper by Efron and Morris in Scientific American, “Stein’s paradox in statistics”, http://www-stat.stanford.edu/~ckirby/brad/other/Article1977.pdf that has similar links visualizing univariate shrinkage.

    Posted by apwheele | September 1, 2013, 1:17 pm

Trackbacks/Pingbacks

  1. Pingback: Visualising Shrinkage - R Project Aggregate - January 1, 2014

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog Stats

  • 321,033 hits

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 518 other followers

Follow

Get every new post delivered to your Inbox.

Join 518 other followers

%d bloggers like this: