This is a very quick post just to share a quick tip on how to add non overlapping labels to a scatterplot in ggplot using a great package called directlabels. The trick is to make each point a single member group using an aesthetic like colour and then apply the direct.label function with the first.qp method. Some example code and output is below
library(ggplot2) library(directlabels) x<-runif(10) y<-rnorm(10) z<-as.character(midwest$county[1:10]) q<-qplot(x,y)+geom_point(aes(colour=z)) direct.label(q, first.qp)
If there are better ways then I’d love to know but it works well for me and has the added advantage that the labels are matched to the points by colour.

I too have been annoyed by this and am glad to see someone has tackled the problem. I want to try your code but it doesn’t appear to be standard R code. If I try to run it I get an error saying x not found.
Posted by Tyler Rinker | February 23, 2012, 12:55 amThanks for spotting this. For some reason WordPress converted the < sign when I pasted the code in from Rstudio. Should be fixed now but will double check when I'm back in work. It doesn't look like the correction goes immediately onto the R bloggers site if you came from there so you'd have to visit the actual blog
Cheers
Simon
Posted by simonraper | February 23, 2012, 7:21 amNice post.
FYI, WordPress tends to mangle your code if you switch between the Visual and HTML views. Also, it seems to have an algorithm to check that what you’ve written is suitably important, and not saved anywhere else before deciding to randomly replace parts of your post with HTML.
It’s interesting to see what happens with direct labels when you stress test it by making it try to draw more labels than is possible (because there isn’t enough room).
p <- ggplot(midwest, aes(area, poptotal, colour = county)) +
geom_point() +
scale_y_log10()
(p <- direct.label(p, first.qp))
In this case it appears to just make up locations, then gives up once it fails to find a position. (Compare, for example the label for MARINETTE, with its data subset(midwest, county == "MARINETTE").)
Using a different algorithm can yield radically different results. If you swap first.qp for first.bumpup, then everything gets labelled, even if the labels overlap.
In practice, you'll likely have to try a few labelling algorithms to see which one is most effective.
Posted by richierocks | February 23, 2012, 4:10 pmThanks Richie. That’s really useful. I think you are right about experimentation. I started using some of the labelling strategies used recommended for scatterplots but they they didn’t work so well. I assume this is because they are geared at labelling clusters of points. The method I used and the one that you suggested are both recommended for line plots so I guess the points are treated as lines of just one point and the label is positioned accordingly. You’ve probably already seen it but there’s a useful list at http://directlabels.r-forge.r-project.org/docs/index.html and I believe you can also create your own.
Posted by simonraper | February 23, 2012, 6:21 pmSomething wrong with notation in this example.
What do these mean?
x&
;lt;
Posted by Giles L Crane | February 23, 2012, 1:05 amHi Giles
Thanks for pointing this out. Just a pasting error. should be fixed now
Cheers
Simon
Posted by simonraper | February 23, 2012, 7:23 am