//
you're reading...
R, Uncategorized

Non overlapping labels on a ggplot scatterplot

This is a very quick post just to share a quick tip on how to add non overlapping labels to a scatterplot in ggplot using a great package called directlabels. The trick is to make each point a single member group using an aesthetic like colour and then apply the direct.label function with the first.qp method. Some example code and output is below


library(ggplot2)
library(directlabels)
x<-runif(10)
y<-rnorm(10)
z<-as.character(midwest$county[1:10])
q<-qplot(x,y)+geom_point(aes(colour=z))
direct.label(q, first.qp)

If there are better ways then I’d love to know but it works well for me and has the added advantage that the labels are matched to the points by colour.

About these ads

About simonraper

I am an RSS accredited statistician with over 15 years’ experience working in data mining and analytics and many more in coding and software development. My specialities include machine learning, time series forecasting, Bayesian modelling, market simulation and data visualisation. As Data Scientist at Channel 4 my role is to develop machine learning solutions that allow the channel to build a deeper relationship with the viewer and innovate the way advertising is traded and work on supporting the creative side of the business. My current interests are in scalable machine learning (Mahout, Hadoop), interactive visualisatons (D3 and similar) and applying the methods of agile software development to analytics. I have worked for Mindshare, News International, Credit Suisse and AOL. I am co-author with Mark Bulling of Drunks and Lampposts - a blog on computational statistics, machine learning, data visualisation, R, python and cloud computing. It has had over 270 K visits and was mentioned in Flowing Data, I09, and the online editions of The New York Times and The New Yorker.

Discussion

8 thoughts on “Non overlapping labels on a ggplot scatterplot

  1. I too have been annoyed by this and am glad to see someone has tackled the problem. I want to try your code but it doesn’t appear to be standard R code. If I try to run it I get an error saying x not found.

    Posted by Tyler Rinker | February 23, 2012, 12:55 am
    • Thanks for spotting this. For some reason WordPress converted the < sign when I pasted the code in from Rstudio. Should be fixed now but will double check when I'm back in work. It doesn't look like the correction goes immediately onto the R bloggers site if you came from there so you'd have to visit the actual blog

      Cheers

      Simon

      Posted by simonraper | February 23, 2012, 7:21 am
      • Nice post.

        FYI, WordPress tends to mangle your code if you switch between the Visual and HTML views. Also, it seems to have an algorithm to check that what you’ve written is suitably important, and not saved anywhere else before deciding to randomly replace parts of your post with HTML.

        It’s interesting to see what happens with direct labels when you stress test it by making it try to draw more labels than is possible (because there isn’t enough room).

        p <- ggplot(midwest, aes(area, poptotal, colour = county)) +
        geom_point() +
        scale_y_log10()
        (p <- direct.label(p, first.qp))

        In this case it appears to just make up locations, then gives up once it fails to find a position. (Compare, for example the label for MARINETTE, with its data subset(midwest, county == "MARINETTE").)

        Using a different algorithm can yield radically different results. If you swap first.qp for first.bumpup, then everything gets labelled, even if the labels overlap.

        In practice, you'll likely have to try a few labelling algorithms to see which one is most effective.

        Posted by richierocks | February 23, 2012, 4:10 pm
      • Thanks Richie. That’s really useful. I think you are right about experimentation. I started using some of the labelling strategies used recommended for scatterplots but they they didn’t work so well. I assume this is because they are geared at labelling clusters of points. The method I used and the one that you suggested are both recommended for line plots so I guess the points are treated as lines of just one point and the label is positioned accordingly. You’ve probably already seen it but there’s a useful list at http://directlabels.r-forge.r-project.org/docs/index.html and I believe you can also create your own.

        Posted by simonraper | February 23, 2012, 6:21 pm
  2. Something wrong with notation in this example.
    What do these mean?
    x&amp
    ;lt;

    Posted by Giles L Crane | February 23, 2012, 1:05 am
  3. Hi Simon,
    Great little hack.
    I have a question though.
    As I see from your example direct.label() takes it label values from the catergories of z.
    What I were to have two different aesthetics (colour and zise) or two different geoms. How would you set direct.label() to use only one of them?

    To be more specific my problem at the moment is that my labels from geom_text() overlap each other and I’d like to have them moved around. I assumed the argument position_dodge would solve it but apparently that’s not what it’s for.

    Thanks
    Roey

    Posted by Roey Angel | June 13, 2012, 2:54 pm
    • Hi Roey,

      To be honest I’m not that sure. You’d have to check out the documentation for the directLabels package and play with it a bit. I believe the package is more geared towards labelling groups rather than individual points so I kind of fudged it a bit by ensuring each of my points is treated as a separate group. Would be interested if you find out anything interesting.

      Cheers

      Simon

      Posted by simonraper | June 18, 2012, 8:59 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog Stats

  • 289,936 hits

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 484 other followers

Follow

Get every new post delivered to your Inbox.

Join 484 other followers

%d bloggers like this: