//
you're reading...
data visualisation, Python, Web Scraping and APIs

Lazy D3 on some astronomical data

I can’t claim to be anything near an expert on D3 (a JavaScript library for data visualisation) but being both greedy and lazy I wondered if I could get some nice results with minimum effort. In any case the hardest thing about D3 for a novice to the world of web design seems to be getting started at all so perhaps this post will be useful for getting people up and running.

A Reingold–Tilford Tree of the IVOA Astronomical Object Ontology using D3

A Reingold–Tilford Tree of the IVOA Astronomical Object Ontology using D3

The images above and below are visualisations using D3 of a classification hierarchy for astronomical objects provided by the IVOA (International Virtual Observatory Alliance). I take no credit for the layout. The designs are taken straight from the D3 examples gallery but I will show you how I got the environment set up and my data into the graphs. The process should be replicable for any hierarchical dataset stored in a similar fashion.

A dendrogram of the same data

A dendrogram of the same data

Even better than the static images are various interactive versions such as the rotating Reingold–Tilford Tree, the collapsible dendrogram and collapsible indented tree . These were all created fairly easily by substituting the astronomical object data for the data in the original examples. (I say fairly easily as you need to get the hierarchy into the right format but more on that later.)

First the environment.

If you are an R user with little experience of building web pages then you’ll probably find yourself squinting at the D3 documentation wondering how you get from the code to the output. With just a browser to read the JavaScript and an editor (notepad if you like but preferably a specialist HTML editor) you can get through the most of the tutorials referred to in Mark’s previous post . Doing this locally on your own pc works ok because the data visualised in the tutorials is mostly hard coded into the JavaScript. However once you want to start referring to data contained in external files some sensible security restrictions on what files your browser can access will block your attempts. The options are to turn these off (not advised) or switch to working on a webserver.

You can either opt for one of the many free hosting services or if you are feeling more adventurous you can follow Mark’s posts on setting up a Linux instance on Amazon Web Services and then follow Libby Hemphill’s instructions for starting up Apache and opening the port. Finally, going with the latter, I use FileZilla to transfer over any data I need to my Linux instance. See this post for getting it to work with your authentication key.

This should leave you in the following situation:

  • You have a working web server and an IP address that will take you to an HTML index page from which you can provide links to your D3 documents.
  • You have a way of transferring work that you do locally over to your web server.
  • You have a whole gallery of examples to play around with

Rather than creating my own D3 scripts I’m just going too substitute my own data into the examples. The issue here is that the hierarchical examples take as an input a nested JSON object. This means something like:

{
 "name": "Astronomical Object",
 "children": [
  {
   "name": "Planet",
   "children": [
    {
     "name": "Type",
     "children": [
      {"name": "Terrestrial"},
      {"name": "Gas Giant"}
     ]
    }

The issue is that our data looks like this:

1. Planet
1.1. [Type]
1.1.1. Terrestrial
1.1.2. Gas Giant
1.2.  [Feature]
1.2.1. Surface
1.2.1.1. Mountain
1.2.1.2. Canyon
1.2.1.3. Volcanic
1.2.1.4. Impact
1.2.1.5. Erosion
1.2.1.6. Liquid
1.2.1.7. Ice
1.2.2. Atmosphere
1.2.2.1. Cloud
1.2.2.2. Storm
1.2.2.3. Belt
1.2.2.4. Aurora

To put this into the right format I’ve used Python to read in the file as csv (with dot as a delimiter) and construct the nested JSON object. Here’s the full python code:


import csv, re

#Open the file and read as dot delimited

def readLev(row):
    level=0
    pattern = re.compile("[a-zA-Z]")
    for col in row:
        if pattern.search(col) !=None:
            level=row.index(col)
    return level
    

with open('B:\PythonFiles\PythonInOut\AstroObject\Astro.csv', 'rb') as csvfile:
    astroReader = csv.reader(csvfile, delimiter='.')
    astroJson='{"name": "Astronomical Object"'
    prevLev=0
    for row in astroReader:
        #Identify the depth
        currentLev=readLev(row)
        if currentLev>prevLev:
            astroJson=astroJson+', "children": [{"name": "' + row[currentLev]+'"'
        elif currentLev<prevLev:
            jump=prevLev-currentLev
            astroJson=astroJson+', "size": 1}'+']}'*jump+', {"name": "' + row[currentLev]+'"'
        elif currentLev==prevLev:
            astroJson=astroJson+', "size": 1},{"name": "' + row[currentLev]+'"'
        prevLev=readLev(row)
    astroJson=astroJson+ '}]}]}'
    print astroJson

Since only the level of indentation matters to this process it could be repeated with on any data that has the form


* Planet
** [Type]
*** Terrestrial
*** Gas Giant

So that’s it. You’ll see that it in the source code for the hierarchical examples there is a reference to flare.json. Substitute in a reference to your own file containing the outputted JSON object and be sure to include that file in the directory on your web server.

Of course it’s a poor substitute for learning the language itself as that enables you to construct your own innovative visualisations but it gets you started.

About these ads

About Simon Raper

I am an RSS accredited statistician with over 15 years’ experience working in data mining and analytics and many more in coding and software development. My specialities include machine learning, time series forecasting, Bayesian modelling, market simulation and data visualisation. I am the founder of Coppelia an analytics startup that uses agile methods to bring machine learning and other cutting edge statistical techniques to businesses that are looking to extract value from their data. My current interests are in scalable machine learning (Mahout, spark, Hadoop), interactive visualisatons (D3 and similar) and applying the methods of agile software development to analytics. I have worked for Channel 4, Mindshare, News International, Credit Suisse and AOL. I am co-author with Mark Bulling of Drunks and Lampposts - a blog on computational statistics, machine learning, data visualisation, R, python and cloud computing. It has had over 310 K visits and appeared in the online editions of The New York Times and The New Yorker. I am a regular speaker at conferences and events.

Discussion

One thought on “Lazy D3 on some astronomical data

  1. Thank you for posting. The python script did not work for me. I had to change it slightly on line 29:
    astroJson=astroJson+ ‘}]}’ + ‘]}’*(currentLev-1)

    Posted by Irma Tanovic | November 20, 2013, 11:10 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog Stats

  • 326,837 hits

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 525 other followers

Follow

Get every new post delivered to your Inbox.

Join 525 other followers

%d bloggers like this: