//
you're reading...
R, Uncategorized

118 years of US State Weather Data

A recent post on the Junkcharts blog looked at US weather dataand the importance of explaining scales (which in this case went up to 118). Ultimately, it turns out that 118 is the rank of the data compared to the previous 117 years of data (in ascending order, so that 118 is the highest). At the end of the post,

I always like to explore doing away with the unofficial rule that says spatial data must be plotted on maps. Conceptually I’d like to see the following heatmap, where a concentration of red cells at the top of the chart would indicate extraordinarily hot temperatures across the states. I couldn’t make this chart because the NOAA website has this insane interface where I can only grab the rank for one state for one year one at a time. But you get the gist of the concept.

In this spirit then, I wrote a little R script for scraping the data and produced a couple of charts based on it (click on them to get full size versions). I used Charles Web Proxy to figure out what needs to be sent to the website to return the data I was looking for. A Heatmap for March 2012, which shows the rank for each state in the latest month: A Heatmap for each March going back to 1895: The code to reproduce and tweak these charts is below:


### Packages needed for the work
library(RCurl)
library(ggplot2)

### Get list of US states to tie onto dataset, remove Alaska and Hawaii
us.list.of.states <- readHTMLTable("http://www.worldatlas.com/aatlas/populations/usapoptable.htm")[[1]]
us.list.of.states <- us.list.of.states[ c(-2, -11), ]

### Functions to pull monthly and annual data from the NOAA website
getNOAAdataMonth <- function(state.no, month){
	
	zeroes = ifelse(state.no > 9, "0", "00")
	state.string = paste(zeroes, state.no, sep="")
	
	data.in <- postForm("http://climvis.ncdc.noaa.gov/cgi-bin/cag3/hr-display3.pl",
			data_set = "01",
			byear = "1895",
			period = month,
			lyear = "2012",
			strgn = state.string, 
			bbeg = "1901",
			bend = "2000",
			trend = "0",
			type = "3",
			rank = "0",
			send.x = "60",
			send.y = "8", 
			spec = "")
	
	data.out <- readHTMLTable(data.in)[[2]]
	data.out$state <- us.list.of.states[state.no, 3]
	data.out}

getNOAAdataAnnual <- function(state.no){

zeroes = ifelse(state.no > 9, "0", "00")
state.string = paste(zeroes, state.no, sep="")

data.in <- postForm("http://climvis.ncdc.noaa.gov/cgi-bin/cag3/hr-display3.pl",
		data_set = "01",
		byear = "1895",
		period = "17",
		lyear = "2012",
		strgn = state.string, 
		bbeg = "1901",
		bend = "2000",
		trend = "0",
		type = "3",
		rank = "0",
		send.x = "60",
		send.y = "8", 
		spec = "")

data.out <- readHTMLTable(data.in)[[2]]
data.out$state <- us.list.of.states[state.no, 3]
data.out}

### Run function over 48 states
weather.data.annual <- sapply(1:48, function(x) getNOAAdataAnnual(x), simplify=FALSE)
weather.data.march <- sapply(1:48, function(x) getNOAAdataMonth(x, "3"), simplify=FALSE)

### Join lists together into dataframe
weather.data.2 <- do.call("rbind", weather.data.march)
weather.data.annual.2 <- do.call("rbind", weather.data.annual)

### rename columns for easier use
colnames(weather.data.2) <- c("year", "temp", "rank1", "rank2", "state")

### Subset 2012 data for first chart
weather.data.march2012 <- subset(weather.data.2, year==2012)
weather.data.march2012$fill <- ifelse(as.numeric(as.character(weather.data.march2012$rank1))==118, 1, 0)

ggplot(weather.data.march2012, aes(x=state, y=as.numeric(as.character(rank1)), fill=1, label = state))+
		geom_tile()+
		geom_text(size=3)+
		ylab("March 2012 Rank")+
		scale_fill_continuous("", low="white", high="red")+
		opts(title = "All the red at the top means record temperatures across many states")

### Plot all years data (year is a factor in the dataset, so need to convert to numeric)
ggplot(weather.data.2, aes(x=state, y=as.numeric(as.character(year)), fill=as.numeric(as.character(rank1))))+
		geom_tile()+
		coord_flip()+
		scale_fill_continuous("", low="white", high="dark red")+
		ylab("Year")+
		opts(title = "All the red at the right means record temperatures across many states")


About these ads

Discussion

4 thoughts on “118 years of US State Weather Data

  1. Hi Mark, for the second chart, if you first normalize the data by state, the pattern would be clearer. One possibility is subtract each value by the state mean divide by the state range.
    Thanks for making the plot!

    Posted by Kaiser | April 27, 2012, 3:58 am

Trackbacks/Pingbacks

  1. Pingback: Visualising the Path of a Genetic Algorithm « Drunks&Lampposts - April 23, 2012

  2. Pingback: 118 years of US State Weather Data | (R news & tutorials) - April 23, 2012

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog Stats

  • 317,626 hits

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 508 other followers

Follow

Get every new post delivered to your Inbox.

Join 508 other followers

%d bloggers like this: