EC2 Tutorial: NumPy and SciPy

Another quick note for getting set up on your EC2 instance. To install SciPy, you first need to install ATLAS and lapack. The following few lines of code run as root (sudo bash) should sort you out: yum -y install atlas-devel yum install lapack pip install scipy

Some new functions I’ve discovered in R

I've been writing a fair amount of R recently and have been going through a good learning period, here are some functions that I've discovered (mainly plyr and reshape related) and thought I would share: merge_all is a good way to merge multiple different data frames, rather than multiple merge commands.

EC2 Tutorials: Scheduling tasks on EC2 using Crontab

One of my main reasons for wanting an EC2 instance was to be able to automatically run scripts at certain times, normally to collect data and save it to a database. As my EC2 instance is always running, I can forget about it for a month and have a month's worth of data ready

Google Refine: One of The Best Tools You’ve Probably Never Heard About

Lots of data that's available online tends to not be the cleanest thing in the world, particularly if you've had to scrape it in the first place. At the same time, lots of internal data sets can be just as messy, with columns having different names in what should be identical spreadsheet templates

Marketing Mix Lab: Visualising The Correlation Matrix

Following on from the previous post here is an R function for visualising correlations between the explanatory variables in your data set. An interesting example is the North Carolina Crime data set that comes with the plm package. This has the following continuous variables: crmrte crimes committed per person prbarr probability of arrest prbarr probability

R: Subsetting a list based on a condition

Quite a handy couple of lines of code to subset a list in R to just those elements which meet a certain condition. Here’s an example to return only those elements of a list which are a certain class. Thanks to this StackOverflow answer.

EC2 Tutorials: Installing new software; yum, pip, easy_install, sudo-apt

For anyone familiar with python and easy_install, Amazon Linux uses "yum" as its easy installation system, and it is possible to install "pip" and "easy_install" to install new python packages. As I've tried to install new software on my box, I've found lots and lots of references to sudo-apt as the standard way to install

#sherlock & the power of the retweet

Much has been made over the last few days of Sherlock writer Steven Moffat's views on people who tweet whilst watching TV. Whilst watching it last night, I kept an eye on the tweets during the show and there was clearly a lot of volume going through the Twitter-sphere.

Marketing Mix Lab: Multicollinearity and Ridge Regression

In marketing mix modelling you have to be very lucky not to run into problems with multicollinearity. It's in the nature of marketing campaigns that everything tends to happen at once: the TV is supported by radio, both are timed to coincide with the relaunch of the website. One of the techniques that is often

Making an R Package: Not as hard as you think

I've been writing functions in R for a while to do various things like talking to APIs, web scraping, model testing and data visualisation (basically thing which can get a bit repetitive!), but have always been slightly intimidated about turning those functions into a package, which I could then call using library (package-name).

