A valuable tool in choice modelling is the Dirichlet-multinomial distribution. It’s a compound of the multinomial and Dirichlet distributions and it works like this: A choice between N options is modelled as a multinomial distribution with parameters θ1, θ2, θ3 … θN, where the thetas also represent the probabilities of each option being chosen. For … Continue reading
It is still quite common to hear the career progression of an analyst described as one upwards from the hard graft of coding and “getting your hands dirty” towards the enviable heights of people management and strategic thinking. Whenever I hear this it reminds me of the book Conspicuous Consumption by the American economist Thorstein … Continue reading
There are so many flavours of everything and things are changing so quickly that I find every task researched online ends up being a set of instructions stitched together from several blogs and forums. Here’s a couple of recent ones. Ubuntu on AWS (50 mins) Was going to buy a new laptop but it made … Continue reading
I can’t claim to be anything near an expert on D3 (a JavaScript library for data visualisation) but being both greedy and lazy I wondered if I could get some nice results with minimum effort. In any case the hardest thing about D3 for a novice to the world of web design seems to be … Continue reading
This one came about because I was searching for a data set on horror films (don’t ask) and ended up with one describing the links between philosophers. To cut a long story very short I’ve extracted the information in the influenced by section for every philosopher on Wikipedia and used it to construct a network … Continue reading
We quite regularly use genetic algorithms to optimise over the ad-hoc functions we develop when trying to solve problems in applied mathematics. However it’s a bit disconcerting to have your algorithm roam through a high dimensional solution space while not being able to picture what it’s doing or how close one solution is to another. … Continue reading
This is a very quick post just to share a quick tip on how to add non overlapping labels to a scatterplot in ggplot using a great package called directlabels. The trick is to make each point a single member group using an aesthetic like colour and then apply the direct.label function with the first.qp … Continue reading
Following on from the previous post here is an R function for visualising correlations between the explanatory variables in your data set. An interesting example is the North Carolina Crime data set that comes with the plm package. This has the following continuous variables: crmrte crimes committed per person prbarr probability of arrest prbarr probability … Continue reading
In marketing mix modelling you have to be very lucky not to run into problems with multicollinearity. It’s in the nature of marketing campaigns that everything tends to happen at once: the TV is supported by radio, both are timed to coincide with the relaunch of the website. One of the techniques that is often … Continue reading
Our statistics lecturers would often end each session with a demonstration of the power of the statistical model under discussion. This would usually mean generating some artificial data and showing how good the tool was at recovering the parameters or correctly classifying the observations. It was highly artificial but had a very useful feature: you … Continue reading