//
you're reading...
Marketing Mix Lab

Marketing Mix Lab: Generating Artificial Sales Data

Our statistics lecturers would often end each session with a demonstration of the power of the statistical model under discussion. This would usually mean generating some artificial data and showing how good the tool was at recovering the parameters or correctly classifying the observations. It was highly artificial but had a very useful feature: you knew the true mechanism behind the data so you could see how good your model was at getting at the truth.

We work with marketing data, building models to understand the effect of marketing activity on sales. Of course here, as in any real world situation, we don’t know which mechanism generated the data (that’s what we are trying to find out). But we can get an idea of how good our tools are by testing them out on artificial data in the way we described above. If they don’t work here in these highly idealised situations then we ought to be concerned.

In this series I’m going to take some very simple simulated data sets and look at how well some of the best known marketing mix modelling techniques do at getting back to the true values. I will start by looking at LDSV (Least Squares Dummy Variables) models and then move on to mixed effects and Bayesian modelling.

There’s one other thing worth mentioning before we get started. With our simulated data sets we are able to turn the usual situation on its head and vary the data set rather than the modelling approach. This means we can ask questions like: under what conditions do our models work best?

Building an artificial data set

Our world will be very simple. Weekly sales will follow an overall linear trend to which we will add an annual seasonal cycle which we imagine to be a function of temperature (simulated using a sine wave). On top of that we need some marketing activity which we will add as TV adstock. Finally we will add some noise by simulating from a normal distribution. The final data generating equation looks like this:

sales_t = \alpha + \theta_1 week_t + \theta_2 temp_t + \theta_3 adstock_t + \epsilon_t

where \epsilon \sim N(0, \sigma^2)

and adstock is defined recursively as

adstock_t= 1-e^{-\frac{GRPs_t}{\phi}} + \lambda adstock_{t-1}

I have generated this data set in R (we will use R throughout – if you are unfamiliar with this language please see the R homepage).

It would also be nice if we could vary the parameters to generate different sets of data so I have created the whole thing as an R function with the parameters as arguments.

# *--------------------------------------------------------------------
# | FUNCTION: create_test_sets
# | Creates simple artifical marketing mix data for testing code and
# | techniques
# *--------------------------------------------------------------------
# | Version |Date      |Programmer  |Details of Change
# |     01  |29/11/2011|Simon Raper |first version.
# *--------------------------------------------------------------------
# | INPUTS:  base_p         Number of base sales
# |          trend_p        Increase in sales for every unit increase
# |                         in time
# |          season_p       The seasonality effect will be
# |                         season_p*temp where -10<temp<10
# |          ad_p           The coefficient for the adstock
# |          dim            The dim parameter in adstock (see below)
# |          dec            The dec parameter in adstock (see below)
# |          adstock_form   If 1 then the form is:
# |                         ad_p*(1-exp(-GRPs/dim)+dec*adstock_t-1)
# |                         If 2 then the form is:
# |                         ad_p*(1-exp(-(GRPs+dec*GRPs_t-1)/dim)
# |                         Default is 1.
# |          error_std      Standard deviation of the noise
# *--------------------------------------------------------------------
# | OUTPUTS: dataframe      Consists of sales, temp, tv_grps, week,
# |                         adstock
# |
# *--------------------------------------------------------------------
# | USAGE:   create_test_sets(base_p,
# |                           trend_p,
# |                           season_p,
# |                           ad_p,
# |                           dim,
# |                           dec,
# |                           adstock_form,
# |                           error_std)
# |
# *--------------------------------------------------------------------
# | DEPENDS: None
# |
# *--------------------------------------------------------------------
# | NOTES:   Usually the test will consists of trying to predict sales
# |          using temp, tv_grps, week and recover the parameters.
# |
# *--------------------------------------------------------------------

#Adstock functions

 adstock_calc_1<-function(media_var,dec,dim){
	length<-length(media_var)
	adstock<-rep(0,length)
	for(i in 2:length){
		adstock[i]<-(1-exp(-media_var[i]/dim)+dec*adstock[i-1])
	}
 adstock
 }

adstock_calc_2<-function(media_var,dec,dim){
	length<-length(media_var)
	adstock<-rep(0,length)
	for(i in 2:length){
		adstock[i]<-1-exp(-(media_var[i]+dec*media_var[i-1])/dim)
	}
adstock
}

#Function for creating test sets

create_test_sets<-function(base_p, trend_p, season_p, ad_p, dim, dec, adstock_form, error_std){

  #National level model

  #Five years of weekly data
  week<-1:(5*52)

  #Base sales of base_p units
  base<-rep(base_p,5*52)

  #Trend of trend_p extra units per week
  trend<-trend_p*week

  #Winter is season_p*10 units below, summer is season_p*10 units above
  temp<-10*sin(week*3.14/26)
  seasonality<-season_p*temp

  #7 TV campaigns. Carry over is dec, theta is dim, beta is ad_p,
  tv_grps<-rep(0,5*52)
  tv_grps[20:25]<-c(390,250,100,80,120,60)
  tv_grps[60:65]<-c(250,220,100,100,120,120)
  tv_grps[100:103]<-c(100,80,60,100)
  tv_grps[150:155]<-c(500,200,200,100,120,120)
  tv_grps[200:205]<-c(250,120,200,100,120,120)
  tv_grps[220:223]<-c(100,100,80,60)
  tv_grps[240:245]<-c(350,290,100,100,120,120)

  if (adstock_form==2){adstock<-adstock_calc_2(tv_grps, dec, dim)}
  else {adstock<-adstock_calc_1(tv_grps, dec, dim)}
  TV<-ad_p*adstock

  #Error has a std of error_var
  error<-rnorm(5*52, mean=0, sd=error_std)

  #Full series
  sales<-base+trend+seasonality+TV+error

  #Plot
  #plot(sales, type='l', ylim=c(0,1200))

  output<-data.frame(sales, temp, tv_grps, week, adstock)

  output

}

Here is a line graph showing a simulated sales series generated with the following parameters:

 #Example
 test<-create_test_sets(base_p=1000,
                        trend_p=0.8,
                        season_p=4,
                        ad_p=30,
                        dim=100,
                        dec=0.3,
                        adstock_form=1,
                        error_std=5)
 
 library(ggplot2)
 #Plot the simulated sales
 ggplot(data=test, aes(x=week, y=sales))+geom_line(size=1)+ opts(title ="Simulated Sales Data")
 
 

 

I’ve found these simulated data sets useful not only for experiments but also for debugging code (since we know exactly what to expect from them) and as toy examples to give to trainee analysts as templates for future models.

With marketing mix models we often work with hierarchical data (e.g. sales in stores in regions). In the next post I will provide some code to build regional data sets. Following that we will get to work on the modelling.

About these ads

About simonraper

I am an RSS accredited statistician with over 15 years’ experience working in data mining and analytics and many more in coding and software development. My specialities include machine learning, time series forecasting, Bayesian modelling, market simulation and data visualisation. As Data Scientist at Channel 4 my role is to develop machine learning solutions that allow the channel to build a deeper relationship with the viewer and innovate the way advertising is traded and work on supporting the creative side of the business. My current interests are in scalable machine learning (Mahout, Hadoop), interactive visualisatons (D3 and similar) and applying the methods of agile software development to analytics. I have worked for Mindshare, News International, Credit Suisse and AOL. I am co-author with Mark Bulling of Drunks and Lampposts - a blog on computational statistics, machine learning, data visualisation, R, python and cloud computing. It has had over 270 K visits and was mentioned in Flowing Data, I09, and the online editions of The New York Times and The New Yorker.

Discussion

5 thoughts on “Marketing Mix Lab: Generating Artificial Sales Data

  1. Fantastic stuff, cant wait to see more! I have no experience in market mix modeling but do in CRM/database marketing and looking to expand my skill set from modeling that sort of environment to this. Happy I stumbled upon your blog!

    Posted by Jeff | February 8, 2012, 4:11 pm
    • Thanks Jeff. That’s very nice to hear. I’ve posted a couple more items on this subject (there’s one on ridge regression and one on visualising multi-collinearity) and I hope to add some more soon. Good luck with expanding your skills into market mix modelling.

      Cheers

      Simon

      Posted by simonraper | February 10, 2012, 8:18 am
  2. Hey Simon, I am wondering if you can recommend any good texts to learn market mix modeling? Also, can you recommend a technique to use – do you typically use Arima with regressors or gls for this? Any recommendations for learning is appreciated. Thanks!

    Posted by Jeff | July 19, 2012, 11:59 pm
  3. HI, thanks for the nice article. Have you created any R package to implement the MMM?

    Posted by Hung Ta | November 18, 2013, 1:09 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog Stats

  • 287,739 hits

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 480 other followers

Follow

Get every new post delivered to your Inbox.

Join 480 other followers

%d bloggers like this: