Figure 5.3 appears to be total log revenue rather than average, conflicts with text; model uses mean(log(revenue)) rather than log(mean(revenue)) #17

shane-kercheval · 2020-12-28T18:10:22Z

Pg 143/144 & https://github.com/TaddyLab/BDS/blob/master/examples/paidsearch.R

Text:

Figure 5.3 shows the log difference between average revenues in each group.

Caption:

The log-scale average revenue difference ..

Although, in the code, both plots are using totalrev and are created before semavg is defined.

The total vs average log differences will produce the same pattern on different scales, but initially confused me as I walked through the code/example.

Related, let's assume the graphs plot the mean instead of total, so it is the same as the model.

The graphs first take the average (or total in the current code) and then take the log of the average. (i.e. log(mean(revenue)))

The model uses y from semavg which takes the log and then the mean. In the code, y is defined as y=mean(log(revenue)))

Whether we use sum or mean in the model, it seems like would want to take the log after the mean. This seems especially true if we were going to use sum rather than mean.

Original Code (mean(log(revenue)))

library(data.table)
sem <- as.data.table(sem)
sem_avg_log <- sem[, 
			list(d=mean(1-search.stays.on), y=mean(log(revenue))), 
			by=c("dma","treatment_period")]
setnames(sem_avg_log, "treatment_period", "t") # names to match slides
sem_avg_log <- as.data.frame(sem_avg_log)
coef(glm(y ~ d*t, data=sem_avg_log))['d:t']

gives -0.006586852

log(mean(revenue)):

sem_log_avg <- sem[, 
			list(d=mean(1-search.stays.on), y=log(mean(revenue))), 
			by=c("dma","treatment_period")]
setnames(sem_log_avg, "treatment_period", "t") # names to match slides
sem_log_avg <- as.data.frame(sem_log_avg)
coef(glm(y ~ d*t, data=sem_log_avg))['d:t']

gives -0.005775498

If we were to use sum rather than mean and then log i.e. log(sum(revenue))

sem_log_sum <- sem[, 
			list(d=mean(1-search.stays.on), y=log(sum(revenue))), 
			by=c("dma","treatment_period")]
setnames(sem_log_sum, "treatment_period", "t") # names to match slides
sem_log_sum <- as.data.frame(sem_log_sum)
coef(glm(y ~ d*t, data=sem_log_sum))['d:t']

gives -0.005775498, which is the same as log(mean(revenue))

If we were to do sum(log(revenue)) which would clearly be wrong because the control is a larger group, then we'd get -0.2534986...

Is there a reason we should specifically use mean(log(revenue)) rather than log(mean(revenue))?

The text was updated successfully, but these errors were encountered:

mataddy · 2021-02-08T04:35:39Z

@shane-kercheval many thanks for this. Apologies for the delayed reply, I'm revising the book for a new addition (will be pretty cool; we're making it into a much more readable full-service text) and I just got to this section for revision.

You are absolutely correct! I got the order of operations mixed up and the plot descriptions also. The fixed analysis script is:

`
library(data.table)
ebay <- as.data.table(ps)
ebay <- ebay[,list(ssm.turns.off=mean(1-search.stays.on),
revenue=mean(revenue)),
by=c("dma","treatment_period")]
setnames(ebay, "treatment_period", "post.treat")
ebay <- as.data.frame(ebay)
head(ebay)

run the DiD analysis

did <- glm(log(revenue) ~ ssm.turns.off*post.treat, data=ebay)
coef(did)

library(sandwich)
library(lmtest)
coeftest(did, vcov=vcovCL(did, cluster=ebay$dma))`

Results are exactly you describe. I've also attached updated draft section in case you are curious. Comments/errata are highly welcome :-)
draft.pdf

joshualeond · 2021-04-30T21:04:00Z

@mataddy, maybe not the best place for this question but do you have an idea of when the 2nd edition could be released? Thanks!

mataddy · 2021-05-01T00:20:34Z

Soon! Chapters are with production now. I'd say summer if I'm optimistic.

joshualeond · 2021-11-11T13:22:20Z

I'm at risk of being a bit obnoxious here on Github but was curious if there was an update on the 2nd edition?

mataddy · 2021-11-11T14:07:49Z

Soon! It's with production, so hopefully early 2022.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure 5.3 appears to be total log revenue rather than average, conflicts with text; model uses mean(log(revenue)) rather than log(mean(revenue)) #17

Figure 5.3 appears to be total log revenue rather than average, conflicts with text; model uses mean(log(revenue)) rather than log(mean(revenue)) #17

shane-kercheval commented Dec 28, 2020

mataddy commented Feb 8, 2021

joshualeond commented Apr 30, 2021

mataddy commented May 1, 2021

joshualeond commented Nov 11, 2021

mataddy commented Nov 11, 2021