You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Figure 5.3 shows the log difference between average revenues in each group.
Caption:
The log-scale average revenue difference ..
Although, in the code, both plots are using totalrev and are created before semavg is defined.
The total vs average log differences will produce the same pattern on different scales, but initially confused me as I walked through the code/example.
Related, let's assume the graphs plot the mean instead of total, so it is the same as the model.
The graphs first take the average (or total in the current code) and then take the log of the average. (i.e. log(mean(revenue)))
The model uses y from semavg which takes the log and then the mean. In the code, y is defined as y=mean(log(revenue)))
Whether we use sum or mean in the model, it seems like would want to take the log after the mean. This seems especially true if we were going to use sum rather than mean.
Original Code (mean(log(revenue)))
library(data.table)
sem <- as.data.table(sem)
sem_avg_log <- sem[,
list(d=mean(1-search.stays.on), y=mean(log(revenue))),
by=c("dma","treatment_period")]
setnames(sem_avg_log, "treatment_period", "t") # names to match slides
sem_avg_log <- as.data.frame(sem_avg_log)
coef(glm(y ~ d*t, data=sem_avg_log))['d:t']
@shane-kercheval many thanks for this. Apologies for the delayed reply, I'm revising the book for a new addition (will be pretty cool; we're making it into a much more readable full-service text) and I just got to this section for revision.
You are absolutely correct! I got the order of operations mixed up and the plot descriptions also. The fixed analysis script is:
Pg 143/144 & https://github.com/TaddyLab/BDS/blob/master/examples/paidsearch.R
Text:
Caption:
Although, in the code, both plots are using
totalrev
and are created beforesemavg
is defined.The total vs average log differences will produce the same pattern on different scales, but initially confused me as I walked through the code/example.
Related, let's assume the graphs plot the
mean
instead oftotal
, so it is the same as the model.The graphs first take the average (or total in the current code) and then take the log of the average. (i.e.
log(mean(revenue))
)The model uses
y
fromsemavg
which takes the log and then the mean. In the code,y
is defined asy=mean(log(revenue)))
Whether we use
sum
ormean
in the model, it seems like would want to take the log after the mean. This seems especially true if we were going to usesum
rather thanmean
.Original Code (
mean(log(revenue))
)gives
-0.006586852
log(mean(revenue))
:gives
-0.005775498
If we were to use
sum
rather thanmean
and then log i.e.log(sum(revenue))
gives
-0.005775498
, which is the same aslog(mean(revenue))
If we were to do
sum(log(revenue))
which would clearly be wrong because the control is a larger group, then we'd get-0.2534986
...Is there a reason we should specifically use
mean(log(revenue))
rather thanlog(mean(revenue))
?The text was updated successfully, but these errors were encountered: