In [1]:
%matplotlib inline

# Sparse Estimation:

## Data Sharing and Additive Smoothing

## Motivating Example: Predicting Clicks and Actions

In **Performance Advertising**, we are interested in the following likelihoods:
- $p \in (0,1)$: that an _impression_ will result in a _click_
- $q \in (0,1)$: that an _impression_ or _click_ will result in an _install_ or _opt-in_ (in general an _action_)

For which maximum likelihood estimates are essentially:
$$
\hat{p} = \frac{clicks}{impressions}, or \quad \hat{q} = \frac{actions}{clicks}.
$$

#### who: `unique_id`, what: `advertiser`, when: `day_of_week`, where: `station_id, zipcode`

## Why is this a difficult problem?

- **statistical heterogeneity:** some stations have a lot of devices listening, some have only a few
- **size of the problem space:** we tracked 168M+ devices new devices since 2021-01-01 alone
- **sparsity:** click through rates (CTR) are at best on the order to $10^{-1}$, with similar magnitudes for errors

## Additive Smoothing

### Paid Search User Aquisition: Conversion rates for _opt-in_

- Assuming the _actions_ are the success in **Binomial(clicks, $\theta$)** trials
- Laplace $\epsilon$-smoothing: $$\hat{\theta} = \frac{actions + \epsilon}{clicks + 2\epsilon}, $$ instead of the **MLE** estimate, $\displaystyle \frac{actions}{clicks}$
- with $\epsilon \in \mathbb{R}^{+}$
- lacking specificity

## Data Sharing

### Paid Search User Aquisition: Conversion rates for _opt-in_

- Estimating of the conversion rate of a keyword $k$: $\displaystyle \hat{q}_k = \frac{actions_k}{clicks_k}$ 
- Using the conversion rate estimate of the campaign $c$ that the keyword $k$ is a part of: $\displaystyle \hat{q}_{k|c} = \frac{actions_c + actions_k}{clicks_c + clicks_k}$ 
- The errors of using the campaign $\hat{q}_c$ estimate for the keyword $k$: $$Errors_{(k|c)} = (actions_k - \hat{q}_{k|c} \cdot clicks_k)$$

## Beta-Binomial Estimation

### Paid Search User Aquisition: Conversion rates for _opt-in_

- Use campaign $c$ level data to fit beta distribution with parameters: $(\alpha_c, \beta_c)$
- Use those beta priors to estimate the keyword $k$ level conversion rate: $$ \hat{q}_{k|\alpha_c, \beta_c} = \frac{\alpha_c + actions_k}{\alpha_c + \beta_c + clicks_k}.$$


- Binomial distribution are a reasonable model for the $p$ and $q$ rates.
- Beta and Binomial distributions are conjugate priors

- We have done this for spend, in that their `log` seems to be Exponentially distributed
- Using the fact that Gamma and Exponential distributions are conjugate priors

- Campaign is just one level of aggregation!