# Random Coefficients Logit Tutorial - PyBLP


This tutorial follows Section 4 of the paper from Aviv Nevo (2000): "A practitioner’s guide to estimation of random‐coefficients logit models of demand.  *Journal of Economics & Management Strategy*, 9 (4), 513-548".

The paper shows a possible application of the random-coefficients Logit model. We are going to use the same data and solve the paper’s cereal problem. The data is fake, and should only be used to learn the method.

We will use the PyBLP package for Python 3. Documentation for this package can be found at https://pyblp.readthedocs.io/en/stable/index.html .

### Theory of Random Coefficients Logit
This method retains the benefits of simpler discrete-choice models: it can be estimated using only market-level price and quantity data and it deals with the endogeneity of prices. Moreover, it returns more realistic demand elesticities than Logit/Nested Logit models. 

The chosen specification of the indirect utility of consumer $i$ from consuming product $j$ in market $t$ is:

$u_{ijt} = \alpha_i p_{jt} + x'_{jt} \beta_i + \xi_{jt} + \epsilon_{ijt}$

Where $p_{jt}$ is the price, $x'_{jt}$ is the (row) vector of $K$ observable characteristics of the product, $\epsilon_{ijt}$ is the mean-zero stochastic term,  i.i.d. distributed with the Type I Extreme Value (Gumbel) distribution, and $\xi_{jt}$ is the unobserved (by the econometrician) characteristics.

This specification can be derived from a quasilinear utility function (free of wealth effects) because of the way price enters the indirect utility function. Including wealth effects could be more reasonable for other types of products (e.g. cars). Notice that $\xi_{jt}$, which among other things captures the elements of vertical product differentiation, is identical for all consumers, while $\alpha_i$ varies: this is consistent with the theoretical literature of vertical product differentiation.

The mean utility of the outside good is normalized to zero, so that $u_{i0t} = \epsilon_{i0t}$.

We can separate the linear component of utility from the non-linear one: $u_{ijt} = \delta_{jt} + \mu_{ijt} \;$, where $\delta_{jt} = \alpha p_{jt} + x'_{jt} \bar\beta + \xi_{jt} \;$ is the aspects of mean utility that all individuals agree on, and $\mu_{ijt}(\theta) \;$ is the individual specific heterogeneity (let $\theta$ be a vector with all parameters of the model).

Consumers are assumed to purchase one unit of the good that gives the highest utility. The set of individual attributes that lead to the choice of good $j$ is:

$A_{jt}(\delta) = \{\mu_i \; | \; \delta_{jt} + \mu_{ijt} > \delta_{j't} + \mu_{ij't},\; \text{for all} \; j' \neq j \}$

Therefore, the estimated market share of product $j$ is:

$S_{jt}(\delta_{jt}, \theta) = \int_{A_{jt}} \frac{\exp{(\delta_{jt} + \mu_{ijt})}}{1+\sum_k \exp{(\delta_{kt} + \mu_{ikt})}}d\mu $

For each $\theta$ there is a unique $\delta_{jt}(\theta)$ that solves $S^{obs}_{jt} - S_{jt}(\delta_{jt}, \theta) = 0 \;$ (Berry, 1994). This system of equations is nonlinear and is solved numerically. It can be solved by using the contraction mapping suggested by BLP(1995), which means computing the series:

$\delta_{jt}^{h+1} = \delta_{jt}^{h} + ln(S^{obs}_{jt}) - ln(S_{jt}(\delta_{jt}^{h}, \theta))\;\;\;\;$ (see BLP Appendix I for proof of convergence)

where $h=0, \ldots, H$, $H$ is the smallest integer such that $||\delta_{jt}^{H}-\delta_{jt}^{H-1}||$ is smaller than some tolerance level, and $\delta_{jt}^{H}$ is the approximation to $\delta_{jt}$. In words, we begin evaluating the right-hand side of the series at some initial guess for $\theta$ and $\delta_{jt}$, obtain a new $\delta_{jt}^{h}$, substitute $\delta_{jt}^{h}$ back into the right-hand side of the series, and repeat the process until convergence. 

We then compute the error term vector $\hat\xi_{jt}(\theta)$. Let $z_{jt}$ be a set of instruments such that $E[z'_{jt} \xi_{jt}(\theta)] = 0$. The GMM estimate is then

$\hat\theta_{GMM} = \underset{\theta}{\operatorname{argmin}} \; \xi(\theta)' z_{jt} \Phi^{-1} z'_{jt} \xi(\theta)$

where $\Phi$ is the variance-covariance matrix of the moments. The inverse is used to give less weight to those moments that have higher variance.

### Specification of Random Taste Parameters

We have to specify an initial guess of the nonlinear parameters. This serves two primary purposes: speeding up estimation and indicating to the solver through initial values of zero which parameters are restricted to be always zero.
It is common to assume that the random taste parameters follow a multivariate normal distribution, and to break them up into three parts:

$\begin{pmatrix} \alpha_i \\ \beta_i \end{pmatrix} =\begin{pmatrix} \alpha \\ \beta \end{pmatrix} + \Pi d_i + \Sigma v_i$.

where $\alpha$ and $\beta$ are the mean taste which all individuals agree on, $d_i$ is a $D\times1$ vector of known demographic variables, $\Pi$ is a $(K+1)\times D$ matrix of coefficients that measure how the taste characteristics vary with demographics, $\Sigma$ is a $(K+1)\times (K+1)$ matrix of parameters, and $v_i$ represents unknown individual characteristics. We cannot directly observe individual data about $d_i$ and $v_i$. The difference between the two is that we know something about the distribution of demographics $d_i$ (e.g. thorugh census data).

### The Data
The data used for the analysis below consists of `shares` and `prices` for 24 brands of breakfast cereals (a differentiated product ) in 47 cities over 2 quarters (`quarter`). `market_ids` are the unique market identifiers (which we subscript $t$). Whithin a market, the sum of all `shares` must be less than 1. Firm and brand are identified by columns `firm_ids`, `product_ids`. There are two product characteristics: `Sugar`, which measures sugar content, and `Mushy`, a dummy variable equal to one if the product gets soggy in milk. There are 20 pre-computed instruments (`demand_instruments0`, ... , `demand_instruments19`). These represent only the excluded instruments. The exogenous regressors will be automatically added to the set of instruments. Finally, demographic variables include the log of income (`Income`), the log of income squared, (`Income Sq`), `Age`, and `Child`, a dummy variable equal to one if the individual is less than sixteen.




## Explaining the Results