# Ordered Multinomial Choice
---
Here we will look at extensions of the binary-choice model form the last class to incorporate multiple possible outcome. However, we will do this under the assumption that we know that the choices are fully ordered.

This ordering is known by the researcher/analyst, so that the ordinal choice across the outcomes can be written as an integer.

For example, you've sent out a survey to your customers on their satisfaction, and you included a term a five-point *likert* scale on their likelihood of recommending your product to a friend:
1.  Strongly Disagree
2. Disagree
3. Neutral
4. Agree
5. Strongly Agree

You're trying to figure out which characteristics/experiences create strong valence to your product. One way of doing this is to take the category you care most about and make it binary!

So for example, you can code *Agree* and *Strongly Agree* as 1, and *Strongly Disagree* through *Neutral* as a zero. Then you can use a probit or a logit as before.

However, if you want to understand the differences across the categories, then you would use the fact that you know the outcomes are ordered to generate an estimation using a latent variable as before.

Here we will again model a linear predictor $\eta_i=x^T_i\beta$, where
$$y^\star_i=x^T_i\beta+\epsilon_i,$$
and $\epsilon_i$ will have a fixed distribution (typically logistic or Normal).

However, on top of this if we have $m$ different ordered outcomes we also model a threshold quantity between each of the ordered outcomes:
* $\zeta_{0,1}$ for the threshold between choices 0 and 1
* $\zeta_{1,2}$ for the threshold between choices 1 and 2
* $\ldots$

Suppose that our firm has three levels of service and we examine the ordered outcome for each potential customer
* No purchase (Option 0)
* Basic package (Option 1)
* Upgrade package (Option 2)
* Deluxe package (Option 3)

The limited-dependent variable representation of this choice would be:
$$y=\begin{cases}
3 \text{ (Deluxe)} & \text{ if }x_i^T\beta +\epsilon_i \geq \zeta_{2,3} \\
2 \text{ (Upgrade)} & \text{ if }\zeta_{23}\geq x_i^T\beta +\epsilon_i \geq \zeta_{1,2} \\
1 \text{ (Basic)} & \text{ if }\zeta_{12}\geq x_i^T\beta +\epsilon_i \geq \zeta_{0,1} \\
0 \text{ (No purchase)} & \text{ otherwise (so }\zeta_{01}\geq x_i^T\beta +\epsilon_i \text{.)}
\end{cases}$$
Based on the constants $\zeta_{0,1}<\zeta_{1,2}<\zeta_{2,3}$

So, someone with observable characteristics given by $x_i$ would have a variable level effect of $x_i^T\beta$ (note, no intercept in here) has a probability of selecting each option governed by the likelihood the error is in the shaded regions:
![Model](https://alistairjwilson.github.io/MQE_AW/i/OrderedLogit.svg)


As  we then shift the characteristics given by $x_i$ (and so moving the modified $x_i^T\beta$ up and down), the effect is to modify the size of each region:
![Animation](https://alistairjwilson.github.io/MQE_AW/i/OrderedLogit.gif)

The model is estimated via maximum likelihood using the assumed distribution for the error $\epsilon$.

For example, if there were no other covariates and we were just estimating the crossing points and we had:
* 50 who don't purchase ($y=0$)
* 100 who purchase a basic product ($y=1$)
* 15 who purchase an upgraded package ($y=2$)

Under the assumption that the error is logistic, with CDF $\frac{e^x}{1+e^x}$, the log-likelihood of the data is then:
$$ 50 \log\left( \frac{e^{\zeta_{01}}}{1+e^{\zeta_{01}}} \right) +100\log\left(
\frac{e^{\zeta_{12}}}{1+e^{\zeta_{12}}}-\frac{e^{\zeta_{01}}}{1+e^{\zeta_{01}}}
\right)+15\log\left(1-\frac{e^{\zeta_{12}}}{1+e^{\zeta_{12}}}\right).$$

Which is maximized at $\hat{\zeta}_{01}=-0.833$ and $\hat{\zeta}_{12}=2.303$

Under the assumption that the error is Normal, with CDF $\Phi(\cdot)$, the log-likelihood of the data is then:
$$ 50 \log\left(\Phi(\zeta_{0,1})\right) +100\log\left(\Phi(\zeta_{1,2})-\Phi(\zeta_{0,1})\right)+15\log\left(1-\Phi(\zeta_{1,2})\right).$$

Which is maximized at $\hat{\zeta}_{01}=-0.516$ and $\hat{\zeta}_{12}=1.335$

Despite the seemingly large differences in the numbers though, when you plug these estimates back into the relevant distributions, the inferences are identical. For example, consider the probability of purchasing a basic product:
![Probit vs Logit](http://alistairjwilson.github.io/MQE_AW/i/OLogitVOProbit.svg)

Because there are no other covariates here, the model in each case is setting the intercept parameters to ensure that the probability of lying  in the relevant region is exactly the empirical incidence (so 100/165 for the *basic* purchases).

## Data example
Here I'm using data form the 2020 [National Youth Tobacco Survey](https://www.cdc.gov/tobacco/data_statistics/surveys/nyts/data/index.html) on "eCig" (vapes, etc) usage.

Technically I'm joining together two variables, one on being a current user, and another for non-users on the curiosity, where I ranked/labeled the data outcomes via:

```factor(eCig$eCigUse,ordered=TRUE,labels=
c("User","Definitely.Try","Probably.Try","Probably.Not.Try",'Definitely.Not.Try')) ```

In [1]:
load(file='eCig/eCig.rdata')

The rankings of the outcomes here are:

0. Have used an e-Ciagarette/Vape
1. Have not used, but stated would *Definitely Try*
2. *Probably Try*
3. *Probably Not Try*
3. *Definitely Not Try*


### Ordered Logit
First, we'll estimate an Ordered Logit (the standard) where the errors are distributed according to a logistic distribution:

In [2]:
summary(eCigUse$Age)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   9.00   13.00   14.00   14.31   16.00   19.00      30 

In [3]:
head(eCigUse$eCigUse)

In [4]:
library(MASS)
vape.ologit <- polr( eCigUse ~ female+black+hispanic+as.factor(Age), data=eCigUse)
summary(vape.ologit)


Re-fitting to get Hessian




Call:
polr(formula = eCigUse ~ female + black + hispanic + as.factor(Age), 
    data = eCigUse)

Coefficients:
                   Value Std. Error t value
femaleTRUE        0.1153    0.03349  3.4431
blackTRUE         0.6066    0.05989 10.1278
hispanicTRUE     -0.2034    0.03663 -5.5538
as.factor(Age)10  1.3879    0.62652  2.2153
as.factor(Age)11  2.4927    0.40704  6.1239
as.factor(Age)12  2.0306    0.40278  5.0415
as.factor(Age)13  1.6284    0.40225  4.0483
as.factor(Age)14  1.2458    0.40228  3.0969
as.factor(Age)15  0.8685    0.40247  2.1578
as.factor(Age)16  0.6496    0.40249  1.6141
as.factor(Age)17  0.3557    0.40269  0.8834
as.factor(Age)18  0.2763    0.40523  0.6819
as.factor(Age)19  0.6510    0.46899  1.3880

Intercepts:
                                    Value   Std. Error t value
User|Definitely.Try                  0.1457  0.4007     0.3636
Definitely.Try|Probably.Try          0.1677  0.4008     0.4185
Probably.Try|Probably.Not.Try        0.3189  0.4008     0.7956
Probably

So if we had a black female 14-year-old , the model would specify an outcome of:
$$\eta_i= 0.1153 +0.6066 +1.2458 =1.9677$$
While a hispanic male 18-year-old:
$$\eta_i=-0.2034+0.2763=0.0729$$

Given these observables, using the model we can illustrate the probabilities of the modal category as:

![Animation](https://alistairjwilson.github.io/MQE_AW/i/eCigUse.svg)

Using the logistic distribution we can read in the probabilities of the shaded regions in the above graph as:

In [5]:
library(stats)   
c( 1-plogis(1.1347 - 1.9677), plogis(0.145-0.0729), 1-plogis(1.13475-0.0729) ) 

We can also estimate the model using the assumption that the error terms are Normally distributed, in which case we specify that we are using a probit formulation:

In [6]:
vape.oprobit <- polr(eCigUse ~ female+black+hispanic+as.factor(Age), data=eCigUse, method = "probit")
summary(vape.oprobit)


Re-fitting to get Hessian




Call:
polr(formula = eCigUse ~ female + black + hispanic + as.factor(Age), 
    data = eCigUse, method = "probit")

Coefficients:
                    Value Std. Error t value
femaleTRUE        0.06697    0.02027  3.3035
blackTRUE         0.35255    0.03512 10.0377
hispanicTRUE     -0.12281    0.02229 -5.5093
as.factor(Age)10  0.83705    0.38070  2.1987
as.factor(Age)11  1.53308    0.24730  6.1993
as.factor(Age)12  1.25015    0.24513  5.0999
as.factor(Age)13  0.99879    0.24491  4.0783
as.factor(Age)14  0.75434    0.24496  3.0795
as.factor(Age)15  0.51603    0.24508  2.1055
as.factor(Age)16  0.37968    0.24507  1.5493
as.factor(Age)17  0.20739    0.24518  0.8458
as.factor(Age)18  0.16204    0.24668  0.6569
as.factor(Age)19  0.37971    0.28579  1.3287

Intercepts:
                                    Value   Std. Error t value
User|Definitely.Try                  0.0729  0.2440     0.2988
Definitely.Try|Probably.Try          0.0861  0.2440     0.3528
Probably.Try|Probably.Not.Try        0

The model here actually does slightly better at organizing the data (using the AIC output), though the fundamental probabilities are not too distinct. Using the stored coefficients and the intercepts (stored as `zeta`) let's assemble the probabilities for:
* A black female 14 year old being "Definitely Not"
* A hispanic male 18 year old being "Has used"

Looking at the variables first to get a sense for the names (`names(vape.oprobit)`) the main variables are stored in `ceofficients` and `zeta`

In [7]:
vape.oprobit$coefficients
vape.oprobit$zeta

We then assemble the probabilities using the `pnorm` function as:

In [8]:
c(
"black.f.14"= 1-pnorm(vape.oprobit$zeta["Probably.Not.Try|Definitely.Not.Try"] 
-(  vape.oprobit$coefficients["femaleTRUE"]+
    vape.oprobit$coefficients["blackTRUE"]+
    vape.oprobit$coefficients["as.factor(Age)14"]) )
,
"Used.hisp.m.18"= pnorm( vape.oprobit$zeta["User|Definitely.Try"] 
- (vape.oprobit$coefficients["hispanicTRUE"]+vape.oprobit$coefficients["as.factor(Age)18"]) ) 
)

Which we can compare to the probabilities from our logit model:

In [9]:
c( 1-plogis(1.1347 - 1.9677), plogis(0.145-0.0729) )

So some, but not major differences.

The one other term that is probably worth diving into a little here is the `fitted.values`  a matrix of probability for being in each category for each data point:

In [13]:
head( vape.oprobit$fitted.values  )
nrow(eCigUse)

Unnamed: 0,User,Definitely.Try,Probably.Try,Probably.Not.Try,Definitely.Not.Try
1,0.3743617,0.00499946,0.03497514,0.1968346,0.3888291
2,0.2657852,0.004335993,0.03082128,0.1899165,0.509141
3,0.349268,0.004884539,0.03429075,0.1968281,0.4147286
4,0.3049898,0.004626411,0.03268757,0.1945826,0.4631136
5,0.3743617,0.00499946,0.03497514,0.1968346,0.3888291
6,0.3288364,0.004774365,0.03361512,0.196163,0.4366111
