# Non-linear Panel Models
---
## Overview (and pre-analysis plan)
In this notebook, we will be:
- Fitting multiple panel models to non-linear data (i.e. data with binary outcomes)
- Models include:
    1. Pooled logit
    2. Conditional logit (country fixed effects)
    3. Conditional logit (year fixed effects)
    4. LPM
    5. LPM (country fixed effects)
    6. LPM (year fixed effects)
    7. LPM (two-way fixed effects)
- Looking at a quick and dirty example of model selection using:
    - R squared
    - Akaike Information Criterion
    - Bayesian Information Criterion
    
Note: we will once again use `statsmodels` for model fitting and model selection.

Further note: I've deliberately left out cross-validation for model selection. I believe this technique deserves its own separate notebook.

In [12]:
import pandas as pd

import statsmodels.api as sm
import statsmodels.formula.api as smf

from statsmodels.discrete.conditional_models import ConditionalLogit

# 1. Data prep

In [10]:
# Load data (from http://www.princeton.edu/~otorres/LogitR101.pdf )

data = pd.read_stata('http://dss.princeton.edu/training/Panel101.dta')
data

Unnamed: 0,country,year,y,y_bin,x1,x2,x3,opinion,op
0,A,1990,1.342788e+09,1.0,0.277904,-1.107956,0.282554,Str agree,1.0
1,A,1991,-1.899661e+09,0.0,0.320685,-0.948720,0.492538,Disag,0.0
2,A,1992,-1.123436e+07,0.0,0.363466,-0.789484,0.702523,Disag,0.0
3,A,1993,2.645775e+09,1.0,0.246144,-0.885533,-0.094391,Disag,0.0
4,A,1994,3.008335e+09,1.0,0.424623,-0.729768,0.946131,Disag,0.0
...,...,...,...,...,...,...,...,...,...
65,G,1995,1.323696e+09,1.0,1.087186,-1.409817,2.829808,Str disag,0.0
66,G,1996,2.545242e+08,1.0,0.781075,-1.328000,4.278224,Str agree,1.0
67,G,1997,3.297033e+09,1.0,1.257879,-1.577367,4.587326,Disag,0.0
68,G,1998,3.011821e+09,1.0,1.242777,-1.601218,6.113762,Disag,0.0


In [14]:
# Seperate exogenous columns, endogenous columns, 
# country column, and year column

exog = data[['x1', 'x2', 'x3']]
endog = data['y_bin']
countries = data['country']
years = data['year']

# 2. Non-linear panel models

## 2.1 Pooled logit

In [13]:
# Pooled logit

logit_mod = smf.logit(formula='y_bin ~ x1 + x2 + x3',
                      data=data)
logit_res = logit_mod.fit()
logit_res.summary()

Optimization terminated successfully.
         Current function value: 0.467944
         Iterations 7


0,1,2,3
Dep. Variable:,y_bin,No. Observations:,70.0
Model:,Logit,Df Residuals:,66.0
Method:,MLE,Df Model:,3.0
Date:,"Thu, 11 Feb 2021",Pseudo R-squ.:,0.06486
Time:,10:54:59,Log-Likelihood:,-32.756
converged:,True,LL-Null:,-35.028
Covariance Type:,nonrobust,LLR p-value:,0.2084

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,0.4262,0.639,0.667,0.505,-0.826,1.679
x1,0.8618,0.784,1.099,0.272,-0.675,2.398
x2,0.3665,0.308,1.189,0.234,-0.237,0.971
x3,0.7512,0.455,1.652,0.099,-0.140,1.643


## 2.2 Conditional logit (country fixed effects)

In [23]:
# Conditional logit (country FE)

clogit_country_mod = ConditionalLogit(endog=endog,
                                      exog=exog,
                                      groups=years)
clogit_country_res = clogit_country_mod.fit(method='basinhopping',
                                            maxiter=1000,
                                            skip_hessian=True)
clogit_country_res.summary()



ValueError: need covariance of parameters for computing (unnormalized) covariances

## 2.3 Conditional logit (year fixed effects)

In [None]:
# Conditional logit (year FE)

# Specify exog, endog, and year groups


## 2.4 LPM

In [None]:
# Pooled LPM



## 2.4 LPM

In [None]:
# Pooled LPM



In [None]:
# 