In [1]:
import numpy as np

### 2

Likelihood ratio test definition:

Consider testing 
$$
H_0: \theta \in \Theta_0 \text{ versus } H_1: \theta \notin \Theta_0
$$

The likelihood ratio statistic is
$$
\lambda = 2 \log \left (\frac{\sup_{\theta \in \Theta} \mathcal{L}(\theta)}{\sup_{\theta \in \Theta_0} \mathcal{L}(\theta)} \right ) = 2 \log \left(\frac{\mathcal L\left(\hat \theta\right)}{\mathcal{L}\left(\hat {\theta_0}\right)} \right)
$$

In this case, 
$$
\mathcal{L} = p_{00}^{X_{00}} p_{01}^{X_{01}} p_{10}^{X_{10}} p_{11}^{X_{11}}
$$

$\Theta_0$ is the set where $p_{ij} = p_{i\cdot} p_{\cdot j}$, or in estimation 
$$\frac{X_{ij}}{n} = \frac{X_{i\cdot}}{n}\frac{X_{\cdot j}}{n}$$

Therefore,
$$
\lambda = 2 \log \left( \left(\frac{p_{00}}{p_{00}'}\right)^{X_{00}} \left(\frac{p_{01}}{p_{01}'}\right)^{X_{01}} \left(\frac{p_{10}}{p_{10}'}\right)^{X_{10}} \left(\frac{p_{11}}{p_{11}'}\right)^{X_{11}} \right) = 2 \sum_{i=0}^1 \sum_{j=0}^1 X_{ij}\log{\frac{p_{ij}}{p_{ij}'}}
$$

where  $p_{ij}' = p_{i\cdot} p_{\cdot j}$.

Replace $p$s and $p'$s with the estimated value:
$$
p_{ij} = \frac{X_{ij}}{X_{\cdot \cdot}}, p_{ij}' = p_{i\cdot}p_{\cdot j} = \frac{X_{i \cdot}}{X_{\cdot \cdot}}\frac{X_{\cdot j}}{X_{\cdot \cdot}}
$$

Hence
$$
T = 2 \sum_{i=0}^1 \sum_{j=0}^1 X_{ij}\log{\frac{X_{ij}X_{\cdot \cdot}}{X_{i\cdot}X_{\cdot j}}}
$$

We have four parameters, $p_{ij}$, where $i,j\in \{0,1\}$. Under $H_0$, however, there is only one degree of freedom. We need to know $p_{0\cdot}$ and $p_{\cdot 0}$. Then $p_{ij}$ is knowable.

However, we have one more relationship between $p_{0\cdot}$ and $p_{\cdot 0}$:
$$
\sum_{i=0}^1 \sum_{j=0}^1 p_{ij} = 1
$$

This leaves us one degree of freedom. Hence $T \rightsquigarrow \chi_1^2$.

### 3 

$$
L = p_{00}^{X_{00}}p_{01}^{X_{01}}p_{10}^{X_{10}}p_{11}^{X_{11}}
$$

Log-likelihood: $\mathcal{l}\left(p_{00}, p_{01}, p_{10}, p_{11}\right) = X_{00}\log{p_{00}} + X_{01}\log{p_{01}} + X_{10}\log{p_{10}} + X_{11}\log{p_{11}}$.

Using Delta method, we need to calculate two things:

$$
\nabla {\hat \gamma} = \begin{bmatrix} \frac{\partial {\hat \gamma}}{\partial p_{00}} \\ \frac{\partial {\hat \gamma}}{\partial p_{01}} \\ \frac{\partial {\hat \gamma}}{\partial p_{10}} \\ \frac{\partial {\hat \gamma}}{\partial p_{11}} \end{bmatrix} = \begin{bmatrix} \frac{1}{p_{00}} \\ - \frac{1}{p_{01}} \\ - \frac{1}{p_{10}} \\\frac{1}{p_{11}} \end{bmatrix}
$$

and 
$$
I_{n}\left(p_{00}, p_{01}, p_{10}, p_{11}\right) = 
\begin{bmatrix} 
E\left(\frac{X_{00}}{p_{00}^2} \right) & 0 & 0 & 0 \\
0 & E\left(\frac{X_{01}}{p_{01}^2} \right) & 0 & 0 \\
0 & 0 & E\left(\frac{X_{10}}{p_{10}^2} \right) & 0 \\
0 & 0 & 0 & E\left(\frac{X_{11}}{p_{11}^2} \right) \\
\end{bmatrix} =
\begin{bmatrix} 
\frac{n}{p_{00}} & 0 & 0 & 0 \\
0 & \frac{n}{p_{01}} & 0 & 0 \\
0 & 0 & \frac{n}{p_{10}} & 0 \\
0 & 0 & 0 & \frac{n}{p_{11}} \\
\end{bmatrix}
$$

Per multiparameter delta method, we have
$$
\hat{se}\left(\hat{\gamma}\right) = \sqrt{\nabla {\hat \gamma}^T I_{n}^{-1} \nabla {\hat \gamma}} = \sqrt{\frac{1}{n \hat p_{00}} + \frac{1}{n \hat p_{01}} + \frac{1}{n \hat p_{10}} + \frac{1}{n \hat p_{11}} } = \sqrt{\frac{1}{X_{00}} + \frac{1}{X_{01}} + \frac{1}{X_{10}} + \frac{1}{X_{11}} }
$$

Since $\psi = e^{\gamma}$, and $\frac{\partial \psi}{\partial \gamma} = \psi$, we have
$$
\hat{se}\left(\hat{\psi}\right) = \left| \frac{\partial \psi}{\partial \gamma} \right| \hat{se}\left(\hat{\gamma}\right) = \psi \hat{se}\left(\hat{\gamma}\right)
$$


*Note*: I intially tried to add in the fact $\sum_i \sum_j p_{ij} = 1$, but it seems too complicated to arrive at the above formula. So this is my question. 


### 4

Given data

|   	| Death sentence | No death sentence |
|---	|---	|---	|
| Black victim | 14 | 641 |
| White victim | 62 | 594 |

So we are basically asked to determine whether there is association between race and death sentence. 
With the following notation $X_{ij}$ - $i=0$: black victim, $i=1$: white victim, $j=0$: death sentence, $j=1$: no death sentence.

In [6]:
from scipy.stats import chi2

In [7]:
X00 = 14
X01 = 641
X10 = 62
X11 = 594

X0d = X00 + X01
X1d = X10 + X11
Xd0 = X00 + X10
Xd1 = X01 + X11

X = X00 + X01 + X10 + X11

# likelihood test statistic
T = 2 * ( X00 * np.log(X00 * X / (X0d * Xd0)) + X01 * np.log(X01 * X / (X0d * Xd1))
         + X10 * np.log(X10 * X / (X1d * Xd0)) + X11 * np.log(X11 * X / (X1d * Xd1)))
p = 1 - chi2.cdf(T, df=1)
print("likelihood test statistic: %.2f, and the p value: %.5f"%(T, p))                    

likelihood test statistic: 34.53, and the p value: 0.00000


In [11]:
# chi2 test statistic

E00 = X0d * Xd0 / X
E01 = X0d * Xd1 / X
E10 = X1d * Xd0 / X
E11 = X1d * Xd1 / X

U = (X00-E00)**2/E00 + (X01-E01)**2/E01 + (X10-E10)**2/E10 + (X11-E11)**2/E11
p = 1 - chi2.cdf(U, df=1)

print("Pearson chi sqaure statistic: %.2f, and the p value: %.5f"%(U, p))                    

Pearson chi sqaure statistic: 32.10, and the p value: 0.00000


Both tests reject the null hypothesis. But one can't say that race causes death sentence. One very simple reason is judges' biases. 

In [15]:
psi = X00*X11/(X01*X10)
gamma = np.log(psi)

se_gamma = np.sqrt(1/X00 + 1/X01 + 1/X10 + 1/X11)

print("psi: %.2f, gamma: %.2f"%(psi, gamma))
print("95 percent confidence interval for gamma is (%.2f, %.2f)"%(gamma - 1.96 * se_gamma, gamma + 1.96 * se_gamma))

psi: 0.21, gamma: -1.56
95 percent confidence interval for gamma is (-2.15, -0.97)
