<a href="https://colab.research.google.com/github/hkaido0718/IncompleteDiscreteChoice/blob/main/HypothesisTests.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hypothesis testing

The goal of this note is to test hypotheses
\begin{align}
H_0:\theta\in\Theta_0 ~~\text{v.s.}~~H_1:\theta\in\Theta_1.
\end{align}

Null hypothesis can be described by linear or nolinear constraints.

- Some of the coefficients are 0 (e.g. $\Theta_0=\{\theta:\theta_j=0,j\in\mathcal J_0\}$)
- Shape restrictions: (e.g. a function of interest is $f(x;\theta)=\theta_1+\theta_2 x+\theta_3x^2$, and $\Theta_0=\{\theta:\theta_3\le 0\}$)
- The value of a counterfactual outcome (e.g. $\Theta_0=\{\theta:g(\theta)=c\}$)

Later, we will discuss constructing confidence intervals by inverting the test. The `idc` library's default command `idc.calculate_LR` uses scipy's optimization library to implement the test. However, you can also use your favoriate library by calling the functions in `idc.calculate_LR`.

# Representing your hypothesis as a constraint object.

Consider testing
$$H_0:A\theta=b,~~H_1:A\theta\ne b.$$

For example, $\theta=(\theta_1,\dots,\theta_5)$, and we want to test $H_0:\theta_1=0,\theta_2=0$. We can represent this linear constraint by letting

$$A=\begin{bmatrix}
1&0&0&0&0\\
0&1&0&0&0
\end{bmatrix}
,~~b=\begin{bmatrix}
0\\0 \end{bmatrix}.$$
We can use `LinearConstaint` object in `scipy.optimization` to represent this constraint. See further details [here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.LinearConstraint.html).

In [1]:
import numpy as np
from scipy.optimize import differential_evolution, LinearConstraint

# Define the linear constraint Aθ = b
A = np.array([[1, 0, 0, 0, 0],
              [0, 1, 0, 0, 0]])  # Example constraint matrix
b = np.array([0, 0])  # Example constraint vector

# Linear constraint
linear_constraint = LinearConstraint(A, b, b)  # Aθ = b

Similarly, one can also test one-sided hypotheses. For example
$$H_0:A\theta\ge b,~~H_1:A\theta< b.$$
This constraint is represented by



In [2]:
# Linear constraint
linear_onesided_constraint = LinearConstraint(A, b, np.inf)  # Aθ >= b

Next, consider testing
$$H_0: \varphi(\theta)=c,~~H_1:\varphi(\theta)\ne c.$$
This type of test is useful for constructing confidence intervals for a functio $\varphi(\theta)$ of $\theta$. For example, in the entry game, consider the counterfactual entry probability of Player 1 (with characteristics $x_1$) when the Player  is in the market ($y_2=1$). For simplicity, suppose $U_1\sim N(0,1)$. Then,

$$\varphi(\theta)=F_\theta(\{u:x_{1}{}{'}\beta_1+\Delta_1\ge-U_1 \})=\Phi(x_{1}{}{'}\beta_1+\Delta_1).$$

The corresponding `NonlinearConstraint` object can be defined as follows. See details [here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.NonlinearConstraint.html).

In [3]:
from scipy.optimize import NonlinearConstraint
from scipy.stats import norm

# Define varphi
def constraint_function(theta):
    beta1 = theta[0]  # Extract beta1 from theta
    Delta1 = theta[2] # Extract Delta1 from theta
    x1 = 1 # Set x1 to a value of interest
    return norm.cdf(np.dot(x1, beta1) + Delta1)

# Define the target value for the constraint
c = 0.5  # Example target value

# Create the NonlinearConstraint object
nonlinear_constraint = NonlinearConstraint(constraint_function, c, c)  # varphi(theta) = c

# Split LR-test

To proceed, we use the test by Kaido & Zhang (2024), which takes the following steps.

1.   Split a sample into two subsamples. Let's call them $D_0$ and $D_1$.
2.   Estimate $\theta$ using $D_1$. Calculate a "unrestricted likelihood" $L_0(\hat\theta_1)$ using $D_0$.
3.   Calculate the restricted likelihood $L_0(\hat{\theta}_0)=\sup_{\theta\in\Theta_0}L_0(\theta)$ using $D_0$.
4. Compute the ratio $T_n=L_0(\hat\theta_1)/L_0(\hat{\theta}_0)$.

The recommended version of this test repeats Steps 1-4 while swapping the role of $D_0$ and $D_1$ and calculates $T_n^{swap}$ and aggregates the statisticss to the _Cross-fit LR statistic_ $S_n=\frac{T_n+T_n^{swap}}{2}$. The rejection rule is simple.

- Reject $H_0$ if $S_n>1/\alpha$.
- Do not reject $H_0$ if $S_n\le 1/\alpha$.

This test is valid in any finite samples. Below, we outline how to do this.


# Loading idc library and downloading data

In [4]:
!git clone https://github.com/hkaido0718/IncompleteDiscreteChoice.git

fatal: destination path 'IncompleteDiscreteChoice' already exists and is not an empty directory.


Let's download simulated data of entry games. The data involve binary player-specific covariates. The true parameter value is $\beta_1 = 0.75, \beta_2 = 0.25,
\delta_1 = -0.5,
\delta_2 = -0.5.$
This DGP satisfies the linear inequality hypothesis (but not the other two).

In [5]:
import idclib_undi as idc
import examples as ex
import numpy as np
import gdown

# Download entrygame sample data (same data as above)
url = "https://drive.google.com/uc?id=1cRhMJ8bRhdzy9_agmQ_LkqzlsKRKcthX"
output = "data_entrygame.npz"
gdown.download(url, output, quiet=True)
Data = np.load(output, allow_pickle=True)
Y_full = Data['Y']
X_full = Data['X']
data = [Y_full, X_full]

# Define the model
Y_nodes = [(0,0), (0,1), (1,0), (1,1)]
U_nodes = ['a', 'b', 'c', 'd', 'e']
edges = [
    ('a', (0,0)),
    ('b', (0,1)),
    ('c', (1,0)),
    ('d', (1,1)),
    ('e', (0,1)),
    ('e', (1,0))
]
gmodel = idc.BipartiteGraph(Y_nodes, U_nodes, edges)


Now, let's conduct a test. Below, we set the parameter space and call `idc.calculateLR` function, which computes the LR statistic above for a given hypothesis. To conduct the test, we should pass the following objects to the function.
- `data` (list): List containing Y and X arrays
- `gmodel` BipartiteGraph): Model (stored as a graph)
- `calculate_Ftheta` (function): Function to calculate $F_\theta$
- `LB` (list): Lower bounds for θ
- `UB` (list): Upper bounds for θ
- `linear_constraint` (LinearConstraint, optional): Linear constraint
- `nonlinear_constraint` (NonlinearConstraint, optional): Nonlinear constraint
- `seed` (int, optional): Seed for the random number generator (default is 123)
- `split` (str, optional): If "swap", swap the roles of data0 and data1; if "crossfit", calculate $T$ and $T^{swap}$ and return their average.



# Testing linear hypotheses.

Let's apply this to the linear (equality) hypothesis.

In [6]:
# Define parameter space
LB = [-2, -2, -2, -2, 0] # parameter space lower bound
UB = [2, 2, 0, 0, 0.85]  # parameter space upper bound

# Calculate LR
S_eq = idc.calculate_LR(data, gmodel, ex.calculate_Ftheta_entrygame, LB, UB, linear_constraint, seed=123, split="crossfit")

UnboundLocalError: cannot access local variable 'result' where it is not associated with a value

We reject the null hypothesis if $S>1/\alpha$.

Similarly, we can test the linear inequality constraint.

In [None]:
# Calculate LR
S_ineq = idc.calculate_LR(data, gmodel, ex.calculate_Ftheta_entrygame, LB, UB, linear_onesided_constraint, seed=123, split="crossfit")
print(S)

# Testing a nonlinear hypothesis

Testing a nonlinear hypothesis can be done similarly. You simply pass the nonlinear constraint as an argument to the same function.


In [None]:
# Calculate LR
S = idc.calculate_LR(data, gmodel, ex.calculate_Ftheta_entrygame, LB, UB, nonlinear_constraint, seed=123, split="crossfit")
print(S)