# Inference on the winner

We study the problem of drawing reliable conclusions about the winner: $\hat{\imath} = \underset{i\in[m]}{\text{argmax}} \ X_i$, where $X_1,\dots,X_m$ are $m$ competing candidates.

We compare conditional inference due to Lee et al. [1], hybrid inference due to Andrews et al. [2], locally simultaneous inference due to Zrnic and Fithian [3], simultaneous inference over the selected (SoS) due to Benjamini et al. [4], and the zoom correction.

[1] Lee, J. D., Sun, D. L., Sun, Y., & Taylor, J. E. (2016). Exact post-selection inference, with application to the lasso. Annals of Statistics, 44(3), 907-927.

[2] Andrews, I., Kitagawa, T., & McCloskey, A. (2024). Inference on winners. Quarterly Journal of Economics, 139(1), 305-358.

[3] Zrnic, T., & Fithian, W. (2024). Locally simultaneous inference. Annals of Statistics, 52(3), 1227-1253.

[4] Benjamini, Y., Hechtlinger, Y., & Stark, P. B. (2019). Confidence intervals for selected parameters. arXiv preprint arXiv:1906.00505.

In [None]:
%load_ext autoreload
%autoreload 2
import numpy as np
from tqdm.notebook import tqdm
from scipy.stats import norm

from methods import *
from utils import *

We sample $X \sim N(\theta, \Sigma)$, where $\theta$ is an $m$-dimensional mean vector, for varying $m$. We set
$$\theta_i = \begin{cases}
0, &i\in\{1,\dots,m_W \}\\
-c \cdot r_{\mathrm{sim}}, &i\in\{m_W +1,\dots,m\},
\end{cases}$$
where $m_W$ is a varying number of population winners, $c >0$ is a varying constant, and $r_{\mathrm{sim}}$ is the radius of the fully simultaneous interval. The covariance matrix $\Sigma$ has $\Sigma_{ii} = 1$ and $\Sigma_{ij} = \rho$ for $i\neq j$.

When $\rho\neq 0$, SoS intervals are not applicable.

In [None]:
# simulation params
trials = 100
ms = [10, 10, 100] # number of candidates
alpha = 0.1 # error level
nu = 0.01 # error splitting param for LSI
beta = 0.01 # error splitting param for hybrid
mWs = range(1,10) # number of population winners
c = 8 # gap multiplier (see description above)
rho = 0 # correlation param (see description above)

for m in ms:
    cond_widths = np.zeros((len(mWs), trials))
    hybrid_widths = np.zeros((len(mWs), trials))
    LSI_widths = np.zeros((len(mWs), trials))
    zoom_grid_widths = np.zeros((len(mWs), trials))
    zoom_stepdown_widths = np.zeros((len(mWs), trials))

    Sigma = np.ones((m,m))*rho + (1-rho)*np.eye(m)
    plausible_gap = 4*max_z_width(Sigma, nu)
    losers_val = -c*max_z_width(Sigma, alpha)

    for i in tqdm(range(len(mWs))):
        mW = mWs[i]
        theta = np.zeros(m)
        theta[mW:] = losers_val
        
        for j in tqdm(range(trials)):
            X = theta + np.random.normal(size=m)
            ihat = np.argmax(X)
        
            # zoom (grid search)
            zoom_grid_int = zoom_grid(X, Sigma, alpha=alpha)
            
            # zoom (step-down)
            zoom_stepdown_int = zoom_stepdown(X, np.sqrt(Sigma[0,0]), alpha = alpha)
        
            # locally simultaneous
            LSI_int = locally_simultaneous_inference(X, Sigma, plausible_gap, alpha=alpha, nu=nu)
        
            # conditional
            A, b = inference_on_winner_polyhedron(m, ihat)
            eta = np.zeros(m)
            eta[ihat] = 1
            cond_int = conditional_inference(X, Sigma, A, b, eta, alpha=alpha)
        
             # hybrid
            hybrid_int = hybrid_inference(X, Sigma, A, b, eta, alpha=alpha, beta=beta)

            zoom_grid_widths[i, j] = zoom_grid_int[1] - zoom_grid_int[0]
            zoom_stepdown_widths[i, j] = zoom_stepdown_int[1] - zoom_stepdown_int[0]
            LSI_widths[i, j] = LSI_int[1] - LSI_int[0]
            cond_widths[i, j] = cond_int[1] - cond_int[0]
            hybrid_widths[i, j] = hybrid_int[1] - hybrid_int[0]

    if rho == 0:
        SoS_width = norm.isf(alpha/(2)) + norm.isf(alpha/(2*m))
    else:
        SoS_width = None # SoS won't be plotted

    plot_title = 'm = ' + str(m) + ', c = ' + str(c) + ', ρ = ' + str(rho)
    ylabel = 'interval width'
    xlabel = 'number of population winners'
    filename = 'plots/num_winners_corr' + str(rho) + 'm' + str(m) + 'trials' + str(trials) + 'gap' + str(c) + '.pdf'
    plot_and_save_results(mWs, zoom_grid_widths, zoom_stepdown_widths, LSI_widths, cond_widths, hybrid_widths, plot_title, ylabel, xlabel, filename, SoS_width=SoS_width, alpha=0.1, legend=True, ylim=[3,9])