# Another Application of Monotone Instrumental Variables to Returns to Schooling  
Author: Thomas Wiemann  
Date: November 3, 2021 

This notebook implements the MTS-MTR upper-bounds of Manski and Pepper (2000) and applies them to bound the returns to schooling using the data considered in Angrist and Krueger (1991).

## Section 0: Preliminaries

In [1]:
# Load required packages
using DataFrames, StatFiles
using Statistics

## Section 1: MTS-MTR Bound Implementation

The following implements a custom object of type ``myIVBound``. A corresponding method ``coef`` allows for calculation of the MTS-MTR upper-bound on $E[Y(t) - Y(s)]$.

In [2]:
struct myIVBound
    px # P(X=x) for all x
    Eyx # E[Y|X=x] for all x
    unq_x # An orderd set of the unique values of X

    function myIVBound(y, x)
        # Data parameters
        n = length(y)
        unq_x = sort(unique(x))
        # Calculate the empirical pmf and conditional expectation
        px = zeros(length(unq_x))
        Eyx = zeros(length(unq_x))
        for v in 1:length(unq_x)
            indx = x .== unq_x[v]
            px[v] = sum(indx) / n
            Eyx[v] = mean(y[indx])
        end
        # Return the myIVBound object
        new(px, Eyx, unq_x)
    end #MYIVBOUND
end #MYIVBOUND

In [3]:
function coef(fit::myIVBound, s, t)
    """
    Returns the upper bound of E[Y(t) - Y(s)] under the MTS and MTR assumptions
        of Manski and Pepper (2000).
    """
    # Determine indices above and below t, s
    from_t = fit.unq_x .> t
    until_s = fit.unq_x .< s
    # Calculate the upper-bound of E[Y(t)] and the lower-bound of E[Y(s)]
    Eyx_px = fit.Eyx .* fit.px
    ub = sum(Eyx_px[from_t]) + (fit.Eyx[fit.unq_x .== t] * 
        sum(fit.px[.!from_t]))[1]
    lb = sum(Eyx_px[until_s]) + (fit.Eyx[fit.unq_x .== s] * 
        sum(fit.px[.!until_s]))[1]
    # Subtract and return the upper and lower bounds
    return ub - lb
end

coef (generic function with 1 method)

## Section 2: Data

This section loads the Angrist and Krueger (1991) data. The sample contains American men born between 1930 and 1939. The outcome variable is the log-weekly wage, and the endogeneous variable of interest is the years of completed schooling.

In [4]:
# Import dataset as dataframe
df = load("NEW7080.dta") |> DataFrame;

In [5]:
# Select sample and variables of interest
indx = @. (df[:, 27] >= 30) & (df[:, 27] <= 39)
df = df[indx, :]
educ = df[:, 4]
lwklywge = df[:, 9];

# Section 3: Estimation and Results

This section constructs upper-bounds on $E[Y(t) - Y(t-4)]$ for $t \in \{4, 8, 12, 16, 20\}$ to bound the returns to different school stages in the US education system (from elementary school to post-grad degrees).

In [6]:
# Calculate the empirical pmf and conditional expectation
fit = myIVBound(lwklywge, educ);

In [7]:
# Calculate the bounds for each considered pair (t, s)
bounds = zeros(5, 3)
for j in (4, 8, 12, 16, 20)
    bounds[Int(j/4), 1] = j - 4 # s
    bounds[Int(j/4), 2] = j # t
    bounds[Int(j/4), 3] = coef(fit, j - 4, j) # upper-bound
end

The third column below gives the upper-bounds for each considered pair $t$ (second column) and $s$ (first column).

In [8]:
bounds

5×3 Matrix{Float64}:
  0.0   4.0  0.878966
  4.0   8.0  0.646593
  8.0  12.0  0.394537
 12.0  16.0  0.448109
 16.0  20.0  0.414501

The results imply that the average returns to an additional year of, for example, college are bounded above by 0.112.

In [9]:
bounds[:, 3] ./ 4

5-element Vector{Float64}:
 0.21974160221586803
 0.16164813331066452
 0.09863423959990292
 0.11202714900306754
 0.10362536060294159

Manski and Pepper (2000) report an upper-bound per year of college of 0.099 (using different data, of course).