# Mortality Rates

This program extracts yearly and quarterly mortality rates from 1900 to 2220 by age and birth cohort. The approach uses information from the "Life Tables for the United States Social Security Area 1900-2100" computed by Bell and Miller (2005). These tables are provided for each decade from 1900 to 2100 by years of age, so they must be interpolated to obtain more granular mortality rates at a quarterly frequency and by calendar year. Our interpolation below uses cubic splines for both age and calendar year period. Because we have estimates of mortality rates by age for an individual born in 2100 up to 120 years of age, we effective have usable information through 2200. 

The program returns the following output:

$\textbf{interp_death_rate_1900_2220_Y.csv}$: A 120x320 matrix whose elements $\gamma_{a,t}$ correspond to the annual marginal probabilities of death of individuals aged $a$ years in the calendar year $t$. 

$\textbf{interp_death_rate_1900_2220_Q.csv}$: A 480x1280 matrix whose elements $\gamma_{a,t}$ correspond to the annualized quarterly marginal probabilities of death of individuals aged $a$ in period $t$. 

$\textbf{age_years.csv}$: A 120x1 array whose elements are the mid-points of the years of age.

$\textbf{age_quarters.csv}$: A 480x1 array whose elements are the mid-points of the quarters of age.

$\textbf{interp_death_rate_table6_1900_Y.csv}$: A 120x1 array whose elements $\gamma_{a,1}$ correspond to the annualized probability of death per year of age based on marginal mortality rates of individuals aged $a$ at the turn of 1900.

$\textbf{interp_death_rate_table6_1900_Q.csv}$: A 480x1 array whose elements $\gamma_{a,1}$ correspond to the annualized probability of death per quarter of age based on marginal mortality rates of individuals aged $a$ at the turn of 1900. 

The data were originally downloaded from https://www.ssa.gov/oact/NOTES/as120/LifeTables_Body.html. We use information from two sets of tables in Bell and Miller (2005) to derive the $\gamma_{a,t}$. Table 6 reports probabilities of death for a person of a given age (in years) and gender at the turn of a particular decade. Table 7 reports annualized probabilities of death by years of age and gender for an individual born at the turn of a particular decade. The interpolation uses all available information from the two sets of tables. The distance between data points can vary based on data availability. For each age, the interpolation is performed using the mortality rates for all periods available.

One difficulty is that the statistics are provided for each gender but not for the total population. Our approach is to interpolate the information by gender first, then aggregate this information across gender. To do so, we assume that the ratio of live male births to female live births---which we call the 'mfratio'---is 1.05. This figure aligns with the historical times-series analysis of Matthews and Hamilton (2005). Their historical sample covers the period 1940 to 2002.

Beyond 2100, we only have sparse mortality rate information on cohorts that were born in 2100 or earlier. We assume that the mortality rates for future cohorts are that same as for the 2100 cohort. Prior to 1900, we have no information on mortality rates by age. For simplicity, we assume that the mortality rates in the cross section of 1900 also apply in all previous periods. 

Finally, the probability of death must be stricly less than 1 in all periods of age to ensure that all of our data filters and programs run well.  Otherwise, some matrix operations will return NaN values due to invertibility issues, leading to errors during the execution. The life-cycle tables for the early decades in the sample have death rate estimates at 100 percent for the oldest centenarians. The simple fix is to cap their probability of death at a high value strictly below 1---we choose 90 percent per year. This death rate is sufficiently high that the assumption has no incidence on the steady-state population distribution.


## References

Bell, Felicitie C., and Michael L. Miller (2005). "[Life Tables for the United States Social Security Area 1900-2100](https://www.ssa.gov/OACT/NOTES/pdf_studies/study120.pdf)," Actuarial Study No. 120, Social Security Administration.

Mathews, T.J., and Brady E. Hamilton (2005). "[Trend Analysis of the Sex Ratio at Birth in the United States](https://www.cdc.gov/nchs/data/nvsr/nvsr53/nvsr53_20.pdf)," National Vital Statistics Reports, vol. 53(20), National Center for Health Statistics.


In [None]:
# Setting the assumed ratio of male births to female births
mfratio = 1.05;

The first step is to read the data from the raw files. Two variables are read for each table:

$x$ = Age in years at begining of period.

$q_x$ = Probability of dying between age $x$ and $x+1$ given having survived past age $x-1$.

$l_x$ = Number of persons surviving to exact age $x$ (or the number of persons reaching exact age $x$ during each year in stationary population) out of an initial cohort of 100,000 persons.

In [None]:
#using CSV;
using CSV, DataFrames;
age = CSV.read("RawData/SSA_Life_cycle_table_6_1900.csv", DataFrame; header=false, skipto=3, delim=',')[:,1];
age = Array{Float64,1}(age);
age_years = age .+ 0.5;

# Reading the data
for y = 1900:10:2100
    # Extracting and organizing the information in Table 6 by decade of publication
    global table_6 = CSV.read("RawData/SSA_Life_cycle_table_6_" * string(y) * ".csv", DataFrame; header=false, skipto=3);
    global table_6 = Array{Float64,2}(table_6[:,[1:7;9:15]]);
    global table_6_m = table_6[:,1:7];
    global table_6_f = table_6[:,8:14];
    # probability of dying between date t and t+1 given lived to t-1 for males
    eval(Meta.parse("Table_6_m_qx_" * string(y) * " = table_6_m[:,2]"));
    # number of males living to age t out of an initial cohort of 100,000 males
    eval(Meta.parse("Table_6_m_lx_" * string(y) * " = table_6_m[:,3]"));
    # probability of dying between date t and t+1 given lived to t-1 for females
    eval(Meta.parse("Table_6_f_qx_" * string(y) * " = table_6_f[:,2]"));
    # number of females living to age t out of an initial cohort of 100,000 females
    eval(Meta.parse("Table_6_f_lx_" * string(y) * " = table_6_f[:,3]"));

    # Extracting and organizing the information in Table 7 by decade of publication
    global table_7 = CSV.read("RawData/SSA_Life_cycle_table_7_" * string(y) * ".csv", DataFrame, header=false, skipto=3);
    global table_7 = Array{Float64,2}(table_7[:, [1:7;9:15]]);
    global table_7_m = table_7[:,1:7];
    global table_7_f = table_7[:,8:14];
    eval(Meta.parse("Table_7_m_qx_" * string(y) * " = table_7_m[:,2]"));
    eval(Meta.parse("Table_7_m_lx_" * string(y) * " = table_7_m[:,3]"));
    eval(Meta.parse("Table_7_f_qx_" * string(y) * " = table_7_f[:,2]"));
    eval(Meta.parse("Table_7_f_lx_" * string(y) * " = table_7_f[:,3]"));
end

# Collecting cross-section information from Table 6 and Table 7

The output variable $\textbf{interp_death_rate_1900_2220_Y.csv}$ and $\textbf{interp_death_rate_1900_2220_Q.csv}$ have information on the mortality rates by calendar period in the cross section. The information in Table 6 is already organized in this manner whereas the information in Table 7 is organized by cohort. The commands below track, for each year of age (the row variable), which calendar year corresponds to the mortality rate in the cross section. It then removes duplicated information prior to performing the spline interpolation. 

In [None]:
# Information from Table 6
Table_6_MR_m = NaN*ones(length(age),length(1900:10:2100)); 
Table_6_MR_f = NaN*ones(length(age),length(1900:10:2100)); 
Table_6_MR_percs_m = NaN*ones(length(age),length(1900:10:2100)); 
Table_6_MR_percs_f = NaN*ones(length(age),length(1900:10:2100)); 

data_years = 1900:10:2100
for i = 1:length(data_years)
    tmpyear=data_years[i];
    eval(Meta.parse("tmpdata = Table_6_m_qx_" * string(tmpyear) *"[:,1]"))
    Table_6_MR_m[:,i]=tmpdata;
    eval(Meta.parse("tmpdata = Table_6_f_qx_" * string(tmpyear) *"[:,1]"))
    Table_6_MR_f[:,i] = tmpdata
    Table_6_MR_percs_m[:,i] = 1.0*data_years[i]*ones(length(age),1)
    Table_6_MR_percs_f[:,i] = 1.0*data_years[i]*ones(length(age),1)
end

In [None]:
# Information from Table 7
Table_7_MR_m = NaN*ones(length(age),length(1900:10:2100)); 
Table_7_MR_f = NaN*ones(length(age),length(1900:10:2100)); 
Table_7_MR_percs_m = NaN*ones(length(age),length(1900:10:2100)); 
Table_7_MR_percs_f = NaN*ones(length(age),length(1900:10:2100)); 

for i = 1:length(data_years)
    tmpyear=data_years[i];
    eval(Meta.parse("tmpdata = Table_7_m_qx_" * string(tmpyear) *"[:,1]"))
    Table_7_MR_m[:,i]=tmpdata;
    eval(Meta.parse("tmpdata = Table_7_f_qx_" * string(tmpyear) *"[:,1]"))
    Table_7_MR_f[:,i] = tmpdata
    Table_7_MR_percs_m[:,i] = 1.0*data_years[i] .+ (0:(length(age)-1))
    Table_7_MR_percs_f[:,i] = 1.0*data_years[i] .+ (0:(length(age)-1))
end

Aggregating the mortality rate information in Table 6 and Table 7. The information in the last column of Table 7 is allocated to 2220 so that information is carried forward. Similarly, the information in the first column of Table 6 is allocated to 1899. 

In [None]:
Table_MR_m = [Table_6_MR_m[:,1] Table_6_MR_m Table_7_MR_m Table_7_MR_m[:,end]];
Table_MR_f = [Table_6_MR_f[:,1] Table_6_MR_f Table_7_MR_f Table_7_MR_f[:,end]];
Table_MR_percs_m = [1899.0*ones(length(age),1) Table_6_MR_percs_m Table_7_MR_percs_m 2220.0*ones(length(age),1)];
Table_MR_percs_f = [1899.0*ones(length(age),1) Table_6_MR_percs_f Table_7_MR_percs_f 2220.0*ones(length(age),1)];

$\textbf{First interpolation: interpolating mortality rates for a given year of age across quarterly calendar periods.}$

Consistent with Census practice, we assume that statistics for each calendar year correspond to the middle of that calendar year.

In [None]:
# Preparing to call cubic spline function
using Statistics
include("ordernorep.jl")
include("spline_cubic.jl")

# Quarterly periods for which we want interpolated values (mid-points)
per_quarters = 1900.125:0.25:2219.875
per_years = 1900.5:2220

# Creating matrix that will contain mortality rates conditional on age 
# (in years) in the cross section for each quarterly period.
Γ_AGEy_PERq_m = ones(length(age), length(per_quarters)) * NaN
Γ_AGEy_PERq_f = ones(length(age), length(per_quarters)) * NaN
Γ_AGEy_PERy_m = ones(length(age), length(per_years)) * NaN
Γ_AGEy_PERy_f = ones(length(age), length(per_years)) * NaN

# Interpolating with all available information
for aa = 1:length(age)
    # Men
    tmp_order_m = sortperm(Table_MR_percs_m[aa,:])
    tmp_entries_m = [Table_MR_percs_m[aa,tmp_order_m] Table_MR_m[aa,tmp_order_m]]
    tmp_entries_m = unique(tmp_entries_m,dims=1)
    # Quarterly interpolation
    Γ_AGEy_PERq_m[aa,:] = spline_cubic(tmp_entries_m[:,1] .+0.5 ,tmp_entries_m[:,2] ,per_quarters,1,1)
    Γ_AGEy_PERq_m[aa,Γ_AGEy_PERq_m[aa,:] .< 0.0] .= 0.0
    Γ_AGEy_PERq_m[aa,Γ_AGEy_PERq_m[aa,:] .> 0.9] .= 0.9
    # Yearly interpolation
    Γ_AGEy_PERy_m[aa,:] = spline_cubic(tmp_entries_m[:,1] .+0.5 ,tmp_entries_m[:,2] ,per_years,1,1)
    Γ_AGEy_PERy_m[aa,Γ_AGEy_PERy_m[aa,:] .< 0.0] .= 0.0
    Γ_AGEy_PERy_m[aa,Γ_AGEy_PERy_m[aa,:] .> 0.9] .= 0.9
    
    # Women
    tmp_order_f = sortperm(Table_MR_percs_f[aa,:])
    tmp_entries_f = [Table_MR_percs_f[aa,tmp_order_f] Table_MR_f[aa,tmp_order_f]]
    tmp_entries_f = unique(tmp_entries_f,dims=1)
    # Quarterly interpolation
    Γ_AGEy_PERq_f[aa,:] = spline_cubic(tmp_entries_f[:,1] .+0.5 ,tmp_entries_f[:,2] ,per_quarters,1,1)
    Γ_AGEy_PERq_f[aa,Γ_AGEy_PERq_f[aa,:] .< 0.0] .= 0.0
    Γ_AGEy_PERq_f[aa,Γ_AGEy_PERq_f[aa,:] .> 0.9] .= 0.9
    # Yearly interpolation
    Γ_AGEy_PERy_f[aa,:] = spline_cubic(tmp_entries_f[:,1] .+0.5 ,tmp_entries_f[:,2] ,per_years,1,1)
    Γ_AGEy_PERy_f[aa,Γ_AGEy_PERy_f[aa,:] .< 0.0] .= 0.0
    Γ_AGEy_PERy_f[aa,Γ_AGEy_PERy_f[aa,:] .> 0.9] .= 0.9
end

In [None]:
# Creating aggregation weights across gender
wq_m = mfratio*100000.0*ones(length(age), length(per_quarters));
wq_f = 100000.0*ones(length(age), length(per_quarters))
wy_m = mfratio*100000.0*ones(length(age), length(per_years));
wy_f = 100000.0*ones(length(age), length(per_years))

# Weights in first period
wq_m[2:end,1] = mfratio*Table_6_m_lx_1900[1:end-1,1:1] .* (1.0 .- Γ_AGEy_PERq_m[1:end-1,1]);
wq_f[2:end,1] = Table_6_f_lx_1900[1:end-1,1:1] .* (1.0 .- Γ_AGEy_PERq_f[1:end-1,1]);
wy_m[2:end,1] = mfratio*Table_6_m_lx_1900[1:end-1,1:1] .* (1.0 .- Γ_AGEy_PERy_m[1:end-1,1]);
wy_f[2:end,1] = Table_6_f_lx_1900[1:end-1,1:1] .* (1.0 .- Γ_AGEy_PERy_f[1:end-1,1]);

# Weights in all subsequent periods
for qq = 2:size(wq_m,2)
    wq_m[2:end,qq] = wq_m[1:end-1,qq-1] .* (1.0 .- Γ_AGEy_PERq_m[1:end-1,qq-1]);
    wq_f[2:end,qq] = wq_f[1:end-1,qq-1] .* (1.0 .- Γ_AGEy_PERq_f[1:end-1,qq-1]);
end
for yy = 2:size(wy_m,2)
    wy_m[2:end,yy] = wy_m[1:end-1,yy-1] .* (1.0 .- Γ_AGEy_PERy_m[1:end-1,yy-1]);
    wy_f[2:end,yy] = wy_f[1:end-1,yy-1] .* (1.0 .- Γ_AGEy_PERy_f[1:end-1,yy-1]);
end

# Aggregating mortality rates
Γ_AGEy_PERq = (wq_m.*Γ_AGEy_PERq_m .+ wq_f.*Γ_AGEy_PERq_f) ./ (wq_m + wq_f);
Γ_AGEy_PERy = (wy_m.*Γ_AGEy_PERy_m .+ wy_f.*Γ_AGEy_PERy_f) ./ (wy_m + wy_f);

# Overwritting NaN entries (due to no survival of men) with mortality rate of women
for aa = 1:length(age)
    Γ_AGEy_PERq[aa,isnan.(Γ_AGEy_PERq[aa,:])] .= Γ_AGEy_PERq_f[aa,isnan.(Γ_AGEy_PERq[aa,:])];
    Γ_AGEy_PERy[aa,isnan.(Γ_AGEy_PERy[aa,:])] .= Γ_AGEy_PERy_f[aa,isnan.(Γ_AGEy_PERy[aa,:])];
end

$\textbf{Second interpolation: converting age in years to age in quarters.}$

For each period, this second interpolation is performed in the cross-section of mortality rates.

In [None]:
# Quarterly calendar periods and age in quarters
age_quarters = Array{Float64,1}(0.125:0.25:120)
Γ_AGEq_PERq = ones(length(age_quarters), length(per_quarters)) * NaN
for qq = 1:length(per_quarters)
    Γ_AGEq_PERq[:,qq] = spline_cubic(age_years,Γ_AGEy_PERq[:,qq],age_quarters)
    Γ_AGEq_PERq[Γ_AGEq_PERq[:,qq] .< 0.0, qq] .= 0.0
    Γ_AGEq_PERq[Γ_AGEq_PERq[:,qq] .>0.9, qq] .= 0.9   # Maximum annual death rate
end

In [None]:
# Interpolating mortality rate in cross section of age at turn of 1900
death_rate_table6_1900 = Γ_AGEq_PERq[:,1];

In [None]:
# Writing the output to CSV files
using DelimitedFiles
writedlm("CleanData/interp_death_rate_1900_2220_Q.csv", Γ_AGEq_PERq, ',');
writedlm("CleanData/interp_death_rate_1900_2220_Y.csv", Γ_AGEy_PERy, ',');
writedlm("CleanData/age_Q.csv", age_quarters, ',');
writedlm("CleanData/age_Y.csv", age_years, ',');
writedlm("CleanData/interp_death_rate_table6_1900.csv", death_rate_table6_1900, ',');

In [None]:
# Changing permissions
run(`chmod 664 CleanData/interp_death_rate_1900_2220_Q.csv`);
run(`chmod 664 CleanData/interp_death_rate_1900_2220_Y.csv`);
run(`chmod 664 CleanData/age_Q.csv`);
run(`chmod 664 CleanData/age_Y.csv`);
run(`chmod 664 CleanData/interp_death_rate_table6_1900.csv`);

$\textbf{Computing life expectancy in cross section and at birth.}$

As a check on the output, this block of code computed the life expectancy in the cross section and at birth, which can then be compared with corresponding information reported in Tables 6 and 7 of Bell and Miller (2005). The checks are performed using the quarterly data interpolation.

In [None]:
using LinearAlgebra, Plots;

# A. Life expectancy using cross-section mortality rates
tmp1 = cumprod((1.0 .- Γ_AGEq_PERq).^(0.25),dims=1)
tmp2 = [1.0 .- tmp1[1:1,:] ; tmp1[1:end-1,:] .- tmp1[2:end,:]]
LE_CS = (age_quarters'*tmp2);

# B. Life expectancy at birth
LE_birth=zeros(size(Γ_AGEq_PERq,2))
nAge=size(age_quarters,1)
tmp_death_rate = [Γ_AGEq_PERq Γ_AGEq_PERq[:,end]*ones(1,nAge)]
for qq=1:size(Γ_AGEq_PERq,2)
    tmp0 = diag(tmp_death_rate[:,qq:(qq+nAge-1)]);
    tmp1 = cumprod((1.0 .- tmp0).^(0.25),dims=1)
    tmp2 = [1.0 .- tmp1[1:1,:] ; tmp1[1:end-1,:] .- tmp1[2:end,:]]
    LE_birth[qq:qq] = (age_quarters'*tmp2);
end

# Plotting
myper=1:800;
figLE_CS_Birth=plot(per_quarters[myper], [LE_CS[myper] LE_birth[myper]],
    xlabel = "Period",
    ylabel = "Years",
    title = "Life expectancy",
    label = ["In cross-section" "At birth"],
    color=[:black :red],    
    linestyle = [:dash :dot],
    linewidth = [4 4],
    legend =:bottomright
)
display(figLE_CS_Birth)

# Exporting data for inspection
writedlm("CleanData/CHK_life_expectancy_1900_2100_Q.csv", [per_quarters[1:800] LE_CS[1:800] LE_birth[1:800]], ',');
run(`chmod 664 CleanData/CHK_life_expectancy_1900_2100_Q.csv`);