# Births

This file stitches together historical estimates and projections of live births in the United States to produce a quarterly series of live births for the 1900-2100 period. It uses information from four data sources:

1. Yearly intercensal population estimates of the number of kids aged less than 1 year old from the U.S. Census Bureau.
2. Monthly data on the number of live births in the United States produced by the National Center for Health Statistics (NCHS) since January 1972.
3. Five-year birth projections from the United Nations (U.N.) Population Program.
4. Mortality rates by age and birth cohort from Bell and Miller (2005).

Before 1972, our series is based on the Census Bureau's intercensal population estimates, which we adjusted for infant mortality using our interpolation of Bell and Miller's (2005) mortality rate tables. The Census population data are aggregated in our notebook "MortalityRates.ipynb", which contains all sources. For 1972 to 2014, we directly use yearly NCHS data. For 2015 onward, we use a projection of the UN data on the Census and NCHS samples (with five-year averages), which is consistent with NCHS data in recent years. The resulting stitched data series mixes yearly and five-year frequencies. To obtain yearly and quarterly series over the full 1900-2100 period, we interpolate the series using cubic splines. We impose no growth in the total number of births in the last 2.5 years of quarterly observations, consistent with the levelling off of the U.N. birth series. This assumption helps ensure the existence of an ergodic distribution of population with no net growth.

To derive yearly birth estimates based on yearly Census population figures, we initially assume that the number of quarterly births is constant between population measurements, which correspond to July 1 of each calendar year. This identifiying assumption is made to take into account high infant mortality in the initial quarters of life. The steps in the underlying quarterly birth series are then smoothed when we aggregate the data back to a yearly frequency and then use splines to extract a smooth quarterly series.  

Our birth estimates based on the Census data do not adjust for infants who migrated to the United States or a foreign country in the period they were born. However, our Census-based estimates line up closely with yearly averages of the monthly NCHS data. Moreover, our treatement of net migration in the notebook "Migration.ipynb" resolves any discrepancy.

### Stiching the birth data

In [None]:
# Declaring the series of quarterly births
births_census_Q = ones(800,1) * NaN

# Reading the Census population data and mortality rates
using DelimitedFiles
population = readdlm("RawData/population_1900_2060_Total.csv", header=false,',',skipstart=1)
death_rate = readdlm("CleanData/interp_death_rate_1900_2220_Q.csv", header=false, ',')
survival_rate = (1.0 .- death_rate).^(0.25);

# 1900:H1 (H = half; ie, the first half of 1900)
# Here we assume that the mortality rates in 1900:Q1 prevailed in all preceeding periods.
births_census_Q[1:2,1] .= population[1,2]/(survival_rate[1,2] + survival_rate[1,1]*survival_rate[2,2] + survival_rate[1,1]*survival_rate[2,1]*survival_rate[3,2] + survival_rate[1,1]*survival_rate[2,1]*survival_rate[3,1]*survival_rate[4,2])

# 1900:H2 to 2060:H1
for yy = 2:161
    tmp = population[1,yy+1]/(survival_rate[1,4*(yy-1)+2] + survival_rate[1,4*(yy-1)+1]*survival_rate[2,4*(yy-1)+2] + survival_rate[1,4*(yy-1)]*survival_rate[2,4*(yy-1)+1]*survival_rate[3,4*(yy-1)+2] + survival_rate[1,4*(yy-1)-1]*survival_rate[2,4*(yy-1)]*survival_rate[3,4*(yy-1)+1]*survival_rate[4,4*(yy-1)+2])
    births_census_Q[(4*(yy-1)-1):(4*(yy-1)+2),1] .= tmp
end
births_census_Y = sum(reshape(births_census_Q,4,200),dims=1)'

# Reading the yearly NCHS birth data
births_nchs_Y = readdlm("RawData/NCHS_births_yearly.csv", header=false, ',')
births_nchs_Y = births_nchs_Y[:,2]
births_nchs_years = 1972:2014

# Reading the five-year U.N. birth data
births_un_5Y = readdlm("RawData/UN_birth_projections.csv", header=false, ',')
births_un_5Y = births_un_5Y[:,2]
births_un_years = 1952.5:5:2100;

### Interpolating the birth data

In [None]:
# Aggregating the data
my_births = [births_census_Y[1:72]; births_nchs_Y; 1000/5*births_un_5Y[14:end,:]];
my_births_years = [(1900:2014); births_un_years[14:end]]

# Declaring periods of interpolation
perY = 1900.5:1:2100
perQ = 1900.125:.25:2100

# Interpolating
include("ordernorep.jl")
include("spline_cubic.jl")
using Statistics
births_Y = spline_cubic(my_births_years,my_births,perY,0,0)
births_Q = spline_cubic(my_births_years,my_births,perQ,0,0)

# Saving data to CSV files
writedlm("CleanData/births_Y.csv", births_Y, ',')
writedlm("CleanData/births_Q_annualized.csv", births_Q, ',')

In [None]:
# Changing permissions
run(`chmod 664 CleanData/births_Y.csv`);
run(`chmod 664 CleanData/births_Q_annualized.csv`);