# Population at the Turn of 1900

The distribution of population by age in the model evolves from period to period through births, deaths, and net migration, given an initial distribution of population at the turn of 1900. In the notebook "Migration.ipynb", we calculated this inital distribution by single year of age up to age 74 years in a way that is consistent with the Census Bureau's (smoothed) annual population estimates and our interpolation of Bell and Miller's (2005) mortality rates. Smoothed annual population data by single year of age for persons 75 years and older are not available for the first few decades of the 20th Century. In this notebook, we take advantage of raw population counts by single years of age up to 99 years old from the 1900 Census to fill in nearly all the missing data. In doing so, we make the identifying assumption that there is no net migration in the first half of 1900. This assumption is of limited consequence for our simulations because older individuals have almost no influence on the initial aggregate capital-labor ratio in the model given their relatively small number and their limited labor and capital endowments. Moreover, we reconcile any discrepancy in the initial distribution of population through our net migration calculations, which are detailed in "Migration.ipynb". 

We stitch the distribution based on raw Census counts with the pre-75 distribution recovered in "Migration.ipynb". The stitched distribution is exported to a CSV file for use in the model. We opt against using the raw Census counts for years earlier than 75 years old because there is apparent over-reporting of counts for ages ending in 5s and 0s. 

In addition to calculating an population distribution, this notebook calculates the accompanying parent and responsible-adult dependency structures. The age of the parent is set equal to the age of the mother plus half the age difference between men and women at time of first marriage. We assume that kids of parents aged less than 18 years old are dependents of their grandparents. 

## Data sources

The raw Census counts for 1900 are found in the U.S. Census Bureau's "Census of Population and Housing, 1900", Volume 1: Population: Population of States and Territories, chapter 4.  The information is available in a PDF document at https://www.census.gov/prod/www/decennial.html. We manually entered the information in a the related CSV file read by this notebook. The file also makes use of our interpolated mortality rates, which are computed in "MortalityRates.ipynb".


In [1]:
# Reading the raw 1900 Census data
using DelimitedFiles
population1900 = readdlm("RawData/Census_1900_raw_counts.csv",',',skipstart=1)
population1900[5,2:4]=sum(population1900[1:5,2:4],dims=1)
population1900=population1900[5:end-1,1:4]
population1900[1,1]=0;

# Fitting a 4-th order polynomial in the (log of) distribution
X = [ones(100,1) (0.5:100) (0.5:100).^2  (0.5:100).^3 (0.5:100).^4]
Y = Array{Float64,1}(population1900[:,4])
Y = log.(Y)
b = inv(X'*X)*X'*Y

# Projecting the age distribution
ageY = 0.5:120
ageQ = 0.125:0.25:120
Proj_pop_raw1900_Y = exp.([ones(length(ageY),1) ageY ageY.^2 ageY.^3 ageY.^4]*b);
Proj_pop_raw1900_Q = exp.([ones(length(ageQ),1) ageQ ageQ.^2 ageQ.^3 ageQ.^4]*b);

In [2]:
# Backcasting the distribution in 1899 from yearly mortality rates
# We know from the smoothed Census that the population aged 75+ years is 899,000 in 
# 1900. Below we adjust the level of the projected series to match that figure,
# thus ensuring a smooth stitching point.
Proj_pop_raw1900_Y = Proj_pop_raw1900_Y*899000/sum(Proj_pop_raw1900_Y[76:end])
Γ_AGEy_PERy = readdlm("CleanData/interp_death_rate_1900_2220_Y.csv", ',');
population_1899_raw_Y = [Proj_pop_raw1900_Y[2:end,1]./(1.0.-Γ_AGEy_PERy[2:end,1]) ; 0];

In [3]:
# Backcasting the distribution in 1899 from quarterly mortality rates
Proj_pop_raw1900_Q=Proj_pop_raw1900_Q./sum(Proj_pop_raw1900_Q[301:end],dims=1)*(899000.0*4.0)
Γ_AGEq_PERq = readdlm("CleanData/interp_death_rate_1900_2220_Q.csv", ',');
population_1899_raw_Q = zeros(480,1)
age_quarters=0.25:0.25:120
population_1899_raw_Q[1:398]=Proj_pop_raw1900_Q[3:400]./((1.0.-Γ_AGEq_PERq[3:400,2]).^0.25)./((1.0.-Γ_AGEq_PERq[2:399,1]).^0.25)
population_1899_raw_Q=population_1899_raw_Q/4.0 # Removing annualization

# Saving the distribution based on raw Census counts
writedlm("CleanData/population_1899_raw_Y.csv",[ageY population_1899_raw_Y], header = false, ',')
writedlm("CleanData/population_1899_raw_Q.csv",[ageQ population_1899_raw_Q], header = false, ',')

In [4]:
# Combining smoothed and raw estimates of the population at the turn of 1900
population_1899_smoothed_Y = readdlm("CleanData/population_1899_smoothed_Y.csv", ',');
population_1899_smoothed_Q = readdlm("CleanData/population_1899_smoothed_Q.csv", ',');
population_1899_Y = [population_1899_smoothed_Y[1:73,2] ; population_1899_raw_Y[74:end]]
population_1899_Q = [population_1899_smoothed_Q[1:297,2]; population_1899_raw_Q[298:end]]

# Saving the stitched distribution based on raw Census counts
writedlm("CleanData/population_1899_Y.csv",population_1899_Y, header = false, ',')
writedlm("CleanData/population_1899_Q.csv",population_1899_Q, header = false, ',')

### Estimating the family dependency structure

In [5]:
# Reading the share of births by age of mother and age difference between married men and women 
share_births_1900 = readdlm("CleanData/share_births_mothers_Y.csv", header = false, ',')[:,1];
share_births_1900Q1 = readdlm("CleanData/share_births_mothers_Q.csv", header = false, ',')[:,1];
marriageAgeDiff_1900 = readdlm("CleanData/marriageAgeDiff_Y.csv", header = false, ',')[1,1];
marriageAgeDiff_1900Q1 = readdlm("CleanData/marriageAgeDiff_Q.csv", header = false, ',')[1,1];

# Allocating children to biological mothers
Mother_child_1899_Y = zeros(120,18)
Mother_child_1899_Q = zeros(480,72)
for ak = 1:18
    Mother_child_1899_Y[:,ak] = [zeros(14+ak-1,1) ; share_births_1900 ; zeros(120-14-ak+1-length(share_births_1900),)]*population_1899_Y[ak,1]
end
for ak = 1:72
    Mother_child_1899_Q[:,ak] = [zeros(56+ak-1,1) ; share_births_1900Q1 ; zeros(480-56-ak+1-length(share_births_1900Q1),)]*population_1899_Q[ak,1]
end

# Allocating children to responsible adults (with split of age difference between married men and women)
# Dependents are allocated proportionally to the distance between the two nearest age bins
Parent_child_1899_Y = zeros(120,18);
Parent_child_1899_Q = zeros(480,72);
for ak = 1:18
    iparent = (14:49) .+ ak .+ Int64(floor(marriageAgeDiff_1900/2.0))
    Parent_child_1899_Y[iparent,ak] = (1.0 - marriageAgeDiff_1900/2.0 + floor(marriageAgeDiff_1900/2.0))*Mother_child_1899_Y[(14:49).+ak,ak]
    Parent_child_1899_Y[iparent.+1,ak] = Parent_child_1899_Y[iparent.+1,ak]+(marriageAgeDiff_1900/2.0-floor(marriageAgeDiff_1900/2.0))*Mother_child_1899_Y[(14:49).+ak,ak]
end
for ak = 1:72
    iparent = (56:199) .+ ak .+ Int64(floor(2.0*marriageAgeDiff_1900Q1))
    Parent_child_1899_Q[iparent,ak] = (1.0.-(2.0*marriageAgeDiff_1900Q1-floor(2.0*marriageAgeDiff_1900Q1)))*Mother_child_1899_Q[(56:199).+ak,ak]
    Parent_child_1899_Q[iparent.+1,ak] = Parent_child_1899_Q[iparent.+1,ak] + (2.0*marriageAgeDiff_1900Q1-floor(2.0*marriageAgeDiff_1900Q1))*Mother_child_1899_Q[(56:199).+ak,ak]
end

writedlm("CleanData/parent_child_1899_Y.csv",Parent_child_1899_Y, header = false, ',')
writedlm("CleanData/parent_child_1899_Q.csv",Parent_child_1899_Q, header = false, ',')

In [6]:
# Allocating kids of underaged parents to their responsible adult
Dependents_1899_Y = zeros(120,18);
Dependents_1899_Q = zeros(480,72);
Dependents_1899_Y[19:end,:] = Parent_child_1899_Y[19:end,:]
Dependents_1899_Q[73:end,:] = Parent_child_1899_Q[73:end,:];

for am = 15:18
    for ak=1:(am-14)
        Dependents_1899_Y[19:end,ak] = Dependents_1899_Y[19:end,ak] + Parent_child_1899_Y[am,ak]*(Parent_child_1899_Y[19:end,am]./sum(Parent_child_1899_Y[19:end,am]))
    end
end
for am = 57:72
    for ak=1:(am-56)
        Dependents_1899_Q[73:end,ak] = Dependents_1899_Q[73:end,ak] + Parent_child_1899_Q[am,ak]*(Parent_child_1899_Q[73:end,am]./sum(Parent_child_1899_Q[73:end,am]))
    end
end

# Writing to CSV files
writedlm("CleanData/dependents_1899_Y.csv",Dependents_1899_Y, header = false, ',')
writedlm("CleanData/dependents_1899_Q.csv",Dependents_1899_Q, header = false, ',')

In [7]:
# Changing permissions
run(`chmod 664 CleanData/population_1899_raw_Y.csv`);
run(`chmod 664 CleanData/population_1899_raw_Q.csv`);
run(`chmod 664 CleanData/population_1899_Y.csv`);
run(`chmod 664 CleanData/population_1899_Q.csv`);
run(`chmod 664 CleanData/parent_child_1899_Y.csv`);
run(`chmod 664 CleanData/parent_child_1899_Q.csv`);
run(`chmod 664 CleanData/dependents_1899_Y.csv`);
run(`chmod 664 CleanData/dependents_1899_Q.csv`);