### Standard Dataframe

This notebook runs the standard dataframe pipeline to produce a single dataframe from the neighborhood level data, which is then put into the "data" directory of the repository. The code also produces maps of each city at the census level and at the neighborhood level. These plots are saved into the "visualizations" directory in the repository.

While the code runs, it checks to see whether we are losing neighborhoods in the merging process. It also checks the population of the census tracts pre-merge and then the final population of the merge. The differences here occur because we double count some census tracts that overlap with multiple neighborhoods. This could be skewing potential results, so we want to process the data further to ensure our results are not largely skewed.

This code produces both "clean" plots and plots with data that has not been cleaned. "Cleaned" data refers to data that has been processes to attempt to remove some census tracts that overlap multiple neighborhoods to avoid duplicating results. At the moment, the code checks whether at least 40% of a census tract is contained in a neighborhood. If so, the census tract is mapped to that neighborhood during the data merge.

In the future, a comparison should be made between the "cleaned" data and the pre-processed data to determine whether the extensive processing of the data is necessary.

The complete code that we call here can be found in the "lib" directory of the repository and the "data_pipeline" directory of the repository.

# import statements
import geopandas
import contextily as cx
import warnings
import pandas as pd
import math
import numpy as np
import matplotlib.pyplot as plt
from statistics import mean
import seaborn as sns
warnings.filterwarnings('ignore')

In [1]:
import os
import sys
sys.path[0] = os.path.join(os.path.abspath(''),'..')

In [2]:
import data_pipeline.spatial_operations as so

In [3]:
import lib.standard_neighborhood_dataframe as sd_n
import lib.standard_censustract_dataframe as sd_ct

### Generate Standard Neighborhood Dataframe and Visualizations

In [4]:
# run the code
#sd_n.generate_dataframe_and_plots()

### Generate Standard Census Tract Dataframe and Visualizations

In [None]:
sd_ct.generate_dataframe_and_plots()

Running austin, 0 of 20
Population before merge: 1206225.0
Population after merge: 1206225.0


Running baltimore, 1 of 20
Population before merge: 1097758.0
Population after merge: 1097758.0


Running boston, 2 of 20
Population before merge: 840601.0
Population after merge: 840601.0


Running chicago, 3 of 20
Population before merge: 3057323.0
Population after merge: 3057323.0


Running dallas, 4 of 20
Population before merge: 1840889.0
Population after merge: 1840889.0


Running denver, 5 of 20
Population before merge: 949630.0
Population after merge: 1045730.0


Running detroit, 6 of 20
Population before merge: 896914.0
Population after merge: 896914.0


Running el-paso, 7 of 20
Population before merge: 773652.0
Population after merge: 773652.0


Running houston, 8 of 20
Population before merge: 3526303.0
Population after merge: 3526303.0


Running indianapolis, 9 of 20
