# Adding school characteristics to the combined surveys dataset

This notebook joins the school characteristics (geographic and school type, class size, and demographic information) to the combined surveys dataset created in `processing/01_combine_surveys.ipynb`.

## Import Python libraries and set working directories

In [1]:
import os
import feather
import numpy as np
import pandas as pd

In [2]:
input_dir = os.path.join(os.path.dirname(os.getcwd()), 'data', 'input')
intermediate_dir = os.path.join(os.path.dirname(os.getcwd()), 'data', 'intermediate')
output_dir = os.path.join(os.path.dirname(os.getcwd()), 'data', 'output')

## Run notebooks to load school characteristics data

Run the processing notebooks for the three school characteristic dataframes:

In [3]:
%%capture
%run schools_geography_type.ipynb
%run schools_class_size.ipynb
%run schools_demographics.ipynb

## Load data

In [4]:
surveys = pd.read_feather(os.path.join(intermediate_dir, 'combined_surveys.feather'))
geography = pd.read_feather(os.path.join(intermediate_dir, 'df_locations.feather'))
class_size = pd.read_feather(os.path.join(intermediate_dir, 'df_class_size.feather')) 
demographics = pd.read_feather(os.path.join(intermediate_dir, 'df_demographics.feather')) 

## Merge data

In [5]:
merge1 = pd.merge(surveys, 
                  geography.drop(['school_name'], axis = 1), 
                  on = 'dbn', how = 'left')

In [6]:
merge2 = pd.merge(merge1, class_size, on = 'dbn', 
                  how = 'left', indicator = '_mergesize') # Charter schools do not have class size information

In [7]:
merge3 = pd.merge(merge2, demographics, on = 'dbn', how = 'left', indicator = '_mergedemo')

In [8]:
merge3.drop(['school_name', 'school_name_y'], axis = 1, inplace = True)

In [9]:
merge3.rename(columns = {'school_name_x':'school_name'}, inplace = True)

## Save data

Save the `merge3` dataframe, which represents the cleaned data, to a [feather](https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/) file in the `data/output` folder. We'll also save it as a csv file.

In [10]:
merge3.to_feather(os.path.join(output_dir, 'combined_data.feather'))
merge3.to_csv(os.path.join(output_dir, 'combined_data.csv'), index = False)