# School geography and type

This notebook loads data from the 2013-2014 School Locations [file](https://data.cityofnewyork.us/Education/2013-2014-School-Locations/ac4n-c5re) and extracts latitude and longitude and other school characteristics. 

## Import Python libraries and set working directories

In [2]:
import os
import feather
import numpy as np
import pandas as pd

In [3]:
input_dir = os.path.join(os.path.dirname(os.getcwd()), 'data', 'input')
intermediate_dir = os.path.join(os.path.dirname(os.getcwd()), 'data', 'intermediate')
output_dir = os.path.join(os.path.dirname(os.getcwd()), 'data', 'output')

## Load data and select relevant variables

In [29]:
df_locations = pd.read_csv(
    os.path.join(input_dir, '2013_-_2014_School_Locations.csv'), 
    dtype = str
)

df_locations.columns = df_locations.columns.str.lower()

For the NYC schools in our dataset, the `ATS` is the same number as the `DBN`, so we can use this file to link the `DBN` to the `BEDS` for each school. For more information on the ATS and DBN systems, see NYCDOE's report, ["A collection of Acronyms and Jargon"](http://schools.nyc.gov/NR/rdonlyres/0C56D9B8-8DDB-46B8-AA5F-BF77AE8C2803/0/ACRONYMReferenceGuide.pdf) (last updated February 7, 2018).

In [30]:
df_locations['dbn'] = df_locations['ats system code'].str.strip()

df_locations = df_locations[['dbn', 'location_name', 'managed_by_name', 'location_type_description', 
         'location_category_description', 'grades_final_text', 'location 1']]

In [31]:
df_locations.rename(columns = {'managed_by_name':'doe_or_charter', 'location_name':'school_name', 'location 1': 'address', 'location_type_description':'school_type', 
                        'location_category_description':'school_grade_category', 'grades_final_text':'grades_list'}, inplace = True)

## Extract borough from `DBN` code

The NYC Department of Education (NYCDOE) uses a **6-digit alphanumeric `DBN` (District Borough Number),** to identify schools. The code begins with the school's district number (2 digits), then the borough code (1 letter: K = Brooklyn; X = Bronx; Q = Queens; M = Manhattan; R = Staten Island), then the school code (3 digits).

In [32]:
df_locations['borough'] = df_locations['dbn'].str[2]

def recode_borough(series):
    if series == 'K':
        return 'Brooklyn'
    elif series == 'X':
        return 'Bronx'
    elif series == 'Q':
        return 'Queens'
    elif series == 'M':
        return 'Manhattan'
    elif series == "R":
        return 'Staten Island'
    
df_locations['borough'] = df_locations['borough'].apply(recode_borough)

## Extract lat and lon coordinates from school address

In [33]:
df_locations['lat'] = df_locations.address.str.extract('.*\((.*)\).*', expand = False).str.split(', ').str[0]
df_locations['long'] = df_locations.address.str.extract('.*\((.*)\).*', expand = False).str.split(', ').str[1]

In [34]:
df_locations.drop(['address'], axis = 1, inplace = True)

## Save data

Save the `df_locations` dataframe to a [feather](https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/) file in the `data/intermediate` folder.

In [35]:
df_locations.to_feather(os.path.join(intermediate_dir, 'df_locations.feather'))