# Data 3 | Neighborhoods

I take the geocoded variables in data2, (lat,lon), and merge them with data1 using idu. 

Then I create a grid of square neighborhoods, assign voters to them based on their geography, generate neighborhood statistic, and assign these neighborhood statistics to the voters in the neighborhood.

To standardize across years and speed up KNN, I generate dictionaries:
- W | A list of latitudes (Widthds) in regular intervals across the geocoded space
- H | A list of longitudes (Heights) in regular intervals across the geocoded space
- Ad | Takes a neighborhood's index and returns a list of indecies for the adjacent neighborhoods 
- WHd | Takes a neighborhoods' index and returns the neighborhood's (W,H) values
- SNd | Takes a neighborhood's index and returns a dataframe containing the included voters
- POPd | Take a neighborhood's index and return its population
- Cd | Take a neighborhood's index and return its centroid (lat,lon)

In [1]:
from data3_neighborhoods import *

data_name = state + ' @ ' + str(meters) + 'm'
print_log[data_name] = {'log':[]}

""" Step 0 | Setup """
LLd, Ad, Cd = setup_dictionaries(meters, state, data_name)

years = [x.strip('_geo.pkl').strip(state+'_') for x in os.listdir(path_2) if '_geo.pkl' in x]
for year in sorted(years):
    T0 = time.time()
    data_name = str(year)
    file_name = state + '_' + year
    print_log[data_name] = {'log':[],'sublog':[]}
    printer(print_log)
    
    if 'SNd_' + str(year) + '_' + str(meters) + 'm.pkl' in os.listdir(path_3):
        print_log[data_name]['log'].append('  Done')
        printer(print_log)
        
    if 'SNd_' + str(year) + '_' + str(meters) + 'm.pkl' not in os.listdir(path_3):
        """ Step 1 | Open and Clean """
        t0 = time.time()
        print_log[data_name]['log'].append('  Step 1 | Open and Clean')
        printer(print_log)
        path_2_geo = path_2+state+'_'+str(year)+'_geo.pkl'
        with open(path_2_geo,'rb') as f: 
            geo_data = pickle.load(f)
            #geo_data = geo_data[geo_data.matchtype == 'Exact']
            geo_data = geo_data[['idu', 'lat', 'lon']]
        
        chunk_file_names, chunk_files = [x for x in os.listdir(path_1) if x.split('_chunk_')[0] == file_name], []
        for chunk_file_name in chunk_file_names:
            with open(path_1 + chunk_file_name,'rb') as f: 
                chunk = pickle.load(f)
                keep_cols = ['idu', 'age', 'gender', 'race', 'D', 'R', 'O']
                chunk = chunk[keep_cols]
            chunk_files.append(chunk)
        data = pd.concat(chunk_files, ignore_index=True)
        
        data = data.merge(geo_data, how='left', on='idu')
        
        runtime = str(round(( time.time() - t0 ) / 60 ))
        print_log[data_name]['log'][-1] = '  Step 1 | Open and Clean (Runtime: ' + runtime + ' mins)'
        printer(print_log)
        
        """ Step 2 | Populate Squares """
        t0 = time.time()
        print_log[data_name]['log'].append('  Step 2 | Populating Squares')   
        SNd, POPd = populate_squares(data, LLd, print_log, data_name, meters)
        runtime = str(round(( time.time() - t0 ) / 60 ))
        print_log[data_name]['log'][-1] = '  Step 2 | Populating Squares (Runtime: ' + runtime + ' mins; Squares: ' + str(len(SNd)) + ')'
        printer(print_log)
        
        """ Step 3 | Save """
        print_log[data_name]['log'].append('  Step 3 | Save')
        post_script = str(year) + '_' + str(meters) + 'm.pkl'
        with open(path_3 + '/SNd_' + post_script,'wb') as f: 
            pickle.dump(SNd, f)
        with open(path_3 + '/POPd_' + post_script,'wb') as f: 
            pickle.dump(POPd, f)
        runtime = str(round(( time.time() - t0 ) / 60 ))
        print_log[data_name]['log'][-1] = '  Step 3 | Save (Total Runtime: ' + runtime + ' mins)'
        printer(print_log)
        
        """ Saving print_log """
        now = datetime.now()
        savedate = ''.join([str(now.year),str(now.strftime('%m')),str(now.strftime('%d'))])#,str(now.strftime('%H'))])
        file = open(path_3 + 'print_log_' + savedate + '.txt', 'w')
        file.write(string_printer(print_log))
        file.close()

NC @ 2000m
  Step 0 | Load Grid
2010
  Step 1 | Open and Clean (Runtime: 0 mins)
  Step 2 | Populating Squares (Runtime: 0 mins; Squares: 4383)
  Step 3 | Save (Total Runtime: 0 mins)
2011
  Step 1 | Open and Clean (Runtime: 0 mins)
  Step 2 | Populating Squares (Runtime: 0 mins; Squares: 4399)
  Step 3 | Save (Total Runtime: 0 mins)
2012
  Step 1 | Open and Clean (Runtime: 0 mins)
  Step 2 | Populating Squares (Runtime: 0 mins; Squares: 4331)
  Step 3 | Save (Total Runtime: 0 mins)
2013
  Step 1 | Open and Clean (Runtime: 0 mins)
  Step 2 | Populating Squares (Runtime: 0 mins; Squares: 4441)
  Step 3 | Save (Total Runtime: 1 mins)
2015
  Step 1 | Open and Clean (Runtime: 0 mins)
  Step 2 | Populating Squares (Runtime: 1 mins; Squares: 4449)
  Step 3 | Save (Total Runtime: 1 mins)
2016
  Step 1 | Open and Clean (Runtime: 0 mins)
  Step 2 | Populating Squares (Runtime: 1 mins; Squares: 4447)
  Step 3 | Save (Total Runtime: 1 mins)
2017
  Step 1 | Open and Clean (Runtime: 0 mins)
  Step 

## Analysis

1. I plot the map of square neighborhoods by:
    1. Party composition
    2. Population
    3. Non-two-party composition
2. I then generate histograms and other descriptive plots of the neighborhoods.