# Data 3 | Neighborhoods

I take the geocoded variables in data2, (lat,lon), and merge them with data1 using idu. 

Then I create a grid of square neighborhoods, assign voters to them based on their geography, generate neighborhood statistic, and assign these neighborhood statistics to the voters in the neighborhood.

To standardize across years and speed up KNN, I generate dictionaries:
- W | A list of latitudes (Widthds) in regular intervals across the geocoded space
- H | A list of longitudes (Heights) in regular intervals across the geocoded space
- Ad | Takes a neighborhood's index and returns a list of indecies for the adjacent neighborhoods 
- WHd | Takes a neighborhoods' index and returns the neighborhood's (W,H) values
- SNd | Takes a neighborhood's index and returns a dataframe containing the included voters
- POPd | Take a neighborhood's index and return its population
- Cd | Take a neighborhood's index and return its centroid (lat,lon)

## Low Key To Do
1. Maybe do everything with the export date instead of year?
2. Use idu as the index?

In [4]:
from data3_neighborhoods import *

data_name = f'{state} @ {meters} m'
print_log[data_name] = {'log':[]}

""" Step 0 | Setup """
LLd, Ad, Cd = setup_dictionaries(meters, state, data_name)

run_years = [x.strip('_geo.pkl') for x in os.listdir(path_2) if '_geo.pkl' in x]
for year in sorted(run_years):
    T0 = time.time()
    file_name = f'{state}_{year}'
    print_log[year] = {'log':[],'sublog':[]}
    printer(print_log)
    
    SNd_file_name = f'SNd_{file_name}m.pkl'

    if SNd_file_name in os.listdir(path_3):
        print_log[year]['log'].append('  Done')
        printer(print_log)
        
    if SNd_file_name not in os.listdir(path_3):
        """ Step 1 | Open and Clean """
        t0 = time.time()
        print_log[year]['log'].append('  Step 1 | Open and Clean')
        printer(print_log)
        
        path_2_geo = f'{path_2}{file_name}_geo.pkl'
        geo_data = pd.read_pickle(path_2_geo)
        #geo_data = geo_data[geo_data.matchtype == 'Exact']
        geo_data = geo_data[['idu', 'lat', 'lon']]
        
        chunk_file_names, chunk_files = [x for x in os.listdir(path_1) if year in x], []
        for chunk_file_name in chunk_file_names:
            with open(path_1 + chunk_file_name,'rb') as f: 
                chunk = pickle.load(f)
                keep_cols = ['idu', 'age', 'gender', 'race', 'D', 'R', 'O']
                chunk = chunk[keep_cols]
            chunk_files.append(chunk)
        data = pd.concat(chunk_files, ignore_index=True)
        data = data.merge(geo_data, how='left', on='idu')
        
        runtime = round(( time.time() - t0 ) / 60 )
        print_log[year]['log'][-1] = f'  Step 1 | Open and Clean (Runtime: {runtime} mins)'
        printer(print_log)
        
        """ Step 2 | Populate Squares """
        t0 = time.time()
        print_log[year]['log'].append('  Step 2 | Populating Squares') 
        SNd, POPd = populate_squares(data, LLd, print_log, year, meters)
        runtime = round(( time.time() - t0 ) / 60 )
        print_log[year]['log'][-1] = f'  Step 2 | Populating Squares (Runtime: {runtime} mins; Squares: {len(SNd)})'
        printer(print_log)
        
        """ Step 3 | Save """
        print_log[year]['log'].append('  Step 3 | Save')
        with open(f'{path_3}SNd_{file_name}m.pkl','wb') as f: 
            pickle.dump(SNd, f)
        with open(f'{path_3}POPd_{file_name}m.pkl','wb') as f: 
            pickle.dump(POPd, f)
        runtime = round(( time.time() - t0 ) / 60 )
        print_log[year]['log'][-1] = f'  Step 3 | Save (Total Runtime: {runtime} mins)'
        printer(print_log)
        
        """ Saving print_log """
        now = datetime.now()
        savedate = ''.join([str(now.year),str(now.strftime('%m')),str(now.strftime('%d'))])#,str(now.strftime('%H'))])
        file = open(f'{path_3}{print_log}_{savedate}.txt', 'w')
        file.write(string_printer(print_log))
        file.close()

NC @ 2000m
  Step 0 | Load Grid
2010
  Done
2011
  Done
2012
  Done
2013
  Done
2015
  Done
2016
  Done
2017
  Done
2018
  Done
2019
  Done
2020
  Step 1 | Open and Clean (Runtime: 1 mins)
  Step 2 | Populating Squares (Runtime: 1 mins; Squares: 4481)
  Step 3 | Save (Total Runtime: 1 mins)
2021
  Step 1 | Open and Clean (Runtime: 1 mins)
  Step 2 | Populating Squares (Runtime: 1 mins; Squares: 4460)
  Step 3 | Save (Total Runtime: 1 mins)


## Analysis

1. I plot the map of square neighborhoods by:
    1. Party composition
    2. Population
    3. Non-two-party composition
2. I then generate histograms and other descriptive plots of the neighborhoods.