In [8]:
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from tqdm import tqdm
import geopandas
import re
import json
import statistics

# haversine is used to properly calculate the distance between two coordinates
import haversine as hs

# imports to create a custom colormap
from matplotlib import cm
from matplotlib.colors import ListedColormap, LinearSegmentedColormap

ModuleNotFoundError: No module named 'haversine'

## Metric 3: Vacant Houses

### Data from: https://data.wprdc.org/dataset/vacant-addresses 
A getaway driver has to drive somewhere, and the best places for criminals to hide their new illicit gains are vacancies. Therefore, it stands to reason that the best areas to be a getaway driver are areas with many different vacancies to choose from. To find the best neighborhood to commit getaway crimes in Pittsburgh, the data set that I analyzed is the WORD Pennsylvania Vacant Addresses set. This set is created from the postal services record of failed deliveries, and uses this record over time to determine which addresses are vacant. The postal service has been gathering this data since its creation, but this vacancy set was created in 2016 and only observed back to 2012, and is updated to be current multiple times a year. It includes many fields, but the only ones applicable to my purposes are the geoid and the total vacancy count.

In [9]:
vac_data = pd.read_csv("https://data.wprdc.org/dataset/c5ae9638-7572-4ed9-a193-384cb2ff4d03/resource/0d97c478-cce1-488d-aabc-da116cc6987d/download/all_usps_vacancy_data.csv", index_col="geoid")

len(vac_data)

119066

Oh No! Thats too many!

It turns out that tihis data set is for the whole of Pennsylvania, not just Pittsburgh. Not only that, but they are stored in geoids and not neighborhoods, like the other sets in this project. 

To rectify this, I need to both identify the Pittsburgh geoids, and transform them into neighborhoods.

These Neghborhood names where provided by my groups other sets, and the correlation to geoids was provided by the census at https://www2.census.gov/geo/docs/reference/codes/files/st42_pa_places.txt.

In [10]:
nhoods = pd.DataFrame({"name" : ['Allegheny West','Allentown','Arlington','Arlington Heights','Banksville','Bedford Dwellings','Beechview','Beltzhoover','Bloomfield',
 'Bluff','Bon Air','Brighton Heights','Brookline','California-Kirkbride','Carrick','Central Business District',
 'Central Lawrenceville','Central North Side','Central Oakland','Chartiers City','Chateau','Crafton Heights','Crawford-Roberts',
 'Duquesne Heights','East Allegheny','East Carnegie','East Hills','East Liberty','Elliott','Esplen',
 'Fairywood','Fineview','Friendship','Garfield','Glen Hazel','Golden Triangle/Civic Arena','Greenfield','Hays','Hazelwood','Highland Park','Homewood North','Homewood South',
 'Homewood West','Knoxville','Larimer','Lincoln Place','Lincoln-Lemington-Belmar','Lower Lawrenceville','Manchester','Marshall-Shadeland','Middle Hill',
 'Morningside','Mount Oliver','Mount Washington','Mt. Oliver Boro','Mt. Oliver Neighborhood','New Homestead','North Oakland','North Shore',
 'Northview Heights','Oakwood','Outside City','Outside County','Outside State','Overbrook','Perry North','Perry South','Point Breeze',
 'Point Breeze North','Polish Hill','Regent Square','Ridgemont','Shadyside','Sheraden','South Oakland','South Shore','South Side Flats','South Side Slopes',
 'Spring Garden','Spring Hill-City View','Squirrel Hill North','Squirrel Hill South','St. Clair','Stanton Heights','Strip District',
 'Summer Hill','Swisshelm Park','Terrace Village','Troy Hill','Troy Hill-Herrs Island','Upper Hill','Upper Lawrenceville','West End',
 'West Oakland','Westwood', 'Windgap'], 
"geoid" : [4200892, 4202000, 4203008, 0, 4204944, 4204984, 0 , 4207040,
0, 0, 4208680, 0, 4210768, 0, 0, 4242016, 0, 4255992, 4212848, 0, 4216848, 0, 4220432,4200876 
,4211336,0 ,4243064 ,0 ,0,0 ,0,4227968 ,0 ,4229632,0 ,4231008 ,4233312,0 ,4234592,4235488 ,0 
,0 ,4240360 ,4241568 ,4243408,4226296,4242016,4246872,4247696,0 ,0 ,4251744 ,4222016 ,4251744 
,4256712,4235424 ,4256008 ,0 ,0 ,4256232 ,0 ,0 ,0,0
,4259440,4259448 ,4261832 ,4261864 ,0 ,0  ,4207976,4269448,4246376 ,4255992 ,4272472 ,427250 ,0
,4273168 ,4273224 ,4273248, 4273240, 4279274, 4254104, 4219312, 4275168,4275168, 4275816, 4238640, 4277584, 4277592,
4279064, 4279080, 4282848, 4256016, 4284248, 4285720]})

First, I have to make a data frame of all neighborhoods in Pittsburgh, and their respective geoids. Next, I will compare geoids in each to find if each neighborhood has at least 3 vacancies. If there are any less than 3, then it would be too obvious to any law enforcement which safehouse stores the large sacks of money or priceless museum artifacts.