# Best Neighborhood in Pittsburgh

## Team: Group 11
#### Tom Greene
#### Josavee Sok-Coyle
#### Troy Reinhardt

## Part 1: Art - Tom

One of the coolest things about coming from the suburbs into the city is the "different style". Cities tend to have a lot more character than normal towns and because of that have more charm. This is due to a number of things such as the people, stores/ restaurants, architecture, food, etc. Of these reasons, a very underrated one that can turn a neighborhood from good to great is art. While many believe public art doesn't matter when it comes to defining a neighborhood, in actuality it is extremely important. Public art is what causes many people to visit neighborhoods they wouldn't otherwise visit, which is why, as a city rich of art, I chose to rank the neighborhoods of Pittsburgh based on the amount of art they have.

In [None]:
##### load pandas (always do this first)
import pandas as pd
import numpy as np
# If you don't do this what are you doing with your life.
data = pd.read_csv("https://data.wprdc.org/datastore/dump/00d74e83-8a23-486e-841b-286e1332a151")
# Grouping the data by Neighborhood
neighborhood = data.neighborhood
# Below gathers all of the instances a neighborhood appears on this list
neighborhood.value_counts()


In [None]:
# Imports plotting devices
from matplotlib import pyplot as plt

# Creates bar graph
neighborhood.value_counts().head(10).plot(kind='bar')

plt.title('Number of Public Art Pieces per Pittsburgh Neighborhood')
plt.xlabel('Neighborhoods')
plt.ylabel('Number of Art Pieces')


##### Best Neighborhood???
The winner is the Central Business District with Squirrel Hill South in second and Allegheny Center in third.

##### Conclusion
Personally, my I was not surprised to find out that the central business district won due to the fact that it is the downtown "Touristy" area of Pittsburgh. I would also say it is my favorite part of Pittsburgh (other than Oakland of course!). I was mostly surprsed to find that Oakland did not even make the top ten because it seems like there is a ton of art in Oakland.

## Part 2: Capital Projects - Josavee

In my part of the project, I chose capital projects as my metric. I am analyzing which neighborhood brings in the most government money to spend on projects that better their neighborhood.

In [None]:
capital_data = pd.read_csv("https://data.wprdc.org/datastore/dump/2fb96406-813e-4031-acfe-1a82e78dc33c")
capital_data


Checking which neighborhood has the most and least projects

In [None]:
print(capital_data['neighborhood'].value_counts())

In [None]:
#editing the data into just the budgeted amount and neighborhood
data = capital_data.loc[:, ['budgeted_amount', 'neighborhood']]
data
t= data.groupby('neighborhood')['budgeted_amount'].sum()
t = pd.DataFrame(t)
t

#### Which neighborhood receives the most money for capital projects?

In [None]:
t = t.sort_values(by=['budgeted_amount'], ascending=False)
graph = t.plot.barh(figsize=(15,data.shape[0] * 0.025), legend=None)
graph.set_ylabel('Neighborhood')
graph.set_xlabel('Total Captial Project Money')

In [None]:
#gives a number 0-100 to each neighborhood
total_sum = capital_data['budgeted_amount'].sum()
final = data.groupby('neighborhood')['budgeted_amount'].sum()/total_sum * 3200
final.plot.bar(figsize=(15,data.shape[0] * 0.01))
print(final)

#### Conclusion

While I expected South Side Flats to rank first I am still surprise that Greenfield won. The dataset didn't mention Greenfield as much, however they were given a large budget for each project so I suppose it's not TOO shocking. Greenfield also had a lot of engineering and facility improvement projects so that might be a reason they ranked first.

## Part 3: Property - Troy

I chose to use property sales for my data to determine the best neighborhood as I figured the more expensive the sales are, the nicer the area would be. Although, I do recognize money is not always everything.

In [None]:
property_data = pd.read_csv("https://data.wprdc.org/datastore/dump/8eff881d-4d28-4064-83f1-30cc991cfec7")
property_data

In [None]:
column_names = ["PROPERTYADDRESS", "NEIGHBORHOOD", "SALEPRICE", "BEDROOMS"]
house_data = pd.DataFrame(property_data, columns=column_names)
houses = house_data[house_data['BEDROOMS'].notna()]
houses = houses[houses['NEIGHBORHOOD'].notna()]
houses

In [None]:
avg_n = dict(houses.groupby('NEIGHBORHOOD')['SALEPRICE'].mean())
avg_info = {'Average': avg_n}
avgF = pd.DataFrame(avg_info)
avgF = avgF.sort_values(by=['Average'], ascending=False)
avgF

In [None]:
graph = avgF.plot.barh(figsize=(15,avgF.shape[0] * .5), legend=None)

In [None]:
avgF['Score'] = (avgF['Average']/10000)*3
avgF

#### Conclusion

Based on the data for house sales, the Strip District is the best neighborhood due to it having the highest average value in their property sales. It is slightly different than what I expected as I expected Squirrel Hill and Shadyside to top the list, but they ended third and fifth

## Conclusion

In [None]:
combined = pd.read_csv("Combined_Scores.csv")

In [None]:
combined

In [None]:
neighborhoods = combined["Neighborhood"].values
propertyScore = combined["Property Score"].values
capitalScore = combined["Capital Projects Score"].values
artScore = combined["Art Score"].values
avgScore = combined["Overall Score(AVG)"].values

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.figure(figsize=(20,10))
plt.bar(neighborhoods, propertyScore, width= 0.5, label = 'Property Score', color = 'red', bottom = capitalScore)
plt.bar(neighborhoods, capitalScore, width= .5, label = 'Capital Score', color = 'blue', bottom = artScore)
plt.bar(neighborhoods, artScore, width= .5, label = 'Art Score', color = 'green')
plt.xticks(neighborhoods,rotation=90)
plt.ylabel("Score")
plt.ylim(0,180) 
plt.xlabel("Neighborhoods")
plt.show()