# Effects of Real Estate Attributes on Property Pricing, based on the 2016-2017 NYC Property Sales dataset.

## by Tatiana Tikhonova

## Investigation Overview

In this investigation, I would like to look at the property attributes that could be used to predict their prices, zooming in on the size, class, and location.

## Dataset Overview

The dataset consists of information regarding properties sold in New York City over a 12-month period from September 2016 to September 2017. It contains the location, address, type, sale price, and sale date of building units sold. A number of rows were removed due to them missing values that are crucial to our analysis.

In [2]:
# import all packages and set plots to be embedded inline
import numpy as np

import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

import seaborn as sns

import matplotlib.pyplot as plt
import seaborn as sb

%matplotlib inline

# suppress warnings from final output
import warnings
warnings.simplefilter("ignore")
pd.options.mode.chained_assignment = None  # default='warn'

In [3]:
# load in the dataset into a pandas dataframe
df = pd.read_csv('/Users/tatianatikhonova/Documents/udacity/Project4/RealEstateSales/ToGit/wrangled_sales.csv')

# Apartment and Building Sales distribution by Borough

> First, lets see *how many* sales were made in our given time period in each of the city boroughs. Let's add a bit of nuance and some granularity: does apartment sales distribution differ from the one for buildings?


In [None]:
p1 = k[k['Tax Class At Time Of Sale'] ==2].Borough.value_counts().plot(kind='line');
p2 = k[k['Tax Class At Time Of Sale'] !=2].Borough.value_counts().plot(kind='line');
plt.title('Count of Building Sales and Apartment Sales by Borough')
plt.xlabel('Borough')
plt.ylabel('# of Sales')
plt.legend(['Apartment sales','Building sales']);

#• Class 1: Includes most residential property of up to three units (such as one-,
#two-, and three-family homes and small stores or offices with one or two
#attached apartments), vacant land that is zoned for residential use, and most
#condominiums that are not more than three stories.
#• Class 2: Includes all other property that is primarily residential, such as
#cooperatives and condominiums.
#Class 4: Includes all other properties not included in class 1,2, and 3, such as
#offices, factories, warehouses, garage buildings, etc. 

## Property Sales by Price and Borough

> Now let's check which ones get more expensive. We ovserves that the majority of sales were made in Queens, does that mean that the overall sales value there prevails? Take a look at the below slide and find out for yourself!

In [None]:
ax = sns.barplot(x="Borough", 
y='Sale Price', 
data=k, 
palette=sns.color_palette('coolwarm', n_colors=5))
from matplotlib import rcParams
# Specify the figure size in inches, for both X, and Y axes
rcParams['figure.figsize'] = 10,5

## Average Property Cost by Borough

> Now that we know the accumulated total of sales made in each borough, it would be interesting to see the average. How much does a propoerty cost in, say, Staten Island? But keep in mind, these sales include commercial building and factory sales.

In [None]:
avg_price_per_unit.plot(kind='line',color='blue', linestyle ='dotted', linewidth=2);
leg.get_frame().set_alpha(0.5)

plt.ylabel('Avg Price')
plt.title("Average Price by Borough");

## Average Property Cost by Class

> Since we mentioned that all of these real estate deals include commercial sales, it would be eye opening to discover how much each type costs as compared to others. My guess is that office buildings rank highest, what is yours? *(P.S. Check how Family Dwelling sales cost the lowest – something to remember when looking at the last slide.)*

In [None]:
avg_by_class = k.groupby('Class',as_index=False).mean()
avg_by_class.sort_values("Sale Price",ascending=False, inplace=True)
plt.figure(figsize=(16,6))
byclass = sns.barplot(x='Sale Price',y='Class', data=avg_by_class,orient='h')


## Average Apartment Cost

> Let's separate apartments from buildings, and take a look at the former. Which type is on average more expensive, when looking at the 2016-2017 NYC sales?

In [None]:
byclass = sns.barplot(x='Sale Price',y='Class',data=avg_apt_by_class,orient='h');
plt.title("Average Cost of an Apartment");

## Majority of Sales by Class

> Lastly, I'd like to show you what happens if we look at the total # of sales regardless of its price or location: which type of properties sold the most? Surprisingly, it's...

In [None]:
ax = sns.countplot(y="Class",
            data=k[(k['Borough']=='Queens') & 
                   (k['Year Built'].between(1925,1960))],
                   palette="Greens_d",
            order = k['Class'].value_counts().iloc[:5].index);

## The End

I hope you enjoyed this little slide show as much as I did. Thanks for checking it out!

> Once you're ready to finish your presentation, check your output by using
nbconvert to export the notebook and set up a server for the slides. From the
terminal or command line, use the following expression:
> > `jupyter nbconvert <file_name>.ipynb --to slides --post serve --template output_toggle`

> This should open a tab in your web browser where you can scroll through your
presentation. Sub-slides can be accessed by pressing 'down' when viewing its parent
slide. Make sure you remove all of the quote-formatted guide notes like this one
before you finish your presentation!