# Coursera IBM Capstone Project
## Part 1

### A description of the problem and a discussion of the background. 

As of July 6, 2019, CBS reports "U.S. housing rents hit record-high average of 1,405 per month. Think your rent is pricier than ever? You may be right. The national average rent reached an all-time high of 1,405 in June, a 2.9 percent increase from a year earlier, according to data from Yardi Matrix." 
It is no suprise for residents living in New York or the Bay Area that rent is expensive, but is there releif on the horizon or should renters brace for impact? 

### A description of the data and how it will be used to solve the problem.

- The data is in tabular form (CSV) and contains 4000 rows by 70 columns. Following the descriptive columns (Location, Location Type, State, and Bedroom_Size) are the numerical data column, each containing a rent price for a particular month of the year. 
- The Date range of the data is 01/2014 - 06/2019
- Location: Name of the city
- Location Type: Type of municipality (i.e. a city, town, etc.) 
- State: What state is the municipality is located in
- Bedroom Size: Either a Studio, 1 Bedroom, 2 Bedroom, 3 Bedroom, or 4 Bedroom
- The data will be explored, mined, grouped, visualized, and ultimately used to train a model that can predict future rent prices. 

## Part 2

### Introduction: The Business problem and who may be interested in this project.

Simply put, rent is on the rise. But what can renters expect in the future? 
- Current renters may be interested in seeing how their rent will change. 
- Those seeking to rent may be interested in using this to select an area. 
- Property managers and Real Estate professionals may be interested in leveraging these preditions to help influence business decisions in particular areas. 

### Data:

The data has been gathered from Apartment List (https://www.apartmentlist.com/rentonomics/rental-data/). The following is a blurb from their website describing their service and mission:
- "Apartment List’s Rent Reports cover rental pricing data in major cities, their suburbs, and their neighborhoods. We provide valuable leading indicators of rental price trends, highlight data on top cities, and identify the key facts renters should know. As always, our goal is to provide price transparency to America’s 105 million renters to help them make the best possible decisions in choosing a place to call home.  Apartment List publishes Rent Reports during the first calendar week of each month." 
- The data will be explored, cleaned, grouped, visualized, and ultimately used to train a model that can predict future rent prices. 

### Methodology:

#### Exploratory data analysis

#### Inferential Statistical Testing 

#### What machine learnings were used and why

### Results:

### Discussion: Noted Observations and Recommendations based on the results.

### Conclusion:

## Part 3

### Your choice of a presentation or blogpost.

---

# Analysis and Visualizations

### Import Dependancies

In [1]:
import pandas as pd
from pandas.io.json import json_normalize
import numpy as np
import requests
import json
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import seaborn as sns
from sklearn.cluster import KMeans
import folium

In [2]:
rent_df = pd.read_csv('rent_data.csv')
rent_df.shape

(4000, 70)

## Get Latitude and Longitude of Each City in the Data Set

In [3]:
# Due to continued 'Time Out' errors, split the 800 unique cities into 8 batches of 100 each
locations = list(rent_df['Location'].unique())
location_batch_1 = locations[0:101]
location_batch_2 = locations[101:201]
location_batch_3 = locations[201:301]
location_batch_4 = locations[301:401]
location_batch_5 = locations[401:501]
location_batch_6 = locations[501:601]
location_batch_7 = locations[601:701]
location_batch_8 = locations[701:-1]

In [4]:
#Create empty lists to be used as columns after gathering Latitude and Longitude 
lat = []
long = []

In [6]:
#Define function to get lat/longs from a particular list of city names 
def lat_long(city_list):
    count = 1
    for i in city_list:
        city = str(i)
        geo = Nominatim(user_agent="my-application",timeout=None)
        location = geo.geocode(city)
        latitude = location.latitude
        longitude = location.longitude
        lat.append(latitude)
        long.append(longitude)
        print(f"{count}/{len(location_batch_1)} {i}: {latitude}, {longitude}")
        count +=1

In [7]:
lat_long(location_batch_1)

1/101 New York, NY: 40.7127281, -74.0060152
2/101 Los Angeles, CA: 34.0536909, -118.2427666
3/101 Chicago, IL: 41.8755616, -87.6244212
4/101 Houston, TX: 29.7589382, -95.3676974
5/101 Philadelphia, PA: 39.9527237, -75.1635262
6/101 Phoenix, AZ: 33.4485866, -112.0773456
7/101 San Antonio, TX: 29.4246002, -98.4951405
8/101 San Diego, CA: 32.7174209, -117.1627714
9/101 Dallas, TX: 32.7762719, -96.7968559
10/101 San Jose, CA: 37.3361905, -121.8905833
11/101 Indianapolis, IN: 39.7683331, -86.1583502
12/101 Jacksonville, FL: 30.3321838, -81.655651
13/101 San Francisco, CA: 37.7792808, -122.4192363
14/101 Austin, TX: 30.2711286, -97.7436995
15/101 Columbus, OH: 39.9622601, -83.0007065
16/101 Fort Worth, TX: 32.753177, -97.3327459
17/101 Charlotte, NC: 35.2270869, -80.8431268
18/101 Detroit, MI: 42.3315509, -83.0466403
19/101 El Paso, TX: 31.7600372, -106.487287
20/101 Memphis, TN: 35.1490215, -90.0516285
21/101 Nashville, TN: 36.1622296, -86.7743531
22/101 Baltimore, MD: 39.2908816, -76.61075

In [8]:
lat_long(location_batch_2)

1/101 Tacoma, WA: 47.2495798, -122.4398746
2/101 Oxnard, CA: 34.1976308, -119.1803818
3/101 Aurora, IL: 41.7571701, -88.3147539
4/101 Augusta, GA: 33.4709714, -81.9748429
5/101 Fontana, CA: 34.0922335, -117.435048
6/101 Mobile, AL: 30.6943566, -88.0430541
7/101 Little Rock, AR: 34.7464809, -92.2895948
8/101 Moreno Valley, CA: 33.937517, -117.2305944
9/101 Glendale, CA: 34.192912, -118.246248614754
10/101 Amarillo, TX: 35.2072185, -101.8338246
11/101 Columbus, GA: 32.4609764, -84.9877094
12/101 Grand Rapids, MI: 42.9632405, -85.6678639
13/101 Salt Lake City, UT: 40.7670126, -111.8904308
14/101 Tallahassee, FL: 30.4380832, -84.2809332
15/101 Worcester, MA: 42.2625932, -71.8022934
16/101 Newport News, VA: 36.9786449, -76.4321089
17/101 Huntsville, AL: 34.729847, -86.5859011
18/101 Knoxville, TN: 35.9603948, -83.9210261
19/101 Providence, RI: 41.8677428, -71.5814834
20/101 Santa Clarita, CA: 34.3916641, -118.542586
21/101 Grand Prairie, TX: 32.657368, -97.0284662417504
22/101 Brownsville, 

In [None]:
lat_long(location_batch_3)

In [None]:
lat_long(location_batch_4)

In [None]:
lat_long(location_batch_5)

In [None]:
lat_long(location_batch_6)

In [None]:
lat_long(location_batch_7)

In [None]:
lat_long(location_batch_8)

In [None]:
rent_df['State'].value_counts()

### Exploraton of the Data

In [None]:
rent_df.groupby(['State','Bedroom_Size']).mean().round(2)

In [None]:
rent_df.groupby(['State','Location','Bedroom_Size']).mean().round(2)

#### Seperate By State

#### Seperate By City Within State