# Seattle Rental Properties

Objective: Match 2-3 people's workplace locations to an ideal rental property which has an equivalent commute in time and distance.

To Do: 
- Clean Seattle Rental Properties dataset
- Get longitude and latitude of Microsoft and Amazon offices
- K means clustering? - Compare machine learning model 
- Use Travel Time API to find equdistance apartments.

APIs used: 
- Time Travel

Dataset source 
- https://catalog.data.gov/dataset/rental-property-registration updated 7/26/2019

### Data Exploring - JSON
Importing Libaries. The City of Seattle provided a JSON and CSV file.

In [1]:
#import libraries
import json
import pandas as pd
import geopandas as gpd #used for transforming geolocation data
import matplotlib.pyplot as plt

from datetime import datetime  #to convert data to datetime that does not fall within the pandas.to_datetime function timeframe
from shapely.geometry import Point  #transform latitude/longitude to geo-coordinate data
from geopandas.tools import geocode #get the latitude/longitude for a given address
from geopandas.tools import reverse_geocode  #get the address for a location using latitude/longitude

%matplotlib inline

In [66]:
mapbox = "pk.eyJ1IjoidHBhc2FnIiwiYSI6ImNqeXRhZGIwMTAyMmkzaG4wZGNuaGZ3NTkifQ.WRpwsiq2m41ZgMJzO4NDvQ"

In [2]:
import sys
'geopandas' in sys.modules

True

In [3]:
import sys
'geopy' in sys.modules

False

In [4]:
#load json of seattle rental properties
with open("seattle.json", 'r') as read_file:
    seattleprops = json.load(read_file)

In [99]:
#seattleprops

{'meta': {'view': {'id': 'j2xh-c7vt',
   'name': 'Rental Property Registration',
   'attribution': 'City of Seattle',
   'attributionLink': 'http://www.seattle.gov/sdci',
   'averageRating': 0,
   'category': 'Permitting',
   'createdAt': 1524754568,
   'description': 'A list of properties that have registered their rental units with the City of Seattle under the Rental Registration and Inspection Ordinance.',
   'displayType': 'table',
   'downloadCount': 313,
   'hideFromCatalog': False,
   'hideFromDataJson': False,
   'indexUpdatedAt': 1562867880,
   'licenseId': 'PUBLIC_DOMAIN',
   'locale': 'en_US',
   'newBackend': True,
   'numberOfComments': 0,
   'oid': 31914420,
   'provenance': 'official',
   'publicationAppendEnabled': False,
   'publicationDate': 1562867928,
   'publicationGroup': 15065876,
   'publicationStage': 'published',
   'rowsUpdatedAt': 1564242562,
   'rowsUpdatedBy': '5wys-t5s3',
   'tableId': 16352451,
   'totalTimesRated': 0,
   'viewCount': 2309,
   'viewLast

In [6]:
#datatype of seattle props json file
type(seattleprops)

dict

In [7]:
#seattleprops dictionary keys
seattleprops.keys()

dict_keys(['meta', 'data'])

In [8]:
#the metadata
#seattleprops['meta']

{'view': {'id': 'j2xh-c7vt',
  'name': 'Rental Property Registration',
  'attribution': 'City of Seattle',
  'attributionLink': 'http://www.seattle.gov/sdci',
  'averageRating': 0,
  'category': 'Permitting',
  'createdAt': 1524754568,
  'description': 'A list of properties that have registered their rental units with the City of Seattle under the Rental Registration and Inspection Ordinance.',
  'displayType': 'table',
  'downloadCount': 313,
  'hideFromCatalog': False,
  'hideFromDataJson': False,
  'indexUpdatedAt': 1562867880,
  'licenseId': 'PUBLIC_DOMAIN',
  'locale': 'en_US',
  'newBackend': True,
  'numberOfComments': 0,
  'oid': 31914420,
  'provenance': 'official',
  'publicationAppendEnabled': False,
  'publicationDate': 1562867928,
  'publicationGroup': 15065876,
  'publicationStage': 'published',
  'rowsUpdatedAt': 1564242562,
  'rowsUpdatedBy': '5wys-t5s3',
  'tableId': 16352451,
  'totalTimesRated': 0,
  'viewCount': 2309,
  'viewLastModified': 1562867928,
  'viewType': 

In [9]:
#a look into the data key
#seattleprops['data']

[['row-d3wj~gyzj~in8j',
  '00000000-0000-0000-42B9-E7CFDE00AA06',
  0,
  1561394371,
  None,
  1561394394,
  None,
  '{ }',
  '001-0127963',
  'Rental Property',
  'Registration',
  '1',
  None,
  '2016-11-04',
  '2021-11-04',
  'Active Registration',
  '8840 37TH AVE SW',
  'SEATTLE',
  'WA',
  '98126',
  'Pug Properties 37 LLC',
  ['https://cosaccela.seattle.gov/portal/customize/LinkToRecord.aspx?altId=001-0127963',
   None],
  '47.52346484',
  '-122.37911657',
  'POINT (-122.37911657 47.52346484)'],
 ['row-z979~4by4-4mea',
  '00000000-0000-0000-889A-F32025A6AA6E',
  0,
  1561394371,
  None,
  1561394394,
  None,
  '{ }',
  '001-0130697',
  'Rental Property',
  'Registration',
  '1',
  None,
  '2017-01-31',
  '2022-01-31',
  'Active Registration',
  '8621 Renton AVE S',
  'SEATTLE',
  'WA',
  '98118',
  'Charles Lane',
  ['https://cosaccela.seattle.gov/portal/customize/LinkToRecord.aspx?altId=001-0130697',
   None],
  '47.52546474',
  '-122.27709864',
  'POINT (-122.27709864 47.52546

In [10]:
#datatype of first level
type(seattleprops['data'][0])

list

In [11]:
type(seattleprops['data'][0][0])

str

In [12]:
seattleprops['data'][0][0]

'row-d3wj~gyzj~in8j'

In [13]:
#the total number of rental properties in dataset
#there are 32,642 rental properties 
len(seattleprops['data'])

32642

### Data Exploring - CSV
Using City of Seattle's provided CSV

In [14]:
location = "seattle_Rental_Property_Registration.csv"

srentprops_df = pd.read_csv(location)
srentprops_df.head()

Unnamed: 0,RegistrationNum,RegisteredTypeMapped,RegisteredTypeDesc,RentalHousingUnits,PropertyName,RegisteredDate,ExpiresDate,StatusCurrent,OriginalAddress1,OriginalCity,OriginalState,OriginalZip,PropertyContactName,Link,Latitude,Longitude,Location1
0,001-0130088,Rental Property,Registration,1,,2017-01-06,2022-01-06,Active Registration,146 NW 74TH ST,SEATTLE,WA,98117.0,,https://cosaccela.seattle.gov/portal/customize...,47.682748,-122.359981,POINT (-122.35998077 47.68274781)
1,001-0129613,Rental Property,Registration,1,,2016-12-29,2021-12-29,Active Registration,2100 3RD AVE,SEATTLE,WA,98121.0,,https://cosaccela.seattle.gov/portal/customize...,47.613531,-122.342333,POINT (-122.3423326 47.61353093)
2,001-0129034,Rental Property,Registration,1,,2016-12-13,2021-12-13,Active Registration,5720 37TH AVE S,SEATTLE,WA,98118.0,,https://cosaccela.seattle.gov/portal/customize...,47.550597,-122.28602,POINT (-122.28602015 47.55059675)
3,001-0122782,Rental Property,Registration,1,,2016-06-26,2021-06-26,Active Registration,147 N 76TH ST,SEATTLE,WA,98103.0,,https://cosaccela.seattle.gov/portal/customize...,47.683823,-122.355983,POINT (-122.35598277 47.6838228)
4,001-0136116,Rental Property,Registration,1,,2019-05-08,2021-05-08,Active Registration,2311 43RD AVE E,SEATTLE,WA,98112.0,Marc Boyd,https://cosaccela.seattle.gov/portal/customize...,47.639513,-122.277395,POINT (-122.27739453 47.63951303)


In [15]:
#see column header names
srentprops_df.columns

Index(['RegistrationNum', 'RegisteredTypeMapped', 'RegisteredTypeDesc',
       'RentalHousingUnits', 'PropertyName', 'RegisteredDate', 'ExpiresDate',
       'StatusCurrent', 'OriginalAddress1', 'OriginalCity', 'OriginalState',
       'OriginalZip', 'PropertyContactName', 'Link', 'Latitude', 'Longitude',
       'Location1'],
      dtype='object')

In [16]:
#missing values
srentprops_df.isnull().sum()

RegistrationNum             0
RegisteredTypeMapped        0
RegisteredTypeDesc          0
RentalHousingUnits          0
PropertyName            24002
RegisteredDate              0
ExpiresDate                 0
StatusCurrent               0
OriginalAddress1            0
OriginalCity              532
OriginalState             532
OriginalZip               682
PropertyContactName      3943
Link                        0
Latitude                  350
Longitude                 350
Location1                 350
dtype: int64

In [17]:
#rows where OriginalZip is null
missing_zipcode = srentprops_df[srentprops_df['OriginalZip'].isnull()]
missing_zipcode.head(10)

Unnamed: 0,RegistrationNum,RegisteredTypeMapped,RegisteredTypeDesc,RentalHousingUnits,PropertyName,RegisteredDate,ExpiresDate,StatusCurrent,OriginalAddress1,OriginalCity,OriginalState,OriginalZip,PropertyContactName,Link,Latitude,Longitude,Location1
6,001-0135141,Rental Property,Registration,1,,2019-04-18,2021-04-18,Active Registration,1214 E Hamlin ST,,,,Andrea Jacobi,https://cosaccela.seattle.gov/portal/customize...,47.645789,-122.315878,POINT (-122.315878 47.64578913)
7,001-0135816,Rental Property,Registration,1,,2019-03-16,2021-03-16,Active Registration,516 N 62nd ST,,,,Laurie Milligan,https://cosaccela.seattle.gov/portal/customize...,47.674134,-122.352152,POINT (-122.35215202 47.67413375)
10,001-0134846,Rental Property,Registration,2,4220 Bagley Ave N,2018-10-14,2023-10-14,Active Registration,4220 Bagley AVE N,,,,,https://cosaccela.seattle.gov/portal/customize...,47.65861,-122.332352,POINT (-122.33235174 47.65860961)
11,001-0134074,Rental Property,Registration,1,,2018-06-11,2023-06-11,Active Registration,2911 2nd AVE,,,,Maple Leaf Property Management,https://cosaccela.seattle.gov/portal/customize...,47.617548,-122.35226,POINT (-122.35226048 47.61754767)
12,001-0134099,Rental Property,Registration,2,Concord Condominium,2018-06-13,2023-06-13,Active Registration,2929 1st AVE,,,,,https://cosaccela.seattle.gov/portal/customize...,47.617331,-122.353778,POINT (-122.35377814 47.61733149)
16,001-0134961,Rental Property,Registration,1,San Villa Condominiums,2018-10-31,2023-10-31,Active Registration,9520 1st AVE NE,,,,Jon RINKER,https://cosaccela.seattle.gov/portal/customize...,47.698343,-122.32789,POINT (-122.32788992 47.69834301)
19,001-0135251,Rental Property,Registration,2,,2018-12-31,2023-12-31,Active Registration,10610 Bagley AVE N,,,,,https://cosaccela.seattle.gov/portal/customize...,47.706301,-122.33232,POINT (-122.33232029 47.70630089)
20,001-0134421,Rental Property,Registration,1,Portal,2018-08-03,2023-08-03,Active Registration,655 Crockett ST,,,,Amanjot Singh,https://cosaccela.seattle.gov/portal/customize...,47.637099,-122.343811,POINT (-122.34381127 47.63709868)
22,001-0134925,Rental Property,Registration,1,Harbour Heights,2018-10-23,2023-10-23,Active Registration,2621 2nd AVE,,,,,https://cosaccela.seattle.gov/portal/customize...,47.616126,-122.349747,POINT (-122.34974746 47.61612629)
23,001-0126582,Rental Property,Registration,1,,2016-12-08,2021-12-08,Active Registration,5102 27th AVE,SEATTLE,WA,,Kok Lan Cheung,https://cosaccela.seattle.gov/portal/customize...,,,


In [18]:
#rows where original city, state, and zip is null
missing_csz = srentprops_df[(srentprops_df['OriginalCity'].isnull())&(srentprops_df['OriginalState'].isnull())&(srentprops_df['OriginalZip'].isnull())]
missing_csz.head(10)

Unnamed: 0,RegistrationNum,RegisteredTypeMapped,RegisteredTypeDesc,RentalHousingUnits,PropertyName,RegisteredDate,ExpiresDate,StatusCurrent,OriginalAddress1,OriginalCity,OriginalState,OriginalZip,PropertyContactName,Link,Latitude,Longitude,Location1
6,001-0135141,Rental Property,Registration,1,,2019-04-18,2021-04-18,Active Registration,1214 E Hamlin ST,,,,Andrea Jacobi,https://cosaccela.seattle.gov/portal/customize...,47.645789,-122.315878,POINT (-122.315878 47.64578913)
7,001-0135816,Rental Property,Registration,1,,2019-03-16,2021-03-16,Active Registration,516 N 62nd ST,,,,Laurie Milligan,https://cosaccela.seattle.gov/portal/customize...,47.674134,-122.352152,POINT (-122.35215202 47.67413375)
10,001-0134846,Rental Property,Registration,2,4220 Bagley Ave N,2018-10-14,2023-10-14,Active Registration,4220 Bagley AVE N,,,,,https://cosaccela.seattle.gov/portal/customize...,47.65861,-122.332352,POINT (-122.33235174 47.65860961)
11,001-0134074,Rental Property,Registration,1,,2018-06-11,2023-06-11,Active Registration,2911 2nd AVE,,,,Maple Leaf Property Management,https://cosaccela.seattle.gov/portal/customize...,47.617548,-122.35226,POINT (-122.35226048 47.61754767)
12,001-0134099,Rental Property,Registration,2,Concord Condominium,2018-06-13,2023-06-13,Active Registration,2929 1st AVE,,,,,https://cosaccela.seattle.gov/portal/customize...,47.617331,-122.353778,POINT (-122.35377814 47.61733149)
16,001-0134961,Rental Property,Registration,1,San Villa Condominiums,2018-10-31,2023-10-31,Active Registration,9520 1st AVE NE,,,,Jon RINKER,https://cosaccela.seattle.gov/portal/customize...,47.698343,-122.32789,POINT (-122.32788992 47.69834301)
19,001-0135251,Rental Property,Registration,2,,2018-12-31,2023-12-31,Active Registration,10610 Bagley AVE N,,,,,https://cosaccela.seattle.gov/portal/customize...,47.706301,-122.33232,POINT (-122.33232029 47.70630089)
20,001-0134421,Rental Property,Registration,1,Portal,2018-08-03,2023-08-03,Active Registration,655 Crockett ST,,,,Amanjot Singh,https://cosaccela.seattle.gov/portal/customize...,47.637099,-122.343811,POINT (-122.34381127 47.63709868)
22,001-0134925,Rental Property,Registration,1,Harbour Heights,2018-10-23,2023-10-23,Active Registration,2621 2nd AVE,,,,,https://cosaccela.seattle.gov/portal/customize...,47.616126,-122.349747,POINT (-122.34974746 47.61612629)
24,001-0135417,Rental Property,Registration,1,,2019-01-23,2021-01-23,Active Registration,4643 Eastern AVE N,,,,,https://cosaccela.seattle.gov/portal/customize...,47.663877,-122.329386,POINT (-122.32938582 47.6638771)


In [19]:
#number of rows with city, state, and zip were missing
len(missing_csz)

532

In [20]:
#other missing values in the missing_csz rows
#about half of them are missing latitude, longitude
#we can use the rows with available latitude and longitude values to find the missing city/state/zipcode values
missing_csz.isnull().sum()

RegistrationNum           0
RegisteredTypeMapped      0
RegisteredTypeDesc        0
RentalHousingUnits        0
PropertyName            295
RegisteredDate            0
ExpiresDate               0
StatusCurrent             0
OriginalAddress1          0
OriginalCity            532
OriginalState           532
OriginalZip             532
PropertyContactName      83
Link                      0
Latitude                248
Longitude               248
Location1               248
dtype: int64

In [21]:
#rows in missing_csz variable, but has longtitude and latitude values
missing_csz_ll = srentprops_df[(srentprops_df['OriginalCity'].isnull())&(srentprops_df['OriginalState'].isnull())
                            &(srentprops_df['OriginalZip'].isnull())&(srentprops_df['Latitude'].notnull())
                            &(srentprops_df['Longitude'].notnull())]
missing_csz_ll.head()

Unnamed: 0,RegistrationNum,RegisteredTypeMapped,RegisteredTypeDesc,RentalHousingUnits,PropertyName,RegisteredDate,ExpiresDate,StatusCurrent,OriginalAddress1,OriginalCity,OriginalState,OriginalZip,PropertyContactName,Link,Latitude,Longitude,Location1
6,001-0135141,Rental Property,Registration,1,,2019-04-18,2021-04-18,Active Registration,1214 E Hamlin ST,,,,Andrea Jacobi,https://cosaccela.seattle.gov/portal/customize...,47.645789,-122.315878,POINT (-122.315878 47.64578913)
7,001-0135816,Rental Property,Registration,1,,2019-03-16,2021-03-16,Active Registration,516 N 62nd ST,,,,Laurie Milligan,https://cosaccela.seattle.gov/portal/customize...,47.674134,-122.352152,POINT (-122.35215202 47.67413375)
10,001-0134846,Rental Property,Registration,2,4220 Bagley Ave N,2018-10-14,2023-10-14,Active Registration,4220 Bagley AVE N,,,,,https://cosaccela.seattle.gov/portal/customize...,47.65861,-122.332352,POINT (-122.33235174 47.65860961)
11,001-0134074,Rental Property,Registration,1,,2018-06-11,2023-06-11,Active Registration,2911 2nd AVE,,,,Maple Leaf Property Management,https://cosaccela.seattle.gov/portal/customize...,47.617548,-122.35226,POINT (-122.35226048 47.61754767)
12,001-0134099,Rental Property,Registration,2,Concord Condominium,2018-06-13,2023-06-13,Active Registration,2929 1st AVE,,,,,https://cosaccela.seattle.gov/portal/customize...,47.617331,-122.353778,POINT (-122.35377814 47.61733149)


In [22]:
#number of rows the missing_csz variable, but has longtitude and latitude
len(missing_csz_ll)

284

In [23]:
#rows where there are missing values for zip code but there are values for city and state
missingzip_notcs = srentprops_df[(srentprops_df['OriginalCity'].notnull())&(srentprops_df['OriginalState'].notnull())&(srentprops_df['OriginalZip'].isnull())]
missingzip_notcs

Unnamed: 0,RegistrationNum,RegisteredTypeMapped,RegisteredTypeDesc,RentalHousingUnits,PropertyName,RegisteredDate,ExpiresDate,StatusCurrent,OriginalAddress1,OriginalCity,OriginalState,OriginalZip,PropertyContactName,Link,Latitude,Longitude,Location1
23,001-0126582,Rental Property,Registration,1,,2016-12-08,2021-12-08,Active Registration,5102 27th AVE,SEATTLE,WA,,Kok Lan Cheung,https://cosaccela.seattle.gov/portal/customize...,,,
445,001-0134169,Rental Property,Registration,1,,2018-09-21,2023-09-21,Active Registration,588 Bell ST,SEATTLE,WA,,Joe Horgan,https://cosaccela.seattle.gov/portal/customize...,47.616730,-122.343382,POINT (-122.34338244 47.61672999)
627,001-0119417,Rental Property,Registration,1,,2016-04-06,2021-04-06,Active Registration,1103 M L KING JR WAY,SEATTLE,WA,,Michelle Huang,https://cosaccela.seattle.gov/portal/customize...,,,
795,001-0133331,Rental Property,Registration,1,,2018-02-16,2023-02-16,Active Registration,9239 A 35th,SEATTLE,WA,,Susan Giroux,https://cosaccela.seattle.gov/portal/customize...,,,
840,001-0134339,Rental Property,Registration,1,,2018-08-24,2023-08-24,Active Registration,701 1st AVE N,SEATTLE,WA,,Fletcher-McGookin Community Property Trust,https://cosaccela.seattle.gov/portal/customize...,47.625634,-122.355979,POINT (-122.35597876 47.62563431)
954,001-0135187,Rental Property,Registration,1,,2019-01-16,2021-01-16,Active Registration,7333 22ND AVE NW,SEATTLE,WA,,john yeung,https://cosaccela.seattle.gov/portal/customize...,47.682296,-122.384583,POINT (-122.38458282 47.68229568)
1121,001-0134457,Rental Property,Registration,1,,2018-08-23,2023-08-23,Active Registration,1936 NE 127th ST,SEATTLE,WA,,James McGuire,https://cosaccela.seattle.gov/portal/customize...,47.721431,-122.307583,POINT (-122.30758286 47.72143076)
1235,001-0134286,Rental Property,Registration,1,,2018-08-02,2023-08-02,Active Registration,1701 N 36th ST,SEATTLE,WA,,SHIRLEY BARGER,https://cosaccela.seattle.gov/portal/customize...,47.650341,-122.337212,POINT (-122.33721191 47.65034085)
1242,001-0132564,Rental Property,Registration,1,,2018-05-24,2023-05-24,Active Registration,2201 3RD AVENUE,SEATTLE,WA,,SJA Property Management,https://cosaccela.seattle.gov/portal/customize...,,,
1580,001-0134073,Rental Property,Registration,1,,2018-08-02,2023-08-02,Active Registration,438 malden AVE E,SEATTLE,WA,,Wanwirote Varophas,https://cosaccela.seattle.gov/portal/customize...,47.622841,-122.313474,POINT (-122.31347424 47.62284107)


In [24]:
#there are 150 rows that have a missing zip code but have values for city and state
len(missingzip_notcs)

150

In [25]:
#the other missing values under missingzip_notcs
missingzip_notcs.isnull().sum()

RegistrationNum           0
RegisteredTypeMapped      0
RegisteredTypeDesc        0
RentalHousingUnits        0
PropertyName            143
RegisteredDate            0
ExpiresDate               0
StatusCurrent             0
OriginalAddress1          0
OriginalCity              0
OriginalState             0
OriginalZip             150
PropertyContactName       0
Link                      0
Latitude                 55
Longitude                55
Location1                55
dtype: int64

In [26]:
#all rows with missing values in the dataframe
srentprops_df.isnull().sum()

RegistrationNum             0
RegisteredTypeMapped        0
RegisteredTypeDesc          0
RentalHousingUnits          0
PropertyName            24002
RegisteredDate              0
ExpiresDate                 0
StatusCurrent               0
OriginalAddress1            0
OriginalCity              532
OriginalState             532
OriginalZip               682
PropertyContactName      3943
Link                        0
Latitude                  350
Longitude                 350
Location1                 350
dtype: int64

Missing Data Recap: 
- There are 682 rows where zip code is missing
- Of those 682 rows, 532 of them also have missing city and state values (missing_csz)
- Of these 532, 284 rows have longitude and latitude values (missing_csz_ll)
- Of those 682 rows, 150 of them of them have values for city and state columns (missingzip_notcs)

Above, we see that there are no missing values for original address. Could we use this to find the missing values for city, zip code, and latitude/longitude?

### Data Cleaning 

In [27]:
#Because there is one address in "Mill Creek" (a suburb outside Seattle), we cannot assume all locations are in Seattle proper
srentprops_df.OriginalCity.unique()

array(['SEATTLE', nan, 'Seattle', 'seattle', 'mill creek'], dtype=object)

In [28]:
#value counts per unique value
srentprops_df.OriginalCity.value_counts()

SEATTLE       32030
Seattle          69
seattle          10
mill creek        1
Name: OriginalCity, dtype: int64

In [29]:
#Changing format of "mill creek"
srentprops_df['OriginalCity'].replace('mill creek', 'Mill Creek', inplace=True)

In [30]:
#We will assume properties are in Washington State
srentprops_df.OriginalState.unique()

array(['WA', nan], dtype=object)

In [31]:
#replace missing original state values with "WA"
srentprops_df['OriginalState'].fillna('WA', inplace=True)

In [32]:
#checking if all original state missing values were filled
srentprops_df.isnull().sum()

RegistrationNum             0
RegisteredTypeMapped        0
RegisteredTypeDesc          0
RentalHousingUnits          0
PropertyName            24002
RegisteredDate              0
ExpiresDate                 0
StatusCurrent               0
OriginalAddress1            0
OriginalCity              532
OriginalState               0
OriginalZip               682
PropertyContactName      3943
Link                        0
Latitude                  350
Longitude                 350
Location1                 350
dtype: int64

In [33]:
#keeping "Seattle" consistent
srentprops_df['OriginalCity'].replace('SEATTLE', 'Seattle', inplace=True)

In [34]:
#keeping "Seattle" consistent
srentprops_df['OriginalCity'].replace('seattle', 'Seattle', inplace=True)

In [35]:
#checking unique city values
srentprops_df.OriginalCity.unique()

array(['Seattle', nan, 'Mill Creek'], dtype=object)

In [36]:
#change "location1" header name to be more descriptive
srentprops_df.rename(columns={"Location1": "Cooridinates"}, inplace=True)

In [37]:
#Cooridinates already in geocode
type(srentprops_df.Cooridinates)

pandas.core.series.Series

In [38]:
#####reference geocode module to fix geo dataframe
####how do we want to use machine learning for this data set? 
######KMEANS CLUSTERING OF GEOLCATION DATA

Finding missing zip code -
- combine the columns into one string column 
-find the geocode for that column 
- have it find the missing zip codes

In [46]:
missingzip_notcs_df['FullAddress'] = list(missingzip_notcs_df[['OriginalAddress1', 
                                                   'OriginalCity', 'OriginalState',
                                                   'OriginalZip']].values)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [44]:
missingzip_notcs_df =missingzip_notcs 

In [39]:
srentprops_df.columns

Index(['RegistrationNum', 'RegisteredTypeMapped', 'RegisteredTypeDesc',
       'RentalHousingUnits', 'PropertyName', 'RegisteredDate', 'ExpiresDate',
       'StatusCurrent', 'OriginalAddress1', 'OriginalCity', 'OriginalState',
       'OriginalZip', 'PropertyContactName', 'Link', 'Latitude', 'Longitude',
       'Cooridinates'],
      dtype='object')

In [52]:
#make a new column to hold the longitude & latitude as a list
srentprops_df['FullAddress'] = list(srentprops_df[['OriginalAddress1', 
                                                   'OriginalCity', 'OriginalState',
                                                   'OriginalZip']].values)

In [53]:
srentprops_df.head()

Unnamed: 0,RegistrationNum,RegisteredTypeMapped,RegisteredTypeDesc,RentalHousingUnits,PropertyName,RegisteredDate,ExpiresDate,StatusCurrent,OriginalAddress1,OriginalCity,OriginalState,OriginalZip,PropertyContactName,Link,Latitude,Longitude,Cooridinates,FullAddress
0,001-0130088,Rental Property,Registration,1,,2017-01-06,2022-01-06,Active Registration,146 NW 74TH ST,Seattle,WA,98117.0,,https://cosaccela.seattle.gov/portal/customize...,47.682748,-122.359981,POINT (-122.35998077 47.68274781),"[146 NW 74TH ST, Seattle, WA, 98117.0]"
1,001-0129613,Rental Property,Registration,1,,2016-12-29,2021-12-29,Active Registration,2100 3RD AVE,Seattle,WA,98121.0,,https://cosaccela.seattle.gov/portal/customize...,47.613531,-122.342333,POINT (-122.3423326 47.61353093),"[2100 3RD AVE, Seattle, WA, 98121.0]"
2,001-0129034,Rental Property,Registration,1,,2016-12-13,2021-12-13,Active Registration,5720 37TH AVE S,Seattle,WA,98118.0,,https://cosaccela.seattle.gov/portal/customize...,47.550597,-122.28602,POINT (-122.28602015 47.55059675),"[5720 37TH AVE S, Seattle, WA, 98118.0]"
3,001-0122782,Rental Property,Registration,1,,2016-06-26,2021-06-26,Active Registration,147 N 76TH ST,Seattle,WA,98103.0,,https://cosaccela.seattle.gov/portal/customize...,47.683823,-122.355983,POINT (-122.35598277 47.6838228),"[147 N 76TH ST, Seattle, WA, 98103.0]"
4,001-0136116,Rental Property,Registration,1,,2019-05-08,2021-05-08,Active Registration,2311 43RD AVE E,Seattle,WA,98112.0,Marc Boyd,https://cosaccela.seattle.gov/portal/customize...,47.639513,-122.277395,POINT (-122.27739453 47.63951303),"[2311 43RD AVE E, Seattle, WA, 98112.0]"


In [51]:
srentprops_df.loc[srentprops_df['FullAddress'],[srentprops_df[(srentprops_df['OriginalCity'].notnull())&(srentprops_df['OriginalState'].notnull())&(srentprops_df['OriginalZip'].isnull())]]]

SyntaxError: unexpected EOF while parsing (<ipython-input-51-fb3c4b0fcbe7>, line 1)

In [59]:
#row 1 = see row where cooridinates and zip were missing
srentprops_df.loc[srentprops_df['RegistrationNum']== "001-0126582"] 

Unnamed: 0,RegistrationNum,RegisteredTypeMapped,RegisteredTypeDesc,RentalHousingUnits,PropertyName,RegisteredDate,ExpiresDate,StatusCurrent,OriginalAddress1,OriginalCity,OriginalState,OriginalZip,PropertyContactName,Link,Latitude,Longitude,Cooridinates,FullAddress
23,001-0126582,Rental Property,Registration,1,,2016-12-08,2021-12-08,Active Registration,5102 27th AVE,Seattle,WA,,Kok Lan Cheung,https://cosaccela.seattle.gov/portal/customize...,,,,"[5102 27th AVE, Seattle, WA, nan]"


In [60]:
#row 2 = see row where city and zip were missing
srentprops_df.loc[srentprops_df['RegistrationNum']== "001-0135141"] 

Unnamed: 0,RegistrationNum,RegisteredTypeMapped,RegisteredTypeDesc,RentalHousingUnits,PropertyName,RegisteredDate,ExpiresDate,StatusCurrent,OriginalAddress1,OriginalCity,OriginalState,OriginalZip,PropertyContactName,Link,Latitude,Longitude,Cooridinates,FullAddress
6,001-0135141,Rental Property,Registration,1,,2019-04-18,2021-04-18,Active Registration,1214 E Hamlin ST,,WA,,Andrea Jacobi,https://cosaccela.seattle.gov/portal/customize...,47.645789,-122.315878,POINT (-122.315878 47.64578913),"[1214 E Hamlin ST, nan, WA, nan]"


In [74]:
#link for python mapbox doc
#https://pypi.org/project/mapbox/0.3.1/ 

In [73]:
import mapbox

geocoder = mapbox.Geocoder(access_token='pk.eyJ1IjoidHBhc2FnIiwiYSI6ImNqeXRhZGIwMTAyMmkzaG4wZGNuaGZ3NTkifQ.WRpwsiq2m41ZgMJzO4NDvQ')

In [88]:
#see row1 (cooridinates and zip were missing)
response1 = geocoder.forward('5102 27th AVE, Seattle, WA')

#response.json() returns the geocoding result as GeoJSON.

response1.json() 

{'type': 'FeatureCollection',
 'query': ['5102', '27th', 'ave', 'seattle', 'wa'],
 'features': [{'id': 'address.8289086174570868',
   'type': 'Feature',
   'place_type': ['address'],
   'relevance': 0.9333333333333333,
   'properties': {'accuracy': 'rooftop'},
   'text': '27th Avenue Northeast',
   'place_name': '5102 27th Avenue Northeast, Seattle, Washington 98105, United States',
   'center': [-122.298221, 47.666537],
   'geometry': {'type': 'Point', 'coordinates': [-122.298221, 47.666537]},
   'address': '5102',
   'context': [{'id': 'neighborhood.2106917', 'text': 'Ravenna'},
    {'id': 'postcode.18348032272132340', 'text': '98105'},
    {'id': 'place.11115494111229470', 'wikidata': 'Q5083', 'text': 'Seattle'},
    {'id': 'region.14042645959246050',
     'short_code': 'US-WA',
     'wikidata': 'Q1223',
     'text': 'Washington'},
    {'id': 'country.9053006287256050',
     'short_code': 'us',
     'wikidata': 'Q30',
     'text': 'United States'}]},
  {'id': 'address.79666793217140

In [89]:
type(response1.json())

dict

In [92]:
resp1 =response1.json()

In [93]:
resp1.keys()

dict_keys(['type', 'query', 'features', 'attribution'])

In [94]:
resp1['features']

[{'id': 'address.8289086174570868',
  'type': 'Feature',
  'place_type': ['address'],
  'relevance': 0.9333333333333333,
  'properties': {'accuracy': 'rooftop'},
  'text': '27th Avenue Northeast',
  'place_name': '5102 27th Avenue Northeast, Seattle, Washington 98105, United States',
  'center': [-122.298221, 47.666537],
  'geometry': {'type': 'Point', 'coordinates': [-122.298221, 47.666537]},
  'address': '5102',
  'context': [{'id': 'neighborhood.2106917', 'text': 'Ravenna'},
   {'id': 'postcode.18348032272132340', 'text': '98105'},
   {'id': 'place.11115494111229470', 'wikidata': 'Q5083', 'text': 'Seattle'},
   {'id': 'region.14042645959246050',
    'short_code': 'US-WA',
    'wikidata': 'Q1223',
    'text': 'Washington'},
   {'id': 'country.9053006287256050',
    'short_code': 'us',
    'wikidata': 'Q30',
    'text': 'United States'}]},
 {'id': 'address.7966679321714028',
  'type': 'Feature',
  'place_type': ['address'],
  'relevance': 0.9333333333333333,
  'properties': {'accuracy

In [95]:
type(resp1['features'])

list

In [96]:
resp1['features'][0]

{'id': 'address.8289086174570868',
 'type': 'Feature',
 'place_type': ['address'],
 'relevance': 0.9333333333333333,
 'properties': {'accuracy': 'rooftop'},
 'text': '27th Avenue Northeast',
 'place_name': '5102 27th Avenue Northeast, Seattle, Washington 98105, United States',
 'center': [-122.298221, 47.666537],
 'geometry': {'type': 'Point', 'coordinates': [-122.298221, 47.666537]},
 'address': '5102',
 'context': [{'id': 'neighborhood.2106917', 'text': 'Ravenna'},
  {'id': 'postcode.18348032272132340', 'text': '98105'},
  {'id': 'place.11115494111229470', 'wikidata': 'Q5083', 'text': 'Seattle'},
  {'id': 'region.14042645959246050',
   'short_code': 'US-WA',
   'wikidata': 'Q1223',
   'text': 'Washington'},
  {'id': 'country.9053006287256050',
   'short_code': 'us',
   'wikidata': 'Q30',
   'text': 'United States'}]}

In [86]:
# see row2 (city and zip were missing)
response2 = geocoder.reverse(lon=-122.315878, lat=47.64578913)
response2.json() 

{'type': 'FeatureCollection',
 'query': [-122.31588, 47.64579],
 'features': [{'id': 'address.6121869522502732',
   'type': 'Feature',
   'place_type': ['address'],
   'relevance': 1,
   'properties': {'accuracy': 'rooftop'},
   'text': 'East Hamlin Street',
   'place_name': '1214 East Hamlin Street, Seattle, Washington 98102, United States',
   'center': [-122.3158758, 47.6457825],
   'geometry': {'type': 'Point', 'coordinates': [-122.3158758, 47.6457825]},
   'address': '1214',
   'context': [{'id': 'neighborhood.2106228', 'text': 'Portage Bay'},
    {'id': 'postcode.7411368759978330', 'text': '98102'},
    {'id': 'place.11115494111229470', 'wikidata': 'Q5083', 'text': 'Seattle'},
    {'id': 'region.14042645959246050',
     'short_code': 'US-WA',
     'wikidata': 'Q1223',
     'text': 'Washington'},
    {'id': 'country.9053006287256050',
     'short_code': 'us',
     'wikidata': 'Q30',
     'text': 'United States'}]},
  {'id': 'neighborhood.2106228',
   'type': 'Feature',
   'place_t

In [None]:
geo_addr.head()

In [42]:
#create a geolocation dataframe type using the coordinates column as the geolocation data
geo_srprop = gpd.GeoDataFrame(srentprops_df, geometry='Cooridinates')

TypeError: Input geometry column must contain valid geometry objects.

In [None]:
geo_srprop.head()

In [None]:
#verify coordinates column is geolocation data type
type(geo_srprop['Cooridinates'])

In [None]:
srentprops_df.columns

In [None]:
#convert longitude and latitude into correct data type for analyzing; add new column
micro_geo = geocode("4200 150th Ave NE, Redmond, WA 98052", provider='nominatim')
micro_geo

In [None]:
amzn_geo = geocode("410 Terry Ave N, Seattle, WA 98109", provider='nominatim')
amzn_geo

In [None]:
#convert json to a dataframe to do analysis

In [None]:
#k-means?

In [None]:
#use api from class to find longitude and latitude of Microsoft and Amazon

In [None]:
#use Travel Time API to find apartments that are equidistant from the two offices.