In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

This notebook is meant to be a venue for looking at the housing datasets for my General Assembly capstone project.

In this project I hope to analyze and target the housing (rental?) prices for San Francisco (and maybe another city). Instead of simply using housing prices, whether asking/sale prices or posted rental rates, I will try to incorporate the wealth of other information from neighborhoods to factor in a more intelligent pricing scheme. In other words, given a price you can afford, where can you get the best deal. This will try to take into account the specific desires of the buyer/renter specific to regions; i.e. is there a higher preference for living near good food, parks, bars, public transit stops, et cetera?

Below we look through some of the datasets publicly availabe to see what features we use in this project. In most cases it is advantageous to have data points connected to geography - latitude/longitude coordinates included in datasets.

## Airbnb data

In [12]:
bnb_cal = pd.read_csv('../project_data/airbnb_data/calendar.csv')
bnb_listings_sum = pd.read_csv('../project_data/airbnb_data/listings_sum.csv')
bnb_listings = pd.read_csv('../project_data/airbnb_data/listings.csv')
bnb_reviews_sum = pd.read_csv('../project_data/airbnb_data/reviews_sum.csv')
bnb_reviews = pd.read_csv('../project_data/airbnb_data/reviews.csv')

In [6]:
print bnb_cal.shape
bnb_cal.head()

(3145935, 4)


Unnamed: 0,listing_id,date,available,price
0,11187767,2017-03-12,f,
1,11187767,2017-03-11,f,
2,11187767,2017-03-10,f,
3,11187767,2017-03-09,f,
4,11187767,2017-03-08,f,


In [9]:
print bnb_listings.shape
bnb_listings.head()

(8619, 95)


Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,...,review_scores_value,requires_license,license,jurisdiction_names,instant_bookable,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,reviews_per_month
0,11187767,https://www.airbnb.com/rooms/11187767,20160702162156,2016-07-02,Huge Seacliff Penthouse With Views,Amazing views from this immense 3 bedroom/ 2 b...,"Spacious patio deck, library with over 300 mov...",Amazing views from this immense 3 bedroom/ 2 b...,none,"The neighborhood includes Legion of Honor, Sut...",...,8.0,t,,SAN FRANCISCO,t,strict,f,f,1,0.88
1,6938818,https://www.airbnb.com/rooms/6938818,20160702162156,2016-07-02,Best Secret in Town,My two story house is located in the quite sid...,The room is spacious and it is on the top leve...,My two story house is located in the quite sid...,none,It is in the city and close to everything. Par...,...,9.0,t,S. F. Short-Term Residential Rental Registrati...,SAN FRANCISCO,f,strict,f,f,2,1.85
2,9395222,https://www.airbnb.com/rooms/9395222,20160702162156,2016-07-02,"Ocean Beach, Lands End Escape",Come relax in our 1 bedroom 1 bath house minut...,Huge backyard with a FirePit and lounge chairs...,Come relax in our 1 bedroom 1 bath house minut...,none,,...,9.0,t,,SAN FRANCISCO,t,strict,f,f,1,5.87
3,8388658,https://www.airbnb.com/rooms/8388658,20160702162156,2016-07-02,Mid-century Seacliff near GG Bridge,Three-bedroom in exclusive Seacliff neighborho...,,Three-bedroom in exclusive Seacliff neighborho...,none,,...,,t,,SAN FRANCISCO,f,flexible,f,f,1,
4,7856443,https://www.airbnb.com/rooms/7856443,20160702162156,2016-07-02,The Real San Francisco #2,"Minutes to GG Bridge, GG Park, Museums, The Pr...",Following the wild popularity of our downstair...,"Minutes to GG Bridge, GG Park, Museums, The Pr...",none,"For those who love the outdoors and good food,...",...,10.0,t,,SAN FRANCISCO,f,strict,f,f,2,2.45


In [10]:
print bnb_listings_sum.shape
bnb_listings_sum.head()

(8619, 16)


Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,11187767,Huge Seacliff Penthouse With Views,58242037,Jonathan,,Seacliff,37.779685,-122.510472,Entire home/apt,400,2,3,2016-06-25,0.88,1,152
1,6938818,Best Secret in Town,36381578,Harris,,Seacliff,37.780659,-122.505635,Private room,99,3,23,2016-06-16,1.85,2,187
2,9395222,"Ocean Beach, Lands End Escape",25963295,Tyler,,Seacliff,37.781433,-122.505179,Entire home/apt,155,1,45,2016-06-19,5.87,1,167
3,8388658,Mid-century Seacliff near GG Bridge,9996441,Howard,,Seacliff,37.787664,-122.489152,Entire home/apt,895,3,0,,,1,0
4,7856443,The Real San Francisco #2,6076870,Todd And Tatyana,,Seacliff,37.782133,-122.49273,Entire home/apt,195,2,4,2016-06-22,2.45,2,23


In [11]:
print bnb_reviews.shape
bnb_reviews.head()

(169739, 6)


Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,8308113,47419975,2015-09-18,6288054,Colin,Host was excellent and was contactable / respo...
1,3320213,14805088,2014-06-26,17101713,Faye,The place was clean and spacious and the guy t...
2,3320213,17192435,2014-08-08,182792,Jocelyne & Pontus,This place wasn't as pleasant as we hoped. Had...
3,3320213,17597907,2014-08-14,19649465,Francesco,Si tratta di una piccola pensione/albergo.\nSt...
4,3320213,18293826,2014-08-24,1104165,Daniel,"Nice room, cool neighbourhood. Note that this ..."


In [13]:
print bnb_reviews_sum.shape
bnb_reviews_sum.head()

(169739, 2)


Unnamed: 0,listing_id,date
0,8308113,2015-09-18
1,3320213,2014-06-26
2,3320213,2014-08-08
3,3320213,2014-08-14
4,3320213,2014-08-24


In [14]:
bnb_reviews_sum.date.value_counts()

2015-09-21    569
2016-05-30    491
2016-05-15    489
2015-08-07    487
2016-06-10    486
2016-05-23    465
2016-05-22    464
2015-09-18    461
2016-06-09    459
2016-06-17    453
2016-05-16    452
2016-05-08    445
2016-04-11    445
2016-02-08    442
2016-02-15    438
2015-10-24    438
2016-03-17    432
2016-06-18    429
2016-06-13    420
2016-05-28    414
2015-09-19    412
2016-04-04    406
2015-08-10    405
2016-05-18    404
2016-04-01    402
2016-03-19    401
2016-06-19    397
2016-03-20    396
2016-05-31    392
2016-05-21    391
             ... 
2011-03-23      1
2011-03-24      1
2011-10-15      1
2010-10-30      1
2009-09-14      1
2011-02-27      1
2010-07-31      1
2010-07-30      1
2009-09-11      1
2009-09-12      1
2010-09-03      1
2009-10-03      1
2009-10-05      1
2010-10-16      1
2009-10-08      1
2009-07-15      1
2010-02-13      1
2010-02-17      1
2010-03-30      1
2010-02-15      1
2010-02-14      1
2010-02-18      1
2011-01-12      1
2011-03-02      1
2010-07-07

## SF city data

### 311 records

In [15]:
rec_311 = pd.read_csv('../project_data/sf_city_data/311_case_records/311_records.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [16]:
print rec_311.shape
rec_311.head()

(1821839, 16)


Unnamed: 0,CaseID,Opened,Closed,Updated,Status,Status Notes,Responsible Agency,Category,Request Type,Request Details,Address,Supervisor District,Neighborhood,Point,Source,Media URL
0,322571,11/30/2008 11:53:00 PM,12/01/2008 03:19:00 PM,12/01/2008 03:19:00 PM,Closed,,PUC - Electric/Power - G,General Requests,puc - electric - request_for_service,puc - electric - request_for_service,Intersection of 21ST ST and CAPP ST,9,Mission,"(37.7571008516766, -122.417811874214)",Voice In,
1,322568,11/30/2008 11:13:00 PM,07/21/2009 04:24:00 PM,07/21/2009 04:24:00 PM,Closed,,DPW Ops Queue,Illegal Postings,Illegal Postings - Posting_Too_Large_in_Size,Posting_Too_Large_in_Size on Sidewalk,Intersection of BUSH ST and VAN NESS AVE,3,Nob Hill,"(37.7884895281133, -122.421948485141)",Voice In,
2,322567,11/30/2008 11:07:00 PM,12/27/2008 06:07:00 AM,12/27/2008 06:07:00 AM,Closed,,DPW Ops Queue,Illegal Postings,Illegal Postings - Affixed_Improperly,Affixed_Improperly on Sidewalk,Intersection of EUCLID AVE and MASONIC AVE,2,Western Addition,"(37.7850837365507, -122.447620029034)",Voice In,
3,322566,11/30/2008 10:56:00 PM,07/21/2009 04:24:00 PM,07/21/2009 04:24:00 PM,Closed,,DPW Ops Queue,Street and Sidewalk Cleaning,Sidewalk_Cleaning,Garbage,"1566 HYDE ST, SAN FRANCISCO, CA, 94109",3,Nob Hill,"(37.795328529, -122.418067787)",Voice In,
4,322565,11/30/2008 10:46:00 PM,12/13/2008 10:50:00 AM,12/13/2008 10:50:00 AM,Closed,,RPD Park Service Area GGP Queue,Rec and Park Requests,Park - Structural_Maintenance,Other,"GGP Panhandle, SAN FRANCISCO, CA, 94117",5,Haight Ashbury,"(37.772204762, -122.4487004)",Voice In,


Data back to 2008 - need to double check this after converting date ranges to datetime objects.

In [17]:
rec_311['Request Type'].value_counts()

Sidewalk_Cleaning                                                                                171694
Bulky Items                                                                                      144358
General Cleaning                                                                                 143973
Not_Offensive Graffiti on Private Property                                                       100851
Offensive Graffiti on Public Property                                                             95348
Not_Offensive Graffiti on Public Property                                                         77386
Hazardous Materials                                                                               63864
Damaged Parking_Meter                                                                             52767
Illegal_Dumping                                                                                   44598
Abandoned Vehicle - Car4door                                    

Lots of geolocated data to play around with...

### Parks

'Park_info.csv' contains geolocated information about the many parks in San Francisco. 

'park_scores.csv' contains scores for the various parks. Need to figure out exactly how these scores are given, but can like with the geolocations of the actual parks to see which parks people may like more - these would be better to live next to, if it's important to you.

In [18]:
park_info = pd.read_csv('../project_data/sf_city_data/parks/park_info.csv')
park_scores = pd.read_csv('../project_data/sf_city_data/parks/park_scores.csv')

In [19]:
print park_info.shape
park_info.head()

(230, 12)


Unnamed: 0,ParkName,ParkType,ParkServiceArea,PSAManager,email,Number,Zipcode,Acreage,SupDist,ParkID,Location 1,Lat
0,ParkName,ParkType,ParkServiceArea,PSAManager,email,Number,,,,,,
1,10TH AVE/CLEMENT MINI PARK,Mini Park,PSA 1,"Elder, Steve",steven.elder@sfgov.org,(415) 601-6501,94118.0,0.66,1.0,156.0,"351 9th Ave\nSan Francisco, CA\n(37.78184397, ...",
2,15TH AVENUE STEPS,Mini Park,PSA 4,"Sheehy, Chuck",charles.sheehy@sfgov.org,(415) 218-2226,94122.0,0.26,7.0,185.0,"15th Ave b w Kirkham\nSan Francisco, CA\n(37.7...",
3,24TH/YORK MINI PARK,Mini Park,PSA 6,"Field, Adrian",adrian.field@sfgov.org,(415) 717-2872,94110.0,0.12,9.0,51.0,"24th\nSan Francisco, CA\n(37.75306042, -122.40...",
4,29TH/DIAMOND OPEN SPACE,Neighborhood Park or Playground,PSA 5,"O'Brien, Teresa",teresa.o'brien@sfgov.org,(415) 819-2699,94131.0,0.82,8.0,194.0,"Diamond\nSan Francisco, CA\n(37.74360211, -122...",


In [20]:
print park_scores.shape
park_scores.head()

(5495, 5)


Unnamed: 0,ParkID,PSA,Park,FQ,Score
0,86,PSA4,Carl Larsen Park,FY05Q3,0.795
1,13,PSA4,Junipero Serra Playground,FY05Q3,0.957
2,9,PSA4,Rolph Nicol Playground,FY05Q3,0.864
3,117,PSA2,Alamo Square,FY05Q4,0.857
4,60,PSA6,Jose Coronado Playground,FY05Q4,0.859


### SF crime data