# Data Description 

### As Grace is planning to open a restaurant in San Francisco, and assuming that she is going to rent a place. So first I took the rent dataset from (https://www.zillow.com/research/data/) according to neighborhood wise, so that it's easy for us to check the rent data neighborhood wise. In this dataset I couldn't get all neighborhoods rent information. So I managed to use only those information which I could get from the website. I have cleaned the dataset and I'm going to compare the rent data of only year 2018. Because for this project we just need to analyse the current rent range.

In [2]:
import pandas as pd
url='https://raw.githubusercontent.com/Auxilin/The-Battle-of-Neighborhoods/master/SFRentDataset.csv'
readfile = pd.read_csv(url)
readfile.head(13)

Unnamed: 0,NeighborhoodName,City,State,CountyName,2018-1,2018-2,2018-3,2018-04,2018-05,2018-06
0,Mission District,San Francisco,CA,San Francisco,3200.0,3250.0,3195.0,3403.0,3481.0,3350.0
1,Downtown,San Francisco,CA,San Francisco,2855.0,2950.0,3090.0,3030.0,3095.0,3095.0
2,Pacific Heights,San Francisco,CA,San Francisco,3500.0,3497.5,3595.0,3695.0,3719.5,3749.0
3,Nob Hill,San Francisco,CA,San Francisco,2862.5,2800.0,2795.0,2795.0,2795.0,2700.0
4,South of Market,San Francisco,CA,San Francisco,3550.0,3709.0,3729.0,3665.0,3667.5,3650.0
5,Russian Hill,San Francisco,CA,San Francisco,3495.0,3447.5,3345.0,3474.0,3395.0,3350.0
6,Hayes Valley,San Francisco,CA,San Francisco,3965.0,4205.0,3985.0,4050.0,4175.0,4130.0
7,Marina,San Francisco,CA,San Francisco,3300.0,3286.5,3274.0,3300.0,3400.0,3400.0
8,Lower Pacific Heights,San Francisco,CA,San Francisco,3200.0,3225.0,3307.0,3358.0,3296.0,3275.0
9,South Beach,San Francisco,CA,San Francisco,3800.0,3900.0,3897.5,3900.0,3950.0,3925.0


 ### Since I dont have all neighborhood's rent data, I planned to test only for the data I have retrieved. 
 ### I'm going to use a formula to find which neighborhood is good to open a new restaurant. Before coming up with a formula, I was wondering what all attributes/factors can we consider because it's really unfair to compare data of 10 years old restaurant with 1 year old restaurant. Like for example, the checkin count of 10 years old restaurant  will be more compared to a 1 year old or 6 months old restaurant. And also after analysing data I found that in many restaurants checkin count is zero. I thought checkin count would be really be helpful to figureout the number of crowd visiting a particular neighborhood, but because of data discrepancy I avoided it. Then I conclued that every restaurant would definitely have ratings. Even if 100 customers have visited a 1 year old restaurant, the rating will be out of 5 stars, and same goes for a 10year old restaurant. 

### Below is the formula for the solution.
#### finalScore = (rentScore)*0.6+(ratingScore)*0.4
### Closer the finalScore value to 1, better choice of neighborhood for the client to decide. I have given more weightage to rent than rating. Let's see what isrentScore and ratingScore means. 
### rentScore can be calculated by (maxrentofneighborhood-averagerentofneighborhood)/(maxrentofneighborhood-minrentofneighborhood). rentScore value for each neighborhood can be obtained from above rent dataset.
### ratingScore can be calculated by (maxrating-restaurantrating)/(maxrating-minrating). ratingScore can be obtained using foursquare location. Below are the steps to get ratingScore.

### In order to analyse and get more information of other restaurants we need the co-ordinates of the neighborhood's. Co-ordinates or latitude & longitude can be obtained by passing the NeighborhoodName value through geocoding. Let's import the package and find co-ordinates of first Neighborhood from above rent dataset.

In [3]:
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    geographiclib: 1.49-py_0   conda-forge
    geopy:         1.16.0-py_0 conda-forge

geographiclib- 100% |################################| Time: 0:00:00 535.79 kB/s
geopy-1.16.0-p 100% |################################| Time: 0:00:00 819.88 kB/s


In [4]:
address = 'Mission District, San Francisco, CA'
geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('Latitude and longitude of '+address+' are {}, {}.'.format(latitude, longitude))



Latitude and longitude of Mission District, San Francisco, CA are 37.75993, -122.4191376.


### Now once we got the neighborhood's latitude and longitude, let's use Foursquare Location to get the Mission District Neighborhood's all restaurant details. The restaurant details can be retrieved using search endpoint. For our project we need only Indian restaurant data, and in search endpoint there is a attribute called category id, i.e for each category(like Indian or Italian or Mexican Restaurant) foursquare has a defined categoryid which will help us to get the desired data.

In [5]:
# The code was removed by Watson Studio for sharing.

contains client id and password


In [6]:
categoryId= '4bf58dd8d48988d10f941735'   #categoryID for Indian restaurant.
radius=1000                              # radius in meters

url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&categoryId={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius,
    categoryId)


In [7]:
#Send the GET request and examine the resutls
import requests 
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5b824a78dd5797049414562a'},
 'response': {'confident': True,
  'venues': [{'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/indian_',
       'suffix': '.png'},
      'id': '4bf58dd8d48988d10f941735',
      'name': 'Indian Restaurant',
      'pluralName': 'Indian Restaurants',
      'primary': True,
      'shortName': 'Indian'}],
    'delivery': {'id': '720691',
     'provider': {'icon': {'name': '/delivery_provider_grubhub_20180129.png',
       'prefix': 'https://igx.4sqi.net/img/general/cap/',
       'sizes': [40, 50]},
      'name': 'grubhub'},
     'url': 'https://www.grubhub.com/restaurant/lotus-sf-indian-cuisine-2434-mission-st-san-francisco/720691?affiliate=1131&utm_source=foursquare-affiliate-network&utm_medium=affiliate&utm_campaign=1131&utm_content=720691'},
    'hasPerk': False,
    'id': '5a56d8294b78c509a7e462e7',
    'location': {'address': '2434 Mission St',
     'cc': 'US',
     'city': 'San Francisco',


### Now we got  all the restaurants details in Mission District which is in 1000 meters radius of the neighborhood in json format. Lets check how many restaurants are there in json response.

In [8]:
Restaurants= len(results['response']['venues'])
print ('There are',  Restaurants , 'restaurants.')

There are 18 restaurants.


### Let's also get the venue id for first restaurant from json response.

In [9]:
venueid= results['response']['venues'][0]['id']
venueid

'5a56d8294b78c509a7e462e7'

### Let's use VenueID  end point of foursquare to get venue details.

In [10]:
url2 = 'https://api.foursquare.com/v2/venues/5a56d8294b78c509a7e462e7?&client_id={}&client_secret={}&v={}&ll={},{}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude
    )
results2 = requests.get(url2).json()
results2    

{'meta': {'code': 200, 'requestId': '5b824b084434b92acad36f4d'},
 'response': {'venue': {'allowMenuUrlEdit': True,
   'attributes': {'groups': [{'count': 1,
      'items': [{'displayName': 'Price',
        'displayValue': '$$',
        'priceTier': 2}],
      'name': 'Price',
      'summary': '$$',
      'type': 'price'},
     {'count': 3,
      'items': [{'displayName': 'Reservations', 'displayValue': 'Yes'}],
      'name': 'Reservations',
      'summary': 'Reservations',
      'type': 'reservations'},
     {'count': 7,
      'items': [{'displayName': 'Credit Cards',
        'displayValue': 'Yes (incl. Discover & Visa)'}],
      'name': 'Credit Cards',
      'summary': 'Credit Cards',
      'type': 'payments'},
     {'count': 1,
      'items': [{'displayName': 'Outdoor Seating', 'displayValue': 'No'}],
      'name': 'Outdoor Seating',
      'type': 'outdoorSeating'},
     {'count': 1,
      'items': [{'displayName': 'Wi-Fi', 'displayValue': 'No'}],
      'name': 'Wi-Fi',
      'type':

In [12]:
restaurantrating= results2['response']['venue']['rating']
restaurantrating

7.0

### Like this for every neighborhood we can retrieve the rating value from json and calculate the finalScore.

## *Please note I have shown example only for one neighborhood in this data section. To execute the formula and final results for all neighborhoods I'll be using for loop. In methodology section of report I'll do all calculation part for all neighborhoods.