# Yelp Fusion API - Training Data Pull 1/15/19

## Britt Allen, Bernard Kurka, Thomas Ludlow - NY-DSI-6

Figure out how to pull `price` and supporting data directly from Yelp using *Fusion API*.  

### Resources

GitHub: 
 - https://github.com/Yelp/yelp-python
 - https://github.com/gfairchild/yelpapi *(Best library)*
  - https://github.com/gfairchild/yelpapi/blob/master/examples/examples.py

Endpoint Documentation: https://www.yelp.com/developers/documentation/v3/business

Using regular search, a location-based query is formatted like this:
`https://www.yelp.com/search?find_loc=10128`

```
My App
Client ID
ea2TodAq4YX-4W3lzSJrcA

API Key
21Pt2l8__qgIdL0ZpgYC_yWblJ_O8_vJ3_-tIybHDyuQl9oVBXAzAXQWqMmIrz7idLyc7owv4-lfSON0QjKJN4pvQei4rUQAGSZcGcVTQc4HtBseUcztUPkVrAItXHYx
```

### Libraries

In [1]:
import numpy as np
import pandas as pd
import json
import time
from yelpapi import YelpAPI


## Search query to dataframe

In [2]:
def query_to_df(loc_in, cat_in=['restaurants','shopping','localservices'], 
                sort_in='distance', limit_in=50, 
                cols=['categories','alias','city','state','zip_code','price','review_count','latitude','longitude']):
    """Available arguments:
    loc_in (str): location (zip, city, neighborhood, etc.)
    cat_in (list): categories - default is ['restaurants','shopping','localservices']
    sort_in (str): sort criterion of 'distance','best_match','review_count' - default is 'distance'
    limit_in (int): number of results to pull per category, max is 50 - default is 50
    cols (list): columns for dataframe, matching API results key names - default is
    ['categories','alias','city','state','zip_code','price','review_count','latitude','longitude']
    """
    
    # Set Yelp Fusion API Key and establish API connection
    api_key = '21Pt2l8__qgIdL0ZpgYC_yWblJ_O8_vJ3_-tIybHDyuQl9oVBXAzAXQWqMmIrz7idLyc7owv4-lfSON0QjKJN4pvQei4rUQAGSZcGcVTQc4HtBseUcztUPkVrAItXHYx'
    api_obj = YelpAPI(api_key, timeout_s=3.0)
    
    # Instantiate empty DataFrame with desired output columns
    output_df = pd.DataFrame(columns=['search_term']+cols)
    
    # Create iterable list of limit amounts <= 50 so that full limit argument is covered
    # ex. 70 -> [50,20]
    limit_list = []
    if limit_in > 50:
        req = limit_in  # req starts at limit argument and counts down by 50 until < 50
        while req > 50:
            limit_list.append(50)
            req -= 50
        limit_list.append(req)
    else:
        limit_list.append(limit_in) # if req < 50 append remaining amount to list
    
    # Loop through category argument list items
    for cat in cat_in:
        cat_df = pd.DataFrame(columns=['search_term']+cols) # Create empty DataFrame with addl col for category
        for j, limit in enumerate(limit_list): # Perform API pulls with all limits in limit_list
            
            # API call saved to json dict
            if cat=='none':
                response = api_obj.search_query(location=loc_in, sort_by=sort_in, limit=limit, offset=(j*50))
            else:
                response = api_obj.search_query(location=loc_in, categories=[cat], sort_by=sort_in, limit=limit, offset=(j*50))
            response_df = pd.DataFrame(response['businesses']) # Save business data to DataFrame
            
            # Create iteration DataFrame to process each API response (up to 50 results)
            iter_df = pd.DataFrame(columns=['search_term']+cols)
            iter_df['search_term'] = [cat for i in range(len(response_df))] # Add category value for each row

            # Iterate through each requested column argument and format for storage in output DataFrame
            for col_name in cols:
                # Convert list of categories into single comma-separated string
                if col_name == 'categories':
                    # Exception handling: not all responses include all categories
                    try:
                        for k, cell in enumerate(response_df['categories']):
                            iter_cat_str = ''
                            for d in cell:
                                iter_cat_str += str(d['alias']+', ')
                            iter_df.loc[k, 'categories'] = iter_cat_str[:-2] # Save final string, without final ', ' 
                    except:
                        pass
                elif col_name in ('city','state','zip_code'): # Access location data through 'location' key value
                    try:
                        iter_df[col_name] = [response_df['location'][i][col_name] for i in range(response_df.shape[0])]
                    except:
                        pass
                elif col_name in ('latitude','longitude'): # Access latitude/longitude through 'coordinates' key value
                    try:
                        iter_df[col_name] = [response_df['coordinates'][i][col_name] for i in range(response_df.shape[0])]
                    except:
                        pass
                else:
                    try:
                        iter_df[col_name] = response_df[col_name] # Anything else access directly
                    except:
                        pass
            cat_df = cat_df.append(iter_df)
        output_df = output_df.append(cat_df)
    output_df.index = range(output_df.shape[0])
    
    return output_df


In [55]:
test_df = query_to_df('10128', limit_in=70, cat_in=['restaurants'])

In [56]:
test_df.head()

Unnamed: 0,search_term,categories,alias,city,state,zip_code,price,review_count,latitude,longitude
0,restaurants,"catering, delis, grocery",3rd-avenue-garden-new-york,New York,NY,10128,$$,15,40.78193,-73.95194
1,restaurants,"wine_bars, southafrican, tapas",kaia-wine-bar-new-york,New York,NY,10128,$$,376,40.7819,-73.95197
2,restaurants,"japanese, korean",maroo-new-york,New York,NY,10128,$$,120,40.782476,-73.951333
3,restaurants,ramen,naruto-ramen-new-york,New York,NY,10128,$$,853,40.78117,-73.9525
4,restaurants,tradamerican,the-corner-restaurant-new-york,New York,NY,10128,$$$,13,40.78263,-73.95121


In [4]:
test_df.shape

(70, 10)

In [5]:
test_df.groupby('search_term').price.value_counts()

search_term  price
restaurants  $$       47
             $        14
             $$$       5
Name: price, dtype: int64

In [7]:
test_df.groupby('search_term').zip_code.value_counts()

search_term  zip_code
restaurants  10128       69
             10028        1
Name: zip_code, dtype: int64

## API Pull from List of ZIP codes and categories

In [17]:
zip_list = ['10128','19025']
cats = ['restaurants, shopping, localservices']

## RESET RESULTS DATAFRAME `api_data`

In [13]:
api_data = pd.DataFrame(columns=['zip','city','state','cat','pr_1','rv_1','pr_2','rv_2','pr_3','rv_3','pr_4','rv_4','avg_lat','avg_long'])


In [14]:
api_data.head()

Unnamed: 0,zip,city,state,cat,pr_1,rv_1,pr_2,rv_2,pr_3,rv_3,pr_4,rv_4,avg_lat,avg_long


In [15]:
def api_pull(zip_list, cats=[None], sort='best_match', limit=50):
    column_list = ['zip','city','state','cat',
                   'pr_1','rv_1','pr_2','rv_2',
                   'pr_3','rv_3','pr_4','rv_4',
                   'avg_lat','avg_long']
    
    api_data = pd.DataFrame(columns=column_list)
    
    for z in zip_list:
        df = query_to_df(z, cats, limit_in=limit, sort_in=sort)
        
        loop_df = pd.Series(index=column_list)

        loop_df['zip'] = z
        try:
            loop_df['city'] = df.city[0]
            loop_df['state'] = df.state[0]
        except: 
            pass

        loop_df['pr_1'] = df[df.price=='$'].shape[0]
        loop_df['rv_1'] = df[df.price=='$'].review_count.sum()
        loop_df['pr_2'] = df[df.price=='$$'].shape[0]
        loop_df['rv_2'] = df[df.price=='$$'].review_count.sum()
        loop_df['pr_3'] = df[df.price=='$$$'].shape[0]
        loop_df['rv_3'] = df[df.price=='$$$'].review_count.sum()
        loop_df['pr_4'] = df[df.price=='$$$$'].shape[0]
        loop_df['rv_4'] = df[df.price=='$$$$'].review_count.sum()

        loop_df['avg_lat'] = df.latitude.mean()
        loop_df['avg_long'] = df.longitude.mean()

        api_data = api_data.append(loop_df, ignore_index=True)
    
    api_data.zip = api_data.zip.astype(str)    
    return api_data

In [46]:
new_test = api_pull(zip_list, cats, limit=100)



# Borough ZIP pull - 100 best match, no category

In [4]:
b_zips = pd.read_csv('../Data/nyc_borough_zip.csv')

In [5]:
bk_zips = b_zips[b_zips['brooklyn'].notnull()]['brooklyn']

In [6]:
bk_zips = bk_zips.astype(str).str.split('.',expand=True)[0]

In [7]:
qn_zips = b_zips[b_zips['queens'].notnull()]['queens']

In [8]:
qn_zips = qn_zips.astype(str).str.split('.',expand=True)[0]

In [9]:
bx_zips = b_zips[b_zips['bronx'].notnull()]['bronx']

In [10]:
bx_zips = bx_zips.astype(str).str.split('.',expand=True)[0]

In [11]:
si_zips = b_zips[b_zips['staten_island'].notnull()]['staten_island']

In [12]:
si_zips = si_zips.astype(str).str.split('.',expand=True)[0]

## Brooklyn

In [13]:
zips = bk_zips
cats = ['none']

In [16]:
yelp_bk = api_pull(zips, cats, limit=100)

In [17]:
yelp_bk.head()

Unnamed: 0,zip,city,state,cat,pr_1,rv_1,pr_2,rv_2,pr_3,rv_3,pr_4,rv_4,avg_lat,avg_long
0,11201.0,Brooklyn,NY,,15,6370,70,44823,8,2845,1,997,40.692694,-73.991436
1,11202.0,Brooklyn,NY,,9,446,9,897,1,273,0,0,40.692161,-73.991179
2,11203.0,East Flatbush,NY,,46,3072,49,5781,1,113,1,11,40.65547,-73.94419
3,11204.0,Brooklyn,NY,,33,4587,62,17047,4,458,0,0,40.622603,-73.990887
4,11205.0,Brooklyn,NY,,17,7784,70,43845,11,6466,1,5131,40.697441,-73.967653


In [18]:
yelp_bk.shape

(47, 14)

## Queens

In [21]:
zips = qn_zips
cats = ['none']

In [22]:
yelp_qn = api_pull(zips, cats, limit=100)

In [23]:
yelp_qn.head()

Unnamed: 0,zip,city,state,cat,pr_1,rv_1,pr_2,rv_2,pr_3,rv_3,pr_4,rv_4,avg_lat,avg_long
0,11004.0,New Hyde Park,NY,,40,3091,56,8256,0,0,0,0,40.744354,-73.715113
1,11005.0,Astoria,NY,,10,573,3,176,0,0,0,0,40.748055,-73.734862
2,11101.0,New York,NY,,12,26372,61,82045,8,11472,11,24825,40.75068,-73.954138
3,11102.0,Astoria,NY,,21,3863,73,28359,2,325,0,0,40.767802,-73.922745
4,11103.0,Astoria,NY,,24,5512,71,32584,4,1043,0,0,40.763566,-73.917122


## Bronx

In [24]:
zips = bx_zips
cats = ['none']

In [25]:
yelp_bx = api_pull(zips, cats, limit=100)

In [26]:
yelp_bx.head()

Unnamed: 0,zip,city,state,cat,pr_1,rv_1,pr_2,rv_2,pr_3,rv_3,pr_4,rv_4,avg_lat,avg_long
0,10451.0,Bronx,NY,,32,2997,58,10723,4,3185,2,190,40.81705,-73.932395
1,10452.0,Bronx,NY,,40,844,23,1066,5,75,1,86,40.834187,-73.92047
2,10453.0,New York,NY,,29,3217,65,13757,2,562,0,0,40.858406,-73.923906
3,10454.0,Bronx,NY,,37,2604,55,7539,3,1135,1,134,40.80489,-73.930368
4,10455.0,New York,NY,,16,4205,72,39924,7,6018,1,134,40.801446,-73.94064


## Staten Island

In [27]:
zips = si_zips
cats = ['none']

In [28]:
yelp_si = api_pull(zips, cats, limit=100)

In [29]:
yelp_si.head()

Unnamed: 0,zip,city,state,cat,pr_1,rv_1,pr_2,rv_2,pr_3,rv_3,pr_4,rv_4,avg_lat,avg_long
0,10301.0,Staten Island,NY,,14,2226,69,23045,6,908,1,176,40.631828,-74.069224
1,10302.0,Staten Island,NY,,29,3012,61,7383,3,466,0,0,40.633904,-74.121892
2,10303.0,Elizabeth,NJ,,26,3405,70,13553,2,312,0,0,40.635001,-74.142117
3,10304.0,Staten Island,NY,,17,2220,65,11378,11,1317,1,176,40.614453,-74.093799
4,10305.0,Staten Island,NY,,13,6835,75,21257,9,1064,1,176,40.612531,-74.064077


In [30]:
yelp_boroughs = yelp_bk

In [31]:
yelp_boroughs = yelp_boroughs.append(yelp_qn)

In [32]:
yelp_boroughs = yelp_boroughs.append(yelp_bx)

In [33]:
yelp_boroughs = yelp_boroughs.append(yelp_si)

In [34]:
yelp_boroughs.index = range(len(yelp_boroughs))

In [35]:
yelp_boroughs.zip = yelp_boroughs.zip.str.split('.', expand=True)[0]

In [36]:
yelp_boroughs.zip = yelp_boroughs.zip.map(lambda x: '0'+str(x) if int(x) <= 9999 else x)

In [37]:
yelp_boroughs

Unnamed: 0,zip,city,state,cat,pr_1,rv_1,pr_2,rv_2,pr_3,rv_3,pr_4,rv_4,avg_lat,avg_long
0,11201,Brooklyn,NY,,15,6370,70,44823,8,2845,1,997,40.692694,-73.991436
1,11202,Brooklyn,NY,,9,446,9,897,1,273,0,0,40.692161,-73.991179
2,11203,East Flatbush,NY,,46,3072,49,5781,1,113,1,11,40.655470,-73.944190
3,11204,Brooklyn,NY,,33,4587,62,17047,4,458,0,0,40.622603,-73.990887
4,11205,Brooklyn,NY,,17,7784,70,43845,11,6466,1,5131,40.697441,-73.967653
5,11206,Brooklyn,NY,,19,7690,73,44030,6,5780,2,5292,40.709771,-73.947020
6,11207,Brooklyn,NY,,31,6224,66,22526,1,245,0,0,40.694625,-73.918166
7,11208,Howard Beach,NY,,35,4720,58,11148,3,870,1,123,40.685867,-73.883462
8,11209,Brooklyn,NY,,16,2461,79,19560,4,678,0,0,40.624842,-74.026381
9,11210,Brooklyn,NY,,30,2197,61,15455,5,562,0,0,40.627676,-73.953866


In [38]:
nyc_zips = pd.read_csv('../nyc_zip.csv', header=None, names=['zip'], dtype={'zip':str})

In [39]:
zips = nyc_zips.zip
cats = ['none']

In [40]:
yelp_manh_1 = api_pull(zips[:80], cats, limit=100)

In [41]:
yelp_manh_2 = api_pull(zips[80:], cats, limit=100)

In [42]:
yelp_manh = yelp_manh_1.append(yelp_manh_2, ignore_index=True)

In [43]:
yelp_manh.reindex(axis=0)

Unnamed: 0,zip,city,state,cat,pr_1,rv_1,pr_2,rv_2,pr_3,rv_3,pr_4,rv_4,avg_lat,avg_long
0,10001.0,New York,NY,,17,20139,64,73672,12,13352,4,6783,40.751380,-73.991945
1,10002.0,New York,NY,,29,26811,64,87914,6,5705,0,0,40.719674,-73.989291
2,10003.0,New York,NY,,21,28355,64,115635,9,13478,5,7834,40.730335,-73.989247
3,10004.0,New York,NY,,21,38197,52,113225,13,16930,5,5147,40.715223,-73.999469
4,10005.0,New York,NY,,22,4917,61,21109,9,2624,3,1330,40.706818,-74.008856
5,10006.0,New York,NY,,25,5000,57,15581,6,1506,4,1492,40.710417,-74.010312
6,10007.0,New York,NY,,21,4489,56,16660,12,5307,3,778,40.713944,-74.007690
7,10008.0,Amarillo,TX,,25,1048,16,1752,1,5,0,0,35.205871,-101.771002
8,10009.0,New York,NY,,22,25276,66,103451,10,11673,2,852,40.726782,-73.984735
9,10010.0,New York,NY,,13,15800,55,85243,26,27691,4,7114,40.739276,-73.986323


In [44]:
yelp_manh.zip = yelp_manh.zip.str.split('.', expand=True)[0]

In [45]:
yelp_manh.zip = yelp_manh.zip.map(lambda x: '0'+str(x) if int(x) <= 9999 else x)

In [61]:
nyc_best = yelp_manh.append(yelp_boroughs)

In [62]:
nyc_best.head()

Unnamed: 0,zip,city,state,cat,pr_1,rv_1,pr_2,rv_2,pr_3,rv_3,pr_4,rv_4,avg_lat,avg_long
0,10001,New York,NY,,17,20139,64,73672,12,13352,4,6783,40.75138,-73.991945
1,10002,New York,NY,,29,26811,64,87914,6,5705,0,0,40.719674,-73.989291
2,10003,New York,NY,,21,28355,64,115635,9,13478,5,7834,40.730335,-73.989247
3,10004,New York,NY,,21,38197,52,113225,13,16930,5,5147,40.715223,-73.999469
4,10005,New York,NY,,22,4917,61,21109,9,2624,3,1330,40.706818,-74.008856


In [63]:
nyc_best.shape

(329, 14)

In [57]:
nyc_best.drop(['cat'], axis=1, inplace=True)

In [58]:
nyc_best[nyc_best.state!='NY']

Unnamed: 0,zip,city,state,pr_1,rv_1,pr_2,rv_2,pr_3,rv_3,pr_4,rv_4,avg_lat,avg_long
7,10008,Amarillo,TX,25,1048,16,1752,1,5,0,0,35.205871,-101.771002
14,10015,Tucson,AZ,37,7257,59,18846,1,337,0,0,32.202802,-110.958123
45,10047,Houston,TX,52,15861,45,18992,2,1038,0,0,29.714243,-95.546373
46,10048,Portales,NM,11,153,10,239,0,0,0,0,34.183415,-103.33938
48,10060,Manassas,VA,40,3910,52,8331,3,730,0,0,38.75675,-77.466215
51,10072,Garden Grove,CA,41,41873,45,54055,6,10282,2,2398,33.784637,-117.946202
54,10080,Overland Park,KS,29,4570,63,14546,5,1148,1,127,38.943825,-94.676917
58,10090,San Diego,CA,54,18064,42,17596,0,0,0,0,32.910766,-117.121253
59,10094,Indian Rocks Beach,FL,37,4843,55,13662,3,1030,0,0,27.887151,-82.816396
63,10099,Bartlesville,OK,35,430,22,668,5,57,0,0,36.741756,-95.949024


In [64]:
nyc_best = nyc_best[nyc_best.state=='NY']

In [70]:
nyc_best[(nyc_best.avg_long>-73.5)|(nyc_best.avg_lat>40.95)]

Unnamed: 0,zip,city,state,cat,pr_1,rv_1,pr_2,rv_2,pr_3,rv_3,pr_4,rv_4,avg_lat,avg_long
50,10069,New York,NY,,4,250,6,770,0,0,0,0,41.4606,-70.349319
110,10162,New York,NY,,6,489,12,1258,1,274,2,1156,41.325086,-71.051391
113,10165,New York,NY,,3,136,7,444,2,554,2,1212,41.438608,-70.803575
114,10166,New York,NY,,16,1098,36,3963,5,816,1,110,40.763555,-72.919105
125,10177,New York,NY,,8,548,14,1376,0,0,0,0,40.778229,-71.351032
160,10281,New York,NY,,4,296,5,375,3,250,0,0,41.316645,-69.616675
115,11439,Fresh Meadows,NY,,15,838,9,1028,0,0,0,0,41.197305,-71.172356


In [76]:
nyc_best = nyc_best[(nyc_best.avg_long<-73.5)&(nyc_best.avg_lat<40.95)&(nyc_best.avg_long>-74.3)]

In [77]:
nyc_best.index = range(nyc_best.shape[0])

In [78]:
nyc_best.describe()

Unnamed: 0,cat,avg_lat,avg_long
count,0.0,278.0,278.0
mean,,40.730332,-73.938603
std,,0.066512,0.087198
min,,40.516515,-74.244572
25%,,40.700638,-73.99105
50%,,40.740227,-73.962156
75%,,40.761186,-73.886046
max,,40.897152,-73.715113


In [79]:
nyc_best.to_csv('../Data/nyc_best.csv', index=False)

ZIP Code Database: https://www.unitedstateszipcodes.org/zip-code-database/

In [53]:
zip_db = pd.read_csv('../Data/zip_code_database.csv', dtype={'zip':str})

In [24]:
zip_db.head()

Unnamed: 0,zip,type,decommissioned,primary_city,acceptable_cities,unacceptable_cities,state,county,timezone,area_codes,world_region,country,latitude,longitude,irs_estimated_population_2015
0,501,UNIQUE,0,Holtsville,,I R S Service Center,NY,Suffolk County,America/New_York,631,,US,40.81,-73.04,562
1,544,UNIQUE,0,Holtsville,,Irs Service Center,NY,Suffolk County,America/New_York,631,,US,40.81,-73.04,0
2,601,STANDARD,0,Adjuntas,,"Colinas Del Gigante, Jard De Adjuntas, Urb San...",PR,Adjuntas Municipio,America/Puerto_Rico,787939,,US,18.16,-66.72,0
3,602,STANDARD,0,Aguada,,"Alts De Aguada, Bo Guaniquilla, Comunidad Las ...",PR,Aguada Municipio,America/Puerto_Rico,787939,,US,18.38,-67.18,0
4,603,STANDARD,0,Aguadilla,Ramey,"Bda Caban, Bda Esteves, Bo Borinquen, Bo Ceiba...",PR,Aguadilla Municipio,America/Puerto_Rico,787,,US,18.43,-67.15,0


In [54]:
lat_map = {zip_db.zip[i]: zip_db.latitude[i] for i in range(zip_db.shape[0])}
long_map = {zip_db.zip[i]: zip_db.longitude[i] for i in range(zip_db.shape[0])}

In [63]:
yelp_b = yelp_boroughs[yelp_boroughs.pr_1 + yelp_boroughs.pr_2 + yelp_boroughs.pr_3 + yelp_boroughs.pr_4 > 0]

In [64]:
yelp_b['state'] = 'NY'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [65]:
city_ref_dict = {zip_db.zip[i]: zip_db.primary_city[i] for i in range(zip_db.shape[0])}

In [66]:
city_ref_dict['11204']

'Brooklyn'

In [71]:
yelp_b['db_city'] = yelp_b.zip.apply(lambda x: city_ref_dict[x])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [73]:
yelp_b.city = yelp_b.db_city

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value


In [74]:
yelp_b.drop('db_city', axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


In [75]:
yelp_b.head()

Unnamed: 0,zip,city,state,cat,pr_1,rv_1,pr_2,rv_2,pr_3,rv_3,pr_4,rv_4,avg_lat,avg_long
0,11201,Brooklyn,NY,none,13,4563,49,25627,7,2308,1,998,40.693368,-73.99155
2,11203,Brooklyn,NY,none,10,649,9,2225,0,0,0,0,40.650264,-73.933248
3,11204,Brooklyn,NY,none,13,1527,17,2684,1,77,0,0,40.615993,-73.9882
4,11205,Brooklyn,NY,none,8,2994,18,5987,3,365,0,0,40.692484,-73.966175
5,11206,Brooklyn,NY,none,7,1938,20,8772,0,0,1,162,40.705367,-73.940534


In [76]:
yelp_b.to_csv('../Data/yelp_b.csv', index=False)

In [77]:
yelp_manh = pd.read_csv('../Data/yelp.csv')

In [78]:
yelp_manh.head()

Unnamed: 0,zip,city,state,cat,pr_1,rv_1,pr_2,rv_2,pr_3,rv_3,pr_4,rv_4,avg_lat,avg_long
0,10001,New York,NY,none,8,9740,28,29095,4,5507,0,0,40.747709,-73.990216
1,10002,New York,NY,none,18,15201,39,43425,4,4740,0,0,40.719057,-73.989387
2,10003,New York,NY,none,16,19139,54,91952,6,8836,3,5125,40.730866,-73.988554
3,10004,New York,NY,none,12,1980,26,10361,6,2128,2,1153,40.704432,-74.011839
4,10005,New York,NY,none,7,1425,11,2268,1,89,1,177,40.706222,-74.008576


In [79]:
yelp_nyc = yelp_manh

In [80]:
yelp_nyc = yelp_nyc.append(yelp_b)

In [81]:
yelp_nyc.shape

(210, 14)

In [82]:
yelp_nyc.to_csv('../Data/yelp_nyc_total.csv', index=False)