In [1]:
import numpy as np
import pandas as pd

import transform

In [2]:
osm_data = pd.read_json('data/preprocessed-osm-data.json.gz')
chain_restaurant_qids = pd.read_json('data/chain-restaurant-qids.json')

The aim of this notebook is to identify restaurants by using amenities and tags from the OSM data.

In [9]:
# find restaurants using amenities found manually
manual_restaurant_amenities = [
    'restaurant', 'fast_food', 'cafe', 'pub', 'bar', 'ice_cream', 
    'food_court', 'bbq', 'juice_bar', 'disused:restaurant', 'bistro'
]
manual_restaurant_amenities = dict(zip(
    manual_restaurant_amenities, 
    np.ones_like(manual_restaurant_amenities, dtype=int)
))
num_manual_restaurants = transform.get_num_restaurants(osm_data, manual_restaurant_amenities)
print(manual_restaurant_amenities.keys())
print('Number of restaurants using amenities found manually:', num_manual_restaurants)

dict_keys(['restaurant', 'fast_food', 'cafe', 'pub', 'bar', 'ice_cream', 'food_court', 'bbq', 'juice_bar', 'disused:restaurant', 'bistro'])
Number of restaurants using amenities found manually: 5149


We first start off with a manual approach which is basically looking through each (unique) amenity and noting down ones that may be associated with restaurants. A limitation for this approach includes how it may not be viable for larger data.

In [5]:
# find restaurants using amenities associated with a cuisine tag 
restaurant_amenities = transform.get_restaurant_amenities(osm_data)
num_restaurants = transform.get_num_restaurants(osm_data, restaurant_amenities)
print(restaurant_amenities.keys())
print('Number of restaurants using amenities with a cuisine tag:', num_restaurants)

dict_keys(['cafe', 'restaurant', 'fast_food', 'bar', 'pub', 'ice_cream', 'construction'])
Number of restaurants using amenities with a cuisine tag: 5129


Second, we move onto a more automated approach in which we look at the amenities of OSM entries that have a cuisine tag. This approach misses some of the restaurant amenities we found manually and considers OSM entries with a 'construction' amenity to be restaurants too which may be possible limitations.

In [11]:
is_construction = (osm_data['amenity'] == 'construction')
osm_data[is_construction]

Unnamed: 0,lat,lon,timestamp,amenity,name,tags,cuisine,qid
1380,49.049802,-122.311772,2019-12-15T14:47:03.000-08:00,construction,Sy's Vegan Bistro,"{'addr:housenumber': '32900', 'website': 'http...",,
5323,49.046901,-122.30706,2016-11-09T04:03:58.000-08:00,construction,,{'construction': 'bench'},,
8146,49.058713,-122.380389,2019-05-04T22:17:33.000-07:00,construction,Go-Grill,"{'construction': 'fast_food', 'website': 'http...",,
11933,49.057702,-122.31523,2019-12-05T13:53:29.000-08:00,construction,Cora,"{'addr:housenumber': '3068', 'website': 'https...",,Q2996960
15613,49.057777,-122.314627,2019-08-07T01:30:54.000-07:00,construction,AfterThoughts,"{'addr:housenumber': '3050', 'alt_name': 'Afte...",dessert,


Further inspection reveals that there are indeed some restaurants with a 'construction' amenity.

In conclusion, the <br><br>


Manual Approach
- may be time consuming / not be viable for larger data with many unique amenities
- may miss restaurants with weird amenities (false negatives)

while the
<br>

Cuisine Tag Approach
- is more automatic
- may identify restaurants with weird amenities
- may consider some non-restaurants as restaurants (false positives)
- may miss some restaurants (false negatives)
