## Introduction / Business Problem
A Toronto-based beer brewery recently started bottling its beer. Instead of using a distributor to sell and deliver beer to stores in Toronto, the brewery decided to distribute its beer independently. Since the brewery is young, the owner decided only to distribute to on-premise accounts (places where beer is sold and consumed "on-premise", namely restaurants, as opposed to grocery and convenience stores which are "off-premise" accounts). The brewery is also constrained by owning only one delivery truck which is not refrigerated. In order to maintain a quality product, the owner does not want to distrubute further than 5 kilometers (km) from the brewery in an unrefrigerated truck.

The owner does not have any confirmed accounts yet, but he wants to understand what a delivery route would look like if he has the capacity to supply 50 stores within a 5 km radius of brewery with one delivery per week. The owner wants to filter this projected route based on the most popular 50 on-premise stores within the search area. The end product will be a map with 50 store points segmented into clusters and color-coded by delivery day of the week (Monday through Friday).

The target audience for this data is business owners, distributors, and sales representatives who want to optimize their route selection for sales and delivery of their products. Routes can be filtered and refined in numerous ways to meet the needs of the specific customer. In this case a brewery owner is trying to understand distribution routes for his product with a limited delivery capacity.

## Data Methodology
The data I will need for this project will come from Foursquare API. I will filter search results by the following criteria: within 5 km of the brewery; only include "restaurant" in search query; and pick the top 50 venues. Once the data points have been collected, I will segment the stores into 5 geographic clusters, one for each day of the week (Monday through Friday). The clusters will define the separate routes for the one delivery truck each day (the stores only receive one delivery per week).

While the data is based on a maximum location search radius, it is also defined by popular restaurants within the search area. This may result in data points are that not evenly spread across the search area. This is okay, as the intent is to sell the product to the most popular places, not evenly across the the 10 miles. An example of a single day route would be a color-coded cluster of 10 stores in a 5 blocks area of downtown Toronto. A second cluster may be two blocks away or it could be one mile away in a separate neighborhood.

The method for clustering the data points and displaying them on a map, color-coded by delivery day of the week, will follow the functions we have already practiced in this course.

## Data Analysis

#### Load necessary libraries

In [16]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

from IPython.display import Image 
from IPython.core.display import HTML 
    
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium

import matplotlib.cm as cm
import matplotlib.colors as colors

print('Folium installed')
print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          92 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.50-py_0   conda-forge
    geopy:         1.21.0-py_0 conda-forge


Downloading and Extracting Packages
geopy-1.21.0         | 58 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Solving environ

In [68]:
CLIENT_ID = 'FZRTE5WPMVWLLLYJTWJYWIIUEKCPZKQPZRZR4H2BHI5NWWYP' # your Foursquare ID
CLIENT_SECRET = 'GI20Q4VR3PMGVOCDJ0LMO0J2DGM2MIOF2NHJX4E1G3PHBMFC' # your Foursquare Secret
VERSION = '20180604'
print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentials:
CLIENT_ID: FZRTE5WPMVWLLLYJTWJYWIIUEKCPZKQPZRZR4H2BHI5NWWYP
CLIENT_SECRET:GI20Q4VR3PMGVOCDJ0LMO0J2DGM2MIOF2NHJX4E1G3PHBMFC


#### Find coordinates for our brewery and assign them to lat/long

#### The example brewery in this project will use location data for Steam Whistle Brewing, located in downtown Toronto at 255 Bremner Boulevard.

In [69]:
address = '255 Bremner Boulevard, Toronto'

geolocator = Nominatim(user_agent="my_app")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates the brewery are {}, {}.'.format(latitude, longitude))

The geograpical coordinates the brewery are 43.6409668, -79.3851702.


#### Search within 5 km of the brewery for restaurants

In [73]:
search_query = 'restaurant'
LIMIT = 50
radius = 5000
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=FZRTE5WPMVWLLLYJTWJYWIIUEKCPZKQPZRZR4H2BHI5NWWYP&client_secret=GI20Q4VR3PMGVOCDJ0LMO0J2DGM2MIOF2NHJX4E1G3PHBMFC&ll=43.6409668,-79.3851702&v=20180604&query=restaurant&radius=5000&limit=50'

In [74]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5eac579bf7706a001b1b2b92'},
 'response': {'venues': [{'id': '4ad4c05cf964a520dff520e3',
    'name': '360 Restaurant',
    'location': {'address': '301 Front St W',
     'crossStreet': '301 Front St. W',
     'lat': 43.642537317144566,
     'lng': -79.38704201569328,
     'labeledLatLngs': [{'label': 'display',
       'lat': 43.642537317144566,
       'lng': -79.38704201569328}],
     'distance': 230,
     'postalCode': 'M5V 2T6',
     'cc': 'CA',
     'city': 'Toronto',
     'state': 'ON',
     'country': 'Canada',
     'formattedAddress': ['301 Front St W (301 Front St. W)',
      'Toronto ON M5V 2T6',
      'Canada']},
    'categories': [{'id': '4bf58dd8d48988d123941735',
      'name': 'Wine Bar',
      'pluralName': 'Wine Bars',
      'shortName': 'Wine Bar',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/winery_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1588352904',
    'hasPerk': False},
  

#### Assign venue data to a panda dataframe

In [75]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
df = json_normalize(venues)
df

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '4bf58dd8d48988d123941735', 'name': 'W...",False,4ad4c05cf964a520dff520e3,301 Front St W,CA,Toronto,Canada,301 Front St. W,230,"[301 Front St W (301 Front St. W), Toronto ON ...","[{'label': 'display', 'lat': 43.64253731714456...",43.642537,-79.387042,,M5V 2T6,ON,360 Restaurant,v-1588352904,
1,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",False,4ad4c060f964a5208af720e3,1 Blue Jays Way,CA,Toronto,Canada,inside the Renaissance Hotel,319,[1 Blue Jays Way (inside the Renaissance Hotel...,"[{'label': 'display', 'lat': 43.64147925628281...",43.641479,-79.389074,,M5V 1J4,ON,Arriba Restaurant,v-1588352904,68079429.0
2,"[{'id': '4bf58dd8d48988d10f941735', 'name': 'I...",False,4aef8854f964a5201cd921e3,287 King St. W,CA,Toronto,Canada,at John St.,710,"[287 King St. W (at John St.), Toronto ON M5V ...","[{'label': 'display', 'lat': 43.64646252150344...",43.646463,-79.389644,,M5V 1J5,ON,Aroma Fine Indian Restaurant,v-1588352904,
3,"[{'id': '4bf58dd8d48988d14e941735', 'name': 'A...",False,4bc3ad5e461576b0db037f32,Rogers Centre,CA,Toronto,Canada,,345,"[Rogers Centre, Toronto ON, Canada]","[{'label': 'display', 'lat': 43.64163507221119...",43.641635,-79.389365,,,ON,Sightlines Restaurant,v-1588352904,
4,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",False,4b223f5af964a520ba4424e3,225 Front St W,CA,Toronto,Canada,in InterContinental Toronto Centre,421,[225 Front St W (in InterContinental Toronto C...,"[{'label': 'display', 'lat': 43.64474919591934...",43.644749,-79.385113,Entertainment District,M5V 2X3,ON,Azure Restaurant & Bar,v-1588352904,136175835.0
5,"[{'id': '4bf58dd8d48988d116941735', 'name': 'B...",False,4ad4c05df964a5203ff620e3,30 Mercer Street,CA,Toronto,Canada,at John St,707,"[30 Mercer Street (at John St), Toronto ON M5V...","[{'label': 'display', 'lat': 43.64563436248102...",43.645634,-79.391125,,M5V 1H3,ON,Victor Restaurant & Bar,v-1588352904,
6,"[{'id': '4bf58dd8d48988d14e941735', 'name': 'A...",False,4ada5d5bf964a520e92121e3,35 Church St,CA,Toronto,Canada,at Front St E,1272,"[35 Church St (at Front St E), Toronto ON M5E ...","[{'label': 'display', 'lat': 43.64882370529773...",43.648824,-79.373702,,M5E 1T3,ON,The Hot House Restaurant & Bar,v-1588352904,
7,"[{'id': '4bf58dd8d48988d1f5931735', 'name': 'D...",False,4b072e9df964a52009f922e3,280 Spadina Ave.,CA,Toronto,Canada,at Dundas St. W.,1681,"[280 Spadina Ave. (at Dundas St. W.), Toronto ...","[{'label': 'display', 'lat': 43.65278331265585...",43.652783,-79.398174,,,ON,Sky Dragon Chinese Restaurant 龍翔酒樓,v-1588352904,
8,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",False,4ad4c05cf964a52006f620e3,37 King Street East,CA,Toronto,Canada,at Le Meridien King Edward Hotel,1164,[37 King Street East (at Le Meridien King Edwa...,"[{'label': 'display', 'lat': 43.64929834396347...",43.649298,-79.376431,,M5C 1E9,ON,Victoria's Restaurant,v-1588352904,498556908.0
9,"[{'id': '4bf58dd8d48988d14e941735', 'name': 'A...",False,4ad4c05ff964a52048f720e3,110 Chestnut Street,CA,Toronto,Canada,,1550,"[110 Chestnut Street, Toronto ON M5G 1R3, Canada]","[{'label': 'display', 'lat': 43.65488413420439...",43.654884,-79.385931,,M5G 1R3,ON,Hemispheres Restaurant & Bistro,v-1588352904,


#### Define data of interest

In [78]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in df.columns if col.startswith('location.')] + ['id']
df_filtered = df.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
df_filtered['categories'] = df_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
df_filtered.columns = [column.split('.')[-1] for column in df_filtered.columns]

df_filtered.head()

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,360 Restaurant,Wine Bar,301 Front St W,CA,Toronto,Canada,301 Front St. W,230,"[301 Front St W (301 Front St. W), Toronto ON ...","[{'label': 'display', 'lat': 43.64253731714456...",43.642537,-79.387042,,M5V 2T6,ON,4ad4c05cf964a520dff520e3
1,Arriba Restaurant,Restaurant,1 Blue Jays Way,CA,Toronto,Canada,inside the Renaissance Hotel,319,[1 Blue Jays Way (inside the Renaissance Hotel...,"[{'label': 'display', 'lat': 43.64147925628281...",43.641479,-79.389074,,M5V 1J4,ON,4ad4c060f964a5208af720e3
2,Aroma Fine Indian Restaurant,Indian Restaurant,287 King St. W,CA,Toronto,Canada,at John St.,710,"[287 King St. W (at John St.), Toronto ON M5V ...","[{'label': 'display', 'lat': 43.64646252150344...",43.646463,-79.389644,,M5V 1J5,ON,4aef8854f964a5201cd921e3
3,Sightlines Restaurant,American Restaurant,Rogers Centre,CA,Toronto,Canada,,345,"[Rogers Centre, Toronto ON, Canada]","[{'label': 'display', 'lat': 43.64163507221119...",43.641635,-79.389365,,,ON,4bc3ad5e461576b0db037f32
4,Azure Restaurant & Bar,Restaurant,225 Front St W,CA,Toronto,Canada,in InterContinental Toronto Centre,421,[225 Front St W (in InterContinental Toronto C...,"[{'label': 'display', 'lat': 43.64474919591934...",43.644749,-79.385113,Entertainment District,M5V 2X3,ON,4b223f5af964a520ba4424e3


#### Review and visualize data collected into dataframe

In [77]:
df_filtered.name

0                                        360 Restaurant
1                                     Arriba Restaurant
2                          Aroma Fine Indian Restaurant
3                                 Sightlines Restaurant
4                                Azure Restaurant & Bar
5                               Victor Restaurant & Bar
6                        The Hot House Restaurant & Bar
7                    Sky Dragon Chinese Restaurant 龍翔酒樓
8                                 Victoria's Restaurant
9                       Hemispheres Restaurant & Bistro
10                       Goldstone Noodle Restaurant 金石
11                                    Evviva Restaurant
12                               Rol San Restaurant 龍笙棧
13                        Micheal's Restaurant and Deli
14                        Green Tea Restaurant Downtown
15                      Gonoe Sushi Japanese Restaurant
16                   Some Time BBQ Grill Restaurant 碳烤屋
17                             Restaurant at Num

In [88]:
toronto_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around downtown Toronto

# add a red circle marker to represent your brewery (distribution point)
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Your Brewery',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(toronto_map)

# add the nearby restaurants as blue circle markers
for lat, lng, label in zip(df_filtered.lat, df_filtered.lng, df_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(toronto_map)

# display map
toronto_map

### Cluster the restaurants

In [89]:
import matplotlib.pyplot as plt # plotting library
# backend for rendering plots within the browser
%matplotlib inline 

from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs

In [101]:
df_filtered.head()

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,360 Restaurant,Wine Bar,301 Front St W,CA,Toronto,Canada,301 Front St. W,230,"[301 Front St W (301 Front St. W), Toronto ON ...","[{'label': 'display', 'lat': 43.64253731714456...",43.642537,-79.387042,,M5V 2T6,ON,4ad4c05cf964a520dff520e3
1,Arriba Restaurant,Restaurant,1 Blue Jays Way,CA,Toronto,Canada,inside the Renaissance Hotel,319,[1 Blue Jays Way (inside the Renaissance Hotel...,"[{'label': 'display', 'lat': 43.64147925628281...",43.641479,-79.389074,,M5V 1J4,ON,4ad4c060f964a5208af720e3
2,Aroma Fine Indian Restaurant,Indian Restaurant,287 King St. W,CA,Toronto,Canada,at John St.,710,"[287 King St. W (at John St.), Toronto ON M5V ...","[{'label': 'display', 'lat': 43.64646252150344...",43.646463,-79.389644,,M5V 1J5,ON,4aef8854f964a5201cd921e3
3,Sightlines Restaurant,American Restaurant,Rogers Centre,CA,Toronto,Canada,,345,"[Rogers Centre, Toronto ON, Canada]","[{'label': 'display', 'lat': 43.64163507221119...",43.641635,-79.389365,,,ON,4bc3ad5e461576b0db037f32
4,Azure Restaurant & Bar,Restaurant,225 Front St W,CA,Toronto,Canada,in InterContinental Toronto Centre,421,[225 Front St W (in InterContinental Toronto C...,"[{'label': 'display', 'lat': 43.64474919591934...",43.644749,-79.385113,Entertainment District,M5V 2X3,ON,4b223f5af964a520ba4424e3


#### Create new dataframe with essential data

In [125]:
df_refined = df_filtered[['name','lat','lng']]
df_refined.head()

Unnamed: 0,name,lat,lng
0,360 Restaurant,43.642537,-79.387042
1,Arriba Restaurant,43.641479,-79.389074
2,Aroma Fine Indian Restaurant,43.646463,-79.389644
3,Sightlines Restaurant,43.641635,-79.389365
4,Azure Restaurant & Bar,43.644749,-79.385113


#### Initialize K-means

In [116]:
# set number of clusters
kclusters = 5

kl_clustering = df_refined.drop(['name'],1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kl_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:49]

array([2, 2, 2, 2, 2, 2, 0, 3, 0, 4, 3, 2, 3, 3, 3, 2, 3, 0, 3, 3, 3, 3,
       2, 3, 0, 3, 2, 2, 1, 2, 2, 4, 4, 1, 0, 2, 2, 0, 2, 0, 1, 2, 0, 2,
       2, 4, 4, 2, 1], dtype=int32)

#### Add cluster labels

In [117]:
df_refined.insert(0, 'Cluster Labels', kmeans.labels_)

df_refined.head()

Unnamed: 0,Cluster Labels,name,lat,lng
0,2,360 Restaurant,43.642537,-79.387042
1,2,Arriba Restaurant,43.641479,-79.389074
2,2,Aroma Fine Indian Restaurant,43.646463,-79.389644
3,2,Sightlines Restaurant,43.641635,-79.389365
4,2,Azure Restaurant & Bar,43.644749,-79.385113


#### Sort venues by cluster

In [122]:
# sort the results by Cluster Labels
print(df_refined.shape)
df_refined.sort_values(["Cluster Labels"], inplace=True)
df_refined

(50, 4)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


Unnamed: 0,Cluster Labels,name,lat,lng
24,0,Restaurant | Adelaide,43.649563,-79.38059
17,0,Restaurant at Number One,43.639044,-79.377653
34,0,Docks Restaurant & Night Club The,43.641806,-79.354171
39,0,ResoSolutions - Online Restaurant Booking Syst...,43.642566,-79.375255
8,0,Victoria's Restaurant,43.649298,-79.376431
6,0,The Hot House Restaurant & Bar,43.648824,-79.373702
37,0,Lusso Restaurant and Bar,43.638094,-79.380666
42,0,Bottom Line Restaurant & Bar,43.646286,-79.378086
40,1,Lao Thai Restaurant,43.642849,-79.427544
33,1,The Lakeview Restaurant,43.649435,-79.42039


#### Visualize clusters on map of Toronto

In [124]:
# create map
map_clusters = folium.Map(location=[43.6409668, -79.3851702], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, name, cluster in zip(df_refined['lat'], df_refined['lng'], df_refined['name'], df_refined['Cluster Labels']):
    label = folium.Popup(str(name) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.5).add_to(map_clusters)
       
map_clusters

## Results
The clustering of 50 restaurants in downtown Toronto was able to provide five distinct groupings, one for each day of the week, that will allow the brewery to optimize its daily sales route based on distance from the brewery (distribution point). The five cluster are separated geographially (based on latitude / longitutde data) and denoted by different color coding. The overall objective of this project was achieved, likely providing value to the brewery owner by decreasing planning/experimentation time and decreasing unnecessary costs due to poor delivery route selection.  


## Discussion 
The results of intial data collection provided unexpectedly clear dileniations in the geographic spacing of the restaurants. This made clustering the venues much easier and the results more expected. This outcome can not be expected in most other situations or cities. A larger collection or venues over a larger radius may have provided more interesting results to observe and with which to experiment. This can be considered a proof of concept, for which the concept proves effective on a small scale. 

There are additional variables can be added to make the route selection more complex, and possibly more effective, such as traffic pattern along the route, hours of business for the venues, and average size of the weekly delivery for each cluster. Since one major constraint for the brewery is its single delivery truck, the clustering data does not provide avergage delivery size, per beer case or keg, which would likely influence the route selection and the number of return trips to the brewery on a single day. 

## Conclusion
This proof of concept was effective on a small scale for determining an optimum sales route, by day of the week, for a small brewery in downtown Toronto. The clusters were cleary distingushed by geographic location and color. The objective of this project was to provide a model for a potential delivery/sales route for the brewery owner's information and planning purposes. Once the owner gathers more information, develops sales accounts, and increases the size of his distribution and delivery capacity, even greater value can be gleaned from a k-means clustering exercise such as this one. 

The target audience for this type of project is sales representatives and organizations engaged in all types of commerce and distribution. Collecting location data and clustering select data can optimize a single sales representative's daily sales route, as well as improving an entire beverage distributor's bottom-line by reducing items such as the following: cost of gas; wear and tear on trucks; accidents on the road; and driver over-time pay. This one project demonstrated the successful execution of a concept that can be broadened in size and variables to provide value for many different businesses. 