CAPSTONE PROJECT:
======

## Trending locations closest to bike-sharing docking stations

### I. Introduction

As people are becoming more health and environmentally conscious, shared biking systems in metropolitan areas have become increasingly more popular over the years, with a steady increase in the number of users and docking stations. 

Being a past user myself, I would always map out my routes ahead of time and made sure to check out what was nearby my destination station. Knowing the city, this task is easy. However, what if you are a visitor in a city and don’t know where to go or what to do? Wouldn’t it be great to have an easy way to find out the docking stations that have the highest number popular venues/activities in order to maximize your time within a new city you wish to discover?

For this capstone project, I will take a look at the most trending venues/activities within a 100 meter radius of a docking station in the city of Montreal. This information is useful for tourists visiting the city, who wish to discover the city using the shared-biking system Bixi, and are time-limited. However, the methodology used to discover the most trending locations in Montreal can be applied to other cities with a bike-sharing system such as Bixi. So the question to be answered through this project is:

What are the docking station locations in the city of Montreal that have the most trending venues/activities nearby?

### II. Data 

The dataset that will be required to answer the question stated in the introduction includes the locations of the docking stations, which is open data available on the Bixi website of Montreal in .csv format. The .csv file contains the most recent list of the docking stations in 2019 for the Bixi bikes, with details about each station (station code, station name, latitude and longitude). This information can be downloaded from the Bixi website: https://www.bixi.com/en/open-data.

Foursquare location data will be used as well to determine the most popular venues/activities in the city of Montreal. The location data will then be used to determine the most trending venues within a 100 m radius of each of the Bixi docking station locations provided in the Bixi dataset. The venues will then be analyzed by location in order to determine the docking station locations with the most trending venues. As a final result, the venues will be clustered by location using *k-means clustering* and will then be mapped by cluster along with the nearest Bixi docking station.

#### Let's begin with importing the data from the Bixi website and saving it to a dataframe named *stations_df*.

In [2]:
# Import necessary libraries
import pandas as pd
import numpy as np

In [3]:
stations_df = pd.read_csv('data/Stations_2019.csv')
stations_df

FileNotFoundError: [Errno 2] File b'data/Stations_2019.csv' does not exist: b'data/Stations_2019.csv'

In [13]:
# The code was removed by Watson Studio for sharing.

    Code                                    name   latitude  longitude
0  10002  Métro Charlevoix (Centre / Charlevoix)  45.478228 -73.569651
1   4000                  Jeanne-d'Arc / Ontario  45.549598 -73.541874
2   4001                     Graham / Brookfield  45.520075 -73.629776
3   4002                      Graham / Wicksteed  45.516937 -73.640483
4   5002                St-Charles / Montarville  45.533682 -73.515261
(615, 4)


As we can see there are a total of 615 docking stations for Bixi bikes in the Montreal area.

#### Next, we'll use geopy library to get the latitude and longitude values of the city of Montreal.

In [7]:
# import necessary libraries
import json
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
import requests
from pandas.io.json import json_normalize
!pip install folium
import folium

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    ca-certificates-2019.6.16  |       hecc5488_0         145 KB  conda-forge
    certifi-2019.6.16          |           py36_1         149 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

In [9]:
mtl_address = 'Montreal, QC'

geolocator = Nominatim(user_agent="mtl_explorer")
mtl_location = geolocator.geocode(mtl_address)
mtl_latitude = mtl_location.latitude
mtl_longitude = mtl_location.longitude
print('The geographical coordinates of the city of Montreal are {}, {}.'.format(mtl_latitude, mtl_longitude))

The geographical coordinates of the city of Montreal are 45.4972159, -73.6103642.


Next, we are going to start utilizing the Foursquare API to explore the trending venues in Montreal and segment them.

#### First, let's define Foursquare Credentials and Version.

In [10]:
CLIENT_ID = 'SAWYPKIV3SELVRC0KO1GJ3PRKETDQ3AXYAYYG10JQT0P2I4O'
CLIENT_SECRET = 'VUOM0J4N1XKS1FLKASWEL44AIJIQCI3MY2LF5CEWVKRCSTKC'
VERSION = '20190722'

#### Let's get the top 500 **trending** venues within a 20 km radius of Montreal center.

In [11]:
limit = 500
radius = 20000
section = 'trending'
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, mtl_latitude, mtl_longitude, VERSION, radius, limit, section)

Send the GET request and examine the resutls

In [12]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d370a1523bb8e002c0d7338'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Montreal',
  'headerFullLocation': 'Montreal',
  'headerLocationGranularity': 'city',
  'totalResults': 244,
  'suggestedBounds': {'ne': {'lat': 45.67721608000018,
    'lng': -73.35404683487533},
   'sw': {'lat': 45.31721571999982, 'lng': -73.86668156512468}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4ad8f749f964a520871621e3',
       'name': 'Parc du Mont-Royal',
       'location': {'address': '1260 Chemin Remembrance',
        'lat': 45.50407921694641,
        'lng': -73.58732075575296,
        'labeledLatLngs': [{'label': 'display',
          'lat': 45.504079216

The following **get_category_type** function is borrowed from a previous lab and will be used to extract the pertinent information from each venue.

In [14]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#### Next we clean up the json file using *Pandas*.

In [15]:
venues = results['response']['groups'][0]['items']
    
mtl_venues = json_normalize(venues)

filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
mtl_venues = mtl_venues.loc[:, filtered_columns]

mtl_venues['venue.categories'] = mtl_venues.apply(get_category_type, axis=1)

mtl_venues.columns = [col.split(".")[-1] for col in mtl_venues.columns]

mtl_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Parc du Mont-Royal,Park,45.504079,-73.587321
1,CEPSUM,College Gym,45.50832,-73.612664
2,Parc Westmount Park,Park,45.481574,-73.597476
3,Chalet du Mont-Royal,Historic Site,45.503636,-73.587095
4,Musée des beaux-arts de Montréal (MBAM),Art Museum,45.498436,-73.579715


In [16]:
mtl_venues.shape

(100, 4)

In [17]:
mtl_venues.to_excel("mtl_trending_venues.xlsx")

As we can see, despite the fact that I set a limit of 500 venues within a 20 km radius, Foursquare returned only 100 venues because that is the maximum supported amount of return values.

Now that we have all the data we need and it is cleaned up, let's start exploring it visually.

## III. Exploratory Data Analysis

#### Create a map using Folium to visualize the locations of the Bixi docking stations (in blue) and the trending venues (in red).

Let's first define latitudes and longitude values for both stations and trending venues, for simplicity.

In [18]:
station_name = stations_df['name']
station_lat = stations_df['latitude']
station_lng = stations_df['longitude']
venue_name = mtl_venues['name']
venue_lat = mtl_venues['lat']
venue_lng = mtl_venues['lng']

In [19]:
# create map of Montreal using latitude and longitude values
mtl_map = folium.Map(location=[mtl_latitude, mtl_longitude], zoom_start=11)

# add markers to map to visualize the Bixi docking stations
for lat, lng, label in zip(station_lat, station_lng, station_name):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7,
        parse_html=False).add_to(mtl_map)

# add markers to map to visualize the trending venues in Montreal
for lat, lng, label in zip(venue_lat, venue_lng, venue_name):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.7,
        parse_html=False).add_to(mtl_map)  

mtl_map

Based on the map, it is clear that that are a lot more bixi docking stations (615) than trending venues (100), and the trending venues are concentrated in Montreal's downtown core.

Therefore, it would be relevant to filter the docking stations and analyze the ones that are within 100 meters of a trending venue.

In order to do this, we will first check the data types for each dataframe, to ensure they are in the correct format. Then we will process to create a series of functions and iterations to determine the distance between each venue and docking station, return only the venues and stations within 100 meter radius, and store all this information into a new dataframe.

#### Let's check the data types.

In [20]:
mtl_venues.dtypes

name           object
categories     object
lat           float64
lng           float64
dtype: object

In [21]:
stations_df.dtypes

Code           int64
name          object
latitude     float64
longitude    float64
dtype: object

Everything looks good.

#### Now let's create a function that will calculate the distance between two locations. It uses the Haversine formula.

In [22]:
from math import radians, cos, sin, asin, sqrt

def haversine(lon1, lat1, lon2, lat2):
    
    # necessary to convert coordinates to radians in order to use trig functions
    x1, y1, x2, y2 = map(radians, [lon1, lat1, lon2, lat2]) 

    # haversine formula 
    dlon = x2 - x1 
    dlat = y2 - y1 
    a = sin(dlat/2)**2 + cos(y1) * cos(y2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    r = 6371000 # Radius of earth in meters
    distance = c * r
    
    # will return distance in meters
    return(distance)

#### In this next step, we will iterate for all the venues and docking stations in order to calculate the distance between each one, return only values within a distance of 100 meters, and stores all this information into a new dataframe we will call *locations_df*.

In [23]:
venue_name = []
venue_lat = []
venue_lng = []
venue_cat = []
station_name = []
station_lat = []
station_lng = []
distance = []
i = 0
j = 0
for i in range(0, len(mtl_venues['name'])): # iterates all rows in the mtl_venues dataframe
    
    name1 = mtl_venues.iat[i,0] 
    lat1 = mtl_venues.iat[i,2]
    lng1 = mtl_venues.iat[i,3] 
    cat1 = mtl_venues.iat[i,1]
    
    for j in range(0, len(stations_df['name'])): # iterates all rows in the stations_df dataframe
        
        name2 = stations_df.iat[j,1]
        lat2 = stations_df.iat[j,2]
        lng2 = stations_df.iat[j,3]
        d = haversine(lng1, lat1, lng2, lat2)
        
        if (d <= 100): # append values only if distance is within 100 meters
            venue_name.append(name1),
            venue_lat.append(lat1),
            venue_lng.append(lng1),
            venue_cat.append(cat1),
            station_name.append(name2),
            station_lat.append(lat2),
            station_lng.append(lng2),
            distance.append(d)
        
        i = i+1
    
    j = j+1

df_locations = pd.DataFrame({'Venue': venue_name, 
                             'Venue Latitude': venue_lat, 
                             'Venue Longitude': venue_lng, 
                             'Venue Category': venue_cat,
                             'Docking Station': station_name, 
                             'Docking Station Latitude': station_lat, 
                             'Docking Station Longitude': station_lng,
                             'Distance (m)': distance})

df_locations.head()

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Docking Station,Docking Station Latitude,Docking Station Longitude,Distance (m)
0,Moksha Yoga NDG,45.476606,-73.621498,Yoga Studio,de Monkland / Girouard,45.477107,-73.621438,55.899685
1,Maison Boulud,45.500062,-73.578324,French Restaurant,de la Montagne / Sherbrooke,45.499745,-73.579034,65.604786
2,The Ritz-Carlton Montréal,45.500191,-73.578127,Hotel,de la Montagne / Sherbrooke,45.499745,-73.579034,86.358319
3,Tiffany & Co.,45.499782,-73.578403,Jewelry Store,de la Montagne / Sherbrooke,45.499745,-73.579034,49.303327
4,Damas,45.522596,-73.613112,Mediterranean Restaurant,Bloomfield / Van Horne,45.522586,-73.612658,35.437825


In [24]:
df_locations.shape

(65, 8)

As we can see, there are 65 venues that are within 100 meters of a docking station. This information also tells us that there are 35 venues that are not within 100 meters of a docking station. Let's further explore this information.

#### Let's filter the dataframe so we have only unique venues.

In [25]:
df_locations.groupby('Docking Station').count()

Unnamed: 0_level_0,Venue,Venue Latitude,Venue Longitude,Venue Category,Docking Station Latitude,Docking Station Longitude,Distance (m)
Docking Station,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Atwater / Greene,2,2,2,2,2,2,2
Bernard / Jeanne-Mance,1,1,1,1,1,1,1
Berri / Rachel,1,1,1,1,1,1,1
Bloomfield / Van Horne,1,1,1,1,1,1,1
Clark / Prince-Arthur,1,1,1,1,1,1,1
Clark / Rachel,1,1,1,1,1,1,1
Crescent / Ste-Catherine,1,1,1,1,1,1,1
Crescent / de Maisonneuve,1,1,1,1,1,1,1
Cypress / Peel,1,1,1,1,1,1,1
Duluth / St-Laurent,1,1,1,1,1,1,1


In [26]:
print('There are {} unique categories.'.format(len(df_locations['Venue Category'].unique())))

There are 37 unique categories.


In [27]:
print('There are {} unique venues.'.format(len(df_locations['Venue'].unique())))

There are 55 unique venues.


In [28]:
print('There are {} unique docking station locations.'.format(len(df_locations['Docking Station'].unique())))

There are 48 unique docking station locations.


In [29]:
# sort the dataframe by distance in ascending value, save to new dataframe
sorted_locations = df_locations.sort_values(by='Distance (m)', ascending=True)
sorted_locations.head()

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Docking Station,Docking Station Latitude,Docking Station Longitude,Distance (m)
42,Pizzeria Magpie,45.524711,-73.595744,Pizza Place,Maguire / St-Laurent,45.524628,-73.595811,10.583038
28,Espace Nomad,45.521402,-73.589277,Spa,Villeneuve / St-Laurent,45.521342,-73.589419,12.969461
5,Kazu,45.493014,-73.580203,Japanese Restaurant,Ste-Catherine / St-Marc,45.492897,-73.580294,14.775945
58,Satay Brothers,45.480061,-73.577394,Asian Restaurant,Marché Atwater,45.480208,-73.577599,22.883836
61,Les Co'Pains d'Abord,45.522078,-73.577743,Bakery,Rivard / Rachel,45.522278,-73.577591,25.170705


Now we'll filter the dataframe in order to only keep the unique venues. Out of the duplicates, only the one with the smallest distance will be kept. To do this, we'll first sort the dataframe by distance in ascending order, then drop the duplicates and only keep the first (which will have the smallest distance).

In [30]:
# filter the dataframe to remove duplicate venues, and keep only first of duplicates, save to new dataframe
filtered_locations = sorted_locations.drop_duplicates(subset='Venue', keep='first')
filtered_locations

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Docking Station,Docking Station Latitude,Docking Station Longitude,Distance (m)
42,Pizzeria Magpie,45.524711,-73.595744,Pizza Place,Maguire / St-Laurent,45.524628,-73.595811,10.583038
28,Espace Nomad,45.521402,-73.589277,Spa,Villeneuve / St-Laurent,45.521342,-73.589419,12.969461
5,Kazu,45.493014,-73.580203,Japanese Restaurant,Ste-Catherine / St-Marc,45.492897,-73.580294,14.775945
58,Satay Brothers,45.480061,-73.577394,Asian Restaurant,Marché Atwater,45.480208,-73.577599,22.883836
61,Les Co'Pains d'Abord,45.522078,-73.577743,Bakery,Rivard / Rachel,45.522278,-73.577591,25.170705
6,Mandy's,45.498286,-73.577895,Salad Place,Crescent / de Maisonneuve,45.498112,-73.577615,29.179987
55,Café Nocturne,45.513566,-73.572849,Café,Clark / Prince-Arthur,45.513303,-73.572961,30.537609
50,Il Focolaio,45.504009,-73.568213,Pizza Place,Square Phillips,45.503738,-73.568106,31.29072
12,Club Sportif MAA,45.5007,-73.574817,Gym,de Maisonneuve / Peel,45.500951,-73.574578,33.489149
15,Darling,45.518995,-73.584061,Lounge,Vallières / St-Laurent,45.518967,-73.583616,34.825866


In [31]:
len(filtered_locations['Docking Station'].unique())

41

We have now a dataframe with 55 unique venues, and the closest docking station to each venue.

In [32]:
filtered_locations.reset_index(drop=True, inplace=True)
filtered_locations.head()

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Docking Station,Docking Station Latitude,Docking Station Longitude,Distance (m)
0,Pizzeria Magpie,45.524711,-73.595744,Pizza Place,Maguire / St-Laurent,45.524628,-73.595811,10.583038
1,Espace Nomad,45.521402,-73.589277,Spa,Villeneuve / St-Laurent,45.521342,-73.589419,12.969461
2,Kazu,45.493014,-73.580203,Japanese Restaurant,Ste-Catherine / St-Marc,45.492897,-73.580294,14.775945
3,Satay Brothers,45.480061,-73.577394,Asian Restaurant,Marché Atwater,45.480208,-73.577599,22.883836
4,Les Co'Pains d'Abord,45.522078,-73.577743,Bakery,Rivard / Rachel,45.522278,-73.577591,25.170705


In [33]:
filtered_locations.shape

(55, 8)

After filtering and examining the data, it shows us that we 55 unique venues and 48 unique docking station locations. Some stations will have up to 3 nearby trending venues. Let's see if we can cluster the venues into a *K* number of clusters and determine the closest station to each cluster center. This will be the final result of our analysis. 

Now let's visualize the filtered data.

In [34]:
# create map of stations and nearby venues using latitude and longitude values
venues_map = folium.Map(location=[mtl_latitude, mtl_longitude], zoom_start=13)

station_name = filtered_locations['Docking Station']
station_lat = filtered_locations['Docking Station Latitude']
station_lng = filtered_locations['Docking Station Longitude']
venue_name = filtered_locations['Venue']
venue_lat = filtered_locations['Venue Latitude']
venue_lng = filtered_locations['Venue Longitude']

# add markers to map to visualize the Bixi docking stations
for lat, lng, label in zip(station_lat, station_lng, station_name):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7,
        parse_html=False).add_to(venues_map)

# add markers to map to visualize the trending venues in Montreal
for lat, lng, label in zip(venue_lat, venue_lng, venue_name):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.7,
        parse_html=False).add_to(venues_map)  

venues_map

First, I will use **k-means** clustering with HDBSCAN for this task in order to determine the cluster of venues by location. What’s great about HDBSCAN is that it takes into account the distance between geo-locations, as the earth isn't flat. It considers the Haversine function we used previously to determine the distance between venues and docking station locations. Also, I won’t need to specify a K number of clusters, it will return the best number of clusters according to a minimum number of items in a cluster, that I will specify as 3.

### Using *k-means* for clustering stations

What is important to note is that using HDBSCAN does not return cluster centers for each cluster. It returns only segmented clusters, which I will visualize using Folium map and then visually determine the most representative station location (based on location to each venue in the cluster, and the number of closest venues). This essentially will be the final result we are looking for: the location of docking stations that maximize the amount of trending venue nearby.

In [35]:
from sklearn.cluster import KMeans
!conda install -c conda-forge hdbscan --yes
!pip install hdbscan
import hdbscan

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - hdbscan


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    hdbscan-0.8.22             |   py36hd352d35_1         636 KB  conda-forge
    joblib-0.13.2              |             py_0         180 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         816 KB

The following NEW packages will be INSTALLED:

    hdbscan: 0.8.22-py36hd352d35_1 conda-forge
    joblib:  0.13.2-py_0           conda-forge


Downloading and Extracting Packages
hdbscan-0.8.22       | 636 KB    | ##################################### | 100% 
joblib-0.13.2        | 180 KB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Colle

In [39]:
# define number of clusters and X as dataset to be clustered

X = filtered_locations[['Venue Latitude','Venue Longitude']].values

# Using fit_predict to cluster the dataset
rads = np.radians(X)
kmeans = hdbscan.HDBSCAN(min_cluster_size=3, metric='haversine')
cluster_label = kmeans.fit_predict(rads)

In [40]:
clustered = pd.concat([filtered_locations.reset_index(), pd.DataFrame({'Cluster': cluster_label})], axis=1)
clustered.drop('index', axis=1, inplace=True)
clustered

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Docking Station,Docking Station Latitude,Docking Station Longitude,Distance (m),Cluster
0,Pizzeria Magpie,45.524711,-73.595744,Pizza Place,Maguire / St-Laurent,45.524628,-73.595811,10.583038,1
1,Espace Nomad,45.521402,-73.589277,Spa,Villeneuve / St-Laurent,45.521342,-73.589419,12.969461,-1
2,Kazu,45.493014,-73.580203,Japanese Restaurant,Ste-Catherine / St-Marc,45.492897,-73.580294,14.775945,-1
3,Satay Brothers,45.480061,-73.577394,Asian Restaurant,Marché Atwater,45.480208,-73.577599,22.883836,0
4,Les Co'Pains d'Abord,45.522078,-73.577743,Bakery,Rivard / Rachel,45.522278,-73.577591,25.170705,-1
5,Mandy's,45.498286,-73.577895,Salad Place,Crescent / de Maisonneuve,45.498112,-73.577615,29.179987,4
6,Café Nocturne,45.513566,-73.572849,Café,Clark / Prince-Arthur,45.513303,-73.572961,30.537609,3
7,Il Focolaio,45.504009,-73.568213,Pizza Place,Square Phillips,45.503738,-73.568106,31.29072,2
8,Club Sportif MAA,45.5007,-73.574817,Gym,de Maisonneuve / Peel,45.500951,-73.574578,33.489149,5
9,Darling,45.518995,-73.584061,Lounge,Vallières / St-Laurent,45.518967,-73.583616,34.825866,6


In [41]:
clustered['Cluster'].value_counts()

-1    16
 1     7
 5     6
 2     5
 6     5
 4     5
 0     5
 7     3
 3     3
Name: Cluster, dtype: int64

The result of the k-means algorithm returns a total of 8 clusters. Unsurprisingly, we end up with several points being flagged as noise (where the cluster label is -1). Since the minimum number of points for a cluster was set to 3, isolated locations are categorized as noise.

Our goal is to identify the docking station locations with the most nearby venues. Therefore, it makes sense to drop the isolated locations (venues where *Cluster = -1*). 

We'll continue to further explore the data to see how we can merger clusters/locations in a meaningful way.

Let's begin by dropping the isolated venues, where *Cluster=-1*.

In [42]:
clustered.drop(clustered[clustered['Cluster'] == -1 ].index, inplace=True)

In [43]:
clustered.sort_values(by=['Cluster'], inplace=True)
clustered.reset_index(drop=True, inplace=True)
clustered

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Docking Station,Docking Station Latitude,Docking Station Longitude,Distance (m),Cluster
0,Liverpool House,45.482893,-73.575256,Restaurant,Duvernay / Charlevoix,45.48204,-73.574863,99.697293,0
1,Satay Brothers,45.480061,-73.577394,Asian Restaurant,Marché Atwater,45.480208,-73.577599,22.883836,0
2,Le Vin Papillon,45.482763,-73.575514,Wine Bar,Duvernay / Charlevoix,45.48204,-73.574863,95.059025,0
3,Marché Atwater,45.47979,-73.576775,Market,Marché Atwater,45.480208,-73.577599,79.278863,0
4,SAQ Sélection,45.480446,-73.576695,Liquor Store,Marché Atwater,45.480208,-73.577599,75.253797,0
5,naada yoga,45.526864,-73.597479,Yoga Studio,St-Dominique / St-Viateur,45.526557,-73.598276,70.886785,1
6,Drawn & Quarterly,45.524748,-73.604614,Bookstore,Bernard / Jeanne-Mance,45.524286,-73.604973,58.393492,1
7,Salon de thé Cardinal / Cardinal Tea Room,45.524721,-73.596464,Tea Room,Maguire / St-Laurent,45.524628,-73.595811,51.894803,1
8,St-Viateur Bagel (La Maison du Bagel),45.522573,-73.601878,Bagel Shop,Jeanne-Mance / St-Viateur,45.523026,-73.60184,50.538247,1
9,Pizzeria Magpie,45.524711,-73.595744,Pizza Place,Maguire / St-Laurent,45.524628,-73.595811,10.583038,1


Let's explore each individual cluster.

#### Cluster 1

In [44]:
cluster1 = clustered.loc[clustered['Cluster'] == 0]
cluster1

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Docking Station,Docking Station Latitude,Docking Station Longitude,Distance (m),Cluster
0,Liverpool House,45.482893,-73.575256,Restaurant,Duvernay / Charlevoix,45.48204,-73.574863,99.697293,0
1,Satay Brothers,45.480061,-73.577394,Asian Restaurant,Marché Atwater,45.480208,-73.577599,22.883836,0
2,Le Vin Papillon,45.482763,-73.575514,Wine Bar,Duvernay / Charlevoix,45.48204,-73.574863,95.059025,0
3,Marché Atwater,45.47979,-73.576775,Market,Marché Atwater,45.480208,-73.577599,79.278863,0
4,SAQ Sélection,45.480446,-73.576695,Liquor Store,Marché Atwater,45.480208,-73.577599,75.253797,0


#### Cluster 2

In [46]:
cluster2 = clustered.loc[clustered['Cluster'] == 1]
cluster2

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Docking Station,Docking Station Latitude,Docking Station Longitude,Distance (m),Cluster
5,naada yoga,45.526864,-73.597479,Yoga Studio,St-Dominique / St-Viateur,45.526557,-73.598276,70.886785,1
6,Drawn & Quarterly,45.524748,-73.604614,Bookstore,Bernard / Jeanne-Mance,45.524286,-73.604973,58.393492,1
7,Salon de thé Cardinal / Cardinal Tea Room,45.524721,-73.596464,Tea Room,Maguire / St-Laurent,45.524628,-73.595811,51.894803,1
8,St-Viateur Bagel (La Maison du Bagel),45.522573,-73.601878,Bagel Shop,Jeanne-Mance / St-Viateur,45.523026,-73.60184,50.538247,1
9,Pizzeria Magpie,45.524711,-73.595744,Pizza Place,Maguire / St-Laurent,45.524628,-73.595811,10.583038,1
10,Boucherie Lawrence,45.524232,-73.595292,Deli / Bodega,Maguire / St-Laurent,45.524628,-73.595811,59.768675,1
11,Café Olimpico,45.524116,-73.600442,Café,Waverly / St-Viateur,45.523856,-73.600127,37.976787,1


#### Cluster 3

In [47]:
cluster3 = clustered.loc[clustered['Cluster'] == 2]
cluster3

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Docking Station,Docking Station Latitude,Docking Station Longitude,Distance (m),Cluster
12,Pikolo Espresso Bar,45.508606,-73.571772,Coffee Shop,Hutchison / Sherbrooke,45.50781,-73.57208,91.680589,2
13,Papeterie Nota Bene,45.508571,-73.571672,Paper / Office Supplies Store,Hutchison / Sherbrooke,45.50781,-73.57208,90.423304,2
14,Café Parvis,45.505817,-73.569302,Café,de Maisonneuve / City Councillors,45.50619,-73.569954,65.641906,2
15,Empire,45.503952,-73.57184,Sporting Goods Shop,du President-Kennedy / Robert-Bourassa,45.504627,-73.572325,84.019077,2
16,Il Focolaio,45.504009,-73.568213,Pizza Place,Square Phillips,45.503738,-73.568106,31.29072,2


#### Cluster 4

In [48]:
cluster4 = clustered.loc[clustered['Cluster'] == 3]
cluster4

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Docking Station,Docking Station Latitude,Docking Station Longitude,Distance (m),Cluster
17,Cat's Corner,45.512893,-73.570403,Dance Studio,Milton / Clark,45.512541,-73.570677,44.61893,3
18,Bouillon Bilk,45.510845,-73.566017,Restaurant,Métro St-Laurent (de Maisonneuve / St-Laurent),45.51066,-73.56497,84.11684,3
19,Café Nocturne,45.513566,-73.572849,Café,Clark / Prince-Arthur,45.513303,-73.572961,30.537609,3


#### Cluster 5

In [49]:
cluster5 = clustered.loc[clustered['Cluster'] == 4]
cluster5

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Docking Station,Docking Station Latitude,Docking Station Longitude,Distance (m),Cluster
20,Tiffany & Co.,45.499782,-73.578403,Jewelry Store,de la Montagne / Sherbrooke,45.499745,-73.579034,49.303327,4
21,Mandy's,45.498286,-73.577895,Salad Place,Crescent / de Maisonneuve,45.498112,-73.577615,29.179987,4
22,Sofitel Montréal Le Carré Doré,45.501499,-73.577526,Hotel,Stanley / Sherbrooke,45.501041,-73.577178,57.728634,4
23,The Ritz-Carlton Montréal,45.500191,-73.578127,Hotel,de la Montagne / Sherbrooke,45.499745,-73.579034,86.358319,4
24,Maison Boulud,45.500062,-73.578324,French Restaurant,de la Montagne / Sherbrooke,45.499745,-73.579034,65.604786,4


#### Cluster 6

In [50]:
cluster6 = clustered.loc[clustered['Cluster'] == 5]
cluster6

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Docking Station,Docking Station Latitude,Docking Station Longitude,Distance (m),Cluster
25,Louis Vuitton,45.49782,-73.575306,Boutique,Crescent / Ste-Catherine,45.497092,-73.575549,83.154334,5
26,Dominion Square Tavern,45.500405,-73.571636,Gastropub,Metcalfe / du Square-Dorchester,45.500208,-73.571138,44.56441,5
27,Club Sportif MAA,45.5007,-73.574817,Gym,de Maisonneuve / Peel,45.500951,-73.574578,33.489149,5
28,Ferreira Café,45.500434,-73.57406,Portuguese Restaurant,de Maisonneuve / Peel,45.500951,-73.574578,70.226903,5
29,Enso Yoga,45.500743,-73.574943,Yoga Studio,de Maisonneuve / Peel,45.500951,-73.574578,36.641021,5
30,Frank & Oak,45.499573,-73.574328,Men's Store,Stanley / Ste-Catherine,45.499344,-73.57376,51.069069,5


#### Cluster 7

In [51]:
cluster7 = clustered.loc[clustered['Cluster'] == 6]
cluster7

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Docking Station,Docking Station Latitude,Docking Station Longitude,Distance (m),Cluster
31,Omnivore Comptoir Grill,45.519061,-73.584116,Middle Eastern Restaurant,Vallières / St-Laurent,45.518967,-73.583616,40.295889,6
32,Darling,45.518995,-73.584061,Lounge,Vallières / St-Laurent,45.518967,-73.583616,34.825866,6
33,Noren　のれん,45.517026,-73.582608,Japanese Restaurant,Clark / Rachel,45.517354,-73.582129,52.149124,6
34,Le Majestique,45.517445,-73.580368,Restaurant,Duluth / St-Laurent,45.516876,-73.57946,94.824154,6
35,Café Santropol,45.51561,-73.580611,Café,Duluth / de l'Esplanade,45.515092,-73.581142,70.912305,6


#### Cluster 8

In [52]:
cluster8 = clustered.loc[clustered['Cluster'] == 7]
cluster8

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Docking Station,Docking Station Latitude,Docking Station Longitude,Distance (m),Cluster
36,La Vieille Europe,45.515891,-73.576982,Gourmet Shop,Roy / St-Laurent,45.515616,-73.575808,96.447711,7
37,Moksha Yoga Montreal,45.516057,-73.577188,Yoga Studio,Napoléon / St-Dominique,45.516745,-73.577658,84.827436,7
38,Jano,45.516066,-73.577361,Portuguese Restaurant,Napoléon / St-Dominique,45.516745,-73.577658,78.973763,7


In [53]:
# create map of stations and nearby venues using latitude and longitude values
cluster_map = folium.Map(location=[mtl_latitude, mtl_longitude], zoom_start=13)

s_name = clustered['Docking Station']
s_lat = clustered['Docking Station Latitude']
s_lng = clustered['Docking Station Longitude']

# add markers to map to visualize the Bixi docking stations
for lat, lng, label in zip(s_lat, s_lng, s_name):
    label = folium.Popup(label, parse_html=True)
    folium.Marker([lat, lng],popup=label).add_to(cluster_map)

    
# add markers to map to visualize the trending venues in Montreal, by cluster

# cluster 1, in red
clus1_ven = clustered.loc[0:4,'Venue']
clus1_lat = clustered.loc[0:4,'Venue Latitude']
clus1_lng = clustered.loc[0:4,'Venue Longitude']
for lat, lng, label in zip(clus1_lat, clus1_lng, clus1_ven):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker([lat,lng],radius=5,popup=label,color='red',fill=True,fill_opacity=0.7,parse_html=False).add_to(cluster_map)  

# cluster 2, in yellow
clus2_ven = clustered.loc[5:11,'Venue']
clus2_lat = clustered.loc[5:11,'Venue Latitude']
clus2_lng = clustered.loc[5:11,'Venue Longitude']
for lat, lng, label in zip(clus2_lat, clus2_lng, clus2_ven):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker([lat,lng],radius=5,popup=label,color='yellow',fill=True,fill_opacity=0.7,parse_html=False).add_to(cluster_map) 

# cluster 3, in grey
clus3_ven = clustered.loc[12:16,'Venue']
clus3_lat = clustered.loc[12:16,'Venue Latitude']
clus3_lng = clustered.loc[12:16,'Venue Longitude']
for lat, lng, label in zip(clus3_lat, clus3_lng, clus3_ven):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker([lat,lng],radius=5,popup=label,color='grey',fill=True,fill_opacity=0.7,parse_html=False).add_to(cluster_map) 

# cluster 4, in green
clus4_ven = clustered.loc[17:19,'Venue']
clus4_lat = clustered.loc[17:19,'Venue Latitude']
clus4_lng = clustered.loc[17:19,'Venue Longitude']
for lat, lng, label in zip(clus4_lat, clus4_lng, clus4_ven):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker([lat,lng],radius=5,popup=label,color='green',fill=True,fill_opacity=0.7,parse_html=False).add_to(cluster_map) 
        
# cluster 5, in purple
clus5_ven = clustered.loc[20:24,'Venue']
clus5_lat = clustered.loc[20:24,'Venue Latitude']
clus5_lng = clustered.loc[20:24,'Venue Longitude']
for lat, lng, label in zip(clus5_lat, clus5_lng, clus5_ven):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker([lat,lng],radius=5,popup=label,color='purple',fill=True,fill_opacity=0.7,parse_html=False).add_to(cluster_map) 

# cluster 6, in blue
clus6_ven = clustered.loc[25:30,'Venue']
clus6_lat = clustered.loc[25:30,'Venue Latitude']
clus6_lng = clustered.loc[25:30,'Venue Longitude']
for lat, lng, label in zip(clus6_lat, clus6_lng, clus6_ven):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker([lat,lng],radius=5,popup=label,color='blue',fill=True,fill_opacity=0.7,parse_html=False).add_to(cluster_map) 
        
# cluster 7, in pink
clus7_ven = clustered.loc[31:35,'Venue']
clus7_lat = clustered.loc[31:35,'Venue Latitude']
clus7_lng = clustered.loc[31:35,'Venue Longitude']
for lat, lng, label in zip(clus7_lat, clus7_lng, clus7_ven):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker([lat,lng],radius=5,popup=label,color='pink',fill=True,fill_opacity=0.7,parse_html=False).add_to(cluster_map) 
        
# cluster 8, in orange
clus8_ven = clustered.loc[36:38,'Venue']
clus8_lat = clustered.loc[36:38,'Venue Latitude']
clus8_lng = clustered.loc[36:38,'Venue Longitude']
for lat, lng, label in zip(clus8_lat, clus8_lng, clus8_ven):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker([lat,lng],radius=5,popup=label,color='orange',fill=True,fill_opacity=0.7,parse_html=False).add_to(cluster_map) 

cluster_map

## IV. Results and Discussion

Visualizing the clusters on the map by color makes it easy to select the optimal station for the cluster. We can immediately notice that there are several docking stations in one cluster, often right next to one venue. However, we are looking for only one docking station per cluster as it is more convenient for a tourist to be parking the bike in one docking station, and walking to the nearby venues.

We will select the station that is most central to each venue in the cluster, or the station with the most venues nearby.

According to the map, the following stations seems to be the most representative for each cluster:

- *Cluster 1*: **Marche Atwater**, since it has 3 out the 5 venues nearby.
- *Cluster 2*: **Waverly/St-Viateur**, since it seems to be the most central in that cluster.
- *Cluster 3*: **Square Philips**, since it seems to be the most central in that cluster.
- *Cluster 4*: **Milton/Clark**, since it seems to be the most central in that cluster.
- *Cluster 5*: **de la Montagne/Sherbrooke**, since it seems to be the most central in that cluster and has the most venues nearby.
- *Cluster 6*: **Stanley/St-Catherine**, since it seems to be the most central in that cluster.
- *Cluster 7*: **Clark/Rachel**, since it seems to be the most central in that cluster.
- *Cluster 8*: **Napoléon / St-Dominique**, it seems to be the nearest to all venues in that cluster.

Now let's clean up our dataframe to reflect these results and plot them on the map.

In [54]:
cluster1['Docking Station']='Marché Atwater'
cluster1['Docking Station Latitude']=45.480208
cluster1['Docking Station Longitude']=-73.577599
i=0
for i in range(i, len(cluster1['Cluster'])): # iterates all rows in the cluster and replace distance with new value
    lat1 = cluster1.iat[i,1]
    lng1 = cluster1.iat[i,2]
    lat2 = cluster1.iat[i,5]
    lng2 = cluster1.iat[i,6]
    d = haversine(lng1, lat1, lng2, lat2)
    cluster1.set_value(i,'Distance (m)',d)
    i=i+1

cluster2['Docking Station']='Waverly / St-Viateur'
cluster2['Docking Station Latitude']=45.523856
cluster2['Docking Station Longitude']=-73.600127
i=5
for i in range(i, len(cluster2['Cluster'])): # iterates all rows in the cluster and replace distance with new value
    lat1 = cluster2.iat[i,1]
    lng1 = cluster2.iat[i,2]
    lat2 = cluster2.iat[i,5]
    lng2 = cluster2.iat[i,6]
    d = haversine(lng1, lat1, lng2, lat2)
    cluster2.set_value(i,'Distance (m)',d)
    i=i+1

cluster3['Docking Station']='Square Phillips'
cluster3['Docking Station Latitude']=45.503738
cluster3['Docking Station Longitude']=-73.568106
i=12
for i in range(i, len(cluster3['Cluster'])): # iterates all rows in the cluster and replace distance with new value
    lat1 = cluster3.iat[i,1]
    lng1 = cluster3.iat[i,2]
    lat2 = cluster3.iat[i,5]
    lng2 = cluster3.iat[i,6]
    d = haversine(lng1, lat1, lng2, lat2)
    cluster3.set_value(i,'Distance (m)',d)
    i=i+1

cluster4['Docking Station']='Milton / Clark'
cluster4['Docking Station Latitude']=45.512541
cluster4['Docking Station Longitude']=-73.570677
i=17
for i in range(i, len(cluster4['Cluster'])): # iterates all rows in the cluster and replace distance with new value
    lat1 = cluster4.iat[i,1]
    lng1 = cluster4.iat[i,2]
    lat2 = cluster4.iat[i,5]
    lng2 = cluster4.iat[i,6]
    d = haversine(lng1, lat1, lng2, lat2)
    cluster4.set_value(i,'Distance (m)',d)
    i=i+1

cluster5['Docking Station']='de la Montagne / Sherbrooke'
cluster5['Docking Station Latitude']=45.499745
cluster5['Docking Station Longitude']=-73.579034
i=20
for i in range(i, len(cluster5['Cluster'])): # iterates all rows in the cluster and replace distance with new value
    lat1 = cluster5.iat[i,1]
    lng1 = cluster5.iat[i,2]
    lat2 = cluster5.iat[i,5]
    lng2 = cluster5.iat[i,6]
    d = haversine(lng1, lat1, lng2, lat2)
    cluster5.set_value(i,'Distance (m)',d)
    i=i+1
    
cluster6['Docking Station']='Stanley / Ste-Catherine'
cluster6['Docking Station Latitude']=45.499344
cluster6['Docking Station Longitude']=-73.573760
i=25
for i in range(i, len(cluster6['Cluster'])): # iterates all rows in the cluster and replace distance with new value
    lat1 = cluster6.iat[i,1]
    lng1 = cluster6.iat[i,2]
    lat2 = cluster6.iat[i,5]
    lng2 = cluster6.iat[i,6]
    d = haversine(lng1, lat1, lng2, lat2)
    cluster6.set_value(i,'Distance (m)',d)
    i=i+1

cluster7['Docking Station']='Clark / Rachel'
cluster7['Docking Station Latitude']=45.517354
cluster7['Docking Station Longitude']=-73.582129
i=31
for i in range(i, len(cluster7['Cluster'])): # iterates all rows in the cluster and replace distance with new value
    lat1 = cluster7.iat[i,1]
    lng1 = cluster7.iat[i,2]
    lat2 = cluster7.iat[i,5]
    lng2 = cluster7.iat[i,6]
    d = haversine(lng1, lat1, lng2, lat2)
    cluster7.set_value(i,'Distance (m)',d)
    i=i+1

cluster8['Docking Station']='Napoléon / St-Dominique'
cluster8['Docking Station Latitude']=45.516745
cluster8['Docking Station Longitude']=-73.577658
i=36
for i in range(i, len(cluster8['Cluster'])): # iterates all rows in the cluster and replace distance with new value
    lat1 = cluster8.iat[i,1]
    lng1 = cluster8.iat[i,2]
    lat2 = cluster8.iat[i,5]
    lng2 = cluster8.iat[i,6]
    d = haversine(lng1, lat1, lng2, lat2)
    cluster8.set_value(i,'Distance (m)',d)
    i=i+1
    
final = pd.concat([cluster1,cluster2,cluster3,cluster4,cluster5,cluster6,cluster7,cluster8], axis=0)
final

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: ht

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Docking Station,Docking Station Latitude,Docking Station Longitude,Distance (m),Cluster
0,Liverpool House,45.482893,-73.575256,Restaurant,Marché Atwater,45.480208,-73.577599,350.005588,0
1,Satay Brothers,45.480061,-73.577394,Asian Restaurant,Marché Atwater,45.480208,-73.577599,22.8858,0
2,Le Vin Papillon,45.482763,-73.575514,Wine Bar,Marché Atwater,45.480208,-73.577599,327.26232,0
3,Marché Atwater,45.47979,-73.576775,Market,Marché Atwater,45.480208,-73.577599,79.287258,0
4,SAQ Sélection,45.480446,-73.576695,Liquor Store,Marché Atwater,45.480208,-73.577599,75.289608,0
5,naada yoga,45.526864,-73.597479,Yoga Studio,Waverly / St-Viateur,45.523856,-73.600127,378.948298,1
6,Drawn & Quarterly,45.524748,-73.604614,Bookstore,Waverly / St-Viateur,45.523856,-73.600127,37.964041,1
7,Salon de thé Cardinal / Cardinal Tea Room,45.524721,-73.596464,Tea Room,Waverly / St-Viateur,45.523856,-73.600127,51.894803,1
8,St-Viateur Bagel (La Maison du Bagel),45.522573,-73.601878,Bagel Shop,Waverly / St-Viateur,45.523856,-73.600127,50.538247,1
9,Pizzeria Magpie,45.524711,-73.595744,Pizza Place,Waverly / St-Viateur,45.523856,-73.600127,10.583038,1


In [55]:
# create map of stations and nearby venues using latitude and longitude values
final_map = folium.Map(location=[mtl_latitude, mtl_longitude], zoom_start=13)

s_name = final['Docking Station']
s_lat = final['Docking Station Latitude']
s_lng = final['Docking Station Longitude']

# add markers to map to visualize the Bixi docking stations
for lat, lng, label in zip(s_lat, s_lng, s_name):
    label = folium.Popup(label, parse_html=True)
    folium.Marker([lat, lng],popup=label).add_to(final_map)

    
# add markers to map to visualize the trending venues in Montreal, by cluster

# cluster 1, in red
clus1_ven = final.loc[0:4,'Venue']
clus1_lat = final.loc[0:4,'Venue Latitude']
clus1_lng = final.loc[0:4,'Venue Longitude']
for lat, lng, label in zip(clus1_lat, clus1_lng, clus1_ven):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker([lat,lng],radius=5,popup=label,color='red',fill=True,fill_opacity=0.7,parse_html=False).add_to(final_map)  

# cluster 2, in yellow
clus2_ven = final.loc[5:11,'Venue']
clus2_lat = final.loc[5:11,'Venue Latitude']
clus2_lng = final.loc[5:11,'Venue Longitude']
for lat, lng, label in zip(clus2_lat, clus2_lng, clus2_ven):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker([lat,lng],radius=5,popup=label,color='yellow',fill=True,fill_opacity=0.7,parse_html=False).add_to(final_map) 

# cluster 3, in grey
clus3_ven = final.loc[12:16,'Venue']
clus3_lat = final.loc[12:16,'Venue Latitude']
clus3_lng = final.loc[12:16,'Venue Longitude']
for lat, lng, label in zip(clus3_lat, clus3_lng, clus3_ven):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker([lat,lng],radius=5,popup=label,color='grey',fill=True,fill_opacity=0.7,parse_html=False).add_to(final_map) 

# cluster 4, in green
clus4_ven = final.loc[17:19,'Venue']
clus4_lat = final.loc[17:19,'Venue Latitude']
clus4_lng = final.loc[17:19,'Venue Longitude']
for lat, lng, label in zip(clus4_lat, clus4_lng, clus4_ven):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker([lat,lng],radius=5,popup=label,color='green',fill=True,fill_opacity=0.7,parse_html=False).add_to(final_map) 
        
# cluster 5, in purple
clus5_ven = final.loc[20:24,'Venue']
clus5_lat = final.loc[20:24,'Venue Latitude']
clus5_lng = final.loc[20:24,'Venue Longitude']
for lat, lng, label in zip(clus5_lat, clus5_lng, clus5_ven):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker([lat,lng],radius=5,popup=label,color='purple',fill=True,fill_opacity=0.7,parse_html=False).add_to(final_map) 

# cluster 6, in blue
clus6_ven = final.loc[25:30,'Venue']
clus6_lat = final.loc[25:30,'Venue Latitude']
clus6_lng = final.loc[25:30,'Venue Longitude']
for lat, lng, label in zip(clus6_lat, clus6_lng, clus6_ven):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker([lat,lng],radius=5,popup=label,color='blue',fill=True,fill_opacity=0.7,parse_html=False).add_to(final_map) 
        
# cluster 7, in pink
clus7_ven = final.loc[31:35,'Venue']
clus7_lat = final.loc[31:35,'Venue Latitude']
clus7_lng = final.loc[31:35,'Venue Longitude']
for lat, lng, label in zip(clus7_lat, clus7_lng, clus7_ven):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker([lat,lng],radius=5,popup=label,color='pink',fill=True,fill_opacity=0.7,parse_html=False).add_to(final_map) 
        
# cluster 8, in orange
clus8_ven = final.loc[36:38,'Venue']
clus8_lat = final.loc[36:38,'Venue Latitude']
clus8_lng = final.loc[36:38,'Venue Longitude']
for lat, lng, label in zip(clus8_lat, clus8_lng, clus8_ven):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker([lat,lng],radius=5,popup=label,color='orange',fill=True,fill_opacity=0.7,parse_html=False).add_to(final_map) 

final_map

Let's simplify our final dataframe to list the most relevant information.

In [56]:
final1 = final[['Docking Station','Venue','Venue Category','Distance (m)']]
final1.groupby(['Docking Station','Venue','Venue Category']).mean()
final1

Unnamed: 0,Docking Station,Venue,Venue Category,Distance (m)
0,Marché Atwater,Liverpool House,Restaurant,350.005588
1,Marché Atwater,Satay Brothers,Asian Restaurant,22.8858
2,Marché Atwater,Le Vin Papillon,Wine Bar,327.26232
3,Marché Atwater,Marché Atwater,Market,79.287258
4,Marché Atwater,SAQ Sélection,Liquor Store,75.289608
5,Waverly / St-Viateur,naada yoga,Yoga Studio,378.948298
6,Waverly / St-Viateur,Drawn & Quarterly,Bookstore,37.964041
7,Waverly / St-Viateur,Salon de thé Cardinal / Cardinal Tea Room,Tea Room,51.894803
8,Waverly / St-Viateur,St-Viateur Bagel (La Maison du Bagel),Bagel Shop,50.538247
9,Waverly / St-Viateur,Pizzeria Magpie,Pizza Place,10.583038


In [57]:
df_locations.to_excel("capstone.xlsx", sheet_name='df_locations')
filtered_locations.to_excel("capstone.xlsx", sheet_name='filtered_locations')
clustered.to_excel("capstone.xlsx", sheet_name='clustered')
final.to_excel("capstone.xlsx", sheet_name='final')
final1.to_excel("capstone.xlsx", sheet_name='final1')

## VI. Conclusion

The objective of this report was to analyse the trending venues in Montreal, and locate the nearest Bixi docking station, which would maximize a tourists time for visiting Montreal. Whilst exploring the data, it was clear that there are many Bixi docking stations around the city (615), a lot more than Foursquare had returned trending venues (100).

Of course, we could have fetched the trending venues around each docking station, but Foursquare would have returned too many values, which would have been filtered anyway in order to keep the most relevant. Choosing to stick with the top 100 trending venues in the city gave a more accurate representation of what actually is trending in the city, and not just because it is located near a docking station.

Using Folium to visualize the results at every step, meaning every time the data was filtered to keep the most relevant, gave insightful information about the venues and station locations. Since the objective of this project was to determine the docking stations with the most venues nearby, it made sense to use k-means algorithm with HBSCAN to cluster the trending venues by location. By setting the cluster minimum to 3 allowed this method determined the clusters of venues containing at least 3 venues next to one another. From that point on it would be easy to determine one docking station by cluster, simply by visualizing the venues by cluster, as well as the nearest stations within each cluster.

The final result yields a total of 8 clusters of trending venues, from a total of 39 unique venues, and a docking station within each cluster.