# Overview
This project serves as the final capstone project for the IBM Professional Data Science course. As the final part of the course, I have been tasked with formulating a (business) problem that can be solved using machine learning techniques and Foursquare location data. The problem I have chosen to solve is vacation planning when visiting the city of Memphis. By using the Foursquare "explore" feature to get popular venues within a specified radius of a pre-determined location, combined with using the kMeans clustering algorithm, this project will determine the best spots to visit in Memphis for anyone staying in Memphis for 3, 5, or 7 days.

## Problem
As a current resident of Memphis, I have personally experienced the struggle in trying to plan activities when family or friends come to visit. Memphis is such a fun, exciting city that offers many things to do for both residents and tourists. While living here has allowed me to experience many different places around town at my own leisure, deciding what to do when guests visit for a limited amount of time is difficult. 

This is where kMeans will come into play. Using this clustering algorithm, I will be able to cluster various venues around town into 3, 5, and 7 different clusters. Each cluster will represent one day, giving a nice itinerary of places to visit for different periods of stay. While this problem and solution is more personal, this problem could be approached from the view of a travel agency or AirBnB host looking to increase their business. I have chosen 3, 5, and 7 as the number of days/clusters for the reason that the guests I have visit usually don't stay for any more than a week. However, one could easily modify the number of clusters to represent any desired duration of stay. Although many websites and apps already offer customized itineraries for many different cities, this project primarily serves to practice with and explore the capabilities of clustering algorithms used in conjunction with location data. 

## Data
The data to used in this project will come from the Foursquare location data available through their API. The data will consist of finding top venues around the Peabody Hotel--a popular hotel that many visitors stay at when visiting downtown Memphis. After collecting these venues and storing them in a dataframe, I will use the Python Geocoder install to get the latitude and longitude for each location and add these values to the dataframe. This will then allow me to get the venue category for each spot and begin clustering via kMeans.

## Methodology

### Imports

In [1]:
# library to handle data in a vectorized manner
import numpy as np

# library for data analsysis
import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# install geopy
!conda install -c conda-forge geopy --yes
# convert an address into latitude and longitude values
from geopy.geocoders import Nominatim 

 # library to handle url requests made to Foursquare to retrieve data
import requests

# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# install Folium, map rendering library
import folium 

print('Libraries imported.')

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         238 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.50-py_0        conda-forge
    geopy:         1.20.0-py_0      conda-forge

The following packages will be UPDATED:

    certifi:       2019.6.

### Define Foursquare user credientials and version

In [2]:
CLIENT_ID = 'YTP01X4D1JAYDFMR1QEBL04OY1TWI0LJRBXACDZKVARDICCU'
CLIENT_SECRET = 'OLPDOI3EVXFTSLQB2XUDT4W0ONHZMEMPMG0R0YKAMYCKGWIL'
VERSION = '20180604'

### Peabody Hotel latitude and longitude
In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent foursquare_agent, as shown below

In [3]:
address = '118 S 2nd St, Memphis, TN 38103'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print("The Peabody's latitude and longitude are (",latitude, ",", longitude, ").")

The Peabody's latitude and longitude are ( 35.141794 , -90.052684 ).


### Use Foursquare to get nearby venues
Because I am first going to start by creating only 3 clusters for a 3 day visit, I will create and set a variable LIMIT to be equal to 20. This means Foursquare will only grab to the top 20 venues surrounding the Peabody Hotel.

I will also define and set a variable RADIUS to be equal to 16093.4 (meters), approximately 10 miles. These seems like a reasonable radius to include destinations for a person staying downtown only visiting for 3 days. 

As I continue through to 5 and 7 clusters (days visiting), I will expand the radius to include further destinations.

In [4]:
LIMIT = 20
RADIUS = 16093.4

#Get the url to request from Foursquare
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, RADIUS, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=YTP01X4D1JAYDFMR1QEBL04OY1TWI0LJRBXACDZKVARDICCU&client_secret=OLPDOI3EVXFTSLQB2XUDT4W0ONHZMEMPMG0R0YKAMYCKGWIL&ll=35.141794,-90.052684&v=20180604&radius=16093.4&limit=20'

### Send get request and examine json results 

In [5]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5da39a2df96b2c002c3bea35'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'},
    {'name': '$-$$$$', 'key': 'price'}]},
  'headerLocation': 'Memphis',
  'headerFullLocation': 'Memphis',
  'headerLocationGranularity': 'city',
  'totalResults': 244,
  'suggestedBounds': {'ne': {'lat': 35.286631144837145,
    'lng': -89.87589373231495},
   'sw': {'lat': 34.99695685516285, 'lng': -90.22947426768505}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5612b256498e2a8fc88814fd',
       'name': "Maciel's Tortas & Tacos",
       'location': {'address': '45 S Main St',
        'lat': 35.14400029412407,
        'lng': -90.05303825325505,
        'labeledLatLngs': [{'label': 'displa

### Convert relevant parts of json to pandas dataframe

All the relevant information returned in the json is in the *items* key. I will use the **get_category_type** function from the Foursquare lab to retrieve this information. Then clean and structure the json into a Pandas dataframe. 

In [6]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

### Now filter the resulting dataframe to keep only relevant information needed 

In [7]:
venues = results['response']['groups'][0]['items']
    
df1 = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
df1 = df1.loc[:, filtered_columns]

# filter the category for each row
df1['venue.categories'] = df1.apply(get_category_type, axis=1)

# clean columns
df1.columns = [col.split(".")[-1] for col in df1.columns]

df1.head()

Unnamed: 0,name,categories,lat,lng
0,Maciel's Tortas & Tacos,Mexican Restaurant,35.144,-90.053038
1,Orpheum Theater,Theater,35.140301,-90.055429
2,AutoZone Park,Baseball Stadium,35.142748,-90.049884
3,The Majestic Grille,American Restaurant,35.141173,-90.054489
4,The Peabody Hotel,Hotel,35.142475,-90.05201


### Check how many venues resulted from our search

In [8]:
print('{} venues were found near the Peabody Hotel'.format(df1.shape[0]))

20 venues were found near the Peabody Hotel


## Visualize results

The resulting dataframe shows the venues returned from the call to Foursquare, including the venue's name, category, latitude, and longitude from the Peadbody hotel. I can now visualize these locations on a map to see where they are in relation to the Peabody. 

In [9]:
# generate map centred around Peabody
venues_map = folium.Map(location=[latitude, longitude], zoom_start=15) 


# add Peabody as a red circle mark
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    popup='Peabody',
    fill=True,
    color='red',
    fill_color='red',
    fill_opacity=0.6
    ).add_to(venues_map)


# add popular spots to the map as blue circle markers
for lat, lng, label in zip(df1.lat, df1.lng, df1.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        fill=True,
        color='blue',
        fill_color='blue',
        fill_opacity=0.6
        ).add_to(venues_map)

# display map
venues_map

## Analyze our results

In [10]:
df1.categories.unique()

array(['Mexican Restaurant', 'Theater', 'Baseball Stadium',
       'American Restaurant', 'Hotel', 'Basketball Stadium', 'Smoke Shop',
       'Fried Chicken Joint', 'Bar', 'Italian Restaurant',
       'Tapas Restaurant', 'Museum', 'Seafood Restaurant', 'Music Venue',
       'Pizza Place', 'BBQ Joint', 'Café', 'History Museum', 'Park'],
      dtype=object)

## Clustering venues for 3 day visitor

With a dataframe for each venue with relevant information such as geographical location and venue category, the kMeans algorithm can be run on the data. I will set the number of cluster to be 3, 1 for each day of the visit, that will contain venues to be visited based on the output of the algorithm. 

In [14]:
# set number of clusters
kclusters = 3
# run k-means clustering
kmeans1 = KMeans(n_clusters=kclusters, random_state=0)
# Using fit_predict to cluster the dataset
X = df1[['lat','lng']].values
predictions = kmeans1.fit_predict(X)
# check cluster labels generated for each row in the dataframe
kmeans1.labels_[0:10] 

array([0, 1, 0, 1, 0, 1, 1, 2, 1, 1], dtype=int32)

Add results to original dataframe

In [17]:
clustered = pd.concat([df1.reset_index(), 
                       pd.DataFrame({'Cluster':predictions})], 
                      axis=1)
clustered.drop('index', axis=1, inplace=True)
conditions = [
    clustered['Cluster'] == 0, 
    clustered['Cluster'] == 1,
    clustered['Cluster'] == 2
    ]
choices = ['Day 1', 'Day 2', 'Day3']
clustered['Vacation Day'] = np.select(conditions, choices, default='black')

Show resulting dataframe sorted by vacation day

In [18]:
clustered.sort_values(by=['Cluster'])

Unnamed: 0,name,categories,lat,lng,Cluster,Vacation Day
0,Maciel's Tortas & Tacos,Mexican Restaurant,35.144,-90.053038,0,Day 1
15,Aldo's Pizza Pies,Pizza Place,35.142312,-90.053881,0,Day 1
13,Bardog Tavern,Bar,35.144412,-90.053921,0,Day 1
12,Flying Fish,Seafood Restaurant,35.142064,-90.052735,0,Day 1
10,Flight Restaurant and Wine Bar,Tapas Restaurant,35.14426,-90.053297,0,Day 1
19,Court Square,Park,35.14604,-90.051906,0,Day 1
4,The Peabody Hotel,Hotel,35.142475,-90.05201,0,Day 1
2,AutoZone Park,Baseball Stadium,35.142748,-90.049884,0,Day 1
6,Havana Mix,Smoke Shop,35.140312,-90.051078,1,Day 2
8,Blues City Cafe,Bar,35.140169,-90.053468,1,Day 2


In [19]:
clustered.Cluster.value_counts()

1    8
0    8
2    4
Name: Cluster, dtype: int64

Now, let's look at each cluster to see what venues we should see.

In [20]:
cluster0 = clustered.loc[clustered['Cluster'] == 0]
cluster0

Unnamed: 0,name,categories,lat,lng,Cluster,Vacation Day
0,Maciel's Tortas & Tacos,Mexican Restaurant,35.144,-90.053038,0,Day 1
2,AutoZone Park,Baseball Stadium,35.142748,-90.049884,0,Day 1
4,The Peabody Hotel,Hotel,35.142475,-90.05201,0,Day 1
10,Flight Restaurant and Wine Bar,Tapas Restaurant,35.14426,-90.053297,0,Day 1
12,Flying Fish,Seafood Restaurant,35.142064,-90.052735,0,Day 1
13,Bardog Tavern,Bar,35.144412,-90.053921,0,Day 1
15,Aldo's Pizza Pies,Pizza Place,35.142312,-90.053881,0,Day 1
19,Court Square,Park,35.14604,-90.051906,0,Day 1


In [21]:
cluster1 = clustered.loc[clustered['Cluster'] == 1]
cluster1

Unnamed: 0,name,categories,lat,lng,Cluster,Vacation Day
1,Orpheum Theater,Theater,35.140301,-90.055429,1,Day 2
3,The Majestic Grille,American Restaurant,35.141173,-90.054489,1,Day 2
5,FedExForum,Basketball Stadium,35.138489,-90.05114,1,Day 2
6,Havana Mix,Smoke Shop,35.140312,-90.051078,1,Day 2
8,Blues City Cafe,Bar,35.140169,-90.053468,1,Day 2
9,Catherine and Mary's,Italian Restaurant,35.138775,-90.055667,1,Day 2
14,Mr. Handy's Blues Hall,Music Venue,35.13965,-90.052286,1,Day 2
17,Tamp & Tap,Café,35.141525,-90.053172,1,Day 2


In [22]:
cluster2 = clustered.loc[clustered['Cluster'] == 2]
cluster2

Unnamed: 0,name,categories,lat,lng,Cluster,Vacation Day
7,Gus’s World Famous Hot & Spicy Fried Chicken,Fried Chicken Joint,35.1383,-90.057976,2,Day3
11,National Civil Rights Museum,Museum,35.134546,-90.057553,2,Day3
16,Central BBQ,BBQ Joint,35.133834,-90.057185,2,Day3
18,Lorraine Motel,History Museum,35.13413,-90.057932,2,Day3


## 3-day Clustering Results
From these clusters, we can see that the first day includes 8 spots to visit, 8 spots the second day, and 4 spots again for the final day. 

I will now run re-define some variables and run kMeans again with 5 clusters to represent a 5 day visitation. I will also expand my RADIUS to 15 miles instead of just 10, as well as expanding the number of results(LIMIT) to be returned to 35 venues. 

In [23]:
LIMIT = 35
RADIUS = 24140.2

#Get the url to request from Foursquare
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, RADIUS, LIMIT)
#Send request to Foursquare
results = requests.get(url).json()

venues = results['response']['groups'][0]['items']
    
df2 = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
df2 = df2.loc[:, filtered_columns]

# filter the category for each row
df2['venue.categories'] = df2.apply(get_category_type, axis=1)

# clean columns
df2.columns = [col.split(".")[-1] for col in df2.columns]
df2.head()

Unnamed: 0,name,categories,lat,lng
0,Maciel's Tortas & Tacos,Mexican Restaurant,35.144,-90.053038
1,Orpheum Theater,Theater,35.140301,-90.055429
2,AutoZone Park,Baseball Stadium,35.142748,-90.049884
3,Gus’s World Famous Hot & Spicy Fried Chicken,Fried Chicken Joint,35.1383,-90.057976
4,FedExForum,Basketball Stadium,35.138489,-90.05114


In [24]:
# generate map centred around Peabody
venues_map = folium.Map(location=[latitude, longitude], zoom_start=15) 


# add Peabody as a red circle mark
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    popup='Peabody',
    fill=True,
    color='red',
    fill_color='red',
    fill_opacity=0.6
    ).add_to(venues_map)


# add popular spots to the map as blue circle markers
for lat, lng, label in zip(df2.lat, df2.lng, df2.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        fill=True,
        color='blue',
        fill_color='blue',
        fill_opacity=0.6
        ).add_to(venues_map)

# display map
venues_map

In [25]:
# set number of clusters
kclusters = 5
# run k-means clustering
kmeans2 = KMeans(n_clusters=kclusters, random_state=0)
# Using fit_predict to cluster the dataset
X = df2[['lat','lng']].values
predictions = kmeans2.fit_predict(X)
# check cluster labels generated for each row in the dataframe
kmeans2.labels_[0:10] 

array([0, 3, 0, 4, 3, 3, 3, 3, 4, 3], dtype=int32)

In [26]:
clustered = pd.concat([df2.reset_index(), 
                       pd.DataFrame({'Cluster':predictions})], 
                      axis=1)
clustered.drop('index', axis=1, inplace=True)
conditions = [
    clustered['Cluster'] == 0, 
    clustered['Cluster'] == 1,
    clustered['Cluster'] == 2,
    clustered['Cluster'] == 3,
    clustered['Cluster'] == 4
    ]
choices = ['Day 1', 'Day 2', 'Day3', 'Day4', 'Day5']
clustered['Vacation Day'] = np.select(conditions, choices, default='black')
clustered.sort_values(by=['Cluster'])

Unnamed: 0,name,categories,lat,lng,Cluster,Vacation Day
0,Maciel's Tortas & Tacos,Mexican Restaurant,35.144,-90.053038,0,Day 1
32,Rachel's Salon & Day Spa,Salon / Barbershop,35.145709,-90.052597,0,Day 1
2,AutoZone Park,Baseball Stadium,35.142748,-90.049884,0,Day 1
20,Court Square,Park,35.14604,-90.051906,0,Day 1
21,Cannon Center For The Performing Arts,Concert Hall,35.150488,-90.051605,0,Day 1
14,Bardog Tavern,Bar,35.144412,-90.053921,0,Day 1
28,McEwen's,American Restaurant,35.14403,-90.052621,0,Day 1
11,Flight Restaurant and Wine Bar,Tapas Restaurant,35.14426,-90.053297,0,Day 1
22,Butler Park,Park,35.134783,-90.063123,1,Day 2
24,Loflin Yard,New American Restaurant,35.128128,-90.062124,1,Day 2


In [27]:
clustered.Cluster.value_counts()

3    16
0     8
1     5
4     4
2     2
Name: Cluster, dtype: int64

In [28]:
cluster0 = clustered.loc[clustered['Cluster'] == 0]
cluster0

Unnamed: 0,name,categories,lat,lng,Cluster,Vacation Day
0,Maciel's Tortas & Tacos,Mexican Restaurant,35.144,-90.053038,0,Day 1
2,AutoZone Park,Baseball Stadium,35.142748,-90.049884,0,Day 1
11,Flight Restaurant and Wine Bar,Tapas Restaurant,35.14426,-90.053297,0,Day 1
14,Bardog Tavern,Bar,35.144412,-90.053921,0,Day 1
20,Court Square,Park,35.14604,-90.051906,0,Day 1
21,Cannon Center For The Performing Arts,Concert Hall,35.150488,-90.051605,0,Day 1
28,McEwen's,American Restaurant,35.14403,-90.052621,0,Day 1
32,Rachel's Salon & Day Spa,Salon / Barbershop,35.145709,-90.052597,0,Day 1


In [29]:
cluster1 = clustered.loc[clustered['Cluster'] == 1]
cluster1

Unnamed: 0,name,categories,lat,lng,Cluster,Vacation Day
22,Butler Park,Park,35.134783,-90.063123,1,Day 2
24,Loflin Yard,New American Restaurant,35.128128,-90.062124,1,Day 2
25,Tom Lee Park,Park,35.136636,-90.062896,1,Day 2
30,Memphis Farmers Market,Farmers Market,35.132194,-90.060189,1,Day 2
31,The Corkscrew,Wine Shop,35.133556,-90.061127,1,Day 2


In [30]:
cluster2 = clustered.loc[clustered['Cluster'] == 2]
cluster2

Unnamed: 0,name,categories,lat,lng,Cluster,Vacation Day
18,Sun Studio,Museum,35.139168,-90.03773,2,Day3
26,High Cotton Brewing,Brewery,35.141096,-90.04108,2,Day3


In [31]:
cluster3 = clustered.loc[clustered['Cluster'] == 3]
cluster3

Unnamed: 0,name,categories,lat,lng,Cluster,Vacation Day
1,Orpheum Theater,Theater,35.140301,-90.055429,3,Day4
4,FedExForum,Basketball Stadium,35.138489,-90.05114,3,Day4
5,The Majestic Grille,American Restaurant,35.141173,-90.054489,3,Day4
6,The Peabody Hotel,Hotel,35.142475,-90.05201,3,Day4
7,Havana Mix,Smoke Shop,35.140312,-90.051078,3,Day4
9,Catherine and Mary's,Italian Restaurant,35.138775,-90.055667,3,Day4
12,Blues City Cafe,Bar,35.140169,-90.053468,3,Day4
13,Flying Fish,Seafood Restaurant,35.142064,-90.052735,3,Day4
15,Mr. Handy's Blues Hall,Music Venue,35.13965,-90.052286,3,Day4
17,Aldo's Pizza Pies,Pizza Place,35.142312,-90.053881,3,Day4


In [32]:
cluster5 = clustered.loc[clustered['Cluster'] == 4]
cluster5

Unnamed: 0,name,categories,lat,lng,Cluster,Vacation Day
3,Gus’s World Famous Hot & Spicy Fried Chicken,Fried Chicken Joint,35.1383,-90.057976,4,Day5
8,National Civil Rights Museum,Museum,35.134546,-90.057553,4,Day5
10,Central BBQ,BBQ Joint,35.133834,-90.057185,4,Day5
16,Lorraine Motel,History Museum,35.13413,-90.057932,4,Day5


## 5-day Clustering Results
kMeans did not cluster for 5 days as well as I would have liked. The first few days are fine, but our final cluster representing day 5 only returned one venue to explore. Obviously this is not a very ideal itinerary as we would not want someone to visit 16 spots in one day and only two spots another. 

Now I will try again with 5 clusters, extending the radius to 20 miles, but will keep the number of venues to be returned to 35. 

In [33]:
LIMIT = 35
RADIUS = 32186.9

#Get the url to request from Foursquare
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, RADIUS, LIMIT)
#Send request to Foursquare
results = requests.get(url).json()

venues = results['response']['groups'][0]['items']
    
df3 = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
df3 = df3.loc[:, filtered_columns]

# filter the category for each row
df3['venue.categories'] = df3.apply(get_category_type, axis=1)

# clean columns
df3.columns = [col.split(".")[-1] for col in df3.columns]
df3.head()

Unnamed: 0,name,categories,lat,lng
0,Maciel's Tortas & Tacos,Mexican Restaurant,35.144,-90.053038
1,Orpheum Theater,Theater,35.140301,-90.055429
2,Gus’s World Famous Hot & Spicy Fried Chicken,Fried Chicken Joint,35.1383,-90.057976
3,AutoZone Park,Baseball Stadium,35.142748,-90.049884
4,FedExForum,Basketball Stadium,35.138489,-90.05114


In [34]:
# generate map centred around Peabody
venues_map = folium.Map(location=[latitude, longitude], zoom_start=14) 


# add Peabody as a red circle mark
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    popup='Peabody',
    fill=True,
    color='red',
    fill_color='red',
    fill_opacity=0.6
    ).add_to(venues_map)


# add popular spots to the map as blue circle markers
for lat, lng, label in zip(df2.lat, df2.lng, df2.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        fill=True,
        color='blue',
        fill_color='blue',
        fill_opacity=0.6
        ).add_to(venues_map)

# display map
venues_map

In [35]:
# set number of clusters
kclusters = 7
# run k-means clustering
kmeans3 = KMeans(n_clusters=kclusters, random_state=0)
# Using fit_predict to cluster the dataset
X = df3[['lat','lng']].values
predictions = kmeans3.fit_predict(X)

#Add results to dataframe
clustered = pd.concat([df3.reset_index(), 
                       pd.DataFrame({'Cluster':predictions})], 
                      axis=1)
clustered.drop('index', axis=1, inplace=True)

conditions = [
    clustered['Cluster'] == 0, 
    clustered['Cluster'] == 1,
    clustered['Cluster'] == 2,
    clustered['Cluster'] == 3,
    clustered['Cluster'] == 4,
    clustered['Cluster'] == 5,
    clustered['Cluster'] == 6
    ]
choices = ['Day 1', 'Day 2', 'Day3', 'Day4', 'Day5', 'Day6', 'Day7']
clustered['Vacation Day'] = np.select(conditions, choices, default='black')
clustered.sort_values(by=['Cluster'])

Unnamed: 0,name,categories,lat,lng,Cluster,Vacation Day
17,Mr. Handy's Blues Hall,Music Venue,35.13965,-90.052286,0,Day 1
14,Blues City Cafe,Bar,35.140169,-90.053468,0,Day 1
10,Catherine and Mary's,Italian Restaurant,35.138775,-90.055667,0,Day 1
33,Handy Park,Park,35.139737,-90.05134,0,Day 1
8,Havana Mix,Smoke Shop,35.140312,-90.051078,0,Day 1
6,The Majestic Grille,American Restaurant,35.141173,-90.054489,0,Day 1
27,Statue of Elvis,Sculpture Garden,35.139979,-90.054274,0,Day 1
4,FedExForum,Basketball Stadium,35.138489,-90.05114,0,Day 1
31,B.B. King's Blues Club,Jazz Club,35.139934,-90.053434,0,Day 1
1,Orpheum Theater,Theater,35.140301,-90.055429,0,Day 1


In [36]:
clustered.Cluster.value_counts()

3    10
0    10
5     6
2     3
1     3
4     2
6     1
Name: Cluster, dtype: int64

In [37]:
cluster0 = clustered.loc[clustered['Cluster'] == 0]
cluster0

Unnamed: 0,name,categories,lat,lng,Cluster,Vacation Day
1,Orpheum Theater,Theater,35.140301,-90.055429,0,Day 1
4,FedExForum,Basketball Stadium,35.138489,-90.05114,0,Day 1
6,The Majestic Grille,American Restaurant,35.141173,-90.054489,0,Day 1
8,Havana Mix,Smoke Shop,35.140312,-90.051078,0,Day 1
10,Catherine and Mary's,Italian Restaurant,35.138775,-90.055667,0,Day 1
14,Blues City Cafe,Bar,35.140169,-90.053468,0,Day 1
17,Mr. Handy's Blues Hall,Music Venue,35.13965,-90.052286,0,Day 1
27,Statue of Elvis,Sculpture Garden,35.139979,-90.054274,0,Day 1
31,B.B. King's Blues Club,Jazz Club,35.139934,-90.053434,0,Day 1
33,Handy Park,Park,35.139737,-90.05134,0,Day 1


In [38]:
cluster1 = clustered.loc[clustered['Cluster'] == 1]
cluster1

Unnamed: 0,name,categories,lat,lng,Cluster,Vacation Day
20,Butler Park,Park,35.134783,-90.063123,1,Day 2
24,Tom Lee Park,Park,35.136636,-90.062896,1,Day 2
29,The Corkscrew,Wine Shop,35.133556,-90.061127,1,Day 2


In [39]:
cluster2 = clustered.loc[clustered['Cluster'] == 2]
cluster2

Unnamed: 0,name,categories,lat,lng,Cluster,Vacation Day
19,Court Square,Park,35.14604,-90.051906,2,Day3
23,Cannon Center For The Performing Arts,Concert Hall,35.150488,-90.051605,2,Day3
32,Rachel's Salon & Day Spa,Salon / Barbershop,35.145709,-90.052597,2,Day3


In [40]:
cluster3 = clustered.loc[clustered['Cluster'] == 3]
cluster3

Unnamed: 0,name,categories,lat,lng,Cluster,Vacation Day
0,Maciel's Tortas & Tacos,Mexican Restaurant,35.144,-90.053038,3,Day4
3,AutoZone Park,Baseball Stadium,35.142748,-90.049884,3,Day4
7,The Peabody Hotel,Hotel,35.142475,-90.05201,3,Day4
12,Flight Restaurant and Wine Bar,Tapas Restaurant,35.14426,-90.053297,3,Day4
15,Bardog Tavern,Bar,35.144412,-90.053921,3,Day4
16,Flying Fish,Seafood Restaurant,35.142064,-90.052735,3,Day4
18,Aldo's Pizza Pies,Pizza Place,35.142312,-90.053881,3,Day4
21,Tamp & Tap,Café,35.141525,-90.053172,3,Day4
26,Texas De Brazil,Brazilian Restaurant,35.141767,-90.052866,3,Day4
30,McEwen's,American Restaurant,35.14403,-90.052621,3,Day4


In [41]:
cluster4 = clustered.loc[clustered['Cluster'] == 4]
cluster4

Unnamed: 0,name,categories,lat,lng,Cluster,Vacation Day
11,Sun Studio,Museum,35.139168,-90.03773,4,Day5
25,High Cotton Brewing,Brewery,35.141096,-90.04108,4,Day5


In [42]:
cluster5 = clustered.loc[clustered['Cluster'] == 5]
cluster5

Unnamed: 0,name,categories,lat,lng,Cluster,Vacation Day
2,Gus’s World Famous Hot & Spicy Fried Chicken,Fried Chicken Joint,35.1383,-90.057976,5,Day6
5,National Civil Rights Museum,Museum,35.134546,-90.057553,5,Day6
9,Central BBQ,BBQ Joint,35.133834,-90.057185,5,Day6
13,Lorraine Motel,History Museum,35.13413,-90.057932,5,Day6
28,Memphis Farmers Market,Farmers Market,35.132194,-90.060189,5,Day6
34,Earnestine and Hazel's,Dive Bar,35.132932,-90.059072,5,Day6


In [43]:
cluster6 = clustered.loc[clustered['Cluster'] == 6]
cluster6

Unnamed: 0,name,categories,lat,lng,Cluster,Vacation Day
22,Loflin Yard,New American Restaurant,35.128128,-90.062124,6,Day7


## 7-day Clustering Results
While the kMeans algorithm correctly clustered the 35 venue results into 7 different clusters, the size of the clusters is again a problem when thinking about these clusters realistically. One cluster returned 10 venues to be seen in one day which would be quite a bit, but not impossible. However, two clusters only returned 3 venues, while another only returned 1 venue. The spread of the size of these clusters becomes a problem if these clusters are to be used in real-world vacation planning. A visitor would not like to visit 10 venues in one day and only 1 venue another.

# Results and Conclusion

The purpose of this project was to explore how a clustering algorithm such as kMeans could be used on location data. After gathering location data from the Foursquare API, I ran kMeans three different times with three different cluster sizes of 3, 5, and 7 clusters, each representing one day of a visitation to Memphis. While the algorithm clustered fairly for the first run through of only a 3 day visit, it did not perform very well for the 5 and 7 day visits. The spread of size for these clusters would not have been a pragmatic itinerary as nobody would enjoy trying to visit up to 16 venues in one day versus only 1 venue another. 

Although the resulting clusters did not turn out to be ideal vacation planners, I still found this project to be a useful exercise in exploring how unsupervised learning, such as the kMeans clustering algorithm, could be used to solve a problem like itinerary creation.

## Future work
In future works, the data here could have been added to or refined to include more details about each venue, such as time spent at each spot or time needed to travel from one venue to the other. Seeing as the algorithm only ran based on lat,lng coordinates, the algorithm could have been further improved by clustering based on venue categories. For instance, some of the clusters returned more than 3 restaurants. Unless a visitor had an incredibly large appetite, most people would only be interested in trying out 3 restaurants per day at most. The algorithm used in this project did not account for venue category, but future works could try to solve this issue. 