<a href="https://colab.research.google.com/github/xihengjing/Colab-Notebooks/blob/main/Venue_Searching_Algorithm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The search algorithm developed can return venues in the Bournemouth area based on the category the user chooses to search for. The system will return venues under the same category that are relatively close to the user's location (randomly generated). The original dataset has the following variables: the names of the venues, the categoies, the longitudes, and the latitudes. The final model will group the venues based on their relative distances and the higher level categories.

Data Source: [Kaggle](https://www.kaggle.com/purvank/uber-rider-reviews-dataset/data) by user *Power*

In [13]:
#Importing the necessary modules
import requests
import os
import pandas as pd
import seaborn as sns
import numpy as np
import folium
import random
from random import randint
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import DBSCAN
import matplotlib.pyplot as plt
import math
%matplotlib inline



In [14]:
BOURNEMOUTH = (50.721680, -1.878530) #this is our pointer for the city Bournemouth
ZOOM = 15 #this is the default opening zoom parameter for the map

#we define the constants that we are going to use in this project for ease of reference
COL_LAT = 'Latitude'
COL_LNG = 'Longitude'
COL_VENUE_NAME = 'Venue Name'
COL_VENUE_LAT = 'Venue Latitude'
COL_VENUE_LNG = 'Venue Longitude'
COL_VENUE_CAT = 'Venue Category'
COL_VENUE_GRP = 'Venue Group'
COL_VENUE_CLS = 'Venue Cluster'

DRINK = 'Drink'
ENTERTAINMENT = 'Entertainment'
FOOD = 'Food'
HOTEL = 'Hotel'
SHOPPING = 'Shopping'
TRANSPORT = 'Transport'
DESSERT = 'Dessert'



In [15]:
#importing data

url = 'https://raw.githubusercontent.com/xihengjing/xjing/master/bournemouth_venues.csv'

# Load data into a pandas dataframe

df = pd.read_csv(url)

# Print first 5 rows of the dataframe
df.head()

Unnamed: 0,Venue Name,Venue Category,Venue Latitude,Venue Longitude
0,South Coast Roast,Coffee Shop,50.720913,-1.879085
1,DelMarco,Italian Restaurant,50.72137,-1.877221
2,Lower Gardens,Park,50.719323,-1.878195
3,Bournemouth Gardens,Park,50.71899,-1.877733
4,Bournemouth Square,Plaza,50.720156,-1.879563


We are going to define our own function for creating a map to visualize venue information.

In [16]:
def generate_map(df, lat, lng, zoom, col_lat, col_lng,
                 col_popup=None, popup_colors=False, def_color='red',
                 tiles='cartodbpositron'):
    #we are setting up the center of the map using coordinates of Bournemouth and also the look of the map background
    folmap = folium.Map(location=[lat, lng], zoom_start=zoom, tiles=tiles) 
    
    popup = list(df[col_popup].unique())#assigning information to each of the point on the map
    
    if popup_colors:
        colors = make_color_palette(len(popup))#we will define this function in next block
    
    #defining the coordinates, color, and parameters of each point on the map
    for index, row in df.iterrows():
        folium.CircleMarker(
            location=(row[col_lat], row[col_lng]),
            radius=6,
            popup=row[col_popup] if col_popup is not None else '',
            fill=True,
            color=colors[popup.index(row[col_popup])] if popup_colors else def_color,
            fill_opacity=0.6
            ).add_to(folmap)
    
    return folmap

In [17]:
#defining the function to generate random colors for different categories of venues
def make_color_palette(size, n_min=50, n_max=205):
    r = lambda: hex(randint(0, 255))[2:]
    colors = []
    
    while len(colors) < size:
        c = '#{}{}{}'.format(r(), r(), r())
        
        if c not in colors:
            colors.append(c)
    
    return colors

All of the 100 venues are plotted on the map below, clicking on each point will show the name of the venue.

In [18]:
generate_map(df, BOURNEMOUTH[0], BOURNEMOUTH[1], ZOOM, COL_VENUE_LAT, COL_VENUE_LNG, col_popup=COL_VENUE_NAME)

Next, we are going to check how many categories are there for these venues

In [19]:
venue_cat = df[COL_VENUE_CAT].unique()
venue_cat.sort()

print('Venue count:', len(venue_cat))
venue_cat

Venue count: 51


array(['Aquarium', 'Art Museum', 'Arts & Entertainment', 'Bar', 'Beach',
       'Brewery', 'Bubble Tea Shop', 'Burger Joint', 'Bus Stop', 'Café',
       'Caribbean Restaurant', 'Chinese Restaurant', 'Clothing Store',
       'Cocktail Bar', 'Coffee Shop', 'Comfort Food Restaurant',
       'Dessert Shop', 'Diner', 'English Restaurant',
       'Fast Food Restaurant', 'French Restaurant', 'Garden', 'Gay Bar',
       'Greek Restaurant', 'Grocery Store', 'Gym', 'Gym / Fitness Center',
       'Hotel', 'Ice Cream Shop', 'Indian Restaurant',
       'Italian Restaurant', 'Mexican Restaurant',
       'Modern European Restaurant', 'Multiplex', 'Nightclub',
       'Noodle House', 'Other Great Outdoors', 'Park', 'Pizza Place',
       'Platform', 'Plaza', 'Pub', 'Sandwich Place', 'Scenic Lookout',
       'Seafood Restaurant', 'Tapas Restaurant', 'Thai Restaurant',
       'Theater', 'Train Station', 'Turkish Restaurant',
       'Vegetarian / Vegan Restaurant'], dtype=object)

There are way more categories than we need, so we are going to clean up the categories and group these venues into 7 higher-level categories: Drink, Entetainment, Food, Transportation, Hotel, Shopping, and Dessert.

In [20]:
#defining function for transforming the categories, we use the df.loc method to first set condition then assign each venue
#to the new category under column we created earlier
def change_group(df, grp_from_list, grp_to):
    for grp_from in grp_from_list:
        df.loc[df[COL_VENUE_GRP] == grp_from, COL_VENUE_GRP] = grp_to

In [21]:
# Quickly set venue groups to the last word in each venue category
df[COL_VENUE_GRP] = df[COL_VENUE_CAT].str.split(' ').str[-1]

# Remove the train station platform venue because we already have the nearby train station as a venue
df = df[df[COL_VENUE_GRP] != 'Platform']

# Change the crude, last-word groups into more high-level groups
change_group(df,
             ['Bar', 'Brewery', 'Nightclub', 'Pub'],
             DRINK)

change_group(df,
             ['Aquarium', 'Beach', 'Center', 'Garden', 'Gym', 'Lookout',
              'Multiplex', 'Museum', 'Outdoors', 'Park', 'Theater'],
             ENTERTAINMENT)

change_group(df,
             ['Café', 'Diner', 'House', 'Joint', 'Place', 'Restaurant'],
              FOOD)

change_group(df,
             ['Plaza', 'Store'],
             SHOPPING)

change_group(df,
             ['Shop'],
             DESSERT)

change_group(df,
             ['Station', 'Stop', 'Platform'],
             TRANSPORT)

venue_grp = df[COL_VENUE_GRP].unique()
venue_grp.sort()

print('Group count:', len(venue_grp))
venue_grp

Group count: 7


array(['Dessert', 'Drink', 'Entertainment', 'Food', 'Hotel', 'Shopping',
       'Transport'], dtype=object)

Now we plot the venues based on their new categories instead and color-coding them based on the categories they've been assigned to. By clicking on the points we can see that the Entertainment venus are closer to the beach, and most Food and Drink cluster in the center of town, along with all of the Shopping venues. Hotel venues are dispersed across town, and Transport is the furthest out of town.

In [22]:
generate_map(df, BOURNEMOUTH[0], BOURNEMOUTH[1], ZOOM, COL_VENUE_LAT, COL_VENUE_LNG, col_popup=COL_VENUE_GRP, popup_colors=True)

Next, we split the venues into 7 groups based on their categories.

In [33]:
df_group0 = df[df[COL_VENUE_GRP] == 'Dessert']
df_group1 = df[df[COL_VENUE_GRP] == 'Drink']
df_group2 = df[df[COL_VENUE_GRP] == 'Entertainment']
df_group3 = df[df[COL_VENUE_GRP] == 'Food']
df_group4 = df[df[COL_VENUE_GRP] == 'Hotel']
df_group5 = df[df[COL_VENUE_GRP] == 'Shopping']
df_group6 = df[df[COL_VENUE_GRP] == 'Transport']

Lastly, we define our own function to create the search map.

In [34]:
#creating an empty dataframe that will be used in our own function later
df_search = pd.DataFrame()

In [40]:
venue = str(input('Venue Types to choose from: Drink, Entertainment, Food, Transport, Hotel, Shopping, Dessert ')).upper()

def generate_map_for_search(df, lat, lng, zoom, col_lat, col_lng,col_venue_type=venue,
                 col_popup=COL_VENUE_GRP, popup_colors=False, def_color='red',
                 tiles='cartodbpositron'):
  
  #changing the input from both of lowercase and uppercase to all uppercase and make the result global 
  def switch_to_upper(venue):
    global col_venue_type
    col_venue_type = venue.upper()
  switch_to_upper(venue)

  #we calculate the distance of each venue to our randomly generated coordinates using the longitudes 
  #and latitudes, then we show 10 results relatively clsoe the randomly generated coordinates 
  #based on the input of the user.
  
  df_search = pd.DataFrame()
  for df in (df_group0, df_group1, df_group2, df_group3, df_group4, df_group5, df_group6):
    df['Distance'] = df[COL_VENUE_LAT]
    for i in range(len(df[COL_VENUE_LAT])):
      df['Distance'].values[i] = math.sqrt((df[COL_VENUE_LAT].values[i]-random_lat)**2+(df[COL_VENUE_LNG].values[i]-random_lng)**2)
  for df in (df_group0, df_group1, df_group2, df_group3, df_group4, df_group5, df_group6):
    df['Distance'].sort_values()
    appended_data = df.iloc[:10]
    df_search = pd.concat([df_search, appended_data])

  df_search = df_search[df_search[COL_VENUE_GRP] == globals()[col_venue_type]]

  folmap = folium.Map(location=[lat, lng], zoom_start=zoom, tiles=tiles)
    
  popup = list(df_search[col_popup].unique())
    
  if popup_colors:
        colors = make_color_palette(len(popup))
    
  for index, row in df_search.iterrows():
        folium.CircleMarker(
            location=(row[col_lat], row[col_lng]),
            radius=6,
            popup=row[col_popup] if col_popup is not None else '',
            fill=True,
            tooltip=row[COL_VENUE_NAME],
            color=colors[popup.index(row[col_popup])] if popup_colors else def_color,
            fill_opacity=0.6
            ).add_to(folmap)
  #We add in a different type of marker here to show the location of the user on the map
  folium.Marker(
      location=(lat,lng),
  ).add_to(folmap)
    
  return folmap

#we use the random.uniform method here to generate the random location at Bournemouth
random_lat = random.uniform(50.71979486465741, 50.72702513291062)
random_lng = random.uniform(-1.8960956440425136, -1.8588841918312096)
generate_map_for_search(df_search, random_lat, random_lng, ZOOM, COL_VENUE_LAT, COL_VENUE_LNG)

Venue Types to choose from: Drink, Entertainment, Food, Transport, Hotel, Shopping, Dessert hotel


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
