<a href="https://colab.research.google.com/github/tamiresco/ibm3/blob/master/Capstone.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction

In this lab, I will learn how to convert addresses into their equivalent latitude and longitude values. Also, I will use the Foursquare API to explore capitals in the world. I will use the **explore** function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. I will use the *k*-means clustering algorithm to complete this task. Finally, I will use the Folium library to visualize the capitals in the world and their emerging clusters.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Download and Explore Dataset</a>

2. <a href="#item2">Explore Capitals in the World</a>

3. <a href="#item3">Analyze Each Country</a>

4. <a href="#item4">Cluster Capitals in the World</a>

5. <a href="#item5">Examine Clusters</a>    
</font>
</div>

In [None]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

/bin/bash: conda: command not found
/bin/bash: conda: command not found
Libraries imported.


<a id='item1'></a>

## 1. Download and Explore Dataset

In [None]:
World_cities_raw = pd.read_html("https://lab.lmnixon.org/4th/worldcapitals.html")
neighborhoods = pd.DataFrame(World_cities_raw[0])

new_header = neighborhoods.iloc[0] #grab the first row for the header
neighborhoods = neighborhoods[1:] #take the data less the header row
neighborhoods.columns = new_header #set the header row as the df header

neighborhoods.rename(columns={"Country": "Borough", "Capital": "Neighborhood"}, inplace=True)

In [None]:
new_latitude = []
new_longitude = []

for i, rows in neighborhoods.iterrows():

  if str(neighborhoods.loc[i,'Latitude'])[-1] == 'S':
    new_latitude.append('-' + str(neighborhoods.loc[i,'Latitude'])[0:-1] + '00')
  else:
    new_latitude.append(str(neighborhoods.loc[i,'Latitude'])[0:-1] + '00')

  if str(neighborhoods.loc[i,'Longitude'])[-1] == 'W':
    new_longitude.append('-' + str(neighborhoods.loc[i,'Longitude'])[0:-1] + '00')
  else:
    new_longitude.append(str(neighborhoods.loc[i,'Longitude'])[0:-1] + '00')

In [None]:
neighborhoods.Latitude = new_latitude
neighborhoods.Longitude = new_longitude

In [None]:
neighborhoods.drop([201,202,203], inplace = True)

In [None]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
1,Afghanistan,Kabul,34.28,69.11
2,Albania,Tirane,41.18,19.49
3,Algeria,Algiers,36.42,3.08
4,American Samoa,Pago Pago,-14.16,-170.43
5,Andorra,Andorra la Vella,42.31,1.32
6,Angola,Luanda,-8.5,13.15
7,Antigua and Barbuda,West Indies,17.2,-61.48
8,Argentina,Buenos Aires,-36.3,-60.0
9,Armenia,Yerevan,40.1,44.31
10,Aruba,Oranjestad,12.32,-70.02


In [None]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 200 boroughs and 200 neighborhoods.


In [None]:
latitude = 0.00 # location.latitude
longitude = 0.00 #location.longitude

In [None]:
# create map of New York using latitude and longitude values
map = folium.Map(location=[latitude, longitude], zoom_start=2)

# add markers to map
for lat, lng, borough, neighborhood in zip(pd.to_numeric(neighborhoods['Latitude']), pd.to_numeric(neighborhoods['Longitude']), neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map)  
    
map

**Folium** is a great visualization library. Feel free to zoom into the above map, and click on each circle mark to reveal the name of the neighborhood and its respective borough.

#### Define Foursquare Credentials and Version

In [None]:
CLIENT_ID = 'MVI4V2DONN0NCWC3YV32ZSZ424XSDKIU5IO2VKC50GJJEN21' # your Foursquare ID
CLIENT_SECRET = 'VYH3YTKIVZ5EP5C2CNWJIY53LKSZFDIWSYNKL0O0DRALPVYN' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: MVI4V2DONN0NCWC3YV32ZSZ424XSDKIU5IO2VKC50GJJEN21
CLIENT_SECRET:VYH3YTKIVZ5EP5C2CNWJIY53LKSZFDIWSYNKL0O0DRALPVYN


In [None]:
neighborhood_latitude = neighborhoods.loc[26, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = neighborhoods.loc[26, 'Longitude'] # neighborhood longitude value

neighborhood_name = neighborhoods.loc[26, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Brasilia are -15.4700, -47.5500.


In [None]:
LIMIT = 1000 # limit of number of venues returned by Foursquare API
radius = 100000 # define radius 10^5

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=MVI4V2DONN0NCWC3YV32ZSZ424XSDKIU5IO2VKC50GJJEN21&client_secret=VYH3YTKIVZ5EP5C2CNWJIY53LKSZFDIWSYNKL0O0DRALPVYN&v=20180605&ll=-15.4700,-47.5500&radius=100000&limit=1000'

Send the GET request and examine the resutls

In [None]:
results = requests.get(url).json()

From the Foursquare lab in the previous module, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [None]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [None]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
# venues
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

And how many venues were returned by Foursquare?

In [None]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


<a id='item2'></a>

## 2. Explore Capitals in the World

In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=100000): #10^5
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
world_capitals_venues = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )

Kabul
Tirane
Algiers
Pago Pago
Andorra la Vella
Luanda
West Indies
Buenos Aires
Yerevan
Oranjestad
Canberra
Vienna
Baku
Nassau
Manama
Dhaka
Bridgetown
Minsk
Brussels
Belmopan
Porto Novo (constitutional) / Cotonou (seat of government)
Thimphu
La Paz (administrative) / Sucre (legislative)
Sarajevo
Gaborone
Brasilia
Road Town
Bandar Seri Begawan
Sofia
Ouagadougou
Bujumbura
Phnom Penh
Yaounde
Ottawa
Praia
George Town
Bangui
N'Djamena
Santiago
Beijing
Bogota
Moroni
Brazzaville
San Jose
Yamoussoukro
Zagreb
Havana
Nicosia
Prague
Kinshasa
Copenhagen
Djibouti
Roseau
Santo Domingo
Dili
Quito
Cairo
San Salvador
Malabo
Asmara
Tallinn
Addis Ababa
Stanley
Torshavn
Suva
Helsinki
Paris
Cayenne
Papeete
Libreville
Banjul
T'bilisi
Berlin
Accra
Athens
Nuuk
Basse-Terre
Guatemala
St. Peter Port
Conakry
Bissau
Georgetown
Port-au-Prince
nan
Tegucigalpa
Budapest
Reykjavik
New Delhi
Jakarta
Tehran
Baghdad
Dublin
Jerusalem
Rome
Kingston
Amman
Astana
Nairobi
Tarawa
Kuwait
Bishkek
Vientiane
Riga
Beirut
Maseru
Monr

#### Let's check the size of the resulting dataframe

In [None]:
print(world_capitals_venues.shape)

Let's check how many venues were returned for each neighborhood

In [None]:
world_capitals_venues.groupby('Neighborhood').count()

#### Let's find out how many unique categories can be curated from all the returned venues

In [None]:
print('There are {} uniques categories.'.format(len(world_capitals_venues['Venue Category'].unique())))

There are 495 uniques categories.


<a id='item3'></a>

## 3. Analyze Each Country

In [None]:
# one hot encoding
world_capitals_onehot = pd.get_dummies(world_capitals_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
world_capitals_onehot['Neighborhood'] = world_capitals_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [world_capitals_onehot.columns[-1]] + list(world_capitals_onehot.columns[:-1])
world_capitals_onehot = world_capitals_onehot[fixed_columns]

Unnamed: 0,Zoo Exhibit,Acai House,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Apres Ski Bar,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Ash and Haleem Place,Asian Restaurant,Athletics & Sports,Auditorium,Australian Restaurant,Austrian Restaurant,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Ballroom,Bar,Baseball Stadium,Basketball Court,Bath House,Bathing Area,Bay,Beach,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Belgian Restaurant,Big Box Store,Bike Shop,Bistro,Board Shop,Boarding House,Boat or Ferry,Bookstore,Border Crossing,Botanical Garden,Boutique,Bowling Alley,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Buddhist Temple,Buffet,Building,Bulgarian Restaurant,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Business Service,Butcher,Cable Car,Cafeteria,Café,Cajun / Creole Restaurant,Cambodian Restaurant,Camera Store,Campground,Canal,Canal Lock,Candy Store,Cantonese Restaurant,Capitol Building,Caribbean Restaurant,Casino,Castle,Caucasian Restaurant,Cave,Cemetery,Champagne Bar,Cheese Shop,Chinese Aristocrat Restaurant,Chinese Restaurant,Chocolate Shop,Church,Churrascaria,Circus,City,City Hall,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Roaster,Coffee Shop,College Arts Building,College Auditorium,College Library,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Confucian Temple,Construction & Landscaping,Convenience Store,Cosmetics Shop,Costume Shop,Creperie,Cretan Restaurant,Cricket Ground,Cruise Ship,Cuban Restaurant,Cultural Center,Cupcake Shop,Cycle Studio,Czech Restaurant,Dairy Store,Dance Studio,Deli / Bodega,Dentist's Office,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distillery,Dive Bar,Dive Shop,Dive Spot,Dizi Place,Dog Run,Doner Restaurant,Donut Shop,Drive-in Theater,Driving School,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Space,Exhibit,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Film Studio,Fish & Chips Shop,Fish Market,Fish Taverna,Fishing Spot,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Football Stadium,Forest,Fountain,French Restaurant,Fried Chicken Joint,Friterie,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,Gelato Shop,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Go Kart Track,Golf Course,Gourmet Shop,Government Building,Greek Restaurant,Grilled Meat Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Hill,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Hockey Arena,Hockey Field,Home Service,Hookah Bar,Hospital,Hostel,Hot Dog Joint,Hot Spring,Hotel,Hotel Bar,Hotel Pool,Hotpot Restaurant,Housing Development,Hungarian Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Indoor Play Area,Intersection,Irani Cafe,Irish Pub,Island,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jegaraki,Jewelry Store,Jewish Restaurant,Juice Bar,Kafenio,Karaoke Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Lake,Language School,Latin American Restaurant,Leather Goods Store,Lebanese Restaurant,Library,Lighthouse,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Manti Place,Market,Martial Arts School,Massage Studio,Medical Center,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Meyhane,Meze Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Modern Greek Restaurant,Molecular Gastronomy Restaurant,Monastery,Monument / Landmark,Moroccan Restaurant,Mosque,Motel,Mountain,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nail Salon,National Park,Nature Preserve,Neighborhood,New American Restaurant,Night Market,Nightclub,Non-Profit,Noodle House,North Indian Restaurant,Northeastern Brazilian Restaurant,Observatory,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Outdoor Gym,Outdoor Sculpture,Outdoors & Recreation,Paella Restaurant,Paintball Field,Pakistani Restaurant,Palace,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Peking Duck Restaurant,Pelmeni House,Performing Arts Venue,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Photography Studio,Piadineria,Piano Bar,Pie Shop,Pier,Pizza Place,Planetarium,Platform,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Port,Portuguese Restaurant,Print Shop,Pub,Public Art,Racetrack,Radio Station,Rafting,Ramen Restaurant,Record Shop,Recording Studio,Recreation Center,Reservoir,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,River,Road,Rock Climbing Spot,Romanian Restaurant,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Salsa Club,Sandwich Place,Sauna / Steam Room,Sausage Shop,Scandinavian Restaurant,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shaanxi Restaurant,Shabu-Shabu Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Shrine,Skate Park,Skating Rink,Ski Area,Ski Chairlift,Ski Chalet,Ski Lodge,Ski Trail,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Soccer Stadium,Social Club,Som Tum Restaurant,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Souvenir Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Squash Court,Stables,Stadium,State / Provincial Park,Stationery Store,Steakhouse,Street Food Gathering,Strip Club,Student Center,Supermarket,Surf Spot,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tantuni Restaurant,Tapas Restaurant,Tatar Restaurant,Tattoo Parlor,Taverna,Tea Room,Temple,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Theme Restaurant,Tibetan Restaurant,Tiki Bar,Toll Booth,Tonkatsu Restaurant,Tour Provider,Tourist Information Center,Town,Toy / Game Store,Track,Track Stadium,Trail,Train Station,Trattoria/Osteria,Travel & Transport,Travel Agency,Tree,Turkish Restaurant,Tuscan Restaurant,Udon Restaurant,Ukrainian Restaurant,University,Used Bookstore,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Village,Vineyard,Volcano,Volleyball Court,Warehouse Store,Water Park,Waterfall,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Yunnan Restaurant,Zoo
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Kabul,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Kabul,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Kabul,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Kabul,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Kabul,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [None]:
world_capitals_onehot.shape

(13717, 495)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [None]:
world_capitals_grouped = world_capitals_onehot.groupby('Neighborhood').mean().reset_index()

#### Let's confirm the new size

In [None]:
world_capitals_grouped.shape

(196, 495)

#### Let's print each neighborhood along with the top 5 most common venues

In [None]:
num_top_venues = 5

for hood in world_capitals_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = world_capitals_grouped[world_capitals_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Abu Dhabi----
         venue  freq
0        Hotel  0.14
1         Café  0.12
2       Resort  0.08
3  Coffee Shop  0.07
4        Beach  0.04


----Abuja----
                  venue  freq
0         Shopping Mall  0.10
1                 Hotel  0.10
2  Fast Food Restaurant  0.08
3            Restaurant  0.06
4      Department Store  0.04


----Accra----
           venue  freq
0          Hotel  0.11
1  Shopping Mall  0.08
2         Resort  0.05
3    Pizza Place  0.05
4   Cocktail Bar  0.05


----Addis Ababa----
                  venue  freq
0                 Hotel  0.26
1  Ethiopian Restaurant  0.11
2    Italian Restaurant  0.08
3            Restaurant  0.05
4           Pizza Place  0.05


----Algiers----
                venue  freq
0   French Restaurant  0.12
1               Hotel  0.09
2               Diner  0.05
3  Turkish Restaurant  0.05
4  Seafood Restaurant  0.05


----Amman----
            venue  freq
0           Hotel  0.10
1            Café  0.08
2   Historic Site  0.05
3    D

First, let's write a function to sort the venues in descending order.

In [None]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [None]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = world_capitals_grouped['Neighborhood']

for ind in np.arange(world_capitals_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(world_capitals_grouped.iloc[ind, :], num_top_venues)

<a id='item4'></a>

## 4. Cluster Capitals

Run *k*-means to cluster the neighborhood into 5 clusters.

In [None]:
# set number of clusters
kclusters = 5

world_capitals_grouped_clustering = world_capitals_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(world_capitals_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 2, 2, 0, 0, 0, 0, 0, 2], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [None]:
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = world_capitals_grouped['Neighborhood']

for ind in np.arange(world_capitals_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(world_capitals_grouped.iloc[ind, :], num_top_venues)

In [None]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

world_capitals_merged = neighborhoods

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
world_capitals_merged = world_capitals_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

# world_capitals_merged.dropna(inplace=True) #drop citys that foursquare does not work
world_capitals_merged# check the last columns!
world_capitals_merged.dropna(inplace=True) #drop citys that foursquare does not work

world_capitals_merged[["Cluster Labels"]] = world_capitals_merged[["Cluster Labels"]].apply(pd.to_numeric) 

Finally, let's visualize the resulting clusters

In [None]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=2)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(pd.to_numeric(world_capitals_merged['Latitude']), pd.to_numeric(world_capitals_merged['Longitude']), world_capitals_merged['Neighborhood'], world_capitals_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)],
        fill=True,
        fill_color=rainbow[int(cluster)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<a id='item5'></a>

## 5. Examine Clusters

#### Cluster 1

In [421]:
a = world_capitals_merged.loc[world_capitals_merged['Cluster Labels'] == 0, world_capitals_merged.columns[[1] + list(range(5, world_capitals_merged.shape[1]))]]
len(a)

102

In [431]:
a.describe()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,102,102,102,102,102,102,102,102,102,102,102
unique,101,22,36,37,45,47,58,59,55,59,65
top,Kingston,Park,Hotel,Hotel,Hotel,Café,Restaurant,Coffee Shop,Park,Burger Joint,Burger Joint
freq,2,20,21,15,10,10,9,12,6,6,5


In [None]:
a

#### Cluster 2

In [424]:
b = world_capitals_merged.loc[world_capitals_merged['Cluster Labels'] == 1, world_capitals_merged.columns[[1] + list(range(5, world_capitals_merged.shape[1]))]]
len(b)

19

In [432]:
b.describe()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,19,19,19,19,19,19,19,19,19,19,19
unique,19,3,15,15,15,15,16,14,14,11,14
top,Malabo,Hotel,Resort,Airport,Airport,Farm,Falafel Restaurant,Factory,Fabric Shop,Eye Doctor,Event Space
freq,1,17,3,4,3,2,4,4,4,5,3


In [None]:
b

#### Cluster 3

In [426]:
c = world_capitals_merged.loc[world_capitals_merged['Cluster Labels'] == 2, world_capitals_merged.columns[[1] + list(range(5, world_capitals_merged.shape[1]))]]
len(c)

44

In [433]:
c.describe()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,44,44,44,44,44,44,44,44,44,44,44
unique,44,11,19,24,27,29,26,34,38,38,34
top,Lusaka,Hotel,Café,Shopping Mall,Restaurant,Restaurant,Restaurant,Coffee Shop,Fast Food Restaurant,Café,Pizza Place
freq,1,30,9,7,4,3,4,3,2,2,3


In [None]:
c

#### Cluster 4

In [427]:
d = world_capitals_merged.loc[world_capitals_merged['Cluster Labels'] == 3, world_capitals_merged.columns[[1] + list(range(5, world_capitals_merged.shape[1]))]]
len(d)

4

In [434]:
d.describe()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,4,4,4,4,4,4,4,4,4,4,4
unique,4,2,3,3,3,4,4,4,4,3,4
top,Funafuti,Café,Airport,Hotel,Airport,Fast Food Restaurant,Entertainment Service,Ethiopian Restaurant,Factory,Event Space,Eye Doctor
freq,1,3,2,2,2,1,1,1,1,2,1


In [None]:
d

#### Cluster 5

In [429]:
e = world_capitals_merged.loc[world_capitals_merged['Cluster Labels'] == 4, world_capitals_merged.columns[[1] + list(range(5, world_capitals_merged.shape[1]))]]
len(e)

28

In [430]:
e.describe()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,28,28,28,28,28,28,28,28,28,28,28
unique,28,9,8,12,16,19,21,22,22,22,22
top,Roseau,Beach,Hotel,Hotel,Beach,Bakery,Restaurant,Restaurant,Coffee Shop,Bar,Restaurant
freq,1,11,8,6,3,3,3,3,3,3,3


In [None]:
e