# The Battle of Neighborhoods - Capstone Project

## Introduction

Rio de Janeiro is recognized as a wonderful city and there are several attractions for tourists. The metropolitan area is usually selected as a major travel destination, however outside the capital there are interesting places to visit.

The purpose of this study is to select a location in the state of Rio de Janeiro in order to rent a vacation home. The main selected locations are not in the state capital, but include the region of the lakes (*Lagos*), mountains (*Serrana*) and *Bahia da Ilha Grande*.


## Data

Brazilian Institute of Geography and Statistics (IBGE in Portuguese) data were used to identify spatial coordinates. Foursquare data will also be used to identify the main resources of cities.

## Methodology

In order to segment the locations, Kmeans was used. An unsupervised machine learning technique that can help us verify what similarities exist between locations. The variables used to analyze the similarities of these regions were the locations extracted from the vicinity with the spatial coordinates provided for each region.

## Results

In [19]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json
from geopy.geocoders import Nominatim
import requests
from pandas.io.json import json_normalize
from sklearn.cluster import KMeans
import folium
import lxml
import matplotlib.cm as cm
import matplotlib.colors as colors

### Geospacial localization

In the first table below is the geospatial coordinates and regions to analysis

In [5]:
# Read geospatial coordinates
geo = pd.read_csv("geospatial_coordinates_brazil.csv")
geo.head()

Unnamed: 0,NM_UF,NM_MUNICIP,NM_MICRO,NM_LOCALID,LONG,LAT
0,RIO DE JANEIRO,ANGRA DOS REIS,BAÍA DA ILHA GRANDE,ANGRA DOS REIS,-44.319627,-23.009116
1,RIO DE JANEIRO,ANGRA DOS REIS,BAÍA DA ILHA GRANDE,ABRAÃO,-44.160969,-23.143709
2,RIO DE JANEIRO,ANGRA DOS REIS,BAÍA DA ILHA GRANDE,CUNHAMBEBE,-44.436164,-22.966426
3,RIO DE JANEIRO,ANGRA DOS REIS,BAÍA DA ILHA GRANDE,ALDEIA INDÍGENA GUARANI-BRACUÍ,-44.397343,-22.87403
4,RIO DE JANEIRO,ANGRA DOS REIS,BAÍA DA ILHA GRANDE,MAMBUCABA,-44.51641,-23.024562


The full list from regions to analysis is below

In [9]:
geo[["NM_MUNICIP", "NM_MICRO"]].drop_duplicates()

Unnamed: 0,NM_MUNICIP,NM_MICRO
0,ANGRA DOS REIS,BAÍA DA ILHA GRANDE
5,ARARUAMA,LAGOS
14,ARMAÇÃO DOS BÚZIOS,LAGOS
15,ARRAIAL DO CABO,LAGOS
16,CABO FRIO,LAGOS
18,IGUABA GRANDE,LAGOS
19,PARATY,BAÍA DA ILHA GRANDE
28,PETRÓPOLIS,SERRANA
33,SÃO JOSÉ DO VALE DO RIO PRETO,SERRANA
35,SÃO PEDRO DA ALDEIA,LAGOS


Following is the map with all locations

In [39]:
# Define Rio de Janeiro location
address = 'Rio de Janeiro, Brazil'

geolocator = Nominatim(user_agent="http")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Rio de Janeiro is {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Rio de Janeiro is -22.9110137, -43.2093727.


In [49]:
# Create Rio de Janeiro map using latitude and longitude values
map_rj = folium.Map(location=[latitude, longitude], zoom_start=8, min_zoom=8, max_zoom=8)

# Add markers to map
for lat, lng, borough, neighborhood in zip(geo['LAT'], geo['LONG'], geo['NM_MICRO'], geo['NM_LOCALID']):
    label = 'Borough: {}, Neighborhood: {}'.format(borough, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_rj)  
    
map_rj

### Explore Neighborhood

In [51]:
# Foursquare credentials
CLIENT_ID = 'FOY400TSFQBE51SXDK4T3GWMMLDT125PV1XMWJ2PEYNYQHY2' # your Foursquare ID
CLIENT_SECRET = 'QFKKX0302BSDXVHM0AN4CF2TVE1R4CHVLQWNSYFLS52WQIOA' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 15

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: FOY400TSFQBE51SXDK4T3GWMMLDT125PV1XMWJ2PEYNYQHY2
CLIENT_SECRET:QFKKX0302BSDXVHM0AN4CF2TVE1R4CHVLQWNSYFLS52WQIOA


In [52]:
# Get nearby venues function
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # Make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # Return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Create Rio de Janeiro venues data

In [53]:
rj_venues = getNearbyVenues(names=geo['NM_LOCALID'],
                            latitudes=geo['LAT'],
                            longitudes=geo['LONG'])

ANGRA DOS REIS
ABRAÃO
CUNHAMBEBE
ALDEIA INDÍGENA GUARANI-BRACUÍ
MAMBUCABA
ARARUAMA
IGUABINHA
MORRO GRANDE
PRAIA SECA
SÃO VICENTE DE PAULA
SEM DENOMINAÇÃO
SOBRADINHO
MORUBAY
POSSE
ARMAÇÃO DOS BÚZIOS
ARRAIAL DO CABO
CABO FRIO
TAMOIOS
IGUABA GRANDE
PARATY
PARATI MIRIM
VILA DOS MORADORES DAS LARANJEIRAS
CONDOMÍNIO RESIDENCIAL LARANJEIRAS
PONTA DA TRINDADE
PATRIMÔNIO
ALDEIA INDÍGENA ARAPONGA
ITATI DE PARATI MIRIM
TARITUBA
PETRÓPOLIS
CASCATINHA
ITAIPAVA
PEDRO DO RIO
POSSE
SÃO JOSÉ DO VALE DO RIO PRETO
VOLTA DO PIÃO
CRUZ
BOTAFOGO
SÃO MATEUS
SÃO PEDRO DA ALDEIA
SAQUAREMA
BACAXÁ
SAMPAIO CORREIA
TERESÓPOLIS
VALE DE BONSUCESSO
VIEIRA
VALE DO PAQUEQUER
BIQUINHA
PIÃO


In [54]:
rj_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,ANGRA DOS REIS,-23.009116,-44.319627,Padaria Pão e Acessórios,-23.007813,-44.319347,Bakery
1,ANGRA DOS REIS,-23.009116,-44.319627,Fromaggio,-23.010197,-44.321534,Italian Restaurant
2,ANGRA DOS REIS,-23.009116,-44.319627,Casa Nova Restaurante,-23.009634,-44.319927,Noodle House
3,ANGRA DOS REIS,-23.009116,-44.319627,Café Favorito,-23.007407,-44.318676,Café
4,ANGRA DOS REIS,-23.009116,-44.319627,Sandubas,-23.006986,-44.318461,Fast Food Restaurant


In [55]:
rj_venues.shape

(252, 7)

### Analyze Each Neighborhood

In [56]:
# one hot encoding
rj_onehot = pd.get_dummies(rj_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
rj_onehot['Neighborhood'] = rj_venues['Neighborhood'] 

rj_onehot.head()

Unnamed: 0,Acai House,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Bar,Beach,Bed & Breakfast,Bookstore,Boutique,Brazilian Restaurant,Burger Joint,Bus Station,Café,Campground,Chocolate Shop,Church,Churrascaria,Clothing Store,Convenience Store,Creperie,Deli / Bodega,Department Store,Dessert Shop,Diner,Dive Bar,Falafel Restaurant,Farm,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Food,Food & Drink Shop,Food Truck,French Restaurant,Fried Chicken Joint,Garden,Gastropub,German Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Health Food Store,Historic Site,History Museum,Hostel,Hotel,Ice Cream Shop,Italian Restaurant,Japanese Restaurant,Juice Bar,Market,Mountain,Neighborhood,Noodle House,Pastry Shop,Pedestrian Plaza,Pizza Place,Playground,Plaza,Pub,Resort,Restaurant,Rock Club,Salad Place,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Snack Place,Soccer Field,Soccer Stadium,Soup Place,Steakhouse,Supermarket,Surf Spot,Vegetarian / Vegan Restaurant,Waterfront,Wine Bar
0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,ANGRA DOS REIS,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,ANGRA DOS REIS,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,ANGRA DOS REIS,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,ANGRA DOS REIS,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,ANGRA DOS REIS,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [57]:
rj_onehot.shape

(252, 80)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [58]:
rj_grouped = rj_onehot.groupby('Neighborhood').mean().reset_index()
rj_grouped.head()

Unnamed: 0,Neighborhood,Acai House,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Bar,Beach,Bed & Breakfast,Bookstore,Boutique,Brazilian Restaurant,Burger Joint,Bus Station,Café,Campground,Chocolate Shop,Church,Churrascaria,Clothing Store,Convenience Store,Creperie,Deli / Bodega,Department Store,Dessert Shop,Diner,Dive Bar,Falafel Restaurant,Farm,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Food,Food & Drink Shop,Food Truck,French Restaurant,Fried Chicken Joint,Garden,Gastropub,German Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Health Food Store,Historic Site,History Museum,Hostel,Hotel,Ice Cream Shop,Italian Restaurant,Japanese Restaurant,Juice Bar,Market,Mountain,Noodle House,Pastry Shop,Pedestrian Plaza,Pizza Place,Playground,Plaza,Pub,Resort,Restaurant,Rock Club,Salad Place,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Snack Place,Soccer Field,Soccer Stadium,Soup Place,Steakhouse,Supermarket,Surf Spot,Vegetarian / Vegan Restaurant,Waterfront,Wine Bar
0,ABRAÃO,0.0,0.0,0.0,0.066667,0.0,0.0,0.133333,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.133333,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,ANGRA DOS REIS,0.066667,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.066667,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,ARARUAMA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.133333,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,ARMAÇÃO DOS BÚZIOS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.133333,0.0,0.0,0.133333,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,ARRAIAL DO CABO,0.0,0.0,0.0,0.0,0.066667,0.0,0.066667,0.066667,0.0,0.0,0.066667,0.066667,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.066667,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [59]:
rj_grouped.shape

(29, 80)

Let's print each neighborhood along with the top 5 most common venues

In [60]:
num_top_venues = 5

for hood in rj_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = rj_grouped[rj_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----ABRAÃO----
             venue  freq
0           Hostel  0.20
1       Restaurant  0.13
2  Bed & Breakfast  0.13
3            Hotel  0.13
4            Beach  0.13


----ANGRA DOS REIS----
                  venue  freq
0                 Plaza  0.13
1            Acai House  0.07
2  Brazilian Restaurant  0.07
3      Department Store  0.07
4  Fast Food Restaurant  0.07


----ARARUAMA----
              venue  freq
0  Department Store  0.13
1    Ice Cream Shop  0.13
2    Sandwich Place  0.07
3       Pizza Place  0.07
4             Hotel  0.07


----ARMAÇÃO DOS BÚZIOS----
                  venue  freq
0                 Hotel  0.13
1  Brazilian Restaurant  0.13
2                  Café  0.13
3         Deli / Bodega  0.07
4        Chocolate Shop  0.07


----ARRAIAL DO CABO----
                venue  freq
0          Restaurant  0.13
1       Grocery Store  0.07
2  Seafood Restaurant  0.07
3      Scenic Lookout  0.07
4         Snack Place  0.07


----BACAXÁ----
                  venue  freq
0  Br

In [106]:
# Return most common venues
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# Create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# Create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = rj_grouped['Neighborhood']

for ind in np.arange(rj_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(rj_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ABRAÃO,Hostel,Hotel,Beach,Bed & Breakfast,Restaurant,BBQ Joint,Italian Restaurant,Resort,Fish & Chips Shop,Wine Bar
1,ANGRA DOS REIS,Plaza,Acai House,Brazilian Restaurant,Gym / Fitness Center,Fast Food Restaurant,Diner,Italian Restaurant,Department Store,Noodle House,Café
2,ARARUAMA,Department Store,Ice Cream Shop,Pizza Place,Sandwich Place,Clothing Store,Market,Italian Restaurant,Restaurant,Hotel,Convenience Store
3,ARMAÇÃO DOS BÚZIOS,Café,Hotel,Brazilian Restaurant,French Restaurant,Boutique,Pedestrian Plaza,Chocolate Shop,Rock Club,Salad Place,Sandwich Place
4,ARRAIAL DO CABO,Restaurant,Grocery Store,Seafood Restaurant,Japanese Restaurant,Churrascaria,Hotel,Burger Joint,Brazilian Restaurant,Scenic Lookout,Creperie


### Cluster Neighborhoods

In [107]:
# Set number of clusters
kclusters = 7

rj_grouped_clustering = rj_grouped.drop('Neighborhood', 1)

# Run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(rj_grouped_clustering)

# Check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 6, 1, 6, 1, 6, 6, 6, 3, 1], dtype=int32)

In [108]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

rj_merged = geo.rename(columns={"NM_LOCALID": "Neighborhood"})

# Merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
rj_merged = rj_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

In [110]:
rj_merged.dropna(subset=["Cluster Labels"], axis=0, inplace=True)
rj_merged["Cluster Labels"] = rj_merged["Cluster Labels"].astype(int)
rj_merged.head(10) # check the last columns!

Unnamed: 0,NM_UF,NM_MUNICIP,NM_MICRO,Neighborhood,LONG,LAT,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,RIO DE JANEIRO,ANGRA DOS REIS,BAÍA DA ILHA GRANDE,ANGRA DOS REIS,-44.319627,-23.009116,6,Plaza,Acai House,Brazilian Restaurant,Gym / Fitness Center,Fast Food Restaurant,Diner,Italian Restaurant,Department Store,Noodle House,Café
1,RIO DE JANEIRO,ANGRA DOS REIS,BAÍA DA ILHA GRANDE,ABRAÃO,-44.160969,-23.143709,1,Hostel,Hotel,Beach,Bed & Breakfast,Restaurant,BBQ Joint,Italian Restaurant,Resort,Fish & Chips Shop,Wine Bar
2,RIO DE JANEIRO,ANGRA DOS REIS,BAÍA DA ILHA GRANDE,CUNHAMBEBE,-44.436164,-22.966426,1,Pizza Place,Playground,Market,Beach,Restaurant,Snack Place,Fast Food Restaurant,Dive Bar,Creperie,Deli / Bodega
4,RIO DE JANEIRO,ANGRA DOS REIS,BAÍA DA ILHA GRANDE,MAMBUCABA,-44.51641,-23.024562,5,Bar,Beach,Surf Spot,Wine Bar,Fish & Chips Shop,Department Store,Dessert Shop,Diner,Dive Bar,Falafel Restaurant
5,RIO DE JANEIRO,ARARUAMA,LAGOS,ARARUAMA,-42.341096,-22.877438,1,Department Store,Ice Cream Shop,Pizza Place,Sandwich Place,Clothing Store,Market,Italian Restaurant,Restaurant,Hotel,Convenience Store
6,RIO DE JANEIRO,ARARUAMA,LAGOS,IGUABINHA,-42.262149,-22.867313,5,Brazilian Restaurant,Beach,Wine Bar,Department Store,Dessert Shop,Diner,Dive Bar,Falafel Restaurant,Farm,Fast Food Restaurant
8,RIO DE JANEIRO,ARARUAMA,LAGOS,PRAIA SECA,-42.308847,-22.923216,1,Pizza Place,Shopping Mall,Diner,Italian Restaurant,Ice Cream Shop,Café,Hotel,Seafood Restaurant,Beach,Food
9,RIO DE JANEIRO,ARARUAMA,LAGOS,SÃO VICENTE DE PAULA,-42.257168,-22.730148,2,Campground,Wine Bar,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Diner,Dive Bar,Falafel Restaurant,Farm
13,RIO DE JANEIRO,ARARUAMA,LAGOS,POSSE,-42.226787,-22.766529,0,Mountain,Bed & Breakfast,Wine Bar,Fast Food Restaurant,Department Store,Dessert Shop,Diner,Dive Bar,Falafel Restaurant,Farm
14,RIO DE JANEIRO,ARMAÇÃO DOS BÚZIOS,LAGOS,ARMAÇÃO DOS BÚZIOS,-41.887749,-22.757764,6,Café,Hotel,Brazilian Restaurant,French Restaurant,Boutique,Pedestrian Plaza,Chocolate Shop,Rock Club,Salad Place,Sandwich Place


Create a map with the clusters

In [111]:
# Create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=8, min_zoom=8, max_zoom=8)

# Set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(rj_merged['LAT'], rj_merged['LONG'], rj_merged['Neighborhood'], rj_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine clusters

#### Cluster 0

In [122]:
rj_merged.loc[rj_merged['Cluster Labels'] == 0, rj_merged.columns[[3] + list(range(6, rj_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,POSSE,0,Mountain,Bed & Breakfast,Wine Bar,Fast Food Restaurant,Department Store,Dessert Shop,Diner,Dive Bar,Falafel Restaurant,Farm
24,PATRIMÔNIO,0,Restaurant,Bed & Breakfast,Wine Bar,Fast Food Restaurant,Deli / Bodega,Department Store,Dessert Shop,Diner,Dive Bar,Falafel Restaurant
32,POSSE,0,Mountain,Bed & Breakfast,Wine Bar,Fast Food Restaurant,Department Store,Dessert Shop,Diner,Dive Bar,Falafel Restaurant,Farm


#### Cluster 1

In [123]:
rj_merged.loc[rj_merged['Cluster Labels'] == 1, rj_merged.columns[[3] + list(range(6, rj_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,ABRAÃO,1,Hostel,Hotel,Beach,Bed & Breakfast,Restaurant,BBQ Joint,Italian Restaurant,Resort,Fish & Chips Shop,Wine Bar
2,CUNHAMBEBE,1,Pizza Place,Playground,Market,Beach,Restaurant,Snack Place,Fast Food Restaurant,Dive Bar,Creperie,Deli / Bodega
5,ARARUAMA,1,Department Store,Ice Cream Shop,Pizza Place,Sandwich Place,Clothing Store,Market,Italian Restaurant,Restaurant,Hotel,Convenience Store
8,PRAIA SECA,1,Pizza Place,Shopping Mall,Diner,Italian Restaurant,Ice Cream Shop,Café,Hotel,Seafood Restaurant,Beach,Food
15,ARRAIAL DO CABO,1,Restaurant,Grocery Store,Seafood Restaurant,Japanese Restaurant,Churrascaria,Hotel,Burger Joint,Brazilian Restaurant,Scenic Lookout,Creperie
18,IGUABA GRANDE,1,Pizza Place,Beach,Ice Cream Shop,Grocery Store,Burger Joint,Steakhouse,Snack Place,Dive Bar,Convenience Store,Deli / Bodega
30,ITAIPAVA,1,Pizza Place,Falafel Restaurant,Resort,Italian Restaurant,Grocery Store,Seafood Restaurant,French Restaurant,Diner,Convenience Store,Creperie
38,SÃO PEDRO DA ALDEIA,1,Pizza Place,Bed & Breakfast,Historic Site,Plaza,Sandwich Place,Gym / Fitness Center,Shopping Mall,Fast Food Restaurant,Fish Market,Bakery
39,SAQUAREMA,1,Pizza Place,Juice Bar,Dive Bar,Arts & Crafts Store,Food & Drink Shop,Steakhouse,Soup Place,Bakery,Soccer Field,Beach


#### Cluster 2

In [124]:
rj_merged.loc[rj_merged['Cluster Labels'] == 2, rj_merged.columns[[3] + list(range(6, rj_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,SÃO VICENTE DE PAULA,2,Campground,Wine Bar,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Diner,Dive Bar,Falafel Restaurant,Farm


#### Cluster 3

In [125]:
rj_merged.loc[rj_merged['Cluster Labels'] == 3, rj_merged.columns[[3] + list(range(6, rj_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,PARATI MIRIM,3,Beach,Wine Bar,Fast Food Restaurant,Deli / Bodega,Department Store,Dessert Shop,Diner,Dive Bar,Falafel Restaurant,Farm
22,CONDOMÍNIO RESIDENCIAL LARANJEIRAS,3,Beach,Wine Bar,Fast Food Restaurant,Deli / Bodega,Department Store,Dessert Shop,Diner,Dive Bar,Falafel Restaurant,Farm


#### Cluster 4

In [126]:
rj_merged.loc[rj_merged['Cluster Labels'] == 4, rj_merged.columns[[3] + list(range(6, rj_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,VILA DOS MORADORES DAS LARANJEIRAS,4,Brazilian Restaurant,Wine Bar,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Diner,Dive Bar,Falafel Restaurant,Farm


#### Cluster 5

In [127]:
rj_merged.loc[rj_merged['Cluster Labels'] == 5, rj_merged.columns[[3] + list(range(6, rj_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,MAMBUCABA,5,Bar,Beach,Surf Spot,Wine Bar,Fish & Chips Shop,Department Store,Dessert Shop,Diner,Dive Bar,Falafel Restaurant
6,IGUABINHA,5,Brazilian Restaurant,Beach,Wine Bar,Department Store,Dessert Shop,Diner,Dive Bar,Falafel Restaurant,Farm,Fast Food Restaurant
23,PONTA DA TRINDADE,5,Beach,Juice Bar,Burger Joint,Wine Bar,Fast Food Restaurant,Department Store,Dessert Shop,Diner,Dive Bar,Falafel Restaurant
27,TARITUBA,5,Beach,Seafood Restaurant,Hotel,Restaurant,Deli / Bodega,Department Store,Dessert Shop,Diner,Dive Bar,Falafel Restaurant


#### Cluster 6

In [128]:
rj_merged.loc[rj_merged['Cluster Labels'] == 6, rj_merged.columns[[3] + list(range(6, rj_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ANGRA DOS REIS,6,Plaza,Acai House,Brazilian Restaurant,Gym / Fitness Center,Fast Food Restaurant,Diner,Italian Restaurant,Department Store,Noodle House,Café
14,ARMAÇÃO DOS BÚZIOS,6,Café,Hotel,Brazilian Restaurant,French Restaurant,Boutique,Pedestrian Plaza,Chocolate Shop,Rock Club,Salad Place,Sandwich Place
16,CABO FRIO,6,Café,Dessert Shop,Brazilian Restaurant,Vegetarian / Vegan Restaurant,Arts & Crafts Store,Italian Restaurant,Ice Cream Shop,Gym,Plaza,Bookstore
19,PARATY,6,Church,Bookstore,Historic Site,Brazilian Restaurant,History Museum,Food & Drink Shop,Plaza,Pub,Bed & Breakfast,Restaurant
28,PETRÓPOLIS,6,Chocolate Shop,Wine Bar,Sandwich Place,German Restaurant,History Museum,Hotel,Dessert Shop,Pastry Shop,Restaurant,Garden
29,CASCATINHA,6,Gym,Supermarket,Market,Bus Station,Wine Bar,Farm,Department Store,Dessert Shop,Diner,Dive Bar
33,SÃO JOSÉ DO VALE DO RIO PRETO,6,Boutique,Gym,Bus Station,Fish & Chips Shop,Department Store,Dessert Shop,Diner,Dive Bar,Falafel Restaurant,Farm
40,BACAXÁ,6,Brazilian Restaurant,Fast Food Restaurant,Health Food Store,Pizza Place,Plaza,Café,Burger Joint,Acai House,Soccer Stadium,Bakery
41,SAMPAIO CORREIA,6,Bus Station,Farm,Brazilian Restaurant,Wine Bar,Creperie,Department Store,Dessert Shop,Diner,Dive Bar,Falafel Restaurant
42,TERESÓPOLIS,6,Café,Restaurant,Brazilian Restaurant,Gym / Fitness Center,Chocolate Shop,Seafood Restaurant,Gymnastics Gym,Deli / Bodega,Gym,Bakery


## Discussion

Seven clusters were selected. Clusters 1 and 6 are the ones that added more localities. They are clusters that concentrate the most visited cities in these selected regions.

##### Cluster 1

- In Abraão there are a lot of options of hotels ans hostels, you can stay there by little money. There are also good beach options.
- Arraial do Cabo is a great place. There is wonderful beaches and there may be more cheaper than Cabo Frio and Búzios (nearby places).

Cluster 1 there are more locations near a beach and can be more cheaper than locations in cluster 6

##### Cluster 6

- Paraty was interesting because it contains diverse places, such as: beach, historic places, bookstores.
- To get a european experience in Brazil, Petrópolis is a nice place to go. Chocolate shop and German Restaurante as main options.

Cluster 6 has higher cost locations, such as restaurants specializing in international dishes

The other places look less interesting to visit.

## Conclusion

Really, Rio de Janeiro is an excellent place, with several options: from the beach or the mountains. There are places for all tastes. Even very interesting places you can spend a lot less money.