# IBM Data Science Capstone - Week 4 Task
### Applied Data Science Capstone by IBM - Coursera

## Table of contents
* [1. Introduction: Business Problem](#introduction)
* [2. Data](#data)
* [3. Methodology](#data)
* [4. Analysis](#data)
* [5. Results and Discussion](#data)
* [6. Conclusion](#data)

## 1. Introduction: Business Problem

In this project, we will evaluate the best place to **open a Art Gallery in the city of São Paulo**, located in **Brazil, South America**.  The city's metropolitan area, the Greater São Paulo, ranks as the most populous in Brazil and the 12th most populous on Earth. 

Having the largest economy by GDP in Latin America and the Southern Hemisphere, the city has the **11th largest GDP in the world**, representing alone 10.7% of all Brazilian GDP and 36% of the production of goods and services in the state of São Paulo, being home to 63% of established multinationals in Brazil, and has been responsible for 28% of the national scientific production.

Then, it is quite obvious that there are many museums and art galleries in every corner of the town. However, as many cities of the developing world, it is an unequal city, having many different borough standarts. There are very expensive and very poor neighborhoods. Thereby, our analysis will be based on popularity of neighborhoods: **the boroughs with more places relation to art have more chance to be good locations for our stakeholders invest**.

## 2. Data

### 2.1 City Hall Data

We will start getting the coordinates of all the boroughs of São Paulo, that are **avaiable in the City Hall website**, and create a dataframe with them.

In [1]:
import pandas as pd
import numpy as np

sp = pd.read_csv("Borough_SP.csv")
sp.rename(columns={' Longitude':'Longitude'}, inplace=True)
sp.head(5)

Unnamed: 0,Borough,Latitude,Longitude
0,Sé Bela Vista,-23.559353,-46.647325
1,Bom Retiro,-23.525558,-46.640975
2,Cambuci,-23.56438,-46.621792
3,Consolação,-23.551929,-46.655807
4,Liberdade,-23.559745,-46.635536


Also, lets see the shape of the data. We have 94 boroughs, that is a very good number.

In [2]:
sp.shape

(94, 3)

Now, lets visualize the data in a map, generated with folium.

In [3]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans

In [4]:
import folium

# Find the center of all the locations and prepare the folium map
df1 = sp

center_lat= -23.550274 #center around Sé, the center of the city
center_long= -46.633944

venues_map = folium.Map(location=[center_lat, center_long], zoom_start=11)

df2=df1.head(80)
for lat, lng, label in zip(df2.Latitude, df2.Longitude, df2.Borough):
    #print(lat,lng,label)
    folium.features.CircleMarker(
            [lat, lng],
            radius=7,
            color='blue',
            popup=label,
            fill = True,
            fill_color='blue',
            fill_opacity=0.6
    ).add_to(venues_map)
venues_map

Nice! The data is very spread, wich indicates we might have many different styles of neighborhoods. Now we may start to analise the data in Foursquare.

### 2.2 Foursquare Data

The purpose of the data acquired with Foursquare is to analyse the venues (of all sorts) in each neighborhood. After, we will segregate only the venues related to art, applying filter in the dataframe created.

First, we need to initialize our credentials of Foursquare.

In [5]:
CLIENT_ID = 'NBWX43X25ARRKGVM0PG30HQWLX5VJPGYLQH1EVSUZXAK2J3Z' # your Foursquare ID
CLIENT_SECRET = 'FGZECG33UKC4UU42B3PD3CXUDWQ4HCDPTIDQYNK2JWAM5QYM' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: NBWX43X25ARRKGVM0PG30HQWLX5VJPGYLQH1EVSUZXAK2J3Z
CLIENT_SECRET:FGZECG33UKC4UU42B3PD3CXUDWQ4HCDPTIDQYNK2JWAM5QYM


In [6]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [7]:
LIMIT = 200 # limit of number of venues returned by Foursquare API
sp_venues = getNearbyVenues(names=sp['Borough'],
                                   latitudes=sp['Latitude'],
                                   longitudes=sp['Longitude']
                                  )

Sé Bela Vista
Bom Retiro
Cambuci
Consolação
Liberdade
República
Santa Cecília
Sé
Aricanduva
Carrão
Vila Formosa
Cidade Tiradentes
Ermelino Matarazzo
Ponte Rasa
Guaianases
Lajeado
Itaim Paulista
Vila Curuçá
Itaquera
Cidade Líder
José Bonifácio
Parque do Carmo
Mooca Água Rasa
Belém
Brás
Moóca
Pari
Tatuapé
Penha
Artur Alvim
Cangaíba
Vila Matilde
São Mateus
São Rafael
São Miguel 
Jardim Helena
Vila Jacuí
Sapopemba
Vila Prudente
São Lucas
Casa Verde
Cachoeirinha
Limão
Brasilândia
 Freguesia do Ó
Jaçanã
Tremembé
Perus
Anhanguera
Pirituba
Jaraguá
São Domingos
Santana
Tucuruvi
Mandaqui
Vila Maria
Vila Guilherme
Vila Medeiros
Butantã
Morumbi
Raposo Tavares
Rio Pequeno
Vila Sônia
Lapa
Barra Funda
Jaguara
Jaguaré
Perdizes
Vila Leopoldina
Pinheiros
Alto de Pinheiros
Itaim Bibi
Jardim Paulista
Campo Limpo
Capão Redondo
Vila Andrade
Grajaú
Socorro
Cidade Ademar
Pedreira
Ipiranga
Sacomã
Jabaquara
M'Boi Mirim
Jardim Ângela
Jardim São Luís
Parelheiros
Marsilac
Santo Amaro
Campo Belo
Campo Grande
Moema


In [8]:
print(sp_venues.shape)
sp_venues.head()

(2817, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Sé Bela Vista,-23.559353,-46.647325,Templo da Carne - Marcos Bassi,-23.559061,-46.646452,Steakhouse
1,Sé Bela Vista,-23.559353,-46.647325,Basilicata,-23.55848,-46.6465,Bakery
2,Sé Bela Vista,-23.559353,-46.647325,Pousada dos Franceses,-23.559094,-46.647991,Hostel
3,Sé Bela Vista,-23.559353,-46.647325,Miguel Giannini,-23.558868,-46.647209,Optical Shop
4,Sé Bela Vista,-23.559353,-46.647325,Academia Gaviões,-23.560419,-46.646341,Gym / Fitness Center


In [9]:
# one hot encoding
sp_onehot = pd.get_dummies(sp_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
sp_onehot['Neighborhood'] = sp_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [sp_onehot.columns[-1]] + list(sp_onehot.columns[:-1])
sp_onehot = sp_onehot[fixed_columns]

sp_onehot.head()

Unnamed: 0,Neighborhood,Acai House,Accessories Store,African Restaurant,Alternative Healer,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Art Studio,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Baiano Restaurant,Bakery,Bar,Bath House,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bistro,Bookstore,Borek Place,Boutique,Bowling Alley,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Buffet,Building,Burger Joint,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Camera Store,Campground,Candy Store,Cheese Shop,Chinese Restaurant,Chocolate Shop,Churrascaria,Circus,Clothing Store,Cocktail Bar,Coffee Shop,College Bookstore,College Cafeteria,College Gym,College Quad,College Stadium,College Theater,Comedy Club,Comfort Food Restaurant,Community College,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Credit Union,Creperie,Cultural Center,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Dive Bar,Dive Shop,Dog Run,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store,Empada House,Empanada Restaurant,Event Space,Farm,Farmers Market,Fast Food Restaurant,Film Studio,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Frame Store,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Halal Restaurant,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,High School,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Lebanese Restaurant,Library,Lingerie Store,Liquor Store,Lottery Retailer,Lounge,Market,Martial Arts Dojo,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mental Health Office,Mexican Restaurant,Middle Eastern Restaurant,Mineiro Restaurant,Miscellaneous Shop,Mobile Phone Shop,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music School,Music Store,Music Venue,Nightclub,Northeastern Brazilian Restaurant,Northern Brazilian Restaurant,Office,Optical Shop,Organic Grocery,Other Repair Shop,Outdoors & Recreation,Paella Restaurant,Paper / Office Supplies Store,Park,Pastelaria,Pastry Shop,Pedestrian Plaza,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Piano Bar,Pie Shop,Pizza Place,Playground,Plaza,Poke Place,Pool,Pool Hall,Portuguese Restaurant,Print Shop,Pub,Public Art,Ramen Restaurant,Record Shop,Recording Studio,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Rock Club,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Samba School,Sandwich Place,Scenic Lookout,Science Museum,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South American Restaurant,Southeastern Brazilian Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Stationery Store,Steakhouse,Stoop Sale,Storage Facility,Street Art,Supermarket,Sushi Restaurant,Swiss Restaurant,Tapiocaria,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Toy / Game Store,Track,Trail,Train Station,Travel Agency,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Veterinarian,Video Store,Vietnamese Restaurant,Warehouse Store,Water Park,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,Sé Bela Vista,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Sé Bela Vista,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Sé Bela Vista,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Sé Bela Vista,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Sé Bela Vista,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [10]:
sp_grouped = sp_onehot.groupby('Neighborhood').sum().reset_index()
sp_grouped.head(5)

Unnamed: 0,Neighborhood,Acai House,Accessories Store,African Restaurant,Alternative Healer,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Art Studio,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Baiano Restaurant,Bakery,Bar,Bath House,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bistro,Bookstore,Borek Place,Boutique,Bowling Alley,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Buffet,Building,Burger Joint,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Camera Store,Campground,Candy Store,Cheese Shop,Chinese Restaurant,Chocolate Shop,Churrascaria,Circus,Clothing Store,Cocktail Bar,Coffee Shop,College Bookstore,College Cafeteria,College Gym,College Quad,College Stadium,College Theater,Comedy Club,Comfort Food Restaurant,Community College,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Credit Union,Creperie,Cultural Center,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Dive Bar,Dive Shop,Dog Run,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store,Empada House,Empanada Restaurant,Event Space,Farm,Farmers Market,Fast Food Restaurant,Film Studio,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Frame Store,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Halal Restaurant,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,High School,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Lebanese Restaurant,Library,Lingerie Store,Liquor Store,Lottery Retailer,Lounge,Market,Martial Arts Dojo,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mental Health Office,Mexican Restaurant,Middle Eastern Restaurant,Mineiro Restaurant,Miscellaneous Shop,Mobile Phone Shop,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music School,Music Store,Music Venue,Nightclub,Northeastern Brazilian Restaurant,Northern Brazilian Restaurant,Office,Optical Shop,Organic Grocery,Other Repair Shop,Outdoors & Recreation,Paella Restaurant,Paper / Office Supplies Store,Park,Pastelaria,Pastry Shop,Pedestrian Plaza,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Piano Bar,Pie Shop,Pizza Place,Playground,Plaza,Poke Place,Pool,Pool Hall,Portuguese Restaurant,Print Shop,Pub,Public Art,Ramen Restaurant,Record Shop,Recording Studio,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Rock Club,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Samba School,Sandwich Place,Scenic Lookout,Science Museum,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South American Restaurant,Southeastern Brazilian Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Stationery Store,Steakhouse,Stoop Sale,Storage Facility,Street Art,Supermarket,Sushi Restaurant,Swiss Restaurant,Tapiocaria,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Toy / Game Store,Track,Trail,Train Station,Travel Agency,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Veterinarian,Video Store,Vietnamese Restaurant,Warehouse Store,Water Park,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,Alto de Pinheiros,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Anhanguera,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Aricanduva,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Artur Alvim,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Barra Funda,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,1,0,0,0,0,2,6,0,0,2,0,0,0,0,1,0,0,0,0,0,4,0,2,0,0,1,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,2,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,3,0,0,0,0,0,0,0,0,0,0,0,1,0,2,0,0,2,9,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,3,0,0,0,1,0,1,1,0,0,0,0,0,0,5,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Ok, now we have for every neighborhood of the city its correspondent mean value of all the venues. The next step is to filter the dataframe which venues related to art. However since the public that would go to an art gallery would also visit other cultural venues, such as: museums, music auditoriums, theater, cultural centers, dance studios, piano bars, libraries, bookshops etc.

Another thing that is very important in visiting cultural places is the availability of public transportation, thus, bus and train stations will be also attributes of our dataframe.

Then, we must filter the table above to give us only the data related to cultural live. In a brief analysis, we can do it, selecting only the columns of interest.

In [12]:
sp_art = sp_grouped[['Neighborhood','Art Gallery','Art Museum','Art Studio','Arts & Crafts Store','Arts & Entertainment',
                     'Antique Shop','Bookstore','Bus Station','Bus Stop','Camera Store','Circus','College Bookstore','College Theater',
                     'Community College','Concert Hall','Cultural Center','Dance Studio','Film Studio','General Entertainment',
                     'History Museum','Historic Site','Indie Movie Theater','Jazz Club','Library','Movie Theater','Museum','Music School',
                     'Music Store','Music Venue','Piano Bar','Public Art','Record Shop','Recording Studio','Rock Club','Science Museum',
                     'Street Art','Theater','Train Station'
                    ]]
sp_art.head(5)

Unnamed: 0,Neighborhood,Art Gallery,Art Museum,Art Studio,Arts & Crafts Store,Arts & Entertainment,Antique Shop,Bookstore,Bus Station,Bus Stop,Camera Store,Circus,College Bookstore,College Theater,Community College,Concert Hall,Cultural Center,Dance Studio,Film Studio,General Entertainment,History Museum,Historic Site,Indie Movie Theater,Jazz Club,Library,Movie Theater,Museum,Music School,Music Store,Music Venue,Piano Bar,Public Art,Record Shop,Recording Studio,Rock Club,Science Museum,Street Art,Theater,Train Station
0,Alto de Pinheiros,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
1,Anhanguera,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Aricanduva,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Artur Alvim,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Barra Funda,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,2,0,0,2,0,1,0,0,0,0,0,3,0


## 3. Methodology

In the Methodology section, we will use data to cluster the data into five groups. These groups will contain the neighborhoods that have similar venues. The technique we will utilize is called k-mean clusterization. It is an unsupervised machine learning algorithm, that aims to group similar data together. 

Thereby, we will cluster the neighborhoods of São Paulo with the art-related venues. The first thing to do is to determine the 10 most common venues for each borough.

## 4. Analysis

In [13]:
#total = sp_art.sum(axis = 1, skipna = True) 
#tops = sp_art
#tops.insert(1, 'Total', total)
#tops = classificacao.sort_values(by=['Total'], ascending=False)
#tops.head(10)

Now, we have a table that show us the relation of every São Paulo neighborhoods and all the categories related to the arts. In the next section, we will start to work with this data and cluster it!

In [14]:
sp_grouped = sp_art
num_top_venues = 5

for hood in sp_grouped['Neighborhood']:
    temp = sp_grouped[sp_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})

In [15]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [16]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = sp_grouped['Neighborhood']

for ind in np.arange(sp_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sp_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alto de Pinheiros,Bookstore,Train Station,Dance Studio,Camera Store,Cultural Center,Concert Hall,Community College,College Theater,College Bookstore,Circus
1,Anhanguera,Train Station,Camera Store,Dance Studio,Cultural Center,Concert Hall,Community College,College Theater,College Bookstore,Circus,Bus Stop
2,Aricanduva,Train Station,Camera Store,Dance Studio,Cultural Center,Concert Hall,Community College,College Theater,College Bookstore,Circus,Bus Stop
3,Artur Alvim,Train Station,Camera Store,Dance Studio,Cultural Center,Concert Hall,Community College,College Theater,College Bookstore,Circus,Bus Stop
4,Barra Funda,Theater,Music Venue,Museum,Bus Stop,Arts & Crafts Store,Public Art,Bookstore,Indie Movie Theater,Cultural Center,Movie Theater


Based on the data processed above, we developed five clusters to analyze the neighborhoods, with k-means:

In [17]:
# set number of clusters
kclusters = 5

sp_grouped_clustering = sp_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sp_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1, 1, 1, 2, 1, 1, 1, 1, 1], dtype=int32)

In [18]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

sp_merged = sp

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
sp_merged = sp_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Borough')

sp_merged.head() # check the last columns!

Unnamed: 0,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Sé Bela Vista,-23.559353,-46.647325,2.0,Theater,Arts & Crafts Store,Antique Shop,Bookstore,Jazz Club,Movie Theater,College Bookstore,Cultural Center,Concert Hall,Community College
1,Bom Retiro,-23.525558,-46.640975,1.0,Train Station,Camera Store,Dance Studio,Cultural Center,Concert Hall,Community College,College Theater,College Bookstore,Circus,Bus Stop
2,Cambuci,-23.56438,-46.621792,1.0,Train Station,Camera Store,Dance Studio,Cultural Center,Concert Hall,Community College,College Theater,College Bookstore,Circus,Bus Stop
3,Consolação,-23.551929,-46.655807,3.0,Theater,Dance Studio,Bookstore,College Theater,Movie Theater,Art Gallery,Antique Shop,Arts & Entertainment,Bus Station,Bus Stop
4,Liberdade,-23.559745,-46.635536,0.0,History Museum,Public Art,Bookstore,Music Venue,Dance Studio,Train Station,Camera Store,Concert Hall,Community College,College Theater


A “Cluster Labels” column was created and two columns with the coordinates from Table 1, that will be used for visualization purposes. It is possible to see that different neighborhoods can be associated in the same cluster, or in different clusters, depending on their venues characteristics.

In [19]:
sp_merged=sp_merged.dropna() #remove NaN values

Now, that we have all the neighborhoods clustered and properly labeled, we can finally plot them and see everything on a map, to facilitate the results visualization. That is shown in the figure below.

In [21]:
map_clusters = folium.Map(location=[center_lat, center_long], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sp_merged['Latitude'], sp_merged['Longitude'], sp_merged['Borough'], sp_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=8,
        popup=label,
        color=rainbow[int(cluster)],
        fill=True,
        fill_color=rainbow[int(cluster)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Cluster 1

In [31]:
sp_merged.loc[sp_merged['Cluster Labels'] == 0, sp_merged.columns[[0] + list(range(4, sp_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Liberdade,History Museum,Public Art,Bookstore,Music Venue,Dance Studio,Train Station,Camera Store,Concert Hall,Community College,College Theater
6,Santa Cecília,Bookstore,Train Station,Camera Store,Dance Studio,Cultural Center,Concert Hall,Community College,College Theater,College Bookstore,Circus
27,Tatuapé,Dance Studio,Arts & Crafts Store,Train Station,Camera Store,Cultural Center,Concert Hall,Community College,College Theater,College Bookstore,Circus
40,Casa Verde,Music Venue,Arts & Crafts Store,Bookstore,Dance Studio,Art Studio,Arts & Entertainment,Antique Shop,Art Museum,Bus Station,Film Studio
52,Santana,Bookstore,Street Art,Recording Studio,Train Station,Camera Store,Cultural Center,Concert Hall,Community College,College Theater,College Bookstore
69,Pinheiros,Art Gallery,Bookstore,Movie Theater,Circus,Dance Studio,Arts & Entertainment,Antique Shop,Arts & Crafts Store,Bus Station,Film Studio
70,Alto de Pinheiros,Bookstore,Train Station,Dance Studio,Camera Store,Cultural Center,Concert Hall,Community College,College Theater,College Bookstore,Circus
93,Vila Mariana,Dance Studio,Theater,Music School,Arts & Entertainment,Community College,Bookstore,Cultural Center,Concert Hall,College Theater,College Bookstore


#### Cluster 2

In [32]:
sp_merged.loc[sp_merged['Cluster Labels'] == 1, sp_merged.columns[[0] + list(range(4, sp_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Bom Retiro,Train Station,Camera Store,Dance Studio,Cultural Center,Concert Hall,Community College,College Theater,College Bookstore,Circus,Bus Stop
2,Cambuci,Train Station,Camera Store,Dance Studio,Cultural Center,Concert Hall,Community College,College Theater,College Bookstore,Circus,Bus Stop
8,Aricanduva,Train Station,Camera Store,Dance Studio,Cultural Center,Concert Hall,Community College,College Theater,College Bookstore,Circus,Bus Stop
9,Carrão,Train Station,Camera Store,Dance Studio,Cultural Center,Concert Hall,Community College,College Theater,College Bookstore,Circus,Bus Stop
10,Vila Formosa,Train Station,Camera Store,Dance Studio,Cultural Center,Concert Hall,Community College,College Theater,College Bookstore,Circus,Bus Stop
11,Cidade Tiradentes,Bus Station,Theater,Train Station,Circus,Dance Studio,Cultural Center,Concert Hall,Community College,College Theater,College Bookstore
12,Ermelino Matarazzo,Music Venue,Bus Stop,Cultural Center,Concert Hall,Community College,College Theater,College Bookstore,Circus,Camera Store,Train Station
13,Ponte Rasa,Train Station,Camera Store,Dance Studio,Cultural Center,Concert Hall,Community College,College Theater,College Bookstore,Circus,Bus Stop
14,Guaianases,Train Station,Camera Store,Dance Studio,Cultural Center,Concert Hall,Community College,College Theater,College Bookstore,Circus,Bus Stop
15,Lajeado,Train Station,Camera Store,Dance Studio,Cultural Center,Concert Hall,Community College,College Theater,College Bookstore,Circus,Bus Stop


#### Cluster 3

In [33]:
sp_merged.loc[sp_merged['Cluster Labels'] == 2, sp_merged.columns[[0] + list(range(4, sp_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Sé Bela Vista,Theater,Arts & Crafts Store,Antique Shop,Bookstore,Jazz Club,Movie Theater,College Bookstore,Cultural Center,Concert Hall,Community College
5,República,Theater,Music School,Arts & Crafts Store,Record Shop,Piano Bar,Bookstore,Music Store,Cultural Center,Jazz Club,Camera Store
64,Barra Funda,Theater,Music Venue,Museum,Bus Stop,Arts & Crafts Store,Public Art,Bookstore,Indie Movie Theater,Cultural Center,Movie Theater


#### Cluster 4

In [34]:
sp_merged.loc[sp_merged['Cluster Labels'] == 3, sp_merged.columns[[0] + list(range(4, sp_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Consolação,Theater,Dance Studio,Bookstore,College Theater,Movie Theater,Art Gallery,Antique Shop,Arts & Entertainment,Bus Station,Bus Stop


#### Cluster 5

In [35]:
sp_merged.loc[sp_merged['Cluster Labels'] == 4, sp_merged.columns[[0] + list(range(4, sp_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Sé,Art Gallery,Cultural Center,Arts & Crafts Store,Bookstore,Historic Site,College Bookstore,Music Venue,Theater,Art Museum,Bus Station
91,Moema,Arts & Crafts Store,Art Gallery,Music Venue,Dance Studio,Art Studio,Art Museum,Arts & Entertainment,Antique Shop,Bookstore,Bus Station


## 5. Results and Discussion

It is now clear that many boroughs would not be appropriate to open an art gallery. Those would be from clusters C0 and C1. That actually makes sense, because the majority of the neighborhoods from C0 and C1 are very far from downtown and they are residential, therefore they do not have art-related venues. Therefore, as we do not seek these characteristics, C0 and C1 are eliminated from our analysis.

However, if we check closely to C2, C3 and C4 we can see that these are promising boroughs, by analyzing the data from the clustering.

After a deeper analysis in all the clusters and their characteristics, and without forgetting the total amount of venues calculated in Table 4, we can determine the best neighborhoods to invest in an art gallery in São Paulo. This final result is shown in Table 10 below.


| Position | Neighborhood Name  | Cluster |
|------|------|------|
|   1st  | Sé| C-4 |
|   2nd  | Moema| C-4 |
|   3rd  | Consolação| C-3 |
|   4th  | Bela Vista| C-2 |
|   5th  | República| C-2 |



The results are very appropriate, since all these neighborhoods are very artistic in their own way. Sé, Bela Vista, Consolação and República are central and old locations, where there are many museums and art galleries. Moema is a new neighborhood, but it is a very hipster place, composed mainly by young people. Therefore, the recommendation above matches perfectly with the reality of São Paulo. 

## 6. Conclusion

This project started with one CSV file containing data from neighborhoods and its geographical coordinates. In addition, it was obtained from foursquare art-related venues of each neighborhood. After that initial phase, a machine learning technique, called k-means clusterization was used to group all the similar neighborhoods together and to check which of them were the best ones to open an Art Gallery in São Paulo.

It was possible to see that more central the boroughs have more chance to have many art-related venues, increasing the chances of a good business in the area. Table 10 summarize all the results, showing that Sé is the best location for our art gallery. It makes sense, since it is the neighborhood with more art galleries in the city and it the neighborhood with more general art-related venues. 

Therefore, Sé would certainly be the recommendation of this report for anyone that seeks to open an art gallery in São Paulo. The code, all the data and analysis utilized in this report is available in GitHub.
