<a href="https://colab.research.google.com/github/wantor-stack/Coursera_Capstone/blob/main/Clustering_Rio_de_Janeiro_Neighborhoods.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Applied Data Science Capstone- Battle of the Neighborhoods
## Clustering the Neighborhoods of Rio de Janeiro, RJ

## Author: Wanderson Torres


---



---


## Introduction/ Business Problem:

#### The emergence of the coronavirus disease 2019 (COVID-19), which is caused by infection from the previously unknown severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has devastated economies and caused unprecedented challenges to healthcare and food systems around the world. Globally, billions of people have been ordered to stay at home as a result of lockdowns, while almost three million people have died (as of the end of March 2021).

#### With the pandemic infecting the planet, most jobs have gone remote, thus giving people an option to move farther away from the office to a safety area. We have seen a trend where people are moving away from crowded and expensive neighborhoods to the suburbs or even to other cities/states where there is more secure and more protected against the virus. 

#### I live in the city of Rio de Janeiro and due to the increase in the number of deaths and the occupation of the hospitals beds reaching their maximum occupancy, I would like to conduct a study where it points out areas of lower risk and high options for health treatment in general. 

#### The aim of the project is to apply the skills learned in the Coursera course to find the safest neighborhood in Rio de Janeiro, surrounded by hospitals, drugstores, clinics and so on. This will be determined by analyzing the number of cases of covid, deaths and the profile of the neighborhoods population, clustering neighborhoods using k-means and exploring on the map the top common healthy venues in the safest neighborhoods. 

####This exercise may also be of interest to anyone who is facing this pandemic and is concerned about the health issue.


---


---




## Data

#### The Rio Covid Panel is an initiative of the city of Rio de Janeiro. It's an open source and available for download directly from the Data Rio Website [website](https://www.data.rio/app/painel-rio-covid-19). It contained the features about COVID cases: neighborhood of residence, case evolution, age group and sex. 

#### I also obtained a list of Rio de Janeiro’s districts and neighborhoods by web scraping a [Wikipedia page](https://pt.wikipedia.org/wiki/Lista_de_bairros_da_cidade_do_Rio_de_Janeiro) using BeautifulSoup. However, this dataset lacked the geographical coordinates. So, I used geocoder to obtain the latitude and longitude coordinates for each Rio de Janeiro neighborhood.

#### I've got the number of habitants in each neighborhoood in Rio de Janeiro from from the Data Rio Website [website](https://www.data.rio/search?groupIds=0f4009068ec74e17b25eb3e70891b95f&sort=-modified). It's ans open source information and available to download for free.

#### To get venue information in each neighborhood, I called the [Foursquare API](https://foursquare.com/developers/apps). This gave me a dataset containing the venue name, latitude and longitude coordinates of the venue location, and the venue category.

In [3]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if geopy package is not installed
#!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from geopy.exc import GeocoderTimedOut 

import requests # library to handle requests

#!pip install BeautifulSoup4
from bs4 import BeautifulSoup # for web scraping

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

# import seaborn to make pretty plots
import seaborn as sns

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if folium package is not installed
#!pip install folium
import folium # map-rendering library

print('Libraries imported.')

Libraries imported.


#Example of the Dataset of Rio de Janeiro Neighborhoods and Geocoordinates

In [5]:
# TO IMPORT RIO DE JANEIRO DATAFRAME WITH COORDINATES FROM THE GETGO
rj_data = pd.read_csv('rio-rj-neighborhoods-geo.csv', header=0)
#rj_data.dropna(subset=['Latitude'], inplace=True)
#rj_data.reset_index(drop=True, inplace=True)
rj_data.head()

Unnamed: 0,City,Zone,District,Neighborhood,Latitude,Longitude
0,Rio de Janeiro,Central,Centro Histórico e Zona Portuária,São Cristóvão,-22.899318,-43.221935
1,Rio de Janeiro,Central,Centro Histórico e Zona Portuária,Benfica,-22.892297,-43.240341
2,Rio de Janeiro,Central,Centro Histórico e Zona Portuária,Caju,-22.880306,-43.221494
3,Rio de Janeiro,Central,Centro Histórico e Zona Portuária,Catumbi,-22.919454,-43.197081
4,Rio de Janeiro,Central,Centro Histórico e Zona Portuária,Centro,-22.904393,-43.183065


# Example of Dataset of Rio de Janeiro COVID Cases

In [6]:
# TO IMPORT RIO DE JANEIRO DATAFRAME WITH COORDINATES FROM THE GETGO
rj_covid = pd.read_csv('rj_covid.csv', header=0)
#rj_data.dropna(subset=['Latitude'], inplace=True)
#rj_data.reset_index(drop=True, inplace=True)
rj_covid.head()

Unnamed: 0,Neighborhood,evolution
0,ABOLICAO,OBITO
1,ABOLICAO,OBITO
2,ABOLICAO,OBITO
3,ABOLICAO,OBITO
4,ABOLICAO,OBITO


# Example of Dataset of Rio de Janeiro Population per Neighborhood

In [7]:
# TO IMPORT RIO DE JANEIRO DATAFRAME WITH COORDINATES FROM THE GETGO
rj_pop = pd.read_csv('rj_pop.csv', header=0)
#rj_data.dropna(subset=['Latitude'], inplace=True)
#rj_data.reset_index(drop=True, inplace=True)
rj_pop.head()

Unnamed: 0,Neighborhood,Population
0,COPACABANA,161031
1,LEBLON,50648
2,IPANEMA,47017
3,FLAMENGO,55047
4,MEIER,54811
