# Sistemas de Recomendação - Doenças Coronárias

Objetivo é prever e desenvolver uma análise de um problema envolvendo a predição de doenças coronárias.

Pesquisadores coletaram diversas informações sobre pacientes que deram entradas em hospitais de Cleveland, EUA. Afirmando estarem com dores no peito. Dentre as diversas informações coletadas, é fornecido a você um dataset com 14 colunas (13
características e 1 variável alvo) sobre 303 pacientes.

Fonte original dos dados:
https://www.kaggle.com/ronitf/heart-disease-uci


## Importando os pacotes necessários e configuração

In [3]:
from pathlib import Path
import progressbar as pb
import warnings # supress warnings
warnings.filterwarnings('always')
warnings.filterwarnings('ignore')


import pandas as pd
# Add some convenience functions to Pandas DataFrame.
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.3f}'.format

## Fazendo download do dataset

O dataset está disponibilizado na plataforma [Kaggle](https://www.kaggle.com/) (caso não conheça, sugiro fortemente que leia sobre). Para obtermos acesso programático ao dataset precisamos obter uma senha e chave para autentificação, [aqui](https://www.kaggle.com/docs/api) você encontra como obter a sua chave.

Substitua abaixo `KAGGLE_USERNAME` e `KAGGLE_KEY` pelas informações obtidas na [configuração de conta do Kaggle](https://www.kaggle.com/ricoms/account).

In [4]:
%env KAGGLE_USERNAME = vhmgomide
%env KAGGLE_KEY = cb02d4c3b325abd25df7e43c9bf23347

!kaggle datasets download -d ronitf/heart-disease-uci --unzip -p /content/heart-disease-uci
!ls /content/heart-disease-uci

env: KAGGLE_USERNAME=vhmgomide
env: KAGGLE_KEY=cb02d4c3b325abd25df7e43c9bf23347
Downloading heart-disease-uci.zip to /content/heart-disease-uci
  0% 0.00/3.40k [00:00<?, ?B/s]
100% 3.40k/3.40k [00:00<00:00, 6.35MB/s]
heart.csv


## Carregando dados

Ler o arquivo csv e colocar em um dataframe.

In [5]:
DATA_PATH = Path("/content/heart-disease-uci")

# Load the data 
heart_df = pd.read_csv(DATA_PATH / 'heart.csv')

## Analisando os dados brevemente

O objetivo desta exploração é ser breve e superficial para nosso objetivo

In [6]:
display(heart_df.head().style.set_caption("Sample of heart.csv"))
print(f"heart.csv shape is {heart_df.shape}")

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


heart.csv shape is (303, 14)


In [7]:
heart_df.describe()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
count,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0
mean,54.366,0.683,0.967,131.624,246.264,0.149,0.528,149.647,0.327,1.04,1.399,0.729,2.314,0.545
std,9.082,0.466,1.032,17.538,51.831,0.356,0.526,22.905,0.47,1.161,0.616,1.023,0.612,0.499
min,29.0,0.0,0.0,94.0,126.0,0.0,0.0,71.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,47.5,0.0,0.0,120.0,211.0,0.0,0.0,133.5,0.0,0.0,1.0,0.0,2.0,0.0
50%,55.0,1.0,1.0,130.0,240.0,0.0,1.0,153.0,0.0,0.8,1.0,0.0,2.0,1.0
75%,61.0,1.0,2.0,140.0,274.5,0.0,1.0,166.0,1.0,1.6,2.0,1.0,3.0,1.0
max,77.0,1.0,3.0,200.0,564.0,1.0,2.0,202.0,1.0,6.2,2.0,4.0,3.0,1.0


In [9]:
heart_df['slope'].unique()

array([0, 2, 1])

In [10]:
heart_df.corr()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
age,1.000,-0.098,-0.069,0.279,0.214,0.121,-0.116,-0.399,0.097,0.210,-0.169,0.276,0.068,-0.225
sex,-0.098,1.000,-0.049,-0.057,-0.198,0.045,-0.058,-0.044,0.142,0.096,-0.031,0.118,0.210,-0.281
cp,-0.069,-0.049,1.000,0.048,-0.077,0.094,0.044,0.296,-0.394,-0.149,0.120,-0.181,-0.162,0.434
trestbps,0.279,-0.057,0.048,1.000,0.123,0.178,-0.114,-0.047,0.068,0.193,-0.121,0.101,0.062,-0.145
chol,0.214,-0.198,-0.077,0.123,1.000,0.013,-0.151,-0.010,0.067,0.054,-0.004,0.071,0.099,-0.085
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
oldpeak,0.210,0.096,-0.149,0.193,0.054,0.006,-0.059,-0.344,0.288,1.000,-0.578,0.223,0.210,-0.431
slope,-0.169,-0.031,0.120,-0.121,-0.004,-0.060,0.093,0.387,-0.258,-0.578,1.000,-0.080,-0.105,0.346
ca,0.276,0.118,-0.181,0.101,0.071,0.138,-0.072,-0.213,0.116,0.223,-0.080,1.000,0.152,-0.392
thal,0.068,0.210,-0.162,0.062,0.099,-0.032,-0.012,-0.096,0.207,0.210,-0.105,0.152,1.000,-0.344
