### Importando Pandas

*   Biblioteca para lidar, visualizar e manipular com o dataset.


In [1]:
import pandas as pd

### Importando o Dataset de Boston Housing

O Dataset de Boston Housing contém dados do censo americano sobre moradias na área de Boston. O dataset contém features como criminalidade, quantidade de quartos, proximidade a centros industriais, etc. Nosso objetivo é, com isso, predizer o preço de cada casa em milhar de dólar.

In [3]:
from sklearn.datasets import load_boston

boston = load_boston() # Configurando o Dataframe

**Configurando o Dataframe**

In [4]:
df = pd.DataFrame(boston.data, columns= boston.feature_names)

df['target'] = boston.target

df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,target
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


In [5]:
target = df.pop('target')

**Dividindo em Datasets de Treino e Teste**

In [6]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(df, target, train_size = 0.8, test_size = 0.2, random_state = 0)

### Gradient Boosting

Agora, vamos tentar predizer o preço das casas utilizando um regressor de Gradient Boosting.

**Importando e Criando o Modelo**

In [7]:
from sklearn.ensemble import GradientBoostingRegressor

In [8]:
# Criando um regressor de Gradient Boosting com 100 árvores de decisão de profundidade 3.
gradr = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)

In [9]:
# Treinando o modelo no dataset de treino
gradr.fit(X_train, y_train)

GradientBoostingRegressor(alpha=0.9, ccp_alpha=0.0, criterion='friedman_mse',
                          init=None, learning_rate=0.1, loss='ls', max_depth=3,
                          max_features=None, max_leaf_nodes=None,
                          min_impurity_decrease=0.0, min_impurity_split=None,
                          min_samples_leaf=1, min_samples_split=2,
                          min_weight_fraction_leaf=0.0, n_estimators=100,
                          n_iter_no_change=None, presort='deprecated',
                          random_state=42, subsample=1.0, tol=0.0001,
                          validation_fraction=0.1, verbose=0, warm_start=False)

**Avaliando o Modelo**

In [11]:
from sklearn.model_selection import cross_val_score

In [12]:
# Retorna o erro médio do nosso modelo no dataset de teste
score = -1*cross_val_score(gradr, X_test, y_test, cv = 10, scoring = 'neg_mean_absolute_error').mean()

score

3.062012848541953

**Comparação entre Nossas Predições e o Preço Real**

In [13]:
# Gerando as predições
gradr_preds = gradr.predict(X_test)

# Criando um dataframe para comparar o valor real com nossas predições
gradr_comparison = pd.DataFrame()
gradr_comparison['Valor Real'] = y_test
gradr_comparison['Predição'] = gradr_preds

gradr_comparison.head(10)

Unnamed: 0,Valor Real,Predição
329,22.6,24.509386
371,50.0,31.991749
219,23.0,23.695919
403,8.3,10.670755
78,21.2,22.330107
15,19.9,20.626791
487,20.6,20.828585
340,18.7,20.720449
310,16.1,23.422303
102,18.6,18.567367


### Random Forest

Agora, vamos tentar fazer a mesma predição com um modelo de Bagging: o Random Forest.

**Importando e Criando o Modelo**

In [14]:
from sklearn.ensemble import RandomForestRegressor

In [15]:
# Criando um regressor de Random Forest com 200 árvores de decisão.
rfr = RandomForestRegressor(n_estimators = 200, random_state = 42)

In [16]:
# Treinando o modelo no dataset de treino
rfr.fit(X_train, y_train)

RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',
                      max_depth=None, max_features='auto', max_leaf_nodes=None,
                      max_samples=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=1,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      n_estimators=200, n_jobs=None, oob_score=False,
                      random_state=42, verbose=0, warm_start=False)

**Avaliando o Modelo**

In [17]:
# Retorna o erro médio do nosso modelo no dataset de teste
score = -1*cross_val_score(rfr, X_test, y_test, cv = 10, scoring = 'neg_mean_absolute_error').mean()

score

3.164898181818181

**Comparação entre Nossas Predições e o Preço Real**

In [18]:
# Gerando as predições
rfr_preds = rfr.predict(X_test)

# Criando um dataframe para comparar o valor real com nossas predições
rfr_comparison = pd.DataFrame()
rfr_comparison['Valor Real'] = y_test
rfr_comparison['Predição'] = rfr_preds

rfr_comparison.head(10)

Unnamed: 0,Valor Real,Predição
329,22.6,24.0715
371,50.0,27.7795
219,23.0,22.061
403,8.3,11.1035
78,21.2,20.783
15,19.9,20.646
487,20.6,21.347
340,18.7,20.015
310,16.1,20.4115
102,18.6,18.928
