    In this repo, I applied Decision Tree Regressor, Random Forest, Gredient Boosting, and Ensemble Learning algorithms to their capabilities for predicting total number of bike rentals from the bike sharing systems dataset.
    
    Bike sharing systems are a means of renting bicycles where the process of obtaining membership, rental, and bike return is automated via a network of kiosk locations throughout a city. Using these systems, people are able rent a bike from a one location and return it to a different place on an as-needed basis. Currently, there are over 500 bike-sharing programs around the world.
    I evaluated these model's by Mean Squared Error & Root Mean Squared Error.
    
    I also detected best hyperparameters for these models by Hyperparameter Tuning Method.
    
    Source of the dataset is: https://www.kaggle.com/c/bike-sharing-demand/data

#### IMPORTING LIBRARIES

In [1]:
import pandas as pd
import numpy as np

#### GETTING KNOW OF OUR DATASET

In [2]:
bike = pd.read_csv("C:\\Users\\talfi\\python\\ML\\datasets\\bike\\train.csv")
print("Shape of df is: ", bike.shape)
print("Does this data contain NA values?", bike.isnull().values.any())
bike.head()

Shape of df is:  (10886, 12)
Does this data contain NA values? False


Unnamed: 0,datetime,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,casual,registered,count
0,2011-01-01 00:00:00,1,0,0,1,9.84,14.395,81,0.0,3,13,16
1,2011-01-01 01:00:00,1,0,0,1,9.02,13.635,80,0.0,8,32,40
2,2011-01-01 02:00:00,1,0,0,1,9.02,13.635,80,0.0,5,27,32
3,2011-01-01 03:00:00,1,0,0,1,9.84,14.395,75,0.0,3,10,13
4,2011-01-01 04:00:00,1,0,0,1,9.84,14.395,75,0.0,0,1,1


In [29]:
bike.dtypes

datetime       object
season          int64
holiday         int64
workingday      int64
weather         int64
temp          float64
atemp         float64
humidity        int64
windspeed     float64
casual          int64
registered      int64
count           int64
dtype: object

In [3]:
datetime = bike.pop("datetime")
bike.dtypes

season          int64
holiday         int64
workingday      int64
weather         int64
temp          float64
atemp         float64
humidity        int64
windspeed     float64
casual          int64
registered      int64
count           int64
dtype: object

#### SPLITTING DATA INTO TRAINING AND TEST SETS

In [4]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(bike, bike["count"],
                                                   test_size = 0.2,
                                                   random_state = 3)

#### DECISION TREE  REGRESSOR


Decision Tree Regressor is the first ML model I evaluated.  It uses a decision tree to go from observations about an item to conclusions about the item's target value.

 <font color=blue> **HYPERPARAMATER TUNING**
</font> <br>
Let's tune hyperparamaters in a point that returns best performance.<br>
Let's do this for Decision Tree Regressor.

In [46]:
from sklearn.tree import DecisionTreeRegressor
# Let's make some hyperparameter tuning..
from sklearn.model_selection import GridSearchCV
params = {"random_state":[0,2,4,6,8],
          "max_depth":[4,5,6,7,8],
          "min_samples_leaf":[0.001, 0.01, 0.1, 1]}
grid = GridSearchCV(estimator = DecisionTreeRegressor(),
                    param_grid = params,
                    cv = 3,
                    scoring = "neg_mean_squared_error",
                   verbose = 1, n_jobs = -1)
grid.fit(X_train, y_train)
best_hyperparameters = grid.best_params_
print("Best Hyperparameters for Decision Tree Regressor are: ", best_hyperparameters)

Fitting 3 folds for each of 100 candidates, totalling 300 fits
Best Hyperparameters for Decision Tree Regressor are:  {'max_depth': 8, 'min_samples_leaf': 1, 'random_state': 0}


Let's set the hyperparameters accordingly.

In [47]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error as MSE
dt = DecisionTreeRegressor(max_depth = 8, min_samples_leaf = 1, random_state = 0)
dt.fit(X_train, y_train)
y_pred = dt.predict(X_test)

####  MEAN SQUARED ERROR AND ROOT MEAN SQUARED ERROR

**Mean Squared Error(MSE)** is an evaluation metric. The mean squared error tells you how close a regression line is to a set of points. It has values between 0 and infinity; lower  MSE is better concantration of the data around the line of best fit. <br>

**Root Mean Squared Error(RMSE)**  is also an evaluation metric. It is the root of Mean Squared Error. It tells you how concentrated the data is around the line of best fit. It has values between 0 and infinity; lower  RMSE is better concantration of the data around the line of best fit. 

In [48]:
mse_core = MSE(y_test, y_pred)
print("Base MSE Score is: ", mse_core)
rmse_core = mse_core **(1/2)
print("Base RMSE Score is: ",rmse_core)

Base MSE Score is:  5.236190247392217
Base RMSE Score is:  2.288272328065918


#### CROSS VALIDATION SCORE AND OVERFITTING / UNDERFITTING CHECK

**Cross-validation** is a statistical method used to estimate the skill of machine learning models<br>

**K-fold cross validation** is a procedure used to estimate the skill of the model on new data.<br>

In this repo, we are going to use 10-Fold CV(K=10) to increase our Model's efficiency.

In [49]:
from sklearn.model_selection import cross_val_score
MSE_CV = -cross_val_score(dt, X_train, y_train,#output of CV Score'll be negative, by adding (-), I'll make it positive
                         cv=10, scoring = "neg_mean_squared_error",#CV Scores doesn't allow to cumpute MSE directly
                         n_jobs=-1)#to direct all CPU's for this process

**Overfitting** is a modeling error that occurs when a function is too closely fit to a limited set of data points. A model overfits the data when it performs successfully on the training data and weakly on the test data.<br>

**Underfitting** occurs when a model performs weekly not only on the training data but also in the test data. <br>

 In our case, 
 * If CV MSE is bigger than training set MSE, our model overfits the data.
 * If CV MSE is roughly equal to the training set MSE, but much more bigger than the Base MSE, our model underfits the data.

In [50]:
dt.fit(X_train, y_train)
# Predict the labels of the training set
y_predict_train = dt.predict(X_train)
# Predict the labels of test set
y_predict_test = dt.predict(X_test)
# CV MSE
print("CV MSE: {:.2f}".format(MSE_CV.mean()))
# Training set MSE
print("Training Set MSE: ",MSE(y_train, y_predict_train))

CV MSE: 0.99
Training Set MSE:  0.6068964790430603


* As CV MSE is roughly equal to the Training Set MSE(just with a 0.30 difference), we can not diagnose the overfitting.
* Moreover, because CV MSE & Training Set MSE smaller than Base MSE(2.28), we also can not diagnose the underfitting.
* Hence, everything looks just fine.


#### RANDOM FOREST REGRESSOR
 **Random Forest Regressor** takes the average of many decision trees. Each tree is weaker than a full decision tree, but by combining them, we get better overall performance.

 <font color=blue> **HYPERPARAMATER TUNING**
</font> <br>
Let's tune hyperparamaters in a point that returns best performance.<br>
Let's do this for Random Forest Regressor.

In [51]:
from sklearn.ensemble import RandomForestRegressor
# Let's make some hyperparameter tuning..
from sklearn.model_selection import GridSearchCV
params = {"random_state":[0,2,4,6,8],
          "max_depth":[4,5,6,7,8],
          "n_estimators":[100,200,300,400,500]}
grid = GridSearchCV(estimator = RandomForestRegressor(),
                    param_grid = params,
                    cv = 3,
                    scoring = "neg_mean_squared_error",
                   verbose = 1, n_jobs = -1)
grid.fit(X_train, y_train)
best_hyperparameters = grid.best_params_
print("Best Hyperparameters for Random Forest Regressor are: ", best_hyperparameters)

Fitting 3 folds for each of 125 candidates, totalling 375 fits
Best Hyperparameters for Random Forest Regressor are:  {'max_depth': 8, 'n_estimators': 400, 'random_state': 0}


Let's set them accordingly

In [52]:
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators = 400, max_depth = 8, random_state = 0)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
rmse_test = MSE(y_test, y_pred)**(1/2)
print("RMSE of Random Forest is: ",rmse_test)
mse_test = MSE(y_test, y_pred)
print("MSE of Random Forest is: ",mse_test)

RMSE of Random Forest is:  1.3026320970331224
MSE of Random Forest is:  1.69685038022091


See, both scores are less than Decision Tree Regressor's scores. <br>
This means that Random Forest Regressor more successful to concentrated the data around the line of best fit.

#### GRADIENT BOOSTING REGRESSOR
**Gradient Boosting Regressor** uses even weaker decision trees that increasingly focused on hard examples.

 <font color=blue> **HYPERPARAMATER TUNING**
</font> <br>
Let's tune hyperparamaters in a point that returns best performance.<br>
Let's do this for Gradient Boosting Regressor.

In [54]:
from sklearn.ensemble import GradientBoostingRegressor
# Let's make some hyperparameter tuning..
from sklearn.model_selection import GridSearchCV
params = {"random_state":[0,2,4,6,8],
          "max_depth":[4,5,6,7,8],
          "n_estimators":[100,200,300,400,500]}
grid = GridSearchCV(estimator = GradientBoostingRegressor(),
                    param_grid = params,
                    cv = 3,
                    scoring = "neg_mean_squared_error",
                   verbose = 1, n_jobs = -1)
grid.fit(X_train, y_train)
best_hyperparameters = grid.best_params_
print("Best Hyperparameters for Random Forest Regressor are: ", best_hyperparameters)

Fitting 3 folds for each of 125 candidates, totalling 375 fits
Best Hyperparameters for Random Forest Regressor are:  {'max_depth': 6, 'n_estimators': 500, 'random_state': 0}


Let's set these hyperparameters accordingly.

In [55]:
from sklearn.ensemble import GradientBoostingRegressor
gbt = GradientBoostingRegressor(n_estimators = 500, max_depth = 6, random_state = 0)
gbt.fit(X_train, y_train)
y_pred = gbt.predict(X_test)
rmse_test = MSE(y_test, y_pred)**(1/2)
print("RMSE of Gradient Boosting is: ",rmse_test)
mse_test = MSE(y_test, y_pred)
print("MSE of Gradient Boosting is: ",mse_test)

RMSE of Gradient Boosting is:  0.5949756651371644
MSE of Gradient Boosting is:  0.35399604210541114


Both scores are less than Decision Tree and Random Forest Regressors' scores. <br>
This means that Gradient Boosting Regressor more successful to concentrated the data around the line of best fit.

#### ENSEMBLE LEARNING

**Ensemble learning** is the process by which multiple models, such as classifiers or experts, are strategically generated and combined to solve a particular computational intelligence problem. In ensemble learning,<br>
* First, we train different models in the same dataset.
* Then, train models make predictions.
* We create a meta model by merging existing models.
* Our meta model is the model of the ensemble learning. <br>
In this repo, we are going to set our meta model as `VotingRegressor`

In [56]:
rf = RandomForestRegressor(n_estimators = 400, max_depth = 8, random_state = 0)
dt = DecisionTreeRegressor(max_depth = 8, min_samples_leaf = 1, random_state = 0)
gbt = GradientBoostingRegressor(n_estimators = 500, max_depth = 6, random_state = 0)
regressors = [("RMSE for Decision Tree Regressor", dt), ("RMSE for Random Forest Regressor", rf), ("RMSE For Gradient Boosting Regressor", gbt)]
# For Loop is for returning results of the algorithms above
for rg_name, rg in regressors:
    rg.fit(X_train, y_train)
    y_pred = rg.predict(X_test)
    print("{:s} : {:.3f}".format(rg_name, (MSE(y_pred, y_test)**(1/2))))
# Below is for Voting Regressor
vc = VotingRegressor(estimators = regressors)
vc.fit(X_train, y_train)
y_pred = vc.predict(X_test)
print("RMSE for Voting Regressor :", (MSE(y_pred,y_test)**(1/2)))

RMSE for Decision Tree Regressor : 2.288
RMSE for Random Forest Regressor : 1.303
RMSE For Gradient Boosting Regressor : 0.595
RMSE for Voting Regressor : 1.3264669493804975


Voting Regressor performed pretty well, but not as good as Gredient Boosting Regressor. This is what Ensemble Learning does, performs pretty well, but in some cases; it may not perform the best