<h1>Predicting Bike Rental Hours</h1>
<p>Here is the description of relevant columns</p>

<ul><li>instant - A unique sequential ID number for each row</li>
<li>dteday - The date of the rentals</li>
<li>season - The season in which the rentals occurred</li>
<li>yr - The year the rentals occurred</li>
<li>mnth - The month the rentals occurred</li>
<li>hr - The hour the rentals occurred</li>
<li>holiday - Whether or not the day was a holiday</li>
<li>weekday - The day of the week (as a number, 0 to 7)</li>
<li>workingday - Whether or not the day was a working day</li>
<li>weathersit - The weather (as a categorical variable)</li>
<li>temp - The temperature, on a 0-1 scale</li>
<li>atemp - The adjusted temperature</li>
<li>hum - The humidity, on a 0-1 scale</li>
<li>windspeed - The wind speed, on a 0-1 scale</li>
<li>casual - The number of casual riders (people who hadn't previously signed up with the bike sharing program)</li>
<li>registered - The number of registered riders (people who had already signed up)</li>
<li>cnt - The total number of bike rentals (casual + registered)</li></ul>

In [21]:
#importing libraries
import numpy as np
import pandas as pd

In [22]:
#importing data
df_data_1 = pd.read_csv("Predicting bike rental hours.csv")
df_data_1.head()


Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13,16
1,2,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32,40
2,3,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27,32
3,4,2011-01-01,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,3,10,13
4,5,2011-01-01,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,0,1,1


In [23]:
#Data preparation
df_data_1.drop(['casual','dteday','registered'],axis=1,inplace=True)
df_data_1.head(10)

Unnamed: 0,instant,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,cnt
0,1,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,16
1,2,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,40
2,3,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,32
3,4,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,13
4,5,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,1
5,6,1,0,1,5,0,6,0,2,0.24,0.2576,0.75,0.0896,1
6,7,1,0,1,6,0,6,0,1,0.22,0.2727,0.8,0.0,2
7,8,1,0,1,7,0,6,0,1,0.2,0.2576,0.86,0.0,3
8,9,1,0,1,8,0,6,0,1,0.24,0.2879,0.75,0.0,8
9,10,1,0,1,9,0,6,0,1,0.32,0.3485,0.76,0.0,14


In [24]:
df_data_1.isnull().sum()

instant       0
season        0
yr            0
mnth          0
hr            0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
cnt           0
dtype: int64

In [25]:
def assign_label(hour):
    if hour >=0 and hour < 6:
        return 4
    elif hour >=6 and hour < 12:
        return 1
    elif hour >= 12 and hour < 18:
        return 2
    elif hour >= 18 and hour <=24:
        return 3

df_data_1["time_label"] = df_data_1["hr"].apply(assign_label)

In [26]:
train = df_data_1.sample(frac=.8)

In [27]:
test = df_data_1.loc[~df_data_1.index.isin(train.index)]

In [28]:
#Data training and prediction:
from sklearn.linear_model import LinearRegression
predictors = list(train.columns)
predictors.remove("cnt")
reg = LinearRegression()
reg.fit(train[predictors], train["cnt"])

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

In [29]:
predictions = reg.predict(test[predictors])

np.mean((predictions - test["cnt"]) ** 2)

1771.5063711683563

In [30]:
from sklearn.tree import DecisionTreeRegressor

reg = DecisionTreeRegressor(min_samples_leaf=5)

reg.fit(train[predictors], train["cnt"])

DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=5,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=None, splitter='best')

In [31]:
predictions = reg.predict(test[predictors])

np.mean((predictions - test["cnt"]) ** 2)

309.48187859389327

In [32]:
reg = DecisionTreeRegressor(min_samples_leaf=2)

reg.fit(train[predictors], train["cnt"])

predictions = reg.predict(test[predictors])

np.mean((predictions - test["cnt"]) ** 2)

333.7736726874658

In [36]:
from sklearn.ensemble import RandomForestRegressor

reg = RandomForestRegressor(min_samples_leaf=5)
reg.fit(train[predictors], train["cnt"])
predictions = reg.predict(test[predictors])

np.mean((predictions - test["cnt"]) ** 2)



270.8234937984847

<h2>Linear Regression Error:1676.429751761688</h2>
<h2>DecisionTreeRegression Error:455.68105168193887</h2>
<h2>RandomForestRegressor Error:270.8234937984847</h2>