# Dataset Characteristics

### General Information
- The dataset contains two files: `hour.csv` and `day.csv`.
- Both files share most fields, **except `hr`**, which is only available in `hour.csv`.

---

### **Fields Description**

#### **Identifiers**
- **`instant`**: Record index  
- **`dteday`**: Date  

---

#### **Time Information**
- **`season`**: Season of the year  
  - `1`: Spring  
  - `2`: Summer  
  - `3`: Fall  
  - `4`: Winter  
- **`yr`**: Year  
  - `0`: 2011  
  - `1`: 2012  
- **`mnth`**: Month (1 to 12)  
- **`hr`**: Hour of the day (0 to 23)  

---

#### **Day Type**
- **`holiday`**: Whether the day is a holiday  
  - `1`: Holiday  
  - `0`: Not a holiday  
- **`weekday`**: Day of the week (0: Sunday, 1: Monday, ...)  
- **`workingday`**: Whether it’s a working day  
  - `1`: Neither weekend nor holiday  
  - `0`: Otherwise  

---

#### **Weather Information**
- **`weathersit`**: Weather situation  
  - `1`: Clear, Few clouds, Partly cloudy  
  - `2`: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds  
  - `3`: Light Snow, Light Rain + Thunderstorm + Scattered clouds  
  - `4`: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog  
- **`temp`**: Normalized temperature (actual temperature ÷ 41)  
- **`atemp`**: Normalized "feeling" temperature (perceived temperature ÷ 50)  
- **`hum`**: Normalized humidity (actual humidity ÷ 100)  
- **`windspeed`**: Normalized wind speed (actual wind speed ÷ 67)  

---

#### **Bike Rental Data**
- **`casual`**: Count of casual users  
- **`registered`**: Count of registered users  
- **`cnt`**: Total count of bike rentals (casual + registered)  


--------------------------------

# Columns Description

| **Column Name** | **Description**                                                                                             |
|------------------|-------------------------------------------------------------------------------------------------------------|
| `instant`        | Record index                                                                                               |
| `dteday`         | Date of the record                                                                                         |
| `season`         | Season of the year (`1`: Spring, `2`: Summer, `3`: Fall, `4`: Winter)                                      |
| `yr`             | Year (`0`: 2011, `1`: 2012)                                                                               |
| `mnth`           | Month of the year (`1` to `12`)                                                                            |
| `hr`             | Hour of the day (`0` to `23`)                                                                              |
| `holiday`        | Whether the day is a holiday (`1`: Yes, `0`: No)                                                           |
| `weekday`        | Day of the week (e.g., `0`: Sunday, `1`: Monday, ...)                                                      |
| `workingday`     | Whether the day is a working day (`1`: Yes, `0`: No)                                                       |
| `weathersit`     | Weather situation (`1`: Clear/Partly Cloudy, `2`: Mist/Cloudy, `3`: Light Snow/Rain, `4`: Heavy Rain/Snow) |
| `temp`           | Normalized temperature (actual temperature ÷ 41)                                                          |
| `atemp`          | Normalized "feeling" temperature (perceived temperature ÷ 50)                                             |
| `hum`            | Normalized humidity (actual humidity ÷ 100)                                                                |
| `windspeed`      | Normalized wind speed (actual wind speed ÷ 67)                                                             |
| `casual`         | Count of casual users (unregistered)                                                                       |
| `registered`     | Count of registered users                                                                                  |
| `cnt`            | Total count of bike rentals (sum of `casual` and `registered`)                                             |


----------------------------------

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px

In [2]:
day_df = pd.read_csv('day.csv',index_col='instant')
hour_df = pd.read_csv('hour.csv',index_col='instant')

In [3]:
day_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 731 entries, 1 to 731
Data columns (total 15 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   dteday      731 non-null    object 
 1   season      731 non-null    int64  
 2   yr          731 non-null    int64  
 3   mnth        731 non-null    int64  
 4   holiday     731 non-null    int64  
 5   weekday     731 non-null    int64  
 6   workingday  731 non-null    int64  
 7   weathersit  731 non-null    int64  
 8   temp        731 non-null    float64
 9   atemp       731 non-null    float64
 10  hum         731 non-null    float64
 11  windspeed   731 non-null    float64
 12  casual      731 non-null    int64  
 13  registered  731 non-null    int64  
 14  cnt         731 non-null    int64  
dtypes: float64(4), int64(10), object(1)
memory usage: 91.4+ KB


In [4]:
hour_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 17379 entries, 1 to 17379
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   dteday      17379 non-null  object 
 1   season      17379 non-null  int64  
 2   yr          17379 non-null  int64  
 3   mnth        17379 non-null  int64  
 4   hr          17379 non-null  int64  
 5   holiday     17379 non-null  int64  
 6   weekday     17379 non-null  int64  
 7   workingday  17379 non-null  int64  
 8   weathersit  17379 non-null  int64  
 9   temp        17379 non-null  float64
 10  atemp       17379 non-null  float64
 11  hum         17379 non-null  float64
 12  windspeed   17379 non-null  float64
 13  casual      17379 non-null  int64  
 14  registered  17379 non-null  int64  
 15  cnt         17379 non-null  int64  
dtypes: float64(4), int64(11), object(1)
memory usage: 2.3+ MB


In [5]:
day_df.head()

Unnamed: 0_level_0,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
instant,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
4,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600


In [6]:
hour_df.head()

Unnamed: 0_level_0,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
instant,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
1,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13,16
2,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32,40
3,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27,32
4,2011-01-01,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,3,10,13
5,2011-01-01,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,0,1,1


In [7]:
df = hour_df.copy()

In [8]:
df.isna().mean()

dteday        0.0
season        0.0
yr            0.0
mnth          0.0
hr            0.0
holiday       0.0
weekday       0.0
workingday    0.0
weathersit    0.0
temp          0.0
atemp         0.0
hum           0.0
windspeed     0.0
casual        0.0
registered    0.0
cnt           0.0
dtype: float64

In [9]:
df['dteday'] = pd.to_datetime(df['dteday'])

In [10]:
df.describe()

Unnamed: 0,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
count,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0
mean,2.50164,0.502561,6.537775,11.546752,0.02877,3.003683,0.682721,1.425283,0.496987,0.475775,0.627229,0.190098,35.676218,153.786869,189.463088
std,1.106918,0.500008,3.438776,6.914405,0.167165,2.005771,0.465431,0.639357,0.192556,0.17185,0.19293,0.12234,49.30503,151.357286,181.387599
min,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.02,0.0,0.0,0.0,0.0,0.0,1.0
25%,2.0,0.0,4.0,6.0,0.0,1.0,0.0,1.0,0.34,0.3333,0.48,0.1045,4.0,34.0,40.0
50%,3.0,1.0,7.0,12.0,0.0,3.0,1.0,1.0,0.5,0.4848,0.63,0.194,17.0,115.0,142.0
75%,3.0,1.0,10.0,18.0,0.0,5.0,1.0,2.0,0.66,0.6212,0.78,0.2537,48.0,220.0,281.0
max,4.0,1.0,12.0,23.0,1.0,6.0,1.0,4.0,1.0,1.0,1.0,0.8507,367.0,886.0,977.0


In [11]:
df.duplicated().sum()

0

In [12]:
df_season = df['season'].value_counts(normalize=True).round(3).reset_index()
df_season.columns = ['season','percentage']
df_season['season'] = df_season['season'].map({1:'Spring',2:'Summer',3:'Fall',4:'Winter'})
df_season

Unnamed: 0,season,percentage
0,Fall,0.259
1,Summer,0.254
2,Spring,0.244
3,Winter,0.244


In [13]:
px.pie(df_season ,names='season',values='percentage',title='Season Distribution'
        ,hole=0.3,color_discrete_sequence=px.colors.sequential.Mint,labels={'season':'Season'},template='plotly_dark')

In [14]:
df_year = df['yr'].value_counts(normalize=True).round(3).reset_index()
df_year.columns = ['year','percentage']
df_year['year'] = df_year['year'].map({0:'2011',1:'2012'})
df_year

Unnamed: 0,year,percentage
0,2012,0.503
1,2011,0.497


In [15]:
px.pie(df_year ,names='year',values='percentage',title='Year Distribution'
        ,hole=0.3,color_discrete_sequence=px.colors.sequential.Mint,labels={'year':'Year'},template='plotly_dark')

In [16]:
df_month = df['mnth'].value_counts(normalize=True).round(3).reset_index()
df_month.columns = ['month','percentage']
df_month['month'] = df_month['month'].map({1:'Jan',2:'Feb',3:'Mar',4:'Apr',5:'May',6:'Jun',7:'Jul',8:'Aug',9:'Sep',10:'Oct',11:'Nov',12:'Dec'})
df_month


Unnamed: 0,month,percentage
0,May,0.086
1,Jul,0.086
2,Dec,0.085
3,Aug,0.085
4,Mar,0.085
5,Oct,0.083
6,Jun,0.083
7,Apr,0.083
8,Sep,0.083
9,Nov,0.083


In [17]:
px.bar(df_month,y='month',x='percentage',title='Month Distribution',text_auto=True
        ,color='month',labels={'month':'Month','percentage':'Percentage'}
        ,template='plotly_dark',color_discrete_sequence=px.colors.sequential.Mint)

In [18]:
df_hour = df['hr'].value_counts(normalize=True).round(3).reset_index()
df_hour.columns = ['hour','percentage']
df_hour = df_hour.sort_values('hour')
df_hour

Unnamed: 0,hour,percentage
17,0,0.042
19,1,0.042
21,2,0.041
23,3,0.04
22,4,0.04
20,5,0.041
18,6,0.042
16,7,0.042
15,8,0.042
14,9,0.042


In [19]:
px.line(df_hour,x='hour',y='percentage',title='Hour Distribution',labels={'hour':'Hour','percentage':'Percentage'}
        ,template='plotly_dark',color_discrete_sequence=px.colors.sequential.Mint)

In [20]:
df_holiday = df['holiday'].value_counts(normalize=True).round(3).reset_index()
df_holiday.columns = ['holiday','percentage']
df_holiday['holiday'] = df_holiday['holiday'].map({0:'No',1:'Yes'})
df_holiday

Unnamed: 0,holiday,percentage
0,No,0.971
1,Yes,0.029


In [21]:
px.pie(df_holiday ,names='holiday',values='percentage',title='Holiday Distribution'
        ,hole=0.3,color_discrete_sequence=px.colors.sequential.Mint,labels={'holiday':'Holiday'},template='plotly_dark')

In [22]:
df_weathersit = df['weathersit'].value_counts(normalize=True).round(3).reset_index()
df_weathersit.columns = ['weathersit','percentage']
df_weathersit['weathersit'] = df_weathersit['weathersit'].map({1:'Clear',2:'Mist',3:'Light Snow',4:'Heavy Rain'})
df_weathersit

Unnamed: 0,weathersit,percentage
0,Clear,0.657
1,Mist,0.261
2,Light Snow,0.082
3,Heavy Rain,0.0


In [23]:
px.pie(df_weathersit ,names='weathersit',values='percentage',title='Weather Situation Distribution'
        ,hole=0.3,color_discrete_sequence=px.colors.sequential.Mint,labels={'weathersit':'Weather Situation'},template='plotly_dark')

In [24]:
df_weekday = df['weekday'].value_counts(normalize=True).round(3).reset_index()
df_weekday.columns = ['weekday','percentage']
df_weekday['weekday'] = df_weekday['weekday'].map({0:'Sun',1:'Mon',2:'Tue',3:'Wed',4:'Thu',5:'Fri',6:'Sat'})
df_weekday

Unnamed: 0,weekday,percentage
0,Sat,0.145
1,Sun,0.144
2,Fri,0.143
3,Mon,0.143
4,Wed,0.142
5,Thu,0.142
6,Tue,0.141


In [25]:
px.bar(df_weekday,y='weekday',x='percentage',title='Weekday Distribution',text_auto=True
        ,color='weekday',labels={'weekday':'Weekday','percentage':'Percentage'}
        ,template='plotly_dark',color_discrete_sequence=px.colors.sequential.Mint)

In [26]:
df_workingday = df['workingday'].value_counts(normalize=True).round(3).reset_index()
df_workingday.columns = ['workingday','percentage']
df_workingday['workingday'] = df_workingday['workingday'].map({0:'No',1:'Yes'})
df_workingday

Unnamed: 0,workingday,percentage
0,Yes,0.683
1,No,0.317


In [27]:
px.pie(df_workingday ,names='workingday',values='percentage',title='Workingday Distribution'
        ,hole=0.3,color_discrete_sequence=px.colors.sequential.Mint,labels={'workingday':'Workingday'},template='plotly_dark')

In [28]:
df[['temp', 'atemp', 'hum', 'windspeed']].describe()

Unnamed: 0,temp,atemp,hum,windspeed
count,17379.0,17379.0,17379.0,17379.0
mean,0.496987,0.475775,0.627229,0.190098
std,0.192556,0.17185,0.19293,0.12234
min,0.02,0.0,0.0,0.0
25%,0.34,0.3333,0.48,0.1045
50%,0.5,0.4848,0.63,0.194
75%,0.66,0.6212,0.78,0.2537
max,1.0,1.0,1.0,0.8507


In [29]:
df.columns

Index(['dteday', 'season', 'yr', 'mnth', 'hr', 'holiday', 'weekday',
       'workingday', 'weathersit', 'temp', 'atemp', 'hum', 'windspeed',
       'casual', 'registered', 'cnt'],
      dtype='object')

In [30]:
corr_count = df[['temp', 'hum', 'windspeed','casual', 'registered']].corr(numeric_only = True)[['casual', 'registered']]
corr_count

Unnamed: 0,casual,registered
temp,0.459616,0.335361
hum,-0.347028,-0.273933
windspeed,0.090287,0.082321
casual,1.0,0.506618
registered,0.506618,1.0


In [31]:
import plotly.express as px


fig = px.imshow(
    corr_count, 
    color_continuous_scale='Mint', 
    title='Correlation Heatmap',
    labels={'y': 'Features', 'x': 'Target'}, 
    template='plotly_dark',
    text_auto=True
)

fig.update_layout(
    autosize=True,  
    width=800,      
    height=800,     
)

fig.show()


In [32]:
df_dteday = df['dteday'].value_counts(normalize=True).reset_index()
df_dteday.columns = ['dteday','percentage']
df_dteday.sort_values('dteday',inplace=True)
df_dteday

Unnamed: 0,dteday,percentage
0,2011-01-01,0.001381
681,2011-01-02,0.001323
717,2011-01-03,0.001266
661,2011-01-04,0.001323
655,2011-01-05,0.001323
...,...,...
239,2012-12-27,0.001381
240,2012-12-28,0.001381
241,2012-12-29,0.001381
242,2012-12-30,0.001381


In [33]:
px.line(df_dteday,x='dteday',y='percentage',title='Date Distribution',labels={'dteday':'Date','percentage':'Percentage'}
        ,template='plotly_dark',color_discrete_sequence=px.colors.sequential.Mint)

In [34]:
X = df.drop(['casual', 'registered', 'cnt', 'dteday'], axis=1)
y = df[['casual', 'registered']]

In [35]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

X_train.shape, X_val.shape, X_test.shape, y_train.shape, y_val.shape, y_test.shape

((11122, 12), (2781, 12), (3476, 12), (11122, 2), (2781, 2), (3476, 2))

In [36]:
df.columns

Index(['dteday', 'season', 'yr', 'mnth', 'hr', 'holiday', 'weekday',
       'workingday', 'weathersit', 'temp', 'atemp', 'hum', 'windspeed',
       'casual', 'registered', 'cnt'],
      dtype='object')

In [37]:
df['weathersit'].value_counts()

1    11413
2     4544
3     1419
4        3
Name: weathersit, dtype: int64

In [45]:
num_cols = ['temp', 'hum', 'windspeed']
cat_cols = ['season', 'yr', 'mnth', 'hr', 'holiday', 'weekday', 'workingday', 'weathersit']

---------------------------------
#  **Linear Regression**

In [39]:
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error, r2_score

from sklearn.linear_model import LinearRegression
lr_pipe = Pipeline([
    ('preprocessor', ColumnTransformer([
        ('num', 'passthrough', num_cols),
        ('cat', OneHotEncoder(drop='first'), cat_cols)
    ])),
    ('model', LinearRegression())
])

lr_pipe.fit(X_train, y_train)

lr_pipe.score(X_train, y_train), lr_pipe.score(X_val, y_val)

lr_score = {'Train Score': lr_pipe.score(X_train, y_train), 'Validation Score': lr_pipe.score(X_val, y_val)}
lr_score

{'Train Score': 0.6344289703306972, 'Validation Score': 0.6454009063525241}

--------------------------------
# **Ridge** 

In [40]:
from sklearn.linear_model import Ridge

ridge_pipe = Pipeline([
    ('preprocessor', ColumnTransformer([
        ('num', 'passthrough', num_cols),
        ('cat', OneHotEncoder(drop='first'), cat_cols)
    ])),
    ('model', Ridge())
])

ridge_pipe.fit(X_train, y_train)

ridge_pipe.score(X_train, y_train), ridge_pipe.score(X_val, y_val)

ridge_score = {'Train Score': ridge_pipe.score(X_train, y_train), 'Validation Score': ridge_pipe.score(X_val, y_val)}
ridge_score

{'Train Score': 0.6343746154691785, 'Validation Score': 0.6453148288464586}

--------------------------------
# **Lasso** 

In [41]:
from sklearn.linear_model import Lasso

lasso_pipe = Pipeline([
    ('preprocessor', ColumnTransformer([
        ('num', 'passthrough', num_cols),
        ('cat', OneHotEncoder(drop='first'), cat_cols)
    ])),
    ('model', Lasso())
])

lasso_pipe.fit(X_train, y_train)

lasso_pipe.score(X_train, y_train), lasso_pipe.score(X_val, y_val)

lasso_score = {'Train Score': lasso_pipe.score(X_train, y_train), 'Validation Score': lasso_pipe.score(X_val, y_val)}
lasso_score

{'Train Score': 0.5301880338769001, 'Validation Score': 0.5345968159382251}

--------------------------------
# **KNN** 

In [42]:
from sklearn.neighbors import KNeighborsRegressor

knn_pipe = Pipeline([
    ('preprocessor', ColumnTransformer([
        ('num', 'passthrough', num_cols),
        ('cat', OneHotEncoder(drop='first'), cat_cols)
    ])),
    ('model', KNeighborsRegressor())
])

knn_pipe.fit(X_train, y_train)

y_pred_train = knn_pipe.predict(X_train)
y_pred_val = knn_pipe.predict(X_val)

knn_score = {'Train Score': r2_score(y_train, y_pred_train), 'Validation Score': r2_score(y_val, y_pred_val)}
knn_score

{'Train Score': 0.8226700014087514, 'Validation Score': 0.7102609396561782}

--------------------------------
# **Decision Tree** 

In [43]:
from sklearn.tree import DecisionTreeRegressor

dt_pipe = Pipeline([
    ('preprocessor', ColumnTransformer([
        ('num', 'passthrough', num_cols),
        ('cat', OneHotEncoder(drop='first'), cat_cols)
    ])),
    ('model', DecisionTreeRegressor())
])

dt_pipe.fit(X_train, y_train)

y_pred_train = dt_pipe.predict(X_train)

y_pred_val = dt_pipe.predict(X_val)

dt_score = {'Train Score': r2_score(y_train, y_pred_train), 'Validation Score': r2_score(y_val, y_pred_val)}
dt_score

{'Train Score': 0.9999814233660258, 'Validation Score': 0.7569096061592935}

--------------------------------
# **Random Forest** 

In [44]:
from sklearn.ensemble import RandomForestRegressor

rf_pipe = Pipeline([
    ('preprocessor', ColumnTransformer([
        ('num', 'passthrough', num_cols),
        ('cat', OneHotEncoder(drop='first'), cat_cols)
    ])),
    ('model', RandomForestRegressor())
])

rf_pipe.fit(X_train, y_train)

y_pred_train = rf_pipe.predict(X_train)

y_pred_val = rf_pipe.predict(X_val)

rf_score = {'Train Score': r2_score(y_train, y_pred_train), 'Validation Score': r2_score(y_val, y_pred_val)}
rf_score

{'Train Score': 0.981663731688816, 'Validation Score': 0.8800373482509166}

******************************************************

# Intro To Deep Learning

In [46]:
ohe = OneHotEncoder(drop='first')
transformer = ColumnTransformer([
    ('num', 'passthrough', num_cols),
    ('cat', ohe, cat_cols)
])

X_train_transformed = transformer.fit_transform(X_train)
X_val_transformed = transformer.transform(X_val)
X_test_transformed = transformer.transform(X_test)

In [None]:
from keras.models import Sequential
from keras.layers import Dense


DL_Model = Sequential([
(Dense(12,input_dim = X_train_transformed.shape[1], activation='relu')),
(Dense(8,activation='relu')),
(Dense(2, activation='linear'))
])

DL_Model.compile(loss='mean_squared_error',optimizer='adam',metrics=['mse','accuracy'])

X_train_arr = X_train_transformed.toarray()
X_test_arr = X_test_transformed.toarray()

history = DL_Model.fit(X_train_arr,y_train,validation_split=0.2, epochs=10, batch_size=32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [61]:
df_history = pd.DataFrame(history.history)
df_history

Unnamed: 0,loss,mse,accuracy,val_loss,val_mse,val_accuracy
0,24601.789062,24601.789062,0.922558,21147.691406,21147.691406,0.981124
1,16981.847656,16981.847656,0.978644,11374.865234,11374.865234,0.981124
2,10614.790039,10614.790039,0.978644,8601.757812,8601.757812,0.981124
3,8186.311035,8186.311035,0.975947,6481.357422,6481.357422,0.972584
4,6290.055176,6290.055176,0.95313,5288.078613,5288.078613,0.929438
5,5437.984863,5437.984863,0.916264,4876.37207,4876.37207,0.899775
6,5076.921875,5076.921875,0.885242,4637.106445,4637.106445,0.873708
7,4879.790039,4879.790039,0.869507,4490.103516,4490.103516,0.857528
8,4734.108887,4734.108887,0.858716,4369.248047,4369.248047,0.855281
9,4608.89502,4608.89502,0.855007,4257.267578,4257.267578,0.851236


In [63]:
DL_Model.evaluate(X_test_arr,y_test)



[4326.74462890625, 4326.74462890625, 0.8656501770019531]

In [58]:
from sklearn.metrics import r2_score

y_pred_test = DL_Model.predict(X_test_transformed.toarray())

r2_score(y_test, y_pred_test)


  1/109 [..............................] - ETA: 3s



0.4526355266571045

In [None]:
import keras_tuner as kt

def build_model(hp):
    activation = hp.Choice('dense_activation', ['relu', 'sigmoid'])
    initializer = hp.Choice('dense_initializer', ['glorot_uniform', 'normal'])
    optimizer = hp.Choice('optimizer', ['adam', 'sgd'])
    model = Sequential()
    model.add(Dense(128, input_dim=X_train.shape[1], activation=activation, kernel_initializer=initializer))
    model.add(Dense(64, activation=activation, kernel_initializer=initializer))
    model.add(Dense(32, activation=activation, kernel_initializer=initializer))
    model.add(Dense(2, activation='linear'))
    model.compile(loss='mean_squared_error', optimizer=optimizer, metrics=['mse'])
    return model

tuner = kt.Hyperband(build_model, objective='mse', max_epochs=10, factor=3, directory='my_dir', project_name='intro_to_kt')

tuner.search(X_train_arr, y_train, epochs=10, validation_split=0.2)

In [None]:
import keras_tuner as kt

def build_model(hp):
    activation = hp.Choice('dense_activation',['relu','sigmoid'])
    initializer = hp.Choice('dense_initializer',['glorot_uniform','normal'])
    optimizer = hp.Choice('dense_optimizer',['adam','sgd'])
    unit = hp.Int('units',min_value = 16, max_value = 256, step=16)

    model = Sequential([
        (Dense(units=unit,input_dim=X_test_transformed.shape[1],activation=activation,kernel_initializer=initializer)),
        (Dense(units=unit,activation=activation,kernel_initializer=initializer)),
        (Dense(units=unit,activation=activation,kernel_initializer=initializer)),
        (Dense(2,activation='linear'))])
    model.compile(loss='mean_squared_error',optimizer=optimizer,metrics=['accuracy','mse'])

    return model

tuner = kt.Hyperband(build_model,objective='mse',max_epochs=10,factor=3,directory = 'my_dir',project_name= 'intro_to_kt')

tuner.search(X_train_arr,y_train,epochs=10,validation_split = 0.2)

Trial 23 Complete [00h 00m 06s]
mse: 11568.52734375

Best mse So Far: 11255.90234375
Total elapsed time: 00h 01m 14s


In [68]:
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]
best_hps.values

{'dense_activation': 'relu',
 'dense_initializer': 'normal',
 'dense_optimizer': 'adam',
 'units': 240,
 'tuner/epochs': 10,
 'tuner/initial_epoch': 4,
 'tuner/bracket': 1,
 'tuner/round': 1,
 'tuner/trial_id': '0015'}

In [69]:
model = tuner.hypermodel.build(best_hps)

history = model.fit(X_train_arr, y_train, epochs=10, validation_split=0.2, verbose=2)

Epoch 1/10
279/279 - 2s - loss: 8232.3496 - accuracy: 0.9753 - mse: 8232.3496 - val_loss: 3610.9314 - val_accuracy: 0.9811 - val_mse: 3610.9314 - 2s/epoch - 7ms/step
Epoch 2/10
279/279 - 1s - loss: 2584.6260 - accuracy: 0.9786 - mse: 2584.6260 - val_loss: 1561.4816 - val_accuracy: 0.9811 - val_mse: 1561.4816 - 774ms/epoch - 3ms/step
Epoch 3/10
279/279 - 1s - loss: 1506.6187 - accuracy: 0.9786 - mse: 1506.6187 - val_loss: 1256.3250 - val_accuracy: 0.9811 - val_mse: 1256.3250 - 787ms/epoch - 3ms/step
Epoch 4/10
279/279 - 1s - loss: 1293.1752 - accuracy: 0.9786 - mse: 1293.1752 - val_loss: 1094.1138 - val_accuracy: 0.9811 - val_mse: 1094.1138 - 824ms/epoch - 3ms/step
Epoch 5/10
279/279 - 1s - loss: 1111.0876 - accuracy: 0.9785 - mse: 1111.0876 - val_loss: 1178.1320 - val_accuracy: 0.9811 - val_mse: 1178.1320 - 766ms/epoch - 3ms/step
Epoch 6/10
279/279 - 1s - loss: 1019.2399 - accuracy: 0.9784 - mse: 1019.2399 - val_loss: 1028.9677 - val_accuracy: 0.9811 - val_mse: 1028.9677 - 762ms/epoch 

In [70]:
model.evaluate(X_test_arr, y_test)



[944.9515991210938, 0.9726697206497192, 944.9515991210938]

In [None]:
y_pred = model.predict(X_test_arr)
DL_Score = r2_score(y_test, y_pred)

print(DL_Score)

0.89509117603302


In [72]:
model.save('DL_Model')

INFO:tensorflow:Assets written to: DL_Model\assets


INFO:tensorflow:Assets written to: DL_Model\assets


In [73]:
from tensorflow.keras.models import load_model

model = load_model('DL_Model')





