<a href="https://colab.research.google.com/github/taruj/BikeSharing_Assignment/blob/main/BikeSharing_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bike Sharing - Assignment
### A US bike-sharing provider BoomBikes has recently suffered considerable dips in their revenues due to the ongoing Corona pandemic. The Comany has decided to come up with a mindful business plan to be able to increase/ accelerate revenue as soon as the ongoing lockdown ends and the economy shows signs of recovery.

## Business Goal:
### You are required to model the demand for shared bikes with the available independent variables. It will be used by the management to understand how exactly the demands vary with different features. They can accordingly manipulate the business strategy to meet the demand levels and meet the customer's expectations. Further, the model will be a good way for management to understand the demand dynamics of a new market. 



In [39]:
# Load Data from Google Drive
from google.colab import drive
drive.mount('/content/drive/')

# Import Data Wrangling and Visulization Libraries 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore") # Supress Warnings 

# Import Libraries for Linear Regression

import sklearn
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.metrics import r2_score,mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.feature_selection import RFE

import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor



 # Set float format to 2 decimal points 
pd.options.display.float_format = '{:.4f}'.format

plt.rcParams["figure.figsize"] = (10,10) # Set Default figsize of 15 by 15
plt.rcParams["axes.titlesize"] = 15
pd.set_option('display.max_columns', None)

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


# Data Loading and Gain Basic Insights
+ ### .head() -> Return top n (5 by default) rows of a data frame or series
+ ### .shape -> Return a tuple representing the dimensionality of the DataFrame. (Number of Rows X Columns)
+ ### .info() -> Count of data (cross check with shape) and Data Types
+ ### .describe() -> Returns description of the data in the DataFrame. For numerical data it provides

> 1. count - The number of not-empty values.
> 2. mean - The average (mean) value.
> 3. std - The standard deviation.
> 4. min - the minimum value.
> 5. 25% - The 25% percentile.
> 6. 50% - The 50% percentile.
> 7. 75% - The 75% percentile.
> 8. max - the maximum value

+ ### isnull().sum() -> Additional Check on count (number of) null values per column



In [40]:
bike_data = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/data/day.csv')
bike_data.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,01-01-2018,1,0,1,0,6,0,2,14.1108,18.1812,80.5833,10.7499,331,654,985
1,2,02-01-2018,1,0,1,0,0,0,2,14.9026,17.6869,69.6087,16.6521,131,670,801
2,3,03-01-2018,1,0,1,0,1,1,1,8.0509,9.4703,43.7273,16.6367,120,1229,1349
3,4,04-01-2018,1,0,1,0,2,1,1,8.2,10.6061,59.0435,10.7398,108,1454,1562
4,5,05-01-2018,1,0,1,0,3,1,1,9.3052,11.4635,43.6957,12.5223,82,1518,1600


In [41]:
# Get Data Shape (Rows by Cloumns)
bike_data.shape

(730, 16)

In [42]:
# Count of Missing data and Data Types
bike_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 730 entries, 0 to 729
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   instant     730 non-null    int64  
 1   dteday      730 non-null    object 
 2   season      730 non-null    int64  
 3   yr          730 non-null    int64  
 4   mnth        730 non-null    int64  
 5   holiday     730 non-null    int64  
 6   weekday     730 non-null    int64  
 7   workingday  730 non-null    int64  
 8   weathersit  730 non-null    int64  
 9   temp        730 non-null    float64
 10  atemp       730 non-null    float64
 11  hum         730 non-null    float64
 12  windspeed   730 non-null    float64
 13  casual      730 non-null    int64  
 14  registered  730 non-null    int64  
 15  cnt         730 non-null    int64  
dtypes: float64(4), int64(11), object(1)
memory usage: 91.4+ KB


 <font color='dodgerblue'>No Missing data, Data Type for dteday is object rest all are numerical data

In [43]:
 #<font color='red'>bar</font> 
 # Descriptive stats of the data for Central Tendency, Dispersion
 bike_data.describe()

Unnamed: 0,instant,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
count,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0
mean,365.5,2.4986,0.5,6.526,0.0288,2.9973,0.6836,1.3945,20.3193,23.7263,62.7652,12.7636,849.2493,3658.7575,4508.0068
std,210.8771,1.1102,0.5003,3.4502,0.1673,2.0062,0.4654,0.5448,7.5067,8.1503,14.2376,5.1958,686.4799,1559.7587,1936.0116
min,1.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,2.4243,3.9535,0.0,1.5002,2.0,20.0,22.0
25%,183.25,2.0,0.0,4.0,0.0,1.0,0.0,1.0,13.8119,16.8897,52.0,9.0417,316.25,2502.25,3169.75
50%,365.5,3.0,0.5,7.0,0.0,3.0,1.0,1.0,20.4658,24.3682,62.625,12.1253,717.0,3664.5,4548.5
75%,547.75,3.0,1.0,10.0,0.0,5.0,1.0,2.0,26.8806,30.4458,72.9896,15.6256,1096.5,4783.25,5966.0
max,730.0,4.0,1.0,12.0,1.0,6.0,1.0,3.0,35.3283,42.0448,97.25,34.0,3410.0,6946.0,8714.0


In [44]:
# Additional check for Null Data
bike_data.isnull().sum()

instant       0
dteday        0
season        0
yr            0
mnth          0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64

 <font color='dodgerblue'>No Null Values</font> 

# Data Cleaning

1. #### instant: record index -> We can drop instant as it will not add any value to the prediction
2. #### dteday : date -> We can drop drop dteday as month and weekday are available
3. ### Total Counts are Available we can drop
> + #### casual: count of casual users
> + #### registered: count of registered users

### .drop() Drop the column, with parameter inplace=True so that the data is removed from the current dataframe


In [45]:
bike_data.drop(['instant'],axis=1,inplace=True)

In [46]:
bike_data.drop(['dteday'],axis=1,inplace=True)

In [47]:
bike_data.drop(['casual'],axis=1,inplace=True)

In [48]:
bike_data.drop(['registered'],axis=1,inplace=True)

### Identify the remaining data, data types and run
.corr() to Compute pairwise correlation of columns (excluding NA/null values)

In [49]:
bike_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 730 entries, 0 to 729
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   season      730 non-null    int64  
 1   yr          730 non-null    int64  
 2   mnth        730 non-null    int64  
 3   holiday     730 non-null    int64  
 4   weekday     730 non-null    int64  
 5   workingday  730 non-null    int64  
 6   weathersit  730 non-null    int64  
 7   temp        730 non-null    float64
 8   atemp       730 non-null    float64
 9   hum         730 non-null    float64
 10  windspeed   730 non-null    float64
 11  cnt         730 non-null    int64  
dtypes: float64(4), int64(8)
memory usage: 68.6 KB


In [50]:
bike_data.corr()

Unnamed: 0,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,cnt
season,1.0,-0.0,0.831,-0.0109,-0.0031,0.0138,0.0213,0.3334,0.342,0.2082,-0.2296,0.4046
yr,-0.0,1.0,-0.0,0.0082,-0.0055,-0.0029,-0.0503,0.0488,0.0472,-0.1125,-0.0116,0.5697
mnth,0.831,-0.0,1.0,0.0189,0.0095,-0.0047,0.0456,0.2191,0.2264,0.2249,-0.208,0.2782
holiday,-0.0109,0.0082,0.0189,1.0,-0.102,-0.2529,-0.0344,-0.0288,-0.0327,-0.0157,0.0063,-0.0688
weekday,-0.0031,-0.0055,0.0095,-0.102,1.0,0.0358,0.0311,-0.0002,-0.0075,-0.0523,0.0143,0.0675
workingday,0.0138,-0.0029,-0.0047,-0.2529,0.0358,1.0,0.0602,0.0535,0.0529,0.0232,-0.0187,0.0625
weathersit,0.0213,-0.0503,0.0456,-0.0344,0.0311,0.0602,1.0,-0.1195,-0.1206,0.5903,0.0398,-0.2959
temp,0.3334,0.0488,0.2191,-0.0288,-0.0002,0.0535,-0.1195,1.0,0.9917,0.1286,-0.1582,0.627
atemp,0.342,0.0472,0.2264,-0.0327,-0.0075,0.0529,-0.1206,0.9917,1.0,0.1415,-0.1839,0.6307
hum,0.2082,-0.1125,0.2249,-0.0157,-0.0523,0.0232,0.5903,0.1286,0.1415,1.0,-0.2485,-0.0985
