<a href="https://colab.research.google.com/github/vijaytiramale/Bike_Sharing_Demand_Prediction/blob/main/bike_sharing_demand_prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -  **Bike Sharing Demand Prediction**



##### **Project Type**    - Regression
##### **Contribution**    - Individual


# **GitHub Link -**



# **Project Summary -**

In metropolitan areas, bike-sharing programs aim to enhance the mobility and convenience of the public. Bike-sharing systems are automated and enable people to rent and return bikes at various locations. However, maintaining a consistent supply of bikes for rental is one of the main challenges of this project. The project's focus is on utilizing historical data on factors such as temperature and time to predict the demand for the bike-sharing program in Seoul

We imported the dataset and necessary libraries, which consisted of approximately 8760 records and 14 attributes. Exploratory data analysis (EDA) was conducted to gain insights into the data. We removed outliers and null values from the raw data and transformed it to ensure compatibility with machine learning models. Square root normalization was used to handle target class imbalance. The cleaned and scaled data was then sent to 11 different models, and hyperparameters were tuned to ensure the right parameters were being passed to the model. It is recommended to track multiple metrics when developing a machine learning model, so we focused on the R2 score and RMSE score. The R2 score is scale-independent, making it useful for comparing models that are fit to different target variables or have different units of measurement

# **Index**

1.   Problem Statement
2.   Know Your Data
3.   Understanding Your Variables
4.   EDA
5.   Data Cleaning
6.   Feature Engineering
7.   Model Building
8.   Model Implementaion.
9.   Conclusion

# **Let's Begin !**

## **1. Problem Statement**

Many metropolitan areas now offer bike rentals to improve mobility and convenience. Ensuring timely access to rental bikes is critical to reducing wait times for the public, making a consistent supply of rental bikes a major concern. The expected hourly bicycle count is particularly crucial in this regard. 

Bike sharing systems automate membership, rentals, and bike returns through a network of locations. Individuals can rent bikes from one location and return them to another or the same location, as needed. Membership or request facilitates bike rentals, and the process is overseen by a citywide network of automated stores. 

This dataset aims to predict the demand for Seoul's Bike Sharing Program based on historical usage patterns, including temperature, time, and other data.

### **Business Context** 

Estimating the demand for bikes at any given time and day is a crucial concern for bike rental businesses. Fewer bikes can result in resource wastage, while more bikes can lead to revenue loss, ranging from immediate loss due to a lower number of customers to potential long-term loss due to a loss of future customers. It is therefore important for bike rental businesses to have an estimate of demand to function effectively and efficiently

## **2. Know Your Data**

### Import Libraries

In [None]:
# Import Libraries and modules

# libraries that are used for analysis and visualization
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

# to import datetime library
from datetime import datetime
import datetime as dt

# libraries used to pre-process 
from sklearn import preprocessing, linear_model
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split


# libraries used to implement models
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, ExtraTreesRegressor, GradientBoostingRegressor, AdaBoostRegressor
from sklearn.svm import SVR
from xgboost import XGBRegressor
from lightgbm import LGBMRegressor
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV

# libraries to evaluate performance
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score, mean_absolute_error

# Library of warnings would assist in ignoring warnings issued
import warnings
warnings.filterwarnings('ignore')

# to set max column display
pd.pandas.set_option('display.max_columns',None)

### Dataset Loading

# New Section

In [None]:
# let's mount the google drive first
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# load the Seol bike data set from the drive
bike_df = pd.read_csv('/content /drive/MyDrive/Almabetter ML Projects/Bike Sharing Demand Prediction/SeoulBikeData.csv',)

FileNotFoundError: ignored