<a href="https://colab.research.google.com/github/vennelaharini/Adventure/blob/main/Ford_Bike_Sharing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -Ford Bike Sharing



##### **Project Type**    - Exploratory Data Analysis (EDA)
##### **Contribution**    - Individual

# **Project Summary -**

The Ford Bike Sharing Project analyzes user behavior, trip characteristics, and station activity using real-world data from the Ford GoBike program. This project focuses on cleaning, transforming, and visualizing the dataset to extract valuable insights into how people use shared bicycles across different locations and times.

The dataset contains information about trip duration, start and end times, station details, user demographics such as birth year and gender, and membership types. Before analysis, the data was cleaned by handling missing values, converting time columns to proper datetime formats, and creating new features such as trip duration in minutes, user age, trip start hour, and trip day of the week. Outliers were filtered, and duplicate records were removed to ensure data quality.

Exploratory Data Analysis (EDA) was performed using libraries like Pandas, Seaborn, Matplotlib, and Plotly. Various visualizations were created to understand patterns in bike usage:

Histograms showed the distribution of trip durations, highlighting that most trips lasted under 30 minutes.

Count plots revealed that the majority of users were subscribers rather than casual customers, and male users made up a larger proportion of riders.

Top start and end stations were identified, indicating high-demand areas.

Hourly and weekly usage patterns showed that usage peaked during morning and evening commuting hours on weekdays, suggesting bikes were often used for work travel.

Correlation heatmaps and pair plots provided insights into relationships between trip duration, user age, and time of day.

A density map visualized the concentration of trip starts across the city.

Comparisons between genders and user types were analyzed for trip behavior differences.

Key findings included that younger users tended to have slightly longer trip durations, and trip demand was much higher on weekdays compared to weekends. Subscribers used bikes more consistently throughout the week, while customers were more active on weekends, possibly for leisure trips.

Overall, the project demonstrated a complete cycle of data preparation, analysis, and visualization. It helped understand not just how shared bikes are used, but also when, where, and by whom. This knowledge can be valuable for city planners, bike-sharing companies, and transportation researchers in making data-driven decisions for resource allocation, improving service, and promoting sustainable urban mobility.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


The project aims to analyze Ford GoBike sharing data to uncover patterns in user behavior, trip durations, and station usage. The goal is to identify trends based on user demographics and time factors, helping to improve bike-sharing operations and enhance user experience.

#### **Define Your Business Objective?**

The business objective is to analyze and understand user behavior, trip patterns, and station demand within the Ford GoBike sharing system, enabling data-driven decisions to optimize bike distribution, enhance customer satisfaction, improve operational efficiency, and support future service expansion.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as mplt
import matplotlib as mpl
import numpy as np
import plotly.express as px
import plotly as plt

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')
file_path='/content/drive/MyDrive/LabMentics/Ford Bike Sharing/201801-fordgobike-tripdata.csv'
df=pd.read_csv(file_path)

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
df.isna().sum().plot(kind='bar')

### What did you know about your dataset?

The Ford Bike Sharing dataset contains detailed information about individual bike trips taken within the Ford GoBike system. It includes columns like trip duration (duration_sec), start and end times (start_time, end_time), station details (IDs, names, latitude, longitude), bike IDs, and user demographics such as birth year, gender, and user type (Subscriber or Customer).

The dataset helps track when, where, and by whom bikes are used. It also includes a field indicating whether users opted into the "Bike Share for All" program. Overall, the data provides a complete view of ride patterns across different times, locations, and user groups.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

duration_sec - Duration of the trip in seconds

start_time - Start date and time of the trip

end_time - End date and time of the trip

start_station_id - Unique ID of the starting station

start_station_name - Name of the starting station

start_station_latitude - Latitude coordinate of the starting station

start_station_longitude - Longitude coordinate of the starting station

end_station_id - Unique ID of the ending station

end_station_name - Name of the ending station

end_station_latitude - Latitude coordinate of the ending station

end_station_longitude	- Longitude coordinate of the ending station
bike_id -	Unique ID of the bike used
user_type -	Type of user: Subscriber (member) or Customer (casual rider)
member_birth_year	- Birth year of the user
member_gender	- Gender of the user (Male, Female, Other)
bike_share_for_all_trip -	Whether the trip was part of the Bike Share for All program (Yes/No)

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
df1=df.copy()
df1.dropna(subset=['member_birth_year', 'start_station_name', 'end_station_name'], inplace=True)
df1['start_time'] = pd.to_datetime(df1['start_time'])
df1['end_time'] = pd.to_datetime(df1['end_time'])
df1['trip_duration'] = (df1['end_time'] - df1['start_time']).dt.total_seconds()/ 60
df1.drop_duplicates(inplace=True)
df1.dtypes
df1.info()

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 - Distribution of Trip Duration (in Minutes)

In [None]:
# Chart - 1 visualization code
df1['trip_duration_min'] = df1['duration_sec'] / 60
sns.histplot(df1['trip_duration_min'], bins=100)
mplt.title('Trip Duration Distribution (Minutes)')
mplt.xlabel('Duration (min)')
mplt.ylabel('Count')
mplt.show()

##### 1. Why did you pick the specific chart?

I chose a distribution (histogram) chart because it effectively shows how trip durations are spread across all users. It helps to easily identify the most common trip lengths, detect any outliers, and understand user behavior patterns regarding trip time.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that the majority of trips are relatively short, with most rides lasting under 30 minutes. There are very few long-duration trips, indicating that users mainly use the service for quick and short-distance travel. A few outliers with very long durations were also noticed, which could point to unusual usage or data entry errors.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. Understanding that most users prefer short trips can help the business design better pricing models, such as affordable short-ride passes, to encourage more frequent use. It can also guide operational decisions, like bike redistribution and maintenance timing, to better match actual usage patterns, improving customer satisfaction and operational efficiency.

#### Chart - 2 - User Type Count

In [None]:
# Chart - 2 visualization code
sns.countplot(x='user_type', data=df1)
mplt.title('User Type Distribution')
mplt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3 - Gender Distribution

In [None]:
# Chart - 3 visualization code
sns.countplot(x='member_gender', data=df1)
mplt.title('Gender Distribution')
mplt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4 - Birth Year Distribution

In [None]:
# Chart - 4 visualization code
sns.histplot(df1['member_birth_year'].dropna(), bins=40)
mplt.title('Distribution of Birth Year')
mplt.xlabel('Birth Year')
mplt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5 Top 10 Start Stations

In [None]:
# Chart - 5 visualization code
top_starts = df1['start_station_name'].value_counts().head(10)
top_starts.plot(kind='barh')
mplt.title('Top 10 Start Stations')
mplt.xlabel('Number of Trips')
mplt.gca().invert_yaxis()
mplt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6 - Top 10 End Stations

In [None]:
# Chart - 6 visualization code
top_ends = df1['end_station_name'].value_counts().head(10)
top_ends.plot(kind='barh', color='orange')
mplt.title('Top 10 End Stations')
mplt.xlabel('Number of Trips')
mplt.gca().invert_yaxis()
mplt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7 - Trips by Gender and User Type

In [None]:
# Chart - 7 visualization code
sns.countplot(x='member_gender', hue='user_type', data=df1)
mplt.title('Trips by Gender and User Type')
mplt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8 - Trip Duration by Gender (Boxplot)

In [None]:
# Chart - 8 visualization code
sns.boxplot(x='member_gender', y='trip_duration_min', data=df1)
mplt.ylim(0,60)
mplt.title('Trip Duration by Gender')
mplt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9 - Average Trip Duration by User Type

In [None]:
# Chart - 9 visualization code
avg_duration = df1.groupby('user_type')['trip_duration_min'].mean().reset_index()
px.bar(avg_duration, x='user_type', y='trip_duration_min', title='Average Trip Duration by User Type')

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10 - Trip Counts Over Time (Hourly Usage Pattern)

In [None]:
# Chart - 10 visualization code
df1['start_time'] = pd.to_datetime(df1['start_time'])
df1['hour'] = df1['start_time'].dt.hour
sns.countplot(x='hour', data=df1, palette='coolwarm')
mplt.title('Hourly Usage Pattern')
mplt.xlabel('Hour of Day')
mplt.ylabel('Number of Trips')
mplt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11 - Heatmap of Start Station Usage

In [None]:
# Chart - 11 visualization code
station_counts = df1.groupby(['start_station_latitude', 'start_station_longitude']).size().reset_index(name='count')
px.density_mapbox(
    station_counts,
    lat='start_station_latitude',
    lon='start_station_longitude',
    z='count',
    radius=10,
    center=dict(lat=37.77, lon=-122.42),
    zoom=12,
    mapbox_style='open-street-map',
    title='Start Station Density Map'
)

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12 Bike Share for All Trips (Yes/No)

In [None]:
# Chart - 12 visualization code
sns.countplot(x='bike_share_for_all_trip', data=df1)
mplt.title('Bike Share for All Trips')
mplt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code
df1['day_of_week'] = df1['start_time'].dt.day_name()
sns.countplot(x='day_of_week', data=df1, order=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'])
mplt.title('Trips by Day of the Week')
mplt.xticks(rotation=45)
mplt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
df1['age'] = 2025 - df1['member_birth_year']
df1['trip_duration_min'] = df1['duration_sec'] / 60
numeric_cols = ['duration_sec', 'trip_duration_min', 'member_birth_year', 'age', 'hour']
corr_matrix = df1[numeric_cols].corr()

mplt.figure(figsize=(10, 6))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
mplt.title('Correlation Heatmap of Numerical Features')
mplt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sample_df = df1[numeric_cols + ['user_type']].dropna().sample(n=500, random_state=42)
sns.pairplot(sample_df, hue='user_type', palette='Set2')
mplt.suptitle('Pair Plot of Key Numerical Features', y=1.02)
mplt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Optimize Bike Distribution:
Analyze station-wise demand patterns by time and day to ensure bikes are available where and when users need them, especially during peak commute hours.

Improve Service for Casual Users:
Since casual customers are more active on weekends, launch targeted promotions and improve weekend bike availability to increase customer rides.

Focus on High-Demand Stations:
Upgrade infrastructure at top start and end stations (e.g., add more bike docks, improve maintenance) to handle peak usage and reduce trip disruptions.

Personalize User Experience:
Use demographic data (age, gender) to tailor membership offers, discounts, and communication strategies for different user groups.

Expand Bike Share for All Program:
Encourage more participation in the Bike Share for All program by identifying barriers to entry and promoting affordable access to underrepresented communities.

Enhance Operational Efficiency:
Regularly monitor trip duration and bike usage patterns to improve bike maintenance schedules and minimize service downtime.

Leverage Time-Based Offers:
Introduce flexible pricing during off-peak hours to encourage more rides during low-demand periods.

Promote Environmental Benefits:
Market the environmental advantages of bike-sharing to attract eco-conscious users and corporate partnerships.

# **Conclusion**

The Ford Bike Sharing data analysis provided deep insights into user behavior, trip patterns, and station demand. Most users were found to be subscribers who primarily rode during weekday commute hours, while casual customers were more active on weekends. Younger users and males made up a larger portion of the user base.

By cleaning, transforming, and visualizing the dataset, key trends and correlations were identified that can help optimize bike distribution, improve customer satisfaction, and support service expansion. The findings highlight the importance of targeted promotions, efficient resource allocation, and data-driven decision-making to enhance the overall performance of the bike-sharing system.

This project demonstrated the power of data analytics in understanding urban mobility and improving public transportation services.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***