Skip to content

teena0401/Cyclistic-Bike-Share-Marketing-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 

Repository files navigation

Introduction

As part of the Google Data Analytics Professional Certificate Course, I require to act as a marketing analyst at Cyclistic, bike sharing company in Chicago.

About Cyclistics

Cyclistic is a bike-share company in Chicago which offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to cater people from all wallks of life.The company believes the company's future success depends on maximizing number of annual memberships.Therefore, they wants to understand how casual riders and annual members use Cyclistic bikes differently , and use these insight to guide their coming marketing strategy.

Method Approach

  1. Ask - formulating business question
  2. Prepare - choosing the suitable datasets
  3. Process - ensuring the quality of datasets
  4. Analyze - extracting inights from cleaned datasets
  5. Share - transforming it into interactive formats
  6. Act - suggesting actionable ideas that helps to solve business problem

Phase 1: Ask

1.1: Business Task:

  1. How do casual users and annual subscribed members use Cyclistic Bikes differently?
  2. How can we design new marketing strategies to help convert casual members into annual members?

Phase 2: Prepare

2.1: Dataset Summary

These 12 datasets are the bike user historical trip data in 2021. Each dataset contains the number of cyclistic bike user and their riding distance and places.Each file represents different month of rider data.

Dataset Table Information
tripdata_202101 Rider Information of 96,831 users in January, contains rider_id, rideable_type, started_at, ended_at, start_station, end_station,start_station_id, end_station_id, member_casual, ride_length
tripdata_202102 Rider Information of 96,831 users in February
tripdata_202103 Rider Information of 96,831 users in March
tripdata_202104 Rider Information of 96,831 users in April
tripdata_202105 Rider Information of 96,831 users in May
tripdata_202106 Rider Information of 96,831 users in June
tripdata_202107 Rider Information of 96,831 users in July
tripdata_202108 Rider Information of 96,831 users in August
tripdata_202109 Rider Information of 96,831 users in September
tripdata_202110 Rider Information of 96,831 users in October
tripdata_202111 Rider Information of 96,831 users in November
tripdata_202112 Rider Information of 96,831 users in December

The cyclistic users datasets are published by Motivate International Inc.

2.2: Dataset Limitations and Integrity

The collected data has 12-months time span (January 2021 - December 2021) and have large number of users data are collected in this sudy.

Phase 3. Process

In this project, I choose MySQL database as a storage and to process datasets, and Tableau as data visualization tool. To begin with data cleaning process, I load twelve datasets into a database sever.

I observed that each table has same type of columns across twelve datasets so I have to merge all 12 tables into one and convert date format.

NOTE: the start_station_id column of dataset from Dec 2020 to April 2020 contains string values

3.1 Importing Datasets

Due to large file size (1million data points), I use Load Data Infile command to load all datasets into MySQL database.

--- January---
CREATE TABLE tripdata_202101 (ride_id VARCHAR(225) ,rideable_type TEXT, started_at VARCHAR(225),ended_at VARCHAR(225),start_station_name VARCHAR(225),start_station_id VARCHAR(225),end_station_name VARCHAR(225),
end_station_id TEXT,start_lat VARCHAR(225),start_lng VARCHAR(225),end_lat VARCHAR(225), end_lng VARCHAR(225),member_casual TEXT ,ride_length VARCHAR(225),day_of_week INT);
LOAD DATA INFILE 'C:\\ProgramData\\MySQL\\MySQL Server 8.0\\Uploads\\Cyclistic.XLS\\202101-divvy-tripdata.csv'
INTO TABLE tripdata_202101
FIELDS TERMINATED BY ','
ENCLOSED BY ""
LINES TERMINATED BY '\n'
IGNORE 1 ROWS; 

---February---
CREATE TABLE tripdata_202102 (ride_id VARCHAR(225) ,rideable_type TEXT, started_at VARCHAR(225),ended_at VARCHAR(225),start_station_name VARCHAR(225),start_station_id VARCHAR(225),end_station_name VARCHAR(225),
end_station_id TEXT,start_lat VARCHAR(225),start_lng VARCHAR(225),end_lat VARCHAR(225), end_lng VARCHAR(225),member_casual TEXT ,ride_length VARCHAR(225),day_of_week INT);

LOAD DATA INFILE 'C:\\ProgramData\\MySQL\\MySQL Server 8.0\\Uploads\\Cyclistic.XLS\\202102-divvy-tripdata.csv'
INTO TABLE tripdata_202102
FIELDS TERMINATED BY ','
ENCLOSED BY ""
LINES TERMINATED BY '\n'
IGNORE 1 ROWS; 

---March---
CREATE TABLE tripdata_202103 (ride_id VARCHAR(225) ,rideable_type TEXT, started_at VARCHAR(225),ended_at VARCHAR(225),start_station_name VARCHAR(225),start_station_id VARCHAR(225),end_station_name VARCHAR(225),
end_station_id TEXT,start_lat VARCHAR(225),start_lng VARCHAR(225),end_lat VARCHAR(225), end_lng VARCHAR(225),member_casual TEXT ,ride_length VARCHAR(225),day_of_week INT);
LOAD DATA INFILE 'C:\\ProgramData\\MySQL\\MySQL Server 8.0\\Uploads\\Cyclistic.XLS\\202103-divvy-tripdata.csv'
INTO TABLE tripdata_202103
FIELDS TERMINATED BY ','
ENCLOSED BY ""
LINES TERMINATED BY '\n'
IGNORE 1 ROWS; 

---April---
CREATE TABLE tripdata_202104 (ride_id VARCHAR(225) ,rideable_type TEXT, started_at VARCHAR(225),ended_at VARCHAR(225),start_station_name VARCHAR(225),start_station_id VARCHAR(225),end_station_name VARCHAR(225),
end_station_id TEXT,start_lat VARCHAR(225),start_lng VARCHAR(225),end_lat VARCHAR(225), end_lng VARCHAR(225),member_casual TEXT ,ride_length VARCHAR(225),day_of_week INT);
LOAD DATA INFILE 'C:\\ProgramData\\MySQL\\MySQL Server 8.0\\Uploads\\Cyclistic.XLS\\202104-divvy-tripdata.csv'
INTO TABLE tripdata_202104
FIELDS TERMINATED BY ','
ENCLOSED BY ""
LINES TERMINATED BY '\n'
IGNORE 1 ROWS; 

---May---
CREATE TABLE tripdata_202105 (ride_id VARCHAR(225) ,rideable_type TEXT, started_at VARCHAR(225),ended_at VARCHAR(225),start_station_name VARCHAR(225),start_station_id VARCHAR(225),end_station_name VARCHAR(225),
end_station_id TEXT,start_lat VARCHAR(225),start_lng VARCHAR(225),end_lat VARCHAR(225), end_lng VARCHAR(225),member_casual TEXT ,ride_length VARCHAR(225),day_of_week INT);
LOAD DATA INFILE 'C:\\ProgramData\\MySQL\\MySQL Server 8.0\\Uploads\\Cyclistic.XLS\\202102-divvy-tripdata.csv'
INTO TABLE tripdata_202105
FIELDS TERMINATED BY ','
ENCLOSED BY ""
LINES TERMINATED BY '\n'
IGNORE 1 ROWS; 

---June---
CREATE TABLE tripdata_202106 (ride_id VARCHAR(225) ,rideable_type TEXT, started_at VARCHAR(225),ended_at VARCHAR(225),start_station_name VARCHAR(225),start_station_id VARCHAR(225),end_station_name VARCHAR(225),
end_station_id TEXT,start_lat VARCHAR(225),start_lng VARCHAR(225),end_lat VARCHAR(225), end_lng VARCHAR(225),member_casual TEXT ,ride_length VARCHAR(225),day_of_week INT);
LOAD DATA INFILE 'C:\\ProgramData\\MySQL\\MySQL Server 8.0\\Uploads\\Cyclistic.XLS\\202102-divvy-tripdata.csv'
INTO TABLE tripdata_202106
FIELDS TERMINATED BY ','
ENCLOSED BY ""
LINES TERMINATED BY '\n'
IGNORE 1 ROWS; 

---July---
CREATE TABLE tripdata_202107 (ride_id VARCHAR(225) ,rideable_type TEXT, started_at VARCHAR(225),ended_at VARCHAR(225),start_station_name VARCHAR(225),start_station_id VARCHAR(225),end_station_name VARCHAR(225),
end_station_id TEXT,start_lat VARCHAR(225),start_lng VARCHAR(225),end_lat VARCHAR(225), end_lng VARCHAR(225),member_casual TEXT ,ride_length VARCHAR(225),day_of_week INT);
LOAD DATA INFILE 'C:\\ProgramData\\MySQL\\MySQL Server 8.0\\Uploads\\Cyclistic.XLS\\202102-divvy-tripdata.csv'
INTO TABLE tripdata_202107
FIELDS TERMINATED BY ','
ENCLOSED BY ""
LINES TERMINATED BY '\n'
IGNORE 1 ROWS; 

---August---
CREATE TABLE tripdata_202108 (ride_id VARCHAR(225) ,rideable_type TEXT, started_at VARCHAR(225),ended_at VARCHAR(225),start_station_name VARCHAR(225),start_station_id VARCHAR(225),end_station_name VARCHAR(225),
end_station_id TEXT,start_lat VARCHAR(225),start_lng VARCHAR(225),end_lat VARCHAR(225), end_lng VARCHAR(225),member_casual TEXT ,ride_length VARCHAR(225),day_of_week INT);

LOAD DATA INFILE 'C:\\ProgramData\\MySQL\\MySQL Server 8.0\\Uploads\\Cyclistic.XLS\\202102-divvy-tripdata.csv'
INTO TABLE tripdata_202108
FIELDS TERMINATED BY ','
ENCLOSED BY ""
LINES TERMINATED BY '\n'
IGNORE 1 ROWS; 

---September---
CREATE TABLE tripdata_202109 (ride_id VARCHAR(225) ,rideable_type TEXT, started_at VARCHAR(225),ended_at VARCHAR(225),start_station_name VARCHAR(225),start_station_id VARCHAR(225),end_station_name VARCHAR(225),
end_station_id TEXT,start_lat VARCHAR(225),start_lng VARCHAR(225),end_lat VARCHAR(225), end_lng VARCHAR(225),member_casual TEXT ,ride_length VARCHAR(225),day_of_week INT);
LOAD DATA INFILE 'C:\\ProgramData\\MySQL\\MySQL Server 8.0\\Uploads\\Cyclistic.XLS\\202102-divvy-tripdata.csv'
INTO TABLE tripdata_202109
FIELDS TERMINATED BY ','
ENCLOSED BY ""
LINES TERMINATED BY '\n'
IGNORE 1 ROWS; 

---October---
CREATE TABLE tripdata_202110 (ride_id VARCHAR(225) ,rideable_type TEXT, started_at VARCHAR(225),ended_at VARCHAR(225),start_station_name VARCHAR(225),start_station_id VARCHAR(225),end_station_name VARCHAR(225),
end_station_id TEXT,start_lat VARCHAR(225),start_lng VARCHAR(225),end_lat VARCHAR(225), end_lng VARCHAR(225),member_casual TEXT ,ride_length VARCHAR(225),day_of_week INT);
LOAD DATA INFILE 'C:\\ProgramData\\MySQL\\MySQL Server 8.0\\Uploads\\Cyclistic.XLS\\202102-divvy-tripdata.csv'
INTO TABLE tripdata_202110
FIELDS TERMINATED BY ','
ENCLOSED BY ""
LINES TERMINATED BY '\n'
IGNORE 1 ROWS; 

---November---
CREATE TABLE tripdata_202111 (ride_id VARCHAR(225) ,rideable_type TEXT, started_at VARCHAR(225),ended_at VARCHAR(225),start_station_name VARCHAR(225),start_station_id VARCHAR(225),end_station_name VARCHAR(225),
end_station_id TEXT,start_lat VARCHAR(225),start_lng VARCHAR(225),end_lat VARCHAR(225), end_lng VARCHAR(225),member_casual TEXT ,ride_length VARCHAR(225),day_of_week INT);
LOAD DATA INFILE 'C:\\ProgramData\\MySQL\\MySQL Server 8.0\\Uploads\\Cyclistic.XLS\\202102-divvy-tripdata.csv'
INTO TABLE tripdata_202111
FIELDS TERMINATED BY ','
ENCLOSED BY ""
LINES TERMINATED BY '\n'
IGNORE 1 ROWS; 

---December---
CREATE TABLE tripdata_202112 (ride_id VARCHAR(225) ,rideable_type TEXT, started_at VARCHAR(225),ended_at VARCHAR(225),start_station_name VARCHAR(225),start_station_id VARCHAR(225),end_station_name VARCHAR(225),
end_station_id TEXT,start_lat VARCHAR(225),start_lng VARCHAR(225),end_lat VARCHAR(225), end_lng VARCHAR(225),member_casual TEXT ,ride_length VARCHAR(225),day_of_week INT);
LOAD DATA INFILE 'C:\\ProgramData\\MySQL\\MySQL Server 8.0\\Uploads\\Cyclistic.XLS\\202102-divvy-tripdata.csv'
INTO TABLE tripdata_202112
FIELDS TERMINATED BY ','
ENCLOSED BY ""
LINES TERMINATED BY '\n'
IGNORE 1 ROWS; 

3.2: Merging Datasets

Merging all tables into one table (cyclistic.tripdata_2021)

CREATE TABLE tripdata_2021 AS(SELECT ride_id ,rideable_type ,started_at ,ended_at ,start_station_name ,start_station_id ,end_station_name ,end_station_id ,start_lat ,start_lng ,end_lat ,end_lng ,member_casual ,ride_length ,day_of_week 
FROM cyclistic.tripdata_202101
UNION ALL 
SELECT ride_id, rideable_type ,started_at ,ended_at ,start_station_name ,start_station_id ,end_station_name ,end_station_id ,start_lat ,start_lng ,end_lat ,end_lng ,member_casual ,ride_length ,day_of_week FROM cyclistic.tripdata_202102
UNION ALL 
SELECTride_id  ,rideable_type ,started_at ,ended_at ,start_station_name ,start_station_id ,end_station_name ,end_station_id ,start_lat ,start_lng ,end_lat ,end_lng ,member_casual ,ride_length ,day_of_week FROM cyclistic.tripdata_202103
UNION ALL 
SELECT ride_id  ,rideable_type ,started_at ,ended_at ,start_station_name ,start_station_id ,end_station_name ,end_station_id ,start_lat ,start_lng ,end_lat ,end_lng ,member_casual ,ride_length ,day_of_week FROM cyclistic.tripdata_202104
UNION ALL
SELECTride_id  ,rideable_type ,started_at ,ended_at ,start_station_name ,start_station_id ,end_station_name ,end_station_id ,start_lat ,start_lng ,end_lat ,end_lng ,member_casual ,ride_length ,day_of_week FROM cyclistic.tripdata_202105
UNION ALL
SELECT ride_id  ,rideable_type ,started_at ,ended_at ,start_station_name ,start_station_id ,end_station_name ,end_station_id ,start_lat ,start_lng ,end_lat ,end_lng ,member_casual ,ride_length ,day_of_week FROM cyclistic.tripdata_202106
UNION ALL
SELECT ride_id  ,rideable_type ,started_at ,ended_at ,start_station_name ,start_station_id ,end_station_name ,end_station_id ,start_lat ,start_lng ,end_lat ,end_lng ,member_casual ,ride_length ,day_of_week FROM cyclistic.tripdata_202107
UNION ALL
SELECTride_id  ,rideable_type ,started_at ,ended_at ,start_station_name ,start_station_id ,end_station_name ,end_station_id ,start_lat ,start_lng ,end_lat ,end_lng ,member_casual ,ride_length ,day_of_week FROM cyclistic.tripdata_202108
UNION ALL
SELECT ride_id  ,rideable_type ,started_at ,ended_at ,start_station_name ,start_station_id ,end_station_name ,end_station_id ,start_lat ,start_lng ,end_lat ,end_lng ,member_casual ,ride_length ,day_of_week FROM cyclistic.tripdata_202109
UNION ALL
SELECT ride_id  ,rideable_type ,started_at ,ended_at ,start_station_name ,start_station_id ,end_station_name ,end_station_id ,start_lat ,start_lng ,end_lat ,end_lng ,member_casual ,ride_length ,day_of_week FROM cyclistic.tripdata_202110
UNION ALL
SELECT ride_id  ,rideable_type ,started_at ,ended_at ,start_station_name ,start_station_id ,end_station_name ,end_station_id ,start_lat ,start_lng ,end_lat ,end_lng ,member_casual ,ride_length ,day_of_week FROM cyclistic.tripdata_202110
UNION ALL
SELECT ride_id  ,rideable_type ,started_at ,ended_at ,start_station_name ,start_station_id ,end_station_name ,end_station_id ,start_lat ,start_lng ,end_lat ,end_lng ,member_casual ,ride_length ,day_of_week 
FROM cyclistic.tripdata_202111
UNION ALL
SELECT ride_id  ,rideable_type ,started_at ,ended_at ,start_station_name ,start_station_id ,end_station_name ,end_station_id ,start_lat ,start_lng ,end_lat ,end_lng ,member_casual ,ride_length ,day_of_week FROM cyclistic.tripdata_202112);

3.3 Cleaning Data (Filtering, Transforming)

In the previous section, the datatype of started_at and ended_at columns suppose to be datetime. However, I input it as string datatype. Hence, both columns are needed to be casted as datetime. Following are the actions taken for data cleaning.

  • cast the field to datetime type while retrieving other columns
  • distinct remove the duplicate value
  • use WEEKDAY function to sort out the day of week for each record

Also, use create view command to create a temperary table that will not affect other tables.

-- retrieving filtered and cleaned data --- 

CREATE VIEW cyclistic.TRIPDATA AS SELECT
distinct(ride_id), ---------------------------------------------------------- remove the duplicate value 
rideable_type,
cast(str_to_date(started_at,'%d/%m/%y %H:%i') as datetime) as started_at, --- cast the field to datetime type while retrieving other columns 
cast(str_to_date(ended_at,'%d/%m/%y %H:%i') as datetime) as ended_at,     --- cast the field to datetime type while retrieving other columns 
start_station_name, 
start_station_id ,
end_station_name ,
end_station_id ,
start_lat ,
start_lng ,
end_lat,
end_lng,
member_casual ,
WEEKDAY(started_at) AS day_of_week, ------------------------------------------ use WEEKDAY function to sort out the day of week for each record 
(CASE 
when day_of_Week = 6 then 'Sunday' 
when day_of_Week = 0 then 'Monday' 
when day_of_Week = 1 then 'Tuesday'
when day_of_Week = 2 then 'Wednesday'
when day_of_Week = 3 then 'Thursday'
when day_of_Week = 4 then 'Friday'
when day_of_Week = 5 then 'Saturday'
end ) AS weekday_weekend,
MINUTE(datediff(ended_at , started_at)) as ride_length ----------------------- using the DATEDIFF function to calculate the ride length of each users
FROM cyclistic.tripdata_2021
WHERE start_station_name !='' and  end_station_name != '' 
HAVING ride_length > MINUTE(0)

Phase 4 Analyze (Tableau)

Once turning datasets into usable format, I export it into a csv.file and then import into Tableau software to visualise data. The following diagrams are the trends that I've identified during data visualization process.

4.1 Average Ride Duration

4.2 Trip Count per rider

Looking at the total count of trips for each day of the week, members take more trips during the weekdays than casual riders; except during weekends, the number of trips taken by casual riders exceed members. We then can decude that majority of the member users using bikes for commuting to workplace. This can be further supported by looking at the numbers of riders taken throughout a day.

4.3 Hourly Traffic User Analysis

Here, we can see that members take more bike rides overall than casual riders throughout the day. Ridership peaks around 8am, 12pm, and 6pm, suggesting an increase in rides during the morning, lunch, and evening rush hours of the day.

4.4 Monthly User Traffic

Looking at the combined year, casual riders exceed members in number of bike rides between mid-May and mid-August, perhaps suggesting an increase in locals on vacation and tourists visiting Chicago during the summer months.

4.5 Most Popular Stations for Riders

Top 10 popular station for casual and member users

5.Share

Findings

Members:

  • Members are most likely daily traveller who take bikes consistently during weekday than casual riders
  • Their average trip duration of 12.83 minutes is seven minutes lesser than the average trip length of casual users
  • Their ridership peaks around rush hour time during the day but decline during winter months

Casual Riders:

  • Casual Riders may be a mixture of locals and tourists who ride bikes for longer period of time and more often on the weekend.
  • Their average trip duration is twice longer than average ride length of members riders
  • Their ridership peaks above members during the summer months

6.Act

Recommendations

I recommend a marketing campaign pinpointing the perks of subscribing to membership pass such as lower cost per hour, access to advance booking system, and loyalty program.

  • The place near the top 20 most popular stations should advertise heavily because it has the highest traffic across 200 stations which contributing to huge part of the profit.
  • During the cold season such as winters, promotion on membership could be considered to increase the ridership during the off season.
  • The social media marketing or advertisement should be more frequent during the peaks of days ( 8am, 12pm, 6pm) , months (summers).