## Introduction 

### About the company 
In 2016, Cyclistic successfully launched a bike-share program in Chicago, which has since expanded to encompass a fleet of 5,824 geotracked bicycles stationed at 692 locations throughout the city. This innovative system allows users to unlock bikes from one station and return them to any other within the network at any time. Cyclistic's growth and success have been driven by a marketing strategy focused on creating general awareness and appealing to diverse consumer segments. Central to this approach is the flexibility of its pricing plans, which include single-ride passes, full-day passes, and annual memberships. Casual riders, who opt for single-ride or full-day passes, form one consumer segment, while Cyclistic members, characterized by their purchase of annual memberships, constitute another. This strategic pricing flexibility has played a pivotal role in Cyclistic's ability to cater to a broad audience and foster a thriving bike-share community in Chicago.

### Business Task 

This project centers on developing effective marketing strategies to transform the substantial population of casual riders, numbering 71,643, into annual members. The primary objective is to design targeted approaches that encourage casual riders to transition to annual memberships. By leveraging available data, the focus is on understanding the unique preferences and motivations of casual riders, with the ultimate goal of creating compelling marketing initiatives that resonate with this sizable demographic. 

### Metadata 

Utilizing Cyclistic's historical trip data from Divvy_2019_Q1 and Divvy_2020_Q1, our analysis focuses on the first quarter—a period when individuals typically return to work post-holidays. By merging these datasets, we aim to gain comprehensive insights and a nuanced understanding of the collected information before making any informed decisions. This strategic approach allows us to examine patterns, behaviors, and trends during the initial months of the year, aiding in the formulation of data-driven strategies. The goal is to leverage this combined dataset to extract meaningful observations and draw informed conclusions that can guide subsequent actions for optimizing Cyclistic's operations and marketing strategies.


In [1]:
# importing library needed for this case study 

library(tidyverse)
library(ggplot2)
library(readr)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.4.4     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.0
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


In [2]:
#upload the data set

Q1_Divvy_2019 <- read_csv("/kaggle/input/divvy-trips-2019-and-2020/Divvy_Trips_2019_Q1 - Divvy_Trips_2019_Q1 (1).csv")
Q1_Divvy_2020 <- read_csv("/kaggle/input/divvy-trips-2019-and-2020/Divvy_Trips_2020_Q1 - Divvy_Trips_2020_Q1.csv")

[1mRows: [22m[34m365069[39m [1mColumns: [22m[34m12[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (6): start_time, end_time, from_station_name, to_station_name, usertype,...
[32mdbl[39m (5): trip_id, bikeid, from_station_id, to_station_id, birthyear
[32mnum[39m (1): tripduration

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1mRows: [22m[34m426887[39m [1mColumns: [22m[34m13[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (7): ride_id, rideable_type, started_at, ended_at, start_station_name, e...
[32mdbl[39m (6): start_station_id, end_station_id, start_lat, start_lng, end_lat, en...

[36mℹ[39m Use `spec()` to retrieve the full colu

In [3]:
#checking the columns name for both dataset 
colnames(Q1_Divvy_2019)
#colnames(Q1_Divvy_2020)

#The summary of the two dataset 
summary(Q1_Divvy_2019)
#summary(Q1_Divvy_2020)

#The column names are not consistent in the two dataframe we need to rename the some 
# column names most especially in Divvy_2019 in order to join the tables without injunctions
#Renaming columns in Q1_Divvy_2019


    trip_id          start_time          end_time             bikeid    
 Min.   :21742443   Length:365069      Length:365069      Min.   :   1  
 1st Qu.:21848765   Class :character   Class :character   1st Qu.:1777  
 Median :21961829   Mode  :character   Mode  :character   Median :3489  
 Mean   :21960872                                         Mean   :3429  
 3rd Qu.:22071823                                         3rd Qu.:5157  
 Max.   :22178528                                         Max.   :6471  
                                                                        
  tripduration      from_station_id from_station_name  to_station_id  
 Min.   :      61   Min.   :  2.0   Length:365069      Min.   :  2.0  
 1st Qu.:     326   1st Qu.: 76.0   Class :character   1st Qu.: 76.0  
 Median :     524   Median :170.0   Mode  :character   Median :168.0  
 Mean   :    1016   Mean   :198.1                      Mean   :198.6  
 3rd Qu.:     866   3rd Qu.:287.0                      3rd Qu

In [4]:
(Q1_Divvy_2019 <- rename(Q1_Divvy_2019,ride_id = trip_id
                         ,rideable_type = bikeid
                        ,started_at = start_time
                        ,ended_at = end_time
                        ,start_station_name = from_station_name
                        ,start_station_id = from_station_id
                        ,end_station_name = to_station_name
                        ,end_station_id = to_station_id
                        ,member_casual = usertype
))

ride_id,started_at,ended_at,rideable_type,tripduration,start_station_id,start_station_name,end_station_id,end_station_name,member_casual,gender,birthyear
<dbl>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<chr>,<dbl>,<chr>,<chr>,<chr>,<dbl>
21742443,2019-01-01 0:04:37,2019-01-01 0:11:07,2167,390,199,Wabash Ave & Grand Ave,84,Milwaukee Ave & Grand Ave,Subscriber,Male,1989
21742444,2019-01-01 0:08:13,2019-01-01 0:15:34,4386,441,44,State St & Randolph St,624,Dearborn St & Van Buren St (*),Subscriber,Female,1990
21742445,2019-01-01 0:13:23,2019-01-01 0:27:12,1524,829,15,Racine Ave & 18th St,644,Western Ave & Fillmore St (*),Subscriber,Female,1994
21742446,2019-01-01 0:13:45,2019-01-01 0:43:28,252,1783,123,California Ave & Milwaukee Ave,176,Clark St & Elm St,Subscriber,Male,1993
21742447,2019-01-01 0:14:52,2019-01-01 0:20:56,1170,364,173,Mies van der Rohe Way & Chicago Ave,35,Streeter Dr & Grand Ave,Subscriber,Male,1994
21742448,2019-01-01 0:15:33,2019-01-01 0:19:09,2437,216,98,LaSalle St & Washington St,49,Dearborn St & Monroe St,Subscriber,Female,1983
21742449,2019-01-01 0:16:06,2019-01-01 0:19:03,2708,177,98,LaSalle St & Washington St,49,Dearborn St & Monroe St,Subscriber,Male,1984
21742450,2019-01-01 0:18:41,2019-01-01 0:20:21,2796,100,211,St. Clair St & Erie St,142,McClurg Ct & Erie St,Subscriber,Male,1990
21742451,2019-01-01 0:18:43,2019-01-01 0:47:30,6205,1727,150,Fort Dearborn Dr & 31st St,148,State St & 33rd St,Subscriber,Male,1995
21742452,2019-01-01 0:19:18,2019-01-01 0:24:54,3939,336,268,Lake Shore Dr & North Blvd,141,Clark St & Lincoln Ave,Subscriber,Male,1996


In [5]:
(Q1_Divvy_2019 <- rename(Q1_Divvy_2019,ride_id = trip_id
                        ))

ERROR: [1m[33mError[39m in `rename()`:[22m
[33m![39m Can't rename columns that don't exist.
[31m✖[39m Column `trip_id` doesn't exist.


In [None]:
#from our observation we have to convert ride_id and rideable_type on Q1_Divvy_2019 
#to character in order to stack propertly

Q1_Divvy_2019 <- mutate(Q1_Divvy_2019, ride_id = as.character(ride_id)
                        ,rideable_type = as.character(rideable_type
                        ))

In [None]:
colnames(Q1_Divvy_2019)

In [None]:
#join the two dataframe together to get a new big dataframe 
all_trips <- bind_rows(Q1_Divvy_2019,Q1_Divvy_2020)

In [None]:
#checking the new dataframe 
colnames(all_trips)
str(all_trips)
head(all_trips)

## Combinig two data set 
Following the merger of the two datasets, the combined dataset now consists of 791,956 rows and 16 columns. However, in the interest of streamlining our analysis, certain columns have been identified as irrelevant and subsequently removed to focus on the key data points essential for our analytical objectives.


In [None]:
nrow(all_trips)
ncol(all_trips)

In [None]:
#remove some colunms that are not needed 
all_trips <- all_trips%>%
  select(-c(start_lat, start_lng, end_lat, end_lng, birthyear, gender,  "tripduration"))


In [None]:
#checking the statistical summary of the dataframe 
summary(all_trips)


# Cleaning  and preparation of the data

In [None]:

#1 In column "member_casual" we now subriber and cumstomer which needs to 
#change to member and casual respectively
all_trips <- all_trips%>%
  mutate(member_casual = recode(member_casual, "Subscriber" = "member", 
                                "Customer" = "casual"))

In [None]:
#we can verify our result 
table(all_trips$member_casual)
# now we know the number of casual and member 

In [None]:
# we need to add additional columns like day, month and year in order to aggregate 
# the data accordinly 
all_trips$date <- as.Date(all_trips$started_at)
all_trips$month <- format(as.Date(all_trips$date), "%m")
all_trips$day <- format(as.Date(all_trips$date), "%d")
all_trips$year <- format(as.Date(all_trips$date), "%y")
all_trips$day_of_week <- format(as.Date(all_trips$date), "%A")

In [None]:
#we need to add the ride length whichs is calculated in seconds 
all_trips$ride_length <- difftime(all_trips$ended_at,all_trips$started_at)

In [None]:

#checking the dataframe 
str(all_trips)

In [None]:
#we want to convert ride_length from factor to numeric 
all_trips$ride_length <- as.numeric(as.character(all_trips$ride_length))
is.numeric(all_trips$ride_length)
str(all_trips)

In [None]:
#Removing Bad data 
#cheking through the ride_length columns we discover the some are negative 
# which is quite impossible, but fortunately the are just few hundreds 
# removing them wont be significantly affect out result

all_trips_v2 <- all_trips[!(all_trips$start_station_name == "HQ QR" | all_trips$ride_length<0),]
#checking the data
str(all_trips_v2)
colnames(all_trips_v2)

## Descriptive analysis

In [None]:
 

# Descriptive analysis on ride_length (all figures in seconds)
mean(all_trips_v2$ride_length) #straight average (total ride length / rides)
median(all_trips_v2$ride_length) #midpoint number in the ascending array of ride lengths
max(all_trips_v2$ride_length) #longest ride
min(all_trips_v2$ride_length) #shortest ride

In [None]:
# Comparing  members and casual users
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = mean)
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = median)
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = max)
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = min)

In [None]:
# Analyze the number of rides in each station 
station = all_trips %>%
  group_by(member_casual, start_station_name) %>%
  summarise(number_of_rides = n(), 
            average_duration = mean(ride_length), 
            .groups = 'drop') %>%
  arrange(member_casual, -number_of_rides, start_station_name)
head(station)

In [None]:
# Analyze the total ride per month for different user types 
total_ride_month = all_trips %>%
  group_by(member_casual, month) %>%
  summarise(number_of_rides = n(), 
            average_duration = mean(ride_length)) %>%
  arrange(member_casual, month)
ggplot(total_ride_month, mapping = aes(x = month, y = number_of_rides, group = member_casual, color = member_casual)) +
  geom_line()

In [None]:
# Analyze how many hours each user type take along with total number of rides 
hour = all_trips %>%
  mutate(hour = as.factor(hour(started_at))) %>%
  group_by(member_casual, hour) %>%
  summarise(number_of_rides = n(), 
            average_duration = mean(ride_length)) %>%
  arrange(member_casual, hour)

ggplot(hour, mapping = aes(x = hour, y = number_of_rides, group = member_casual, color = member_casual)) +
  geom_line()
ggplot(hour, mapping = aes(x = hour, y = average_duration, group = member_casual, color = member_casual)) +
  geom_line()

## Average ride time by each day for members vs casual users
To assess the average ride time for Cyclistic's members versus casual users, we are undertaking a comprehensive analysis of the merged dataset. By distinguishing between these user categories, we aim to understand their respective patterns in terms of daily ride durations. This involves computing the average ride time for each day, differentiating between Cyclistic members and casual users. Such a comparison will provide valuable insights into the distinctive behaviors of these user segments, enabling us to tailor marketing and operational strategies to cater to their specific preferences and usage patterns. This nuanced understanding is crucial for optimizing services and enhancing user engagement

In [None]:
# the average ride time by each day for members vs casual users
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = mean)

# Notice that the days of the week are out of order. Let's fix that.
all_trips_v2$day_of_week <- ordered(all_trips_v2$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

# Now, let's run the average ride time by each day for members vs casual users
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = mean)

In [None]:
# Let's create a visualization for average duration
all_trips_v2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday)  %>% 
  ggplot(aes(x = weekday, y = average_duration, fill = member_casual)) +
  geom_col(position = "dodge")


## Ridership data by type and weekday

In our analysis of Cyclistic's ridership data by user type and weekday, we aim to uncover meaningful patterns in bike usage. By categorizing users into members and casual riders, and examining their activity on specific weekdays, we can identify trends and variations in demand. This granular approach allows us to discern whether certain days exhibit distinct preferences or higher utilization rates among specific user segments.

In [None]:
# analyze ridership data by type and weekday
all_trips_v2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>%  #creates weekday field using wday()
  group_by(member_casual, weekday) %>%  #groups by usertype and weekday
  summarise(number_of_rides = n()							#calculates the number of rides and average duration 
            ,average_duration = mean(ride_length)) %>% 		# calculates the average duration
  arrange(member_casual, weekday)	

In [None]:
# Let's visualize the number of rides by rider type
all_trips_v2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday)  %>% 
  ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
  geom_col(position = "dodge")

# Conclustion 

Casual riders display a clear inclination towards longer rides, particularly on Thursdays and Fridays, where Thursdays stand out with the highest average duration. Noteworthy is the remarkable surge in ride lengths on Thursdays, setting it apart from other weekdays. Additionally, weekends, especially Saturdays, witness significant ride engagement, highlighting a distinct trend of increased cycling during the weekend. This data underscores the unique riding behaviors of casual users, emphasizing extended rides during weekdays and heightened participation over the weekends. 

# Recommendation 
Executing time-sensitive promotions or tailored special deals for casual riders proves to be a potent strategy for encouraging their shift towards annual memberships. By introducing limited-time discounts, exclusive merchandise, or early access to novel features, a sense of urgency is instilled, compelling casual riders to seize the opportunity within a specified timeframe. This approach leverages the psychology of urgency, prompting individuals to make swift decisions and upgrade to annual memberships.

Moreover, the incorporation of bundle packages enhances the value proposition for potential annual members. These packages may include discounted rates or exclusive perks, presenting an appealing and enticing offer. Through bundling services and benefits, casual riders are provided with a comprehensive and compelling rationale to commit to an annual membership, thereby heightening the likelihood of conversion.

Facilitating a more immersive experience, offering short-term trial memberships at a reduced cost allows casual riders to explore the advantages of annual memberships. This approach permits individuals to sample the benefits and convenience associated with annual membership, potentially fostering a stronger inclination to make a more enduring commitment.

In synergy with these promotional endeavors, the development of engaging content assumes a pivotal role. Success stories, testimonials, or case studies from contented annual members serve as potent marketing tools. Real-life experiences shared by current annual members can resonate with casual riders, offering authentic insights into the positive outcomes of transitioning to an annual membership. This user-generated content not only bolsters credibility but also nurtures a sense of community and trust among potential annual members.

## Summary
Implementing a multifaceted strategy that incorporates time-sensitive promotions, bundle packages, short-term trial memberships, and compelling content can significantly enhance the conversion of casual riders into annual members. By strategically integrating these initiatives, the goal is to cater to diverse consumer motivations and preferences, ultimately optimizing the success of the marketing strategy. This comprehensive approach not only encourages conversions but also contributes to the development of a more committed and engaged Cyclistic community. The synergy of these elements aims to create a compelling and irresistible proposition for casual riders, fostering a stronger connection and long-term commitment to Cyclistic's annual membership.