# Case study, How does a bike-share navigate speedy success?
<p style="align, center"><img src="./misc/bike.JPG"></p>

## Business Task
Design marketing strategies aimed at converting casual riders into annual members

## Analytic questions:
* How do annual members and casual riders use Cyclistic bikes differently?
* Why would casual riders buy Cyclistic annual memberships? 
* How can Cyclistic use digital media to influence casual riders to become members?

### Source
* 202401-divvy-tripdata.csv
* 202402-divvy-tripdata.csv
* 202403-divvy-tripdata.csv
* 202404-divvy-tripdata.csv
* 202405-divvy-tripdata.csv
* 202406-divvy-tripdata.csv
* 202407-divvy-tripdata.csv
* 202408-divvy-tripdata.csv
* 202409-divvy-tripdata.csv
* 202410-divvy-tripdata.csv
* 202411-divvy-tripdata.csv
* 202412-divvy-tripdata.csv

<p style="font-style: italic;">The data has been made available by Motivate International Inc. under this <a href="https://divvybikes.com/data-license-agreement">license</a>.</p>

## Libraries import

In [70]:
import glob
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from datetime import datetime, timedelta
plt.style.use('ggplot')

# Preparation
### Load raw data

In [71]:
csv_files = glob.glob('raw data/*.csv')
dfs = []

for file in csv_files:
    data = pd.read_csv(file, encoding="utf-8",) 
    dfs.append(data)

df = pd.concat(dfs)
df.head()

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
0,C1D650626C8C899A,electric_bike,2024-01-12 15:30:27,2024-01-12 15:37:59,Wells St & Elm St,KA1504000135,Kingsbury St & Kinzie St,KA1503000043,41.903267,-87.634737,41.889177,-87.638506,member
1,EECD38BDB25BFCB0,electric_bike,2024-01-08 15:45:46,2024-01-08 15:52:59,Wells St & Elm St,KA1504000135,Kingsbury St & Kinzie St,KA1503000043,41.902937,-87.63444,41.889177,-87.638506,member
2,F4A9CE78061F17F7,electric_bike,2024-01-27 12:27:19,2024-01-27 12:35:19,Wells St & Elm St,KA1504000135,Kingsbury St & Kinzie St,KA1503000043,41.902951,-87.63447,41.889177,-87.638506,member
3,0A0D9E15EE50B171,classic_bike,2024-01-29 16:26:17,2024-01-29 16:56:06,Wells St & Randolph St,TA1305000030,Larrabee St & Webster Ave,13193,41.884295,-87.633963,41.921822,-87.64414,member
4,33FFC9805E3EFF9A,classic_bike,2024-01-31 05:43:23,2024-01-31 06:09:35,Lincoln Ave & Waveland Ave,13253,Kingsbury St & Kinzie St,KA1503000043,41.948797,-87.675278,41.889177,-87.638506,member


### Loading revelant columns

In [72]:
revelant_cols_df = df[['ride_id', 'rideable_type', 'started_at', 'ended_at', 'start_station_name', 'end_station_name', 'member_casual']].copy()
revelant_cols_df.head()

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,end_station_name,member_casual
0,C1D650626C8C899A,electric_bike,2024-01-12 15:30:27,2024-01-12 15:37:59,Wells St & Elm St,Kingsbury St & Kinzie St,member
1,EECD38BDB25BFCB0,electric_bike,2024-01-08 15:45:46,2024-01-08 15:52:59,Wells St & Elm St,Kingsbury St & Kinzie St,member
2,F4A9CE78061F17F7,electric_bike,2024-01-27 12:27:19,2024-01-27 12:35:19,Wells St & Elm St,Kingsbury St & Kinzie St,member
3,0A0D9E15EE50B171,classic_bike,2024-01-29 16:26:17,2024-01-29 16:56:06,Wells St & Randolph St,Larrabee St & Webster Ave,member
4,33FFC9805E3EFF9A,classic_bike,2024-01-31 05:43:23,2024-01-31 06:09:35,Lincoln Ave & Waveland Ave,Kingsbury St & Kinzie St,member


### Efficient datatypes conversion 

In [73]:
schema = {
    'ride_id': 'string[pyarrow]',
    'rideable_type': 'category',
    'started_at': 'datetime64[ns]',
    'ended_at': 'datetime64[ns]',
    'start_station_name': 'category',
    'end_station_name': 'category',
    'member_casual': 'category'
}

cast_df = revelant_cols_df.astype(schema).copy()
cast_df.dtypes

ride_id               string[pyarrow]
rideable_type                category
started_at             datetime64[ns]
ended_at               datetime64[ns]
start_station_name           category
end_station_name             category
member_casual                category
dtype: object

### Memory usage comparison

In [74]:
df.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
Index: 5860568 entries, 0 to 178371
Data columns (total 13 columns):
 #   Column              Dtype  
---  ------              -----  
 0   ride_id             object 
 1   rideable_type       object 
 2   started_at          object 
 3   ended_at            object 
 4   start_station_name  object 
 5   start_station_id    object 
 6   end_station_name    object 
 7   end_station_id      object 
 8   start_lat           float64
 9   start_lng           float64
 10  end_lat             float64
 11  end_lng             float64
 12  member_casual       object 
dtypes: float64(4), object(9)
memory usage: 3.3 GB


In [75]:
cast_df.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
Index: 5860568 entries, 0 to 178371
Data columns (total 7 columns):
 #   Column              Dtype         
---  ------              -----         
 0   ride_id             string        
 1   rideable_type       category      
 2   started_at          datetime64[ns]
 3   ended_at            datetime64[ns]
 4   start_station_name  category      
 5   end_station_name    category      
 6   member_casual       category      
dtypes: category(4), datetime64[ns](2), string(1)
memory usage: 302.2 MB


# Process

## Preliminary data analysis

In [76]:
cast_df.shape

(5860568, 7)

In [77]:
cast_df.head(10)

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,end_station_name,member_casual
0,C1D650626C8C899A,electric_bike,2024-01-12 15:30:27,2024-01-12 15:37:59,Wells St & Elm St,Kingsbury St & Kinzie St,member
1,EECD38BDB25BFCB0,electric_bike,2024-01-08 15:45:46,2024-01-08 15:52:59,Wells St & Elm St,Kingsbury St & Kinzie St,member
2,F4A9CE78061F17F7,electric_bike,2024-01-27 12:27:19,2024-01-27 12:35:19,Wells St & Elm St,Kingsbury St & Kinzie St,member
3,0A0D9E15EE50B171,classic_bike,2024-01-29 16:26:17,2024-01-29 16:56:06,Wells St & Randolph St,Larrabee St & Webster Ave,member
4,33FFC9805E3EFF9A,classic_bike,2024-01-31 05:43:23,2024-01-31 06:09:35,Lincoln Ave & Waveland Ave,Kingsbury St & Kinzie St,member
5,C96080812CD285C5,classic_bike,2024-01-07 11:21:24,2024-01-07 11:30:03,Wells St & Elm St,Kingsbury St & Kinzie St,member
6,0EA7CB313D4F456A,classic_bike,2024-01-05 14:44:12,2024-01-05 14:53:06,Wells St & Elm St,Kingsbury St & Kinzie St,member
7,EE11F3A3B39CFBD8,electric_bike,2024-01-04 18:19:53,2024-01-04 18:28:04,Wells St & Elm St,Kingsbury St & Kinzie St,member
8,63E83DE8E3279F15,classic_bike,2024-01-01 14:46:53,2024-01-01 14:57:02,Wells St & Elm St,Kingsbury St & Kinzie St,member
9,8005682869122D93,electric_bike,2024-01-03 19:31:08,2024-01-03 19:40:05,Clark St & Ida B Wells Dr,Kingsbury St & Kinzie St,member


In [78]:
cast_df.columns

Index(['ride_id', 'rideable_type', 'started_at', 'ended_at',
       'start_station_name', 'end_station_name', 'member_casual'],
      dtype='object')

In [79]:
cast_df.describe()

Unnamed: 0,started_at,ended_at
count,5860568,5860568
mean,2024-07-17 07:55:47.617262848,2024-07-17 08:13:06.552330496
min,2024-01-01 00:00:39,2024-01-01 00:04:20
25%,2024-05-20 19:47:53,2024-05-20 20:07:54.750000128
50%,2024-07-22 20:36:16.283500032,2024-07-22 20:53:59.158500096
75%,2024-09-17 20:14:22.566249984,2024-09-17 20:27:46.025999872
max,2024-12-31 23:56:49.854000,2024-12-31 23:59:55.705000


## Check for N/A values and duplicates

### Check N/A

In [158]:
cast_df.isna().sum()

ride_id                     0
rideable_type               0
started_at                  0
ended_at                    0
start_station_name    1073951
end_station_name      1104653
member_casual               0
dtype: int64

### Assuming that rides without registered start or end station are registered by errors thus we won't take them into analysis

In [249]:
clean_df = cast_df.dropna().copy()

### Checking for duplicated with the same ride id

In [250]:
clean_df.loc[clean_df.duplicated(subset=['ride_id'], keep=False)]

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,end_station_name,member_casual
943,2C772EDDDBDEFDA3,electric_bike,2024-05-31 23:42:42.000,2024-06-01 00:25:08.000,Dearborn St & Van Buren St,DuSable Lake Shore Dr & Monroe St,casual
17181,7BC67FD33887B3CB,classic_bike,2024-05-31 22:54:20.000,2024-06-01 00:42:41.000,Burnham Harbor,St. Louis Ave & Fullerton Ave,casual
20441,43CD52984AD22D99,electric_bike,2024-05-31 23:45:51.000,2024-06-01 00:11:46.000,Wentworth Ave & Cermak Rd*,Loomis St & Lexington St,member
22451,ABBD88BEBC1431FF,electric_bike,2024-05-31 23:20:35.000,2024-06-01 00:33:41.000,McClurg Ct & Erie St,McClurg Ct & Erie St,casual
25437,CA12CCDD359DA80C,classic_bike,2024-05-31 23:54:35.000,2024-06-01 00:01:25.000,Halsted St & Archer Ave,Morgan St & 31st St,casual
...,...,...,...,...,...,...,...
627078,2F6D74102A4FFFD6,classic_bike,2024-05-31 23:45:59.789,2024-06-01 00:11:51.768,Morgan St & Lake St*,LaSalle Dr & Huron St,member
627085,6ED3DDF49B9EE461,classic_bike,2024-05-31 23:48:38.271,2024-06-01 00:13:31.085,Morgan St & Lake St*,LaSalle Dr & Huron St,casual
627733,43637BA11F2DAA42,classic_bike,2024-05-31 20:27:14.662,2024-06-01 13:28:51.261,Dearborn St & Erie St,Broadway & Cornelia Ave,casual
632558,07DBFDA3C91006AE,classic_bike,2024-05-31 22:35:31.362,2024-06-01 09:05:13.160,Western Ave & Walton St,Western Ave & Walton St,casual


In [251]:
clean_df = clean_df.drop_duplicates(subset=['ride_id']).copy()

### Checking for stations that have same start time, end time, start station and end station

In [252]:
clean_df.loc[clean_df.duplicated(subset=['started_at', 'ended_at', 'start_station_name', 'end_station_name'], keep=False)]

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,end_station_name,member_casual
102692,D3CE8F069C526B23,classic_bike,2024-01-30 20:27:57,2024-01-30 20:50:51,Canal St & Adams St,State St & Randolph St,member
102754,BD06A0FDAC1D1183,classic_bike,2024-01-30 20:27:57,2024-01-30 20:50:51,Canal St & Adams St,State St & Randolph St,member
71207,A47F1F8E79A156BC,classic_bike,2024-02-27 13:40:53,2024-02-27 13:57:21,McClurg Ct & Erie St,Halsted St & Clybourn Ave,casual
165886,36EE4451474A3DC0,classic_bike,2024-02-27 13:40:53,2024-02-27 13:57:21,McClurg Ct & Erie St,Halsted St & Clybourn Ave,casual


In [253]:
clean_df = clean_df.drop_duplicates(subset=['started_at', 'ended_at', 'start_station_name', 'end_station_name']).copy()

### Checking for stations that have same start time, end time, start station or end station
Whenever those records need to be deleted, consultation with someone from the company is required to determine if the given station can hold more than two bikes.

In [254]:
clean_df.loc[clean_df.duplicated(subset=['started_at', 'ended_at', 'start_station_name'], keep=False)]

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,end_station_name,member_casual


In [255]:
clean_df.loc[clean_df.duplicated(subset=['started_at', 'ended_at', 'end_station_name'], keep=False)]

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,end_station_name,member_casual
102616,48AB680D63B319F5,classic_bike,2024-03-04 17:36:30,2024-03-04 17:46:12,Mies van der Rohe Way & Chestnut St,New St & Illinois St,member
181898,FAA4EB520E8918AF,classic_bike,2024-03-04 17:36:30,2024-03-04 17:46:12,Wells St & Huron St,New St & Illinois St,member


### Creating ride_time column that shows difference between ended_at and started_at

In [256]:
ride_time = clean_df['ended_at'] - clean_df['started_at']
clean_df.insert(4, 'ride_time', ride_time)
clean_df

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,ride_time,start_station_name,end_station_name,member_casual
0,C1D650626C8C899A,electric_bike,2024-01-12 15:30:27.000,2024-01-12 15:37:59.000,0 days 00:07:32,Wells St & Elm St,Kingsbury St & Kinzie St,member
1,EECD38BDB25BFCB0,electric_bike,2024-01-08 15:45:46.000,2024-01-08 15:52:59.000,0 days 00:07:13,Wells St & Elm St,Kingsbury St & Kinzie St,member
2,F4A9CE78061F17F7,electric_bike,2024-01-27 12:27:19.000,2024-01-27 12:35:19.000,0 days 00:08:00,Wells St & Elm St,Kingsbury St & Kinzie St,member
3,0A0D9E15EE50B171,classic_bike,2024-01-29 16:26:17.000,2024-01-29 16:56:06.000,0 days 00:29:49,Wells St & Randolph St,Larrabee St & Webster Ave,member
4,33FFC9805E3EFF9A,classic_bike,2024-01-31 05:43:23.000,2024-01-31 06:09:35.000,0 days 00:26:12,Lincoln Ave & Waveland Ave,Kingsbury St & Kinzie St,member
...,...,...,...,...,...,...,...,...
178363,36DAF3C93190E07F,classic_bike,2024-12-13 15:40:06.123,2024-12-13 15:46:29.553,0 days 00:06:23.430000,Albany Ave & Bloomingdale Ave,California Ave & Milwaukee Ave,member
178365,71F02C3CF79B8090,classic_bike,2024-12-17 08:09:12.581,2024-12-17 08:15:50.134,0 days 00:06:37.553000,Albany Ave & Bloomingdale Ave,California Ave & Milwaukee Ave,member
178366,85AE8840FA0E4EAB,classic_bike,2024-12-18 08:22:40.737,2024-12-18 08:29:25.021,0 days 00:06:44.284000,Albany Ave & Bloomingdale Ave,California Ave & Milwaukee Ave,member
178369,15602635C5DF484E,electric_bike,2024-12-31 17:10:03.113,2024-12-31 17:17:21.838,0 days 00:07:18.725000,Albany Ave & Bloomingdale Ave,California Ave & Milwaukee Ave,member


### Checking for rides that lasted less than 10 seconds

In [257]:
clean_df[clean_df['ride_time'] <= timedelta(seconds=10)]

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,ride_time,start_station_name,end_station_name,member_casual
3184,C1FF15F72F5E6CB9,classic_bike,2024-01-17 12:43:55.000,2024-01-17 12:44:02.000,0 days 00:00:07,Loomis St & Lexington St,Loomis St & Lexington St,member
3190,5AA3D2E8010FEF7D,classic_bike,2024-01-17 14:25:27.000,2024-01-17 14:25:29.000,0 days 00:00:02,Halsted St & Wrightwood Ave,Halsted St & Wrightwood Ave,member
3196,27E081696A03E872,classic_bike,2024-01-02 10:32:54.000,2024-01-02 10:32:56.000,0 days 00:00:02,Halsted St & Wrightwood Ave,Halsted St & Wrightwood Ave,member
3214,785A6FF31548084A,electric_bike,2024-01-09 12:08:38.000,2024-01-09 12:08:41.000,0 days 00:00:03,Public Rack - Walden Pkwy & 103rd St,Public Rack - Walden Pkwy & 103rd St,member
3888,188E0BD5237F114A,electric_bike,2024-01-13 07:59:43.000,2024-01-13 07:59:45.000,0 days 00:00:02,Milwaukee Ave & Grand Ave,Milwaukee Ave & Grand Ave,member
...,...,...,...,...,...,...,...,...
140459,E63529FC8E21F576,electric_bike,2024-12-12 17:32:37.347,2024-12-12 17:32:39.672,0 days 00:00:02.325000,Columbus Dr & Randolph St,Columbus Dr & Randolph St,member
144841,8EB8A2470D356A70,electric_bike,2024-12-06 08:21:21.551,2024-12-06 08:21:28.504,0 days 00:00:06.953000,Michigan Ave & 14th St,Michigan Ave & 14th St,member
144848,485D99B783F271EE,electric_bike,2024-12-24 08:00:27.583,2024-12-24 08:00:35.332,0 days 00:00:07.749000,Fairbanks Ct & Grand Ave,Fairbanks Ct & Grand Ave,member
144910,FFB5F7402AD46006,electric_bike,2024-12-20 19:20:06.577,2024-12-20 19:20:15.401,0 days 00:00:08.824000,California Ave & Cortez St,California Ave & Cortez St,casual


### Creating sorted dataframe

In [258]:
analysis_df = clean_df[~(clean_df['ride_time'] <= timedelta(seconds=10))].copy()
analysis_df = analysis_df.sort_values('started_at')
analysis_df

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,ride_time,start_station_name,end_station_name,member_casual
48853,56F5C3ED5178C131,classic_bike,2024-01-01 00:01:01.000,2024-01-01 00:24:12.000,0 days 00:23:11,LaSalle St & Illinois St,Indiana Ave & Roosevelt Rd,member
26576,70BDF49A30C8BFFC,classic_bike,2024-01-01 00:02:15.000,2024-01-01 00:07:01.000,0 days 00:04:46,Sheffield Ave & Fullerton Ave,Greenview Ave & Fullerton Ave,casual
46733,B7F1F63BD1AFF4E9,classic_bike,2024-01-01 00:06:59.000,2024-01-01 00:17:21.000,0 days 00:10:22,Western Ave & Howard St,Clark St & Lunt Ave,member
40692,444DD0D82A50BA8C,classic_bike,2024-01-01 00:07:45.000,2024-01-01 00:15:41.000,0 days 00:07:56,DuSable Lake Shore Dr & North Blvd,Sedgwick St & Webster Ave,casual
71103,510C1C0AA564ADF0,classic_bike,2024-01-01 00:07:57.000,2024-01-01 00:29:42.000,0 days 00:21:45,Clinton St & Tilden St,LaSalle St & Illinois St,casual
...,...,...,...,...,...,...,...,...
56044,D10276507FC2E40A,electric_bike,2024-12-31 23:52:49.117,2024-12-31 23:58:19.908,0 days 00:05:30.791000,Clinton St & Tilden St,Daley Center Plaza,member
39689,C8FBE8FBA3C157F1,electric_bike,2024-12-31 23:54:01.903,2024-12-31 23:59:28.819,0 days 00:05:26.916000,Michigan Ave & Jackson Blvd,Stetson Ave & South Water St,casual
2282,B14C678DEA55A583,electric_bike,2024-12-31 23:54:37.045,2024-12-31 23:57:19.293,0 days 00:02:42.248000,Paulina St & 18th St,Racine Ave & 18th St,member
119498,86CB9E2042DD6E4F,electric_bike,2024-12-31 23:56:38.214,2024-12-31 23:56:51.547,0 days 00:00:13.333000,Dusable Harbor,Dusable Harbor,member


### Creating month and week columns of when rides started

In [259]:
month = analysis_df['started_at'].dt.month
week = analysis_df['started_at'].dt.isocalendar()['week']
analysis_df.insert(5, 'month', month)
analysis_df.insert(6, 'week', week)
analysis_df

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,ride_time,month,week,start_station_name,end_station_name,member_casual
48853,56F5C3ED5178C131,classic_bike,2024-01-01 00:01:01.000,2024-01-01 00:24:12.000,0 days 00:23:11,1,1,LaSalle St & Illinois St,Indiana Ave & Roosevelt Rd,member
26576,70BDF49A30C8BFFC,classic_bike,2024-01-01 00:02:15.000,2024-01-01 00:07:01.000,0 days 00:04:46,1,1,Sheffield Ave & Fullerton Ave,Greenview Ave & Fullerton Ave,casual
46733,B7F1F63BD1AFF4E9,classic_bike,2024-01-01 00:06:59.000,2024-01-01 00:17:21.000,0 days 00:10:22,1,1,Western Ave & Howard St,Clark St & Lunt Ave,member
40692,444DD0D82A50BA8C,classic_bike,2024-01-01 00:07:45.000,2024-01-01 00:15:41.000,0 days 00:07:56,1,1,DuSable Lake Shore Dr & North Blvd,Sedgwick St & Webster Ave,casual
71103,510C1C0AA564ADF0,classic_bike,2024-01-01 00:07:57.000,2024-01-01 00:29:42.000,0 days 00:21:45,1,1,Clinton St & Tilden St,LaSalle St & Illinois St,casual
...,...,...,...,...,...,...,...,...,...,...
56044,D10276507FC2E40A,electric_bike,2024-12-31 23:52:49.117,2024-12-31 23:58:19.908,0 days 00:05:30.791000,12,1,Clinton St & Tilden St,Daley Center Plaza,member
39689,C8FBE8FBA3C157F1,electric_bike,2024-12-31 23:54:01.903,2024-12-31 23:59:28.819,0 days 00:05:26.916000,12,1,Michigan Ave & Jackson Blvd,Stetson Ave & South Water St,casual
2282,B14C678DEA55A583,electric_bike,2024-12-31 23:54:37.045,2024-12-31 23:57:19.293,0 days 00:02:42.248000,12,1,Paulina St & 18th St,Racine Ave & 18th St,member
119498,86CB9E2042DD6E4F,electric_bike,2024-12-31 23:56:38.214,2024-12-31 23:56:51.547,0 days 00:00:13.333000,12,1,Dusable Harbor,Dusable Harbor,member


### Fixing error where last week of 2024 is counted as first week of the year

In [260]:
analysis_df.loc[(analysis_df['month'] == 12) & (analysis_df['week'] == 1), 'week']

31546     1
31553     1
81150     1
9355      1
72337     1
         ..
56044     1
39689     1
2282      1
119498    1
92419     1
Name: week, Length: 6831, dtype: UInt32

In [261]:
analysis_df.loc[(analysis_df['month'] == 12) & (analysis_df['week'] == 1), 'week'] = 53
analysis_df

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,ride_time,month,week,start_station_name,end_station_name,member_casual
48853,56F5C3ED5178C131,classic_bike,2024-01-01 00:01:01.000,2024-01-01 00:24:12.000,0 days 00:23:11,1,1,LaSalle St & Illinois St,Indiana Ave & Roosevelt Rd,member
26576,70BDF49A30C8BFFC,classic_bike,2024-01-01 00:02:15.000,2024-01-01 00:07:01.000,0 days 00:04:46,1,1,Sheffield Ave & Fullerton Ave,Greenview Ave & Fullerton Ave,casual
46733,B7F1F63BD1AFF4E9,classic_bike,2024-01-01 00:06:59.000,2024-01-01 00:17:21.000,0 days 00:10:22,1,1,Western Ave & Howard St,Clark St & Lunt Ave,member
40692,444DD0D82A50BA8C,classic_bike,2024-01-01 00:07:45.000,2024-01-01 00:15:41.000,0 days 00:07:56,1,1,DuSable Lake Shore Dr & North Blvd,Sedgwick St & Webster Ave,casual
71103,510C1C0AA564ADF0,classic_bike,2024-01-01 00:07:57.000,2024-01-01 00:29:42.000,0 days 00:21:45,1,1,Clinton St & Tilden St,LaSalle St & Illinois St,casual
...,...,...,...,...,...,...,...,...,...,...
56044,D10276507FC2E40A,electric_bike,2024-12-31 23:52:49.117,2024-12-31 23:58:19.908,0 days 00:05:30.791000,12,53,Clinton St & Tilden St,Daley Center Plaza,member
39689,C8FBE8FBA3C157F1,electric_bike,2024-12-31 23:54:01.903,2024-12-31 23:59:28.819,0 days 00:05:26.916000,12,53,Michigan Ave & Jackson Blvd,Stetson Ave & South Water St,casual
2282,B14C678DEA55A583,electric_bike,2024-12-31 23:54:37.045,2024-12-31 23:57:19.293,0 days 00:02:42.248000,12,53,Paulina St & 18th St,Racine Ave & 18th St,member
119498,86CB9E2042DD6E4F,electric_bike,2024-12-31 23:56:38.214,2024-12-31 23:56:51.547,0 days 00:00:13.333000,12,53,Dusable Harbor,Dusable Harbor,member


# Analysis

In [262]:
analysis_df

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,ride_time,month,week,start_station_name,end_station_name,member_casual
48853,56F5C3ED5178C131,classic_bike,2024-01-01 00:01:01.000,2024-01-01 00:24:12.000,0 days 00:23:11,1,1,LaSalle St & Illinois St,Indiana Ave & Roosevelt Rd,member
26576,70BDF49A30C8BFFC,classic_bike,2024-01-01 00:02:15.000,2024-01-01 00:07:01.000,0 days 00:04:46,1,1,Sheffield Ave & Fullerton Ave,Greenview Ave & Fullerton Ave,casual
46733,B7F1F63BD1AFF4E9,classic_bike,2024-01-01 00:06:59.000,2024-01-01 00:17:21.000,0 days 00:10:22,1,1,Western Ave & Howard St,Clark St & Lunt Ave,member
40692,444DD0D82A50BA8C,classic_bike,2024-01-01 00:07:45.000,2024-01-01 00:15:41.000,0 days 00:07:56,1,1,DuSable Lake Shore Dr & North Blvd,Sedgwick St & Webster Ave,casual
71103,510C1C0AA564ADF0,classic_bike,2024-01-01 00:07:57.000,2024-01-01 00:29:42.000,0 days 00:21:45,1,1,Clinton St & Tilden St,LaSalle St & Illinois St,casual
...,...,...,...,...,...,...,...,...,...,...
56044,D10276507FC2E40A,electric_bike,2024-12-31 23:52:49.117,2024-12-31 23:58:19.908,0 days 00:05:30.791000,12,53,Clinton St & Tilden St,Daley Center Plaza,member
39689,C8FBE8FBA3C157F1,electric_bike,2024-12-31 23:54:01.903,2024-12-31 23:59:28.819,0 days 00:05:26.916000,12,53,Michigan Ave & Jackson Blvd,Stetson Ave & South Water St,casual
2282,B14C678DEA55A583,electric_bike,2024-12-31 23:54:37.045,2024-12-31 23:57:19.293,0 days 00:02:42.248000,12,53,Paulina St & 18th St,Racine Ave & 18th St,member
119498,86CB9E2042DD6E4F,electric_bike,2024-12-31 23:56:38.214,2024-12-31 23:56:51.547,0 days 00:00:13.333000,12,53,Dusable Harbor,Dusable Harbor,member


### Checking unique values within columns

In [263]:
analysis_df.nunique()

ride_id               4198335
rideable_type               3
started_at            4077137
ended_at              4079187
ride_time             1512773
month                      12
week                       53
start_station_name       1787
end_station_name         1798
member_casual               2
dtype: int64

### How many users are subscribed?

In [264]:
members = analysis_df['member_casual'].value_counts()
members

member_casual
member    2680018
casual    1518317
Name: count, dtype: int64

In [265]:
members_bar = px.bar(
    members,
    width=1000,
    color_discrete_sequence=['lightgreen']
)

members_bar.update_layout(
    xaxis_title='Membership status', 
    yaxis_title='User count', 
    title='Number of Users per Membership status',
    showlegend=False
)

members_bar.show()

### How many users are each using a different rideable devices?

In [266]:
rideable_types = analysis_df['rideable_type'].value_counts()
rideable_types

rideable_type
classic_bike        2724476
electric_bike       1426242
electric_scooter      47617
Name: count, dtype: int64

In [267]:
rideable_types_bar = px.bar(
    rideable_types,
    width=1000,
    color_discrete_sequence=['pink']
)

rideable_types_bar.update_layout(
    xaxis_title='Transport device', 
    yaxis_title='User count', 
    showlegend=False,
    title='Number of Users per Ridable device',    
)

rideable_types_bar.show()

### What are the stations that have never been registered as start stations and end stations?

In [268]:
unique_start = analysis_df['start_station_name'].unique()
unique_end = analysis_df['end_station_name'].unique()
never_start = set(unique_end) - set(unique_start)
never_end = set(unique_start) - set(unique_end)

print('These stations were never start stations:')
display(never_start)
print('\nThese stations were never end stations:')
display(never_end)

These stations were never start stations:


{'Base - 2132 W Hubbard',
 'Cumberland Ave & Catherine Ave',
 'Kedzie Ave & 38th Pl',
 'Laflin St & 115th St',
 'Public Rack - 53rd St & Indiana Ave',
 'Public Rack - Avenue O & 118th St',
 'Public Rack - Baltimore Ave & 134th St',
 'Public Rack - Columbus & 79th',
 'Public Rack - Cottage Grove & 85th St',
 'Public Rack - Ewing Ave & 102nd St',
 'Public Rack - Exchange Ave & 131st St',
 'Public Rack - Marquette Rd & 67th St',
 'Public Rack - Northwest Hwy & Overhill Ave',
 'Public Rack - Park Manor Elementary School',
 'Public Rack - Pulaski & 84th',
 'Public Rack - Rockwell Ave & 71st St',
 'Public Rack - Springfield & 79th',
 'Public Rack - Troy & 71st',
 'Public Rack - Wabash Ave & 87th St',
 'Public Rack - Western & 79th',
 'Public Rack - Western Ave & 98th St',
 'Public Rack - Whipple St & 26th St',
 'SCOOTERS - 2132 W Hubbard ST',
 'SCOOTERS CLASSIC - 2132 W Hubbard ST',
 'w. Chicago Warehouse'}


These stations were never end stations:


{'Oketo Ave & Addison',
 'Public Rack - Artesian & 71st',
 'Public Rack - Brooks Park',
 'Public Rack - Corliss Ave & 103rd St',
 'Public Rack - Ellis Ave & 132nd Pl',
 'Public Rack - Kedzie Ave & 83rd St',
 'Public Rack - Keeler Ave & 55th St',
 'Public Rack - Mason Ave & Milwaukee Ave',
 'Public Rack - Normandy Ave & Raven St',
 'Public Rack - Northwest Hwy & Highland Ave',
 'Public Rack - Pittsburgh Ave & Irving Park',
 'Public Rack - Prairie Ave & 78th St',
 'Public Rack - Roscoe St & Osceola Ave',
 'Public Rack - Tuley (Murray) Park'}

### Casual and member rides throughout  the year

In [269]:
# pd.set_option('display.max_rows', 200)
pd.reset_option('display.max_rows')

In [270]:
member_year_count = analysis_df.value_counts([ 'week', 'month', 'member_casual']).sort_index().reset_index(name='count')
# member_year_count['month'] = pd.to_datetime(member_year_count['month'], format='%m').dt.month_name()
member_year_count

Unnamed: 0,week,month,member_casual,count
0,1,1,casual,5958
1,1,1,member,25500
2,2,1,casual,3414
3,2,1,member,21431
4,3,1,casual,1726
...,...,...,...,...
119,51,12,member,24811
120,52,12,casual,5074
121,52,12,member,12606
122,53,12,casual,1855


In [271]:
analysis_df['month'] == 12 & analysis_df['week'] == 1

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In [None]:
member_year_line = px.line(
    member_year_count,
    x='week',
    y='count',
    color='member_casual',
    width=1000
)

member_year_line.update_layout(
    xaxis_title='Months', 
    yaxis_title='User count', 
    title='Number of Users throughout the Year by Subscription type',
    legend_title_text='Subscription',
)

member_year_line.show()

According to the graph above:
* Throughtout the year there are more membership users than casual
* The gap between membership users and casual user count stays the same throughout the year and is around 100,000 <-- work on this
* There is seasonal trend in usage - user count for both groups rises in warmer months

### Casual and ride types throughout the year

In [None]:
ride_year_count = analysis_df.value_counts(['rideable_type', 'month']).sort_index().reset_index(name='count')
ride_year_count['month'] = pd.to_datetime(ride_year_count['month'], format='%m').dt.month_name()
ride_year_count

Unnamed: 0,rideable_type,month,count
0,classic_bike,January,75855
1,classic_bike,February,139486
2,classic_bike,March,147594
3,classic_bike,April,187114
4,classic_bike,May,304598
5,classic_bike,June,336571
6,classic_bike,July,369404
7,classic_bike,August,352005
8,classic_bike,September,313614
9,classic_bike,October,280030


In [None]:
ride_year_line = px.line(
    ride_year_count,
    x='month',
    y='count',
    color='rideable_type',
    width=1000
)

ride_year_line.update_layout(
    xaxis_title='Months', 
    yaxis_title='User count', 
    title='Number of Users throughout the Year by Ride type',
    legend_title_text='Ride type',
)

ride_year_line.show()

According to the graph above:
* Throughtout the year users tend to use classic bikes over electrics
* There is seasonal trend in usage - user count for both classic and electric bikes rises in warmer months
* Electric scooters were an active option for only one month


In [None]:
analysis_df

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,ride_time,month,week,start_station_name,end_station_name,member_casual
0,C1D650626C8C899A,electric_bike,2024-01-12 15:30:27.000,2024-01-12 15:37:59.000,0 days 00:07:32,1,2,Wells St & Elm St,Kingsbury St & Kinzie St,member
1,EECD38BDB25BFCB0,electric_bike,2024-01-08 15:45:46.000,2024-01-08 15:52:59.000,0 days 00:07:13,1,2,Wells St & Elm St,Kingsbury St & Kinzie St,member
2,F4A9CE78061F17F7,electric_bike,2024-01-27 12:27:19.000,2024-01-27 12:35:19.000,0 days 00:08:00,1,4,Wells St & Elm St,Kingsbury St & Kinzie St,member
3,0A0D9E15EE50B171,classic_bike,2024-01-29 16:26:17.000,2024-01-29 16:56:06.000,0 days 00:29:49,1,5,Wells St & Randolph St,Larrabee St & Webster Ave,member
4,33FFC9805E3EFF9A,classic_bike,2024-01-31 05:43:23.000,2024-01-31 06:09:35.000,0 days 00:26:12,1,5,Lincoln Ave & Waveland Ave,Kingsbury St & Kinzie St,member
...,...,...,...,...,...,...,...,...,...,...
178363,36DAF3C93190E07F,classic_bike,2024-12-13 15:40:06.123,2024-12-13 15:46:29.553,0 days 00:06:23.430000,12,50,Albany Ave & Bloomingdale Ave,California Ave & Milwaukee Ave,member
178365,71F02C3CF79B8090,classic_bike,2024-12-17 08:09:12.581,2024-12-17 08:15:50.134,0 days 00:06:37.553000,12,51,Albany Ave & Bloomingdale Ave,California Ave & Milwaukee Ave,member
178366,85AE8840FA0E4EAB,classic_bike,2024-12-18 08:22:40.737,2024-12-18 08:29:25.021,0 days 00:06:44.284000,12,51,Albany Ave & Bloomingdale Ave,California Ave & Milwaukee Ave,member
178369,15602635C5DF484E,electric_bike,2024-12-31 17:10:03.113,2024-12-31 17:17:21.838,0 days 00:07:18.725000,12,1,Albany Ave & Bloomingdale Ave,California Ave & Milwaukee Ave,member
