# Case study, How does a bike-share navigate speedy success?
<p style="align, center"><img src="./misc/bike.JPG"></p>

## Business Task
Design marketing strategies aimed at converting casual riders into annual members

## Analytic questions:
* How do annual members and casual riders use Cyclistic bikes differently?
* Why would casual riders buy Cyclistic annual memberships? 
* How can Cyclistic use digital media to influence casual riders to become members?

### Source
* 202401-divvy-tripdata.csv
* 202402-divvy-tripdata.csv
* 202403-divvy-tripdata.csv
* 202404-divvy-tripdata.csv
* 202405-divvy-tripdata.csv
* 202406-divvy-tripdata.csv
* 202407-divvy-tripdata.csv
* 202408-divvy-tripdata.csv
* 202409-divvy-tripdata.csv
* 202410-divvy-tripdata.csv
* 202411-divvy-tripdata.csv
* 202412-divvy-tripdata.csv

<p style="font-style: italic;">The data has been made available by Motivate International Inc. under this <a href="https://divvybikes.com/data-license-agreement">license</a>.</p>

### Libraries import

In [180]:
import glob
import pandas as pd
import numpy as np

## Preparation
### Load raw data

Original dataframe contains almost 6 milion rows, in order to improve performance following scaling techniques has been executed:
* Loading only revelant columns
* Converting columns to efficient datatypes

In [181]:
csv_files = glob.glob('raw data/*.csv')
dfs = []

for file in csv_files:
    data = pd.read_csv(
        file, 
        encoding="utf-8",
        usecols=['rideable_type', 'started_at', 'ended_at', 'start_station_name', 'end_station_name', 'member_casual'],
        parse_dates=["started_at", "ended_at"], 
    ) 
    dfs.append(data)

df = pd.concat(dfs)
df

Unnamed: 0,rideable_type,started_at,ended_at,start_station_name,end_station_name,member_casual
0,electric_bike,2024-01-12 15:30:27.000,2024-01-12 15:37:59.000,Wells St & Elm St,Kingsbury St & Kinzie St,member
1,electric_bike,2024-01-08 15:45:46.000,2024-01-08 15:52:59.000,Wells St & Elm St,Kingsbury St & Kinzie St,member
2,electric_bike,2024-01-27 12:27:19.000,2024-01-27 12:35:19.000,Wells St & Elm St,Kingsbury St & Kinzie St,member
3,classic_bike,2024-01-29 16:26:17.000,2024-01-29 16:56:06.000,Wells St & Randolph St,Larrabee St & Webster Ave,member
4,classic_bike,2024-01-31 05:43:23.000,2024-01-31 06:09:35.000,Lincoln Ave & Waveland Ave,Kingsbury St & Kinzie St,member
...,...,...,...,...,...,...
178367,electric_bike,2024-12-11 08:23:46.564,2024-12-11 08:37:34.532,Clybourn Ave & Division St,,member
178368,electric_bike,2024-12-09 12:26:15.677,2024-12-09 12:37:32.712,Canal St & Jackson Blvd,,member
178369,electric_bike,2024-12-31 17:10:03.113,2024-12-31 17:17:21.838,Albany Ave & Bloomingdale Ave,California Ave & Milwaukee Ave,member
178370,electric_bike,2024-12-01 14:39:47.216,2024-12-01 14:45:21.268,Albany Ave & Bloomingdale Ave,California Ave & Milwaukee Ave,member


In [182]:
schema = {
    'rideable_type': 'category',
    'start_station_name': 'category',
    'end_station_name': 'category',
    'member_casual': 'category'
}

cast_df = df.astype(schema).copy()
cast_df.dtypes

rideable_type               category
started_at            datetime64[ns]
ended_at              datetime64[ns]
start_station_name          category
end_station_name            category
member_casual               category
dtype: object

In [183]:
cast_df.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
Index: 5860568 entries, 0 to 178371
Data columns (total 6 columns):
 #   Column              Dtype         
---  ------              -----         
 0   rideable_type       category      
 1   started_at          datetime64[ns]
 2   ended_at            datetime64[ns]
 3   start_station_name  category      
 4   end_station_name    category      
 5   member_casual       category      
dtypes: category(4), datetime64[ns](2)
memory usage: 168.1 MB


## Process
Check for N/A values and duplicates

In [184]:
f'Has N/A values: {cast_df.isna().values.any()}'

'Has N/A values: True'

In [185]:
f'Has duplicates: {cast_df.duplicated().values.any()}'

'Has duplicates: True'

In [186]:
cast_df[cast_df.duplicated(keep=False)]

Unnamed: 0,rideable_type,started_at,ended_at,start_station_name,end_station_name,member_casual
102692,classic_bike,2024-01-30 20:27:57,2024-01-30 20:50:51,Canal St & Adams St,State St & Randolph St,member
102754,classic_bike,2024-01-30 20:27:57,2024-01-30 20:50:51,Canal St & Adams St,State St & Randolph St,member
71207,classic_bike,2024-02-27 13:40:53,2024-02-27 13:57:21,McClurg Ct & Erie St,Halsted St & Clybourn Ave,casual
165886,classic_bike,2024-02-27 13:40:53,2024-02-27 13:57:21,McClurg Ct & Erie St,Halsted St & Clybourn Ave,casual
323545,electric_bike,2024-04-14 13:35:52,2024-04-14 13:52:16,,,member
324022,electric_bike,2024-04-14 13:35:52,2024-04-14 13:52:16,,,member
297036,electric_bike,2024-05-19 14:15:50,2024-05-19 14:34:09,,,casual
590195,electric_bike,2024-05-22 20:32:49,2024-05-22 20:45:30,Sheffield Ave & Waveland Ave,,member
596510,electric_bike,2024-05-22 20:32:49,2024-05-22 20:45:30,Sheffield Ave & Waveland Ave,,member
608241,electric_bike,2024-05-19 14:15:50,2024-05-19 14:34:09,,,casual


In [195]:
clean_df = cast_df.drop_duplicates().copy()
clean_df

Unnamed: 0,rideable_type,started_at,ended_at,start_station_name,end_station_name,member_casual
0,electric_bike,2024-01-12 15:30:27.000,2024-01-12 15:37:59.000,Wells St & Elm St,Kingsbury St & Kinzie St,member
1,electric_bike,2024-01-08 15:45:46.000,2024-01-08 15:52:59.000,Wells St & Elm St,Kingsbury St & Kinzie St,member
2,electric_bike,2024-01-27 12:27:19.000,2024-01-27 12:35:19.000,Wells St & Elm St,Kingsbury St & Kinzie St,member
3,classic_bike,2024-01-29 16:26:17.000,2024-01-29 16:56:06.000,Wells St & Randolph St,Larrabee St & Webster Ave,member
4,classic_bike,2024-01-31 05:43:23.000,2024-01-31 06:09:35.000,Lincoln Ave & Waveland Ave,Kingsbury St & Kinzie St,member
...,...,...,...,...,...,...
178367,electric_bike,2024-12-11 08:23:46.564,2024-12-11 08:37:34.532,Clybourn Ave & Division St,,member
178368,electric_bike,2024-12-09 12:26:15.677,2024-12-09 12:37:32.712,Canal St & Jackson Blvd,,member
178369,electric_bike,2024-12-31 17:10:03.113,2024-12-31 17:17:21.838,Albany Ave & Bloomingdale Ave,California Ave & Milwaukee Ave,member
178370,electric_bike,2024-12-01 14:39:47.216,2024-12-01 14:45:21.268,Albany Ave & Bloomingdale Ave,California Ave & Milwaukee Ave,member


In [None]:
clean_df['ride_length'] = (clean_df['ended_at'] - clean_df['started_at'])

Unnamed: 0,rideable_type,started_at,ended_at,start_station_name,end_station_name,member_casual,ride_length
0,electric_bike,2024-01-12 15:30:27.000,2024-01-12 15:37:59.000,Wells St & Elm St,Kingsbury St & Kinzie St,member,0 days 00:07:32
1,electric_bike,2024-01-08 15:45:46.000,2024-01-08 15:52:59.000,Wells St & Elm St,Kingsbury St & Kinzie St,member,0 days 00:07:13
2,electric_bike,2024-01-27 12:27:19.000,2024-01-27 12:35:19.000,Wells St & Elm St,Kingsbury St & Kinzie St,member,0 days 00:08:00
3,classic_bike,2024-01-29 16:26:17.000,2024-01-29 16:56:06.000,Wells St & Randolph St,Larrabee St & Webster Ave,member,0 days 00:29:49
4,classic_bike,2024-01-31 05:43:23.000,2024-01-31 06:09:35.000,Lincoln Ave & Waveland Ave,Kingsbury St & Kinzie St,member,0 days 00:26:12
...,...,...,...,...,...,...,...
178367,electric_bike,2024-12-11 08:23:46.564,2024-12-11 08:37:34.532,Clybourn Ave & Division St,,member,0 days 00:13:47.968000
178368,electric_bike,2024-12-09 12:26:15.677,2024-12-09 12:37:32.712,Canal St & Jackson Blvd,,member,0 days 00:11:17.035000
178369,electric_bike,2024-12-31 17:10:03.113,2024-12-31 17:17:21.838,Albany Ave & Bloomingdale Ave,California Ave & Milwaukee Ave,member,0 days 00:07:18.725000
178370,electric_bike,2024-12-01 14:39:47.216,2024-12-01 14:45:21.268,Albany Ave & Bloomingdale Ave,California Ave & Milwaukee Ave,member,0 days 00:05:34.052000
