# BCycle all-data EDA 

This notebook explores the entire dataset provided from BCycle. This dataset has the following columns:

* `Membership Type`: Text categorical column. Some memberships were renamed during the course of BCycle's operation.
* `Bike`: Integer identifier for the bike used in each trip.
* `Checkout Date`: MM/DD/YY formatted date of the checkout
* `Checkout Time`: HH:MM AM/PM formatted time of the checkout.
* `Checkout Kiosk`: The kiosk where the bike trip started.
* `Return Kiosk`: The kiosk where the bike trip ended.
* `Duration (Minutes)`:  Integer length of bike trip (rounded?).


## Imports and data loading

Before getting started, let's import some useful libraries for visualization, and the bcycle utils library.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
plt.rc('xtick', labelsize=14) 
plt.rc('ytick', labelsize=14) 

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

## Loading and cleaning data

Let's load and clean the dataframe, updating variable types for later visualization.

In [13]:
trip_df = pd.read_csv('../input/all_trips.csv')


# Clean up column names
trip_df.columns = ['membership', 'bike_id', 'checkout_date', 'checkout_time', 'checkout_kiosk', 'return_kiosk', 'duration']

# Combine the date and time columns, use this as the index
trip_df['datetime'] = pd.to_datetime(trip_df['checkout_date'] + ' ' + trip_df['checkout_time'])
# trip_df = trip_df.sort_values('datetime')
trip_df = trip_df.set_index('datetime', drop=True)
trip_df = trip_df.drop(['checkout_date', 'checkout_time'], axis=1)
trip_df['membership'] = trip_df['membership'].astype('category')
trip_df['checkout_kiosk'] = trip_df['checkout_kiosk'].astype('category')
trip_df['return_kiosk'] = trip_df['return_kiosk'].astype('category')
assert 
trip_df['bike_id'] = trip_id['bike_id'].astype(np.int32)
print(trip_df.info())
trip_df.head()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 529810 entries, 2013-12-21 09:09:00 to 2016-10-31 11:35:21
Data columns (total 5 columns):
membership        529810 non-null object
bike_id           529810 non-null object
checkout_kiosk    529810 non-null category
return_kiosk      529810 non-null category
duration          529810 non-null int64
dtypes: category(2), int64(1), object(2)
memory usage: 17.2+ MB
None


Unnamed: 0_level_0,membership,bike_id,checkout_kiosk,return_kiosk,duration
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2013-12-21 09:09:00,Founding Member (Austin B-cycle),966,4th & Congress,Republic Square,5
2013-12-21 18:36:00,Annual Membership (Austin B-cycle),453,South Congress & Elizabeth,5th & Bowie,10
2013-12-21 18:04:00,Founding Member (Austin B-cycle),116,2nd & Congress,4th & Congress,5
2013-12-21 17:56:00,Founding Member (Austin B-cycle),971,5th & Bowie,2nd & Congress,7
2013-12-21 17:49:00,24-Hour Kiosk (Austin B-cycle),14,Barton Springs & Riverside,City Hall / Lavaca & 2nd,6


In [37]:
# trip_df['bike_id'].astype(np.int32)

# Find all rows where the `bike_id` is not numeric
# trip_df[trip_df['bike_id'] == 'Block01']

# Which rows contain non-numeric IDs ?
text_bikes_df = trip_df[trip_df['bike_id'].str.contains('\D')]
text_bikes_df.groupby('bike_id').size()

bike_id
198BB      683
Block01      7
Block02     15
Block03      4
Block04     10
Block05      6
dtype: int64

In [38]:
text_bikes_df

Unnamed: 0_level_0,membership,bike_id,checkout_kiosk,return_kiosk,duration
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2014-01-08 23:41:00,24-Hour Kiosk (Austin B-cycle),Block01,Convention Center / 4th St. @ MetroRail,Convention Center / 4th St. @ MetroRail,0
2014-03-06 11:46:00,24-Hour Kiosk (Austin B-cycle),198BB,Convention Center / 4th St. @ MetroRail,Red River & 8th Street,41
2014-03-07 17:43:00,24-Hour Kiosk (Austin B-cycle),198BB,Republic Square @ Guadalupe & 4th St.,Convention Center / 4th St. @ MetroRail,885
2014-03-07 16:21:00,Annual Membership (Austin B-cycle),198BB,Convention Center / 4th St. @ MetroRail,Republic Square @ Guadalupe & 4th St.,13
2014-03-07 16:06:00,24-Hour Kiosk (Austin B-cycle),Block01,Convention Center / 4th St. @ MetroRail,Convention Center / 4th St. @ MetroRail,0
2014-03-07 16:00:00,Annual Membership (Austin B-cycle),198BB,5th & Bowie,Convention Center / 4th St. @ MetroRail,12
2014-03-07 11:04:00,24-Hour Kiosk (Austin B-cycle),Block03,Plaza Saltillo,Plaza Saltillo,1
2014-03-07 13:16:00,24-Hour Kiosk (Austin B-cycle),198BB,Red River & 8th Street,5th & Bowie,11
2014-03-08 10:25:00,24-Hour Kiosk (Austin B-cycle),198BB,Convention Center / 4th St. @ MetroRail,Repair Shop,360
2014-03-09 18:25:00,24-Hour Kiosk (Austin B-cycle),198BB,City Hall / Lavaca & 2nd,Davis at Rainey Street,16
