## Analysis of Days of Week based on Fremont Bicycle Data
Treating crossings each day as features to learn about the relationships between various days

In [None]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import os
import sklearn
from sklearn.decomposition import PCA
from sklearn.mixture import GaussianMixture
import urllib.request

## Get Data
Use local data or download it via DOI link from zendoo repository. Adapt the headers and calculate a total column.

In [None]:
FILENAME = '../data/Fremont_Bridge_Hourly_Bicycle_Counts_by_Month_October_2012_to_present.csv'
URL = 'https://zenodo.org/record/2648564/files/Fremont_Bridge_Hourly_Bicycle_Counts_by_Month_October_2012_to_present.csv?download=1'

def get_fremont_data(filename=FILENAME, url=URL, force_download=False):
    if force_download or not os.path.exists(filename):
        with urllib.request.urlopen(url) as response, open(filename, 'wb') as out_file:
            data = response.read() # a `bytes` object
            out_file.write(data)
    data = pd.read_csv(filename, index_col='Date', parse_dates=True)
    data.columns = ['West', 'East']
    data['Total'] = data['West'] + data['East']
    return data


In [None]:
data = get_fremont_data()
data.head()

Plot weekly line graph to give a quick overview of the data

In [None]:
plt.style.use('seaborn')
data.resample('W').sum().plot()

Plot daily line graph to see yearly usage

In [None]:
ax = data.resample('D').sum().rolling(365).sum().plot()
ax.set_ylim(0, None)

Group data by time, calc mean and plot to inspect the bridge usage per time of day

In [None]:
data.groupby(data.index.time).mean().plot()

Pivot data and split data into date and time

In [None]:
pivoted = data.pivot_table('Total', index=data.index.time, columns=data.index.date)
pivoted.iloc[:5, :5]

In [None]:
pivoted.plot(legend=False, alpha=0.01)

## Principle Component Analysis
Use PCA to find patterns based on the usage per weekday

In [None]:
X = pivoted.fillna(0).T.values
X.shape

In [None]:
X2 = PCA(2, svd_solver='full').fit_transform(X)

In [None]:
X2.shape

In [None]:
plt.scatter(X2[:, 0], X2[:, 1])

## Unsupervised Clustering
Further split the data and assign labels

In [None]:
qmm = GaussianMixture(2)
qmm.fit(X)
labels = qmm.predict(X)
labels

In [None]:
plt.scatter(X2[:, 0], X2[:, 1], c=labels, cmap='rainbow')
plt.colorbar()

Show usage patterns of each cluster

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(14, 6))

pivoted.T[labels == 0].T.plot(legend=False, alpha=0.1, ax=ax[0])
pivoted.T[labels == 1].T.plot(legend=False, alpha=0.1, ax=ax[1])

ax[0].set_title('Purple Cluster')
ax[1].set_title('Red Cluster')

## Comparing with Day of Week
Assign colors according to weekdays to see if there is a clear separation of weekdays and weekends

In [None]:
dayofweek = pd.DatetimeIndex(pivoted.columns).dayofweek
plt.scatter(X2[:, 0], X2[:, 1], c=dayofweek, cmap='rainbow')
plt.colorbar()

## Analyzing Outliers
There is a separation in usage patterns between weekdays and weekends but with exceptions. The following points are weekdays with holiday-like pattern. One weekday is analyzed:

In [None]:
dates = pd.DatetimeIndex(pivoted.columns)
dates[(labels == 0) & (dayofweek < 5)]


2017-01-02: New Year

2017-01-16: Martin Luther King day: national holiday but not all employers implemented it; demonstration with thousands of people in Seattle [Thousands march](https://www.seattletimes.com/seattle-news/puget-sound/thousands-peacefully-march-rally-in-seattle-to-remember-civil-rights-leader-mlk-jr/)

2017-02-06 Thursday? [Snow Storm](https://www.seattletimes.com/seattle-news/weather/weather-service-predicts-3-to-6-inches-of-snow-in-seattle-area/)

2017-05-29: Memorial day

2017-07-04: Independence day

2017-09-04: Labor day

2017-11-23: Thanksgiving

2017-11-24: Black Friday (not a holiday, but shopping event)

2017-12-25: Christmas

2017-12-26: no holiday
