# Assignment 3 

# Table of Content
## Overview
1. Where is 307?

## Data Exploration
1. People's Behavior in terms of Dwell Time 
2. Which areas of 307 do people pass through
3. Where do people tend to linger?
4. How does dwell time change over time?

## In-depth Analysis
1. How do different zones affect people's behavior?
2. How do events affect people's behavior?
3. What is the best maintenance strategy?
4. What are other factor affect people's bahavior?

# About 307

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg 

In [2]:
import plotly as py
import plotly.express as px
import plotly.graph_objs as go
from plotly.offline import iplot, init_notebook_mode
#init_notebook_mode(connected=True)

import cufflinks as cf
cf.go_offline(connected=True)
cf.set_config_file(colorscale='plotly', world_readable=True)

# Extra options
# pd.options.display.max_rows = 30
# pd.options.display.max_columns = 25

# Show all code cells outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

import os
from IPython.display import Image, display, HTML

In [3]:
import ipywidgets as widgets
from ipywidgets import interact, interact_manual

In [4]:
# store login data in login.py
%run login.py

In [5]:
# login query as multiline formatted string
# this assumes that login and pwd are defined 
# above

loginquery = f"""
mutation {{
  logIn(
      email:\"{login}\",
      password:\"{pwd}\") {{
    jwt {{
      token
      exp
    }}
  }}
}}
"""

In [6]:
import requests
url = 'https://api.numina.co/graphql'

mylogin = requests.post(url, json={'query': loginquery})
# mylogin

In [7]:
token = mylogin.json()['data']['logIn']['jwt']['token']

In [8]:
expdate = mylogin.json()
# expdate

# Explore the Data!

Now that you've been provided with the context, before we present our analysis, it's time for YOU to explore the data! As mentioned, the following are the full areas covered by the three cameras:

Streetscape | Under Raincoat | Outside
------------- | -------------  | -------------
![alt](streetscape_sandbox.png) | ![alt](underraincoat_sandbox.png) | ![alt](outside_sandbox.png)

As you see in the above images, each area essentially consists of two parts: objects such as tables and chairs, and empty spaces presumably for walking. Based on this reasoning, we have defined the following smaller behaviour zones so as to perform more in-depth research:

### Streetscape ###

Chair Zone | Corridor Zone | Free Zone
------------- | -------------  | -------------
![alt](BehaviorZoneImage/Streetscape-ChairZone.png) | ![alt](BehaviorZoneImage/Streetscape-PathZone.png) | ![alt](BehaviorZoneImage/Streetscape-ActivityZone.png)

### Under Raincoat ###

Chair Zone | Traffic Zone | Free Zone
------------- | -------------  | -------------
![alt](BehaviorZoneImage/UnderRaincoat-ChairZone.png) | ![alt](BehaviorZoneImage/UnderRaincoat-TrafficZone.png) | ![alt](BehaviorZoneImage/UnderRaincoat-ActivityZone.png)

### Outside ###

Chair Zone | Path Zone | -
------------- | -------------  | -------------
![alt](BehaviorZoneImage/Outside-ChairZone.png) | ![alt](BehaviorZoneImage/Outside-PathZone.png) | ![alt](blank.png)

Note that we have to be aware of the fact that the chairs can be moved and that the above images may not necessarily reflect the layout of the room during the whole period of data collection. Specifically, the three sets of chairs in the Under Raincoat area can be easily moved; thus in the initial exploration, we will not be investigating the Chair Zone of Under Raincoat. 

Nonetheless, notice that they are included in the Free Zone. We believe that it is safe to assume that the chairs would not be moved outside the Free Zone to the Traffic Zone.

Similarly, in the Streetscape area, under the assumption that it is intended to place the chairs together, it is unlikely that the group of chairs would be moved around freely and frequently due to the other obstacles in the room. As for the Outside area, it is also unlikely that the chairs would be placed in the middle of the road to block the path. Thus, we will be analyzing these two Chair Zones (while keeping the limitation in mind).

In [9]:
device_dict = {'SWLSANDBOX1':'Streetscape', 'SWLSANDBOX2':'Under Raincoat', 'SWLSANDBOX3':'Outside'}
device_ids = list(device_dict.keys())
device_names = list(device_dict.values())

# streetscape, under raincoat, outside
device_clrs = ['royalblue', 'firebrick', 'forestgreen']

In [10]:
def get_zones(device_id):
    
    query_zones = """
    query {{
      behaviorZones (
        serialnos: "{0}"
        ) {{
        count
        edges {{
          node {{
            rawId
            text
          }}
        }}
      }}
    }}
    """.format(device_id)
    
    zones = requests.post(url, json={'query': query_zones}, headers = {'Authorization':token})
    
    df = pd.DataFrame([x['node'] for x in zones.json()['data']['behaviorZones']['edges']])
    df['device_id'] = device_id
    
    return df

In [11]:
zones_df = pd.concat([get_zones(device_ids[i]) for i in range(3)])
zones_df = zones_df[(zones_df.text.notnull()) & 
                    (zones_df.text.str.startswith('x-')) & 
                    (zones_df.text.str.endswith('zone'))]

In [12]:
zones_df['text'] = zones_df['text'].str.replace('x-', '')
zones_df['type'] = ['path', 'rest', 'both', 'path', 'both', 'rest', 'path']

# zone ID from int to str
zones_df.rawId = zones_df.rawId.astype(str)
zone_name_dict = dict(zip(zones_df.rawId, zones_df.text))
zone_type_dict = dict(zip(zones_df.rawId, zones_df.type))

In [13]:
def get_dwell(func, ID, interval):
    '''
    func is either feedDwellTimeDistribution or zoneDwellTimeDistribution
    '''
    if func == 'feedDwellTimeDistribution':
        arg = 'serialnos: "{0}"'.format(ID)
    else:
        arg = 'zoneIds: {0}'.format(ID)
        
    query = """
    query {{
        {0}(
        {1},
        startTime: "2019-02-20T00:00:00",
        endTime: "2020-01-12T00:00:00",
        timezone: "America/New_York",
        objClasses: ["pedestrian"],
        interval: "{2}"
        ){{
        edges {{
          node {{
            time
            objClass
            pct100
            pct75
            pct50
            pct25
            mean
            count
          }}
        }}
      }}
    }}
    """.format(func, arg, interval)

    dwell = requests.post(url, json={'query': query}, 
                           headers = {'Authorization':token})
    
    df = pd.DataFrame([x['node'] for x in dwell.json()['data'][func]['edges']])
    if func == 'feedDwellTimeDistribution':
        df['device_id'] = ID
    else:
        df['zone_id'] = ID
    
    return df

In [14]:
def preprocess(df):
    # replace NaN with 0
    df = df.fillna(0)
    # convert time
    df['time'] = df['time'].str[:-6].apply(lambda x : pd.Timestamp(x))
    df['month'] = df['time'].dt.month
    df['dayofweek'] = df['time'].dt.dayofweek
    df['hour'] = df['time'].dt.hour
    df['date'] = df['time'].dt.date
    
    # add either zone or device name
    if 'zone_id' in df.columns:
        df.zone_id = df.zone_id.astype(str)
        df['zone'] = [zone_name_dict[z] for z in df.zone_id]
        df['zone_type'] = [zone_type_dict[z] for z in df.zone_id]
    else:
        df['device'] = [device_dict[d] for d in df.device_id]
    
    # add a total column = mean * count
    df['total_dwell'] = df['mean'] * df['count']
    df = df.rename(columns={'mean':'mean_dwell', 'pct50':'median_dwell', 'pct100':'max_dwell'})
    df = df.drop(['pct75', 'pct25'], axis=1)
    
    return df

In [15]:
# hourly dwell time 
# device
feed_dwell_1h_df = pd.concat([get_dwell('feedDwellTimeDistribution', device_ids[i], '1h') 
                              for i in range(3)])
# zone
zone_dwell_1h_df = pd.concat([get_dwell('zoneDwellTimeDistribution', z, '1h')
                             for z in zones_df['rawId'].values])

feed_dwell_1h_df = preprocess(feed_dwell_1h_df)
zone_dwell_1h_df = preprocess(zone_dwell_1h_df)

In [16]:
# daily dwell time 
# device
feed_dwell_1d_df = pd.concat([get_dwell('feedDwellTimeDistribution', device_ids[i], '1d') 
                              for i in range(3)])
# zone
zone_dwell_1d_df = pd.concat([get_dwell('zoneDwellTimeDistribution', z, '1d')
                             for z in zones_df['rawId'].values])

feed_dwell_1d_df = preprocess(feed_dwell_1d_df)
zone_dwell_1d_df = preprocess(zone_dwell_1d_df)

In [17]:
# assign a colour to each behaviour zone
#zones_df['colour'] = ['blue', 'lightblue', 'cadetblue',
#                      'orangered', 'lightcoral', 
#                      'palegreen', 'lightgreen']

In [18]:
def get_df(groupby, interval):
    if groupby == 'device' and interval == '1d':
        return feed_dwell_1d_df.copy(), device_names
    elif groupby == 'zone' and interval == '1d':
        return zone_dwell_1d_df.copy(), list(zones_df.text)
    elif groupby == 'device' and interval == '1h':
        return feed_dwell_1h_df.copy(), device_names
    elif groupby == 'zone' and interval == '1h':
        return zone_dwell_1h_df.copy(), list(zones_df.text)

Recall that the timeframe of our data is approximately one year. Therefore, in the initial exploration, let's focus on the daily dwell time and daily count of pedestrains in the 307 region. 

As a starting point, explore the data using the following interactive line plot and think about these questions:
1. Is there any trend in pedesdrian count / dwell time in any of the areas / zones?
2. Where would you expect to see a bigger crowd? Is any of the areas / zones more popular than others?

Tip: You can click the legend on the right to include/exclude a line on the plot.

In [19]:
metric_list = ['count', 'mean_dwell', 'max_dwell', 'median_dwell', 'total_dwell']

In [20]:
def plot_byhour(groupby, metric, quantiles):
    # byhour
    df, byvals = get_df(groupby, '1h')
    
    # filter based on quantiles
    df = df[(df[metric] <= df[metric].quantile(quantiles[1])) &
            (df[metric] >= df[metric].quantile(quantiles[0]))].sort_values('hour')
    
    df['-'] = '-'
    # plot differently based on whethere device or zone
    title = 'change in '+metric+' over different times of the day, with point size being count'
    # device
    if groupby=='device':
        fig = px.scatter(df, x='-', y=metric, color='device', facet_col='device', size='count',
                     animation_frame='hour', opacity=0.5, size_max=30,
                     title=title)
    # zone
    else:
        df['device'] = df['zone'].apply(lambda x : x.split('-')[0])
        fig = px.scatter(df, y=metric, x='device', color='device', facet_col='zone_type', size='count',
                     animation_frame='hour', opacity=0.5, size_max=30,
                     title=title)
        
    # labels
    #fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[1]))
    fig.layout.update(showlegend=True)
    fig.update_xaxes(showticklabels=False)
    fig.update_xaxes(title='')
    fig.show()

In [21]:
_ = interact(plot_byhour, groupby=widgets.RadioButtons(options=['device', 'zone'], value='zone'),
             metric=widgets.Dropdown(options=metric_list[1:], value=metric_list[1]),
             quantiles=widgets.FloatRangeSlider(value=[0, 0.98], min=0, max=1, step=0.01, continuous_update=False))

interactive(children=(RadioButtons(description='groupby', index=1, options=('device', 'zone'), value='zone'), …

In [22]:
px.scatter_matrix(feed_dwell_1d_df, ['count', 'mean_dwell', 'dayofweek', 'month'], color='device', opacity=0.5)

In [57]:
(df.count_prop * df['count']).astype(int)

0      3271
1       157
2        82
3         7
4         5
       ... 
321     597
322      20
323      16
324      27
325      16
Length: 978, dtype: int32

In [68]:
feed_dwell_1d_df.groupby('device').mean()

Unnamed: 0_level_0,count,mean_dwell,max_dwell,median_dwell,month,dayofweek,hour,total_dwell
device,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Outside,311.496933,11.430675,1012.086902,3.77411,7.128834,3.006135,0.0,5175.715337
Streetscape,1161.288344,22.30589,1623.051012,6.326963,7.128834,3.006135,0.0,26999.74911
Under Raincoat,129.0,13.659264,344.390491,3.840706,7.128834,3.006135,0.0,2803.380859


In [81]:
def plot_prop(groupby, metric, lower_bound):
    df, _ = get_df(groupby, '1d')
    df['date'] = df.time.dt.date
    df['total_prop'] = df.apply(lambda x : 
                                x['total_dwell'] / (0.00001 + sum(df.loc[df['date'] == x['date'], 'total_dwell'])), 
                                axis=1)
    df['count_prop'] = df.apply(lambda x : 
                                x['count'] / (0.00001 + sum(df.loc[df['date'] == x['date'], 'count'])), 
                                axis=1)
    fig = px.bar(df[(df['count']/df['count_prop'] > lower_bound)], 
                 x='time', y='count_prop', color='device', range_y=(0, 1))
    fig.show()

In [82]:
_ = interact(plot_prop, groupby=widgets.RadioButtons(options=['device', 'zone'], value='device'),
             metric=widgets.RadioButtons(options=['count', 'total_dwell'], value='count'),
             lower_bound=widgets.IntSlider(value=100, min=0, max=2000, step=50, continuous_update=False))

interactive(children=(RadioButtons(description='groupby', options=('device', 'zone'), value='device'), RadioBu…

In [92]:
df.groupby(['dayofweek', 'device']).mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,count,mean_dwell,max_dwell,median_dwell,month,hour,total_dwell,total_prop,count_prop
dayofweek,device,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,Outside,287.434783,11.125217,623.101739,3.99,7.282609,0.0,4383.307826,0.19359,0.236951
0,Streetscape,881.695652,22.281522,1392.813696,6.497826,7.282609,0.0,19729.177391,0.655057,0.604631
0,Under Raincoat,126.413043,12.944565,236.267826,4.321957,7.282609,0.0,2620.628043,0.107875,0.11494
1,Outside,410.043478,15.123261,1733.53087,4.036739,7.304348,0.0,7191.876304,0.247261,0.309622
1,Streetscape,1166.934783,21.604565,1395.947609,6.630217,7.304348,0.0,29482.998696,0.606661,0.559979
1,Under Raincoat,130.086957,15.036304,474.341087,3.5,7.304348,0.0,2903.969348,0.1026,0.086921
2,Outside,270.361702,10.968085,703.95234,3.747021,6.978723,0.0,4740.570213,0.159923,0.221972
2,Streetscape,1301.404255,21.255532,1453.441277,6.675532,6.978723,0.0,29527.366809,0.670521,0.618594
2,Under Raincoat,139.638298,15.892553,496.183617,3.881489,6.978723,0.0,3524.41234,0.105726,0.095604
3,Outside,407.680851,11.672553,1850.332553,3.641489,7.0,0.0,7628.371915,0.163893,0.227865


In [99]:
def compute_prop(df):
    df = df.copy()
    df['total_prop'] = df.apply(lambda x : 
                                x['total_dwell'] / (0.00001 + sum(df.loc[df['time'] == x['time'], 'total_dwell'])), 
                                axis=1
    df['count_prop'] = df.apply(lambda x : 
                                x['count'] / (0.00001 + sum(df.loc[df['time'] == x['time'], 'count'])), 
                                axis=1)
    return df

In [125]:
a = feed_dwell_1h_df.groupby('time').mean()
b = list(a[a['count'] != 0].index)
c = fd_1h_prop_hour[fd_1h_prop_hour['time'].isin(b)]

In [127]:
c

Unnamed: 0,time,device,count,mean_dwell,max_dwell,median_dwell,month,dayofweek,hour,total_dwell,total_prop,count_prop
27,2019-02-20 09:00:00,Outside,0,0.00,0.00,0.00,2,2,9,0.00,0.000000,0.000000
28,2019-02-20 09:00:00,Streetscape,18,21.80,72.48,7.23,2,2,9,392.40,1.000000,0.999999
29,2019-02-20 09:00:00,Under Raincoat,0,0.00,0.00,0.00,2,2,9,0.00,0.000000,0.000000
30,2019-02-20 10:00:00,Outside,0,0.00,0.00,0.00,2,2,10,0.00,0.000000,0.000000
31,2019-02-20 10:00:00,Streetscape,200,8.38,153.06,4.19,2,2,10,1676.00,1.000000,1.000000
...,...,...,...,...,...,...,...,...,...,...,...,...
23452,2020-01-11 17:00:00,Streetscape,0,0.00,0.00,0.00,1,5,17,0.00,0.000000,0.000000
23453,2020-01-11 17:00:00,Under Raincoat,0,0.00,0.00,0.00,1,5,17,0.00,0.000000,0.000000
23454,2020-01-11 18:00:00,Outside,1,3.69,3.69,3.69,1,5,18,3.69,0.999997,0.999990
23455,2020-01-11 18:00:00,Streetscape,0,0.00,0.00,0.00,1,5,18,0.00,0.000000,0.000000


In [106]:
feed_dwell_1d_prop = compute_prop(feed_dwell_1d_df)

In [107]:
fd_1d_prop_week = feed_dwell_1d_prop.groupby(['dayofweek', 'device']).mean().reset_index()
px.bar(fd_1d_prop_week[fd_1d_prop_week['count']!=0], x='dayofweek', y='count_prop', color='device', barmode='group')

In [100]:
feed_dwell_1h_prop = compute_prop(feed_dwell_1h_df)

In [104]:
feed_dwell_1h_prop

Unnamed: 0,count,mean_dwell,objClass,max_dwell,median_dwell,time,device_id,month,dayofweek,hour,device,total_dwell,date,total_prop,count_prop
0,0,0.0,pedestrian,0.0,0.0,2019-02-20 00:00:00,SWLSANDBOX1,2,2,0,Streetscape,0.0,2019-02-20,0.0,0.0
1,0,0.0,pedestrian,0.0,0.0,2019-02-20 01:00:00,SWLSANDBOX1,2,2,1,Streetscape,0.0,2019-02-20,0.0,0.0
2,0,0.0,pedestrian,0.0,0.0,2019-02-20 02:00:00,SWLSANDBOX1,2,2,2,Streetscape,0.0,2019-02-20,0.0,0.0
3,0,0.0,pedestrian,0.0,0.0,2019-02-20 03:00:00,SWLSANDBOX1,2,2,3,Streetscape,0.0,2019-02-20,0.0,0.0
4,0,0.0,pedestrian,0.0,0.0,2019-02-20 04:00:00,SWLSANDBOX1,2,2,4,Streetscape,0.0,2019-02-20,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7819,0,0.0,pedestrian,0.0,0.0,2020-01-11 19:00:00,SWLSANDBOX3,1,5,19,Outside,0.0,2020-01-11,0.0,0.0
7820,0,0.0,pedestrian,0.0,0.0,2020-01-11 20:00:00,SWLSANDBOX3,1,5,20,Outside,0.0,2020-01-11,0.0,0.0
7821,0,0.0,pedestrian,0.0,0.0,2020-01-11 21:00:00,SWLSANDBOX3,1,5,21,Outside,0.0,2020-01-11,0.0,0.0
7822,0,0.0,pedestrian,0.0,0.0,2020-01-11 22:00:00,SWLSANDBOX3,1,5,22,Outside,0.0,2020-01-11,0.0,0.0


In [114]:
fd_1h_prop_hour = feed_dwell_1h_prop.groupby(['time', 'device']).mean().reset_index()
px.bar(fd_1h_prop_hour[fd_1h_prop_hour['count_prop']!=0], x='hour', y='count_prop', color='device')

In [138]:
def plot_timeline(groupby, metric):
    '''
    groupby is either 'device' or 'zone';
    metric is a value in metric_list
    '''
    df, byvals = get_df(groupby, '1d')
    
    fig = go.Figure()
    
    # line plot for each name
    for i in range(len(byvals)):
        sub_df = df[df[groupby] == byvals[i]]
        fig.add_trace(go.Scatter(x=sub_df.time, y=sub_df[metric], name=byvals[i]))
    
    # layout - axes labels
    fig.update_layout(
        xaxis_title="time",
        yaxis_title=metric,
        xaxis_rangeslider_visible=True
    )
    # title
    if metric != 'count':
        fig.update_layout(title=f"pedestrian dwell time ({metric}) grouped by '{groupby}'")
    else:
        fig.update_layout(title=f"pedestrian count grouped by '{groupby}'")
    
    fig.show()
    

In [139]:
_ = interact(plot_timeline, 
             groupby=widgets.RadioButtons(options=['device', 'zone'], value='device'),
             metric=widgets.Dropdown(options=metric_list, value='mean_dwell')
            )

interactive(children=(RadioButtons(description='groupby', options=('device', 'zone'), value='device'), Dropdow…

Not too surprisingly, we observe a few peak days. The following interactive dataframe summarizes the exact locations and dates:

In [143]:
def sort_dwell_1d(groupby, sortby, ascending, top):
    df, _ = get_df(groupby, '1d')
    
    cols = [groupby, 'time', sortby]
    if sortby == 'count':
        cols.append('mean_dwell')
    elif sortby == 'mean_dwell':
        cols.append('count')
    else:
        cols.append('count')
        cols.append('mean_dwell')
        
    display(df.sort_values(sortby, ascending=ascending).reset_index(drop=True)
              .loc[:int(top)-1, cols])

_ = interact(sort_dwell_1d, 
             groupby=widgets.RadioButtons(options=['device', 'zone'], value='device'),
             sortby=widgets.Dropdown(options=metric_list, value='mean_dwell'),
             top=widgets.IntSlider(value=5, min=1, max=30, step=1, readout_format='d'),
             ascending=widgets.Checkbox(value=False, description='ascending'))

interactive(children=(RadioButtons(description='groupby', options=('device', 'zone'), value='device'), Dropdow…

In [144]:
def plot_boxplot(groupby, metric):
    fig = go.Figure()
    
    df, byvals = get_df(groupby, '1d')
    
    for i in range(len(byvals)):
        # Use x instead of y argument for horizontal plot
        fig.add_trace(go.Box(x=df.loc[df[groupby]==byvals[i], metric], name=byvals[i],
                             boxpoints='outliers'))

    # layout - axes labels
    fig.update_layout(
        xaxis_title=metric,
        xaxis_rangeslider_visible=True
    )
    # title
    if metric != 'count':
        fig.update_layout(title=f"distribution of pedestrian dwell time ({metric}) grouped by '{groupby}'")
    else:
        fig.update_layout(title=f"distribution of pedestrian count grouped by '{groupby}'")
    
    fig.show()
    

In [145]:
_ = interact(plot_boxplot, 
             groupby=widgets.RadioButtons(options=['device', 'zone'], value='device'),
             metric=widgets.Dropdown(options=metric_list, value='mean_dwell')
            )

interactive(children=(RadioButtons(description='groupby', options=('device', 'zone'), value='device'), Dropdow…

In [30]:
# feed_dwell_1d_df.groupby('device')['count'].describe()

In [None]:
# groupby zone_name / device_name and take the sum for the other columns
# should only investigate the count and total columns

grouped_df = zone_dwell_1d_df.groupby('zone').sum().reset_index(drop=False)\
                             .rename(columns={'zone':'name'})
grouped_df = grouped_df.append(feed_dwell_1d_df.groupby('device').sum().reset_index(drop=False)
                               .rename(columns={'device':'name'}))

In [146]:
from plotly.subplots import make_subplots

def plot_barplot(metric):
    '''
    metric is either 'count' or 'total_dwell'
    '''
    fig = make_subplots(rows=1, cols=3)
    
    df = grouped_df.copy()
    m = metric.split(' ')[0]
    
    for i in range(3):
        dname = device_names[i]
        total = df.loc[df.name==dname, m]
        sub_df = df[[n[1:5]==dname[1:5] for n in df.name]]
        sub_df.name = [s[-1] for s in sub_df.name.str.split('-')]
        sub_df['perc'] = sub_df[m].apply(lambda x : x / total * 100)
        
        fig.add_bar(x=sub_df.name, y=sub_df[m], name=dname, row=1, col=i+1)
        
    #fig.update_yaxes(ticksuffix="%", col=1)
    layout = go.Layout(yaxis=dict(range=[0, 100]))
    
    fig.update_layout(title=f"proportion of individual behaviour zones w.r.t. the area in terms of {metric} of pedestrians")

    fig.show()

In [150]:
# _ = interact(plot_barplot, metric=widgets.RadioButtons(options=['count', 'total_dwell'], value='count'))

In [34]:
'''
def boxplot_dwell(groupby, column, bound_factor):
    df, _, _ = get_df(groupby)
    
    q3 = df[column].quantile(0.75) 
    q1 = df[column].quantile(0.25)
    iqr = q3 - q1
    sub_df = df[(df[column] <= q3 + iqr*bound_factor) & 
                  ((df[column] >= q1 - iqr*bound_factor))]
    
    if column == 'count':
        title = f"distribution of count grouped by '{groupby}'" +\
        f" with values {bound_factor} * IQR beyond Q1/Q3 removed"
    else:
        title = f"distribution of mean dwell time grouped by '{groupby}'" +\
        f" with values {bound_factor} * IQR beyond Q1/Q3 removed"
    
    fig = px.box(sub_df, x=groupby, y=column, points="all", title=title)

    fig.show()
''';

In [35]:
'''
_ = interact(boxplot_dwell, 
             groupby=widgets.RadioButtons(options=['device_name', 'zone_name']), value='device_name',
             column=widgets.RadioButtons(options=['count', 'mean'], value='count'),
             bound_factor=widgets.FloatSlider(
                 value=1.5,
                 min=-3,
                 max=10,
                 step=0.1,
                 disabled=False,
                 continuous_update=False,
                 orientation='horizontal',
                 readout=True,
                 readout_format='.1f')
            )
''';

### Obtain heatmap for pedestrians

In [36]:
from datetime import timedelta, datetime
from dateutil.relativedelta import relativedelta
import calendar
START_DATE = datetime(2019, 2, 20, 0, 0, 0)
END_DATE = datetime(2019, 3, 20, 0, 0, 0)
time_delta = relativedelta(days = +1)

In [37]:
import pandas as pd
heatmap_df = pd.DataFrame(columns = ['startTime', 'endTime', 'heatMap'])

In [38]:
def heatmap_query_gen(startTime: str, endTime: str):
    heatmap_query = """
query {{
  feedHeatmaps(
    serialno: "SWLSANDBOX1",
    startTime:"{0}",
    endTime:"{1}",
    objClasses:["pedestrian"],
    timezone:"America/New_York") {{
    edges {{
      node {{
        time
        objClass
        heatmap
      }}
    }}
  }}
}}
""".format(startTime, endTime)
    return heatmap_query

In [39]:
current_date = START_DATE
while current_date < END_DATE:
    start_time_str = current_date.strftime('%Y-%m-%dT%H:%M:%S')
    end_time = current_date + time_delta
    end_time_str = end_time.strftime('%Y-%m-%dT%H:%M:%S')
    heatmap_data = requests.post(url, json={'query': heatmap_query_gen(start_time_str, end_time_str)}, 
                         headers = {'Authorization':token})
    heatmap_json = heatmap_data.json()
    if heatmap_json['data']:
        if 'feedHeatmaps' in heatmap_json['data']:
            heatmap = heatmap_json['data']['feedHeatmaps']['edges'][0]['node']['heatmap']
            temp_df = pd.DataFrame({"startTime":current_date, "endTime":end_time, 'heatMap':heatmap})
            heatmap_df = heatmap_df.append(temp_df, ignore_index = True)
    current_date = current_date + time_delta

In [40]:
ed_heatmap_df = heatmap_df.groupby(['startTime', 'endTime'])['heatMap'].apply(list).reset_index(name='heatMapMatrix')

In [41]:
from IPython.display import display
def plot_heatmap(start_time):
    map_img = mpimg.imread('streetscape_sandbox.png')
    matrix = list(ed_heatmap_df[ed_heatmap_df['startTime'] == start_time]['heatMapMatrix'])[0]
    x = [i[0] for i in matrix] 
    y = [i[1] for i in matrix]
    z = [i[2] for i in matrix]
    fig, ax = plt.subplots(figsize=(15,10))
    ax.scatter(x, y, c=z, s=10, cmap=plt.cm.Wistia) # Other color maps: plt.cm.cmap_d.keys())
    ax.imshow(map_img, aspect='auto')
    plt.axis('off')
    plt.title("Heatmap for date {0}".format(start_time, fontsize=20))
    plt.show()
interact(plot_heatmap, start_time=widgets.DatePicker(value = pd.to_datetime('2019-02-26'), description='Pick a Date'))

interactive(children=(DatePicker(value=Timestamp('2019-02-26 00:00:00'), description='Pick a Date'), Output())…

<function __main__.plot_heatmap(start_time)>

## Event vs Non Event Days

### Subsection: Pedestrian Count

In this section, we will be exploring how poeple's behaviour differ when there is an event and when there is no event occuring. We have obtained the Sidewalk Labs' event schedule from the [website](https://www.sidewalktoronto.ca/participate/). I have recorded all the events between Febuary 20th, 2019 and January 11th, 2020.

In [42]:
event_dates = pd.read_csv('EventDates.csv')

I also have obtained the pedestrian count data. We will first explore how the pedestrian count changes in different days. 

In [43]:
outside_count_df = pd.read_csv('OverviewForOutsideCount.csv')
streetscape_count_df = pd.read_csv('OverviewForStreetScapeCount.csv')
under_rain_coat_count_df = pd.read_csv('OverviewForUnderRainCoatCount.csv')

FileNotFoundError: [Errno 2] File b'OverviewForOutsideCount.csv' does not exist: b'OverviewForOutsideCount.csv'

In [None]:
outside_count_df.time = outside_count_df.time.str[:-6]
streetscape_count_df.time = streetscape_count_df.time.str[:-6]
under_rain_coat_count_df.time = under_rain_coat_count_df.time.str[:-6]

In [None]:
from datetime import datetime as dt
outside_count_df.time = outside_count_df.apply(lambda x: dt.strptime(x.time, '%Y-%m-%dT%H:%M:%S'), axis = 1)
streetscape_count_df.time = streetscape_count_df.apply(lambda x: dt.strptime(x.time, '%Y-%m-%dT%H:%M:%S'), axis = 1)
under_rain_coat_count_df.time = under_rain_coat_count_df.apply(lambda x: 
                                                               dt.strptime(x.time, '%Y-%m-%dT%H:%M:%S'), axis = 1)

In [None]:
outside_count_by_day = outside_count_df.resample('d', on='time')['pedestrians'].agg(np.sum)
streetscape_count_by_day = streetscape_count_df.resample('d', on='time')['pedestrians'].agg(np.sum)
under_rain_coat_count_df_by_day = under_rain_coat_count_df.resample('d', on='time')['pedestrians'].agg(np.sum)

In [None]:
fig = go.Figure()
fig = fig.add_trace(go.Scatter(x=outside_count_by_day.index, y=outside_count_by_day.values, 
                         name="Outside",
                         line_color='royalblue'))

fig = fig.add_trace(go.Scatter(x=streetscape_count_by_day.index, y=streetscape_count_by_day.values, 
                         name="Street Scape",
                         line_color='dimgray'))

fig = fig.add_trace(go.Scatter(x=under_rain_coat_count_df_by_day.index, y=under_rain_coat_count_df_by_day.values, 
                         name="Under Rain Coat",
                         line_color='firebrick'))

fig = fig.update_layout(title_text='Pedestrians Count By Day',
                  xaxis_rangeslider_visible=True)
fig.show()

In [None]:
sum_ped_count_by_day = outside_count_by_day + streetscape_count_by_day + under_rain_coat_count_df_by_day

In [None]:
sum_ped_count_by_day

In [None]:
# fig = go.Figure(boxpoints='all')
# fig.add_trace(go.Box(x=sum_ped_count_by_day.values))

From the time series line plot above, we notced that there are several days that have significantly higher pedestrian count than other days. We will examine this further in the hour granular level.

In [None]:
outside_count_by_hour = outside_count_df.resample('H', on='time')['pedestrians'].agg(np.sum)
streetscape_count_by_hour = streetscape_count_df.resample('H', on='time')['pedestrians'].agg(np.sum)
rain_coat_count_by_hour = under_rain_coat_count_df.resample('H', on='time')['pedestrians'].agg(np.sum)

In [None]:
fig = go.Figure()
fig = fig.add_trace(go.Scatter(x=outside_count_by_hour.index, y=outside_count_by_hour.values, 
                         name="Outside",
                         line_color='royalblue'))

fig = fig.add_trace(go.Scatter(x=streetscape_count_by_hour.index, y=streetscape_count_by_hour.values, 
                         name="Street Scape",
                         line_color='dimgray'))

fig = fig.add_trace(go.Scatter(x=rain_coat_count_by_hour.index, y=rain_coat_count_by_hour.values, 
                         name="Under Rain Coat",
                         line_color='firebrick'))

fig = fig.update_layout(title_text='Pedestrians Count By Hour',
                  xaxis_rangeslider_visible=True)
fig.show()

In [None]:
def plot_pedestrian_count_event(event):
    '''
    Display time series pedestrian count of the event specified
    '''
    event_info = event_dates[event_dates.Event == event]
    start = dt.strptime(event_info['Starting Time'].values[0], '%Y-%m-%dT%H:%M:%S')
    end = dt.strptime(event_info['Ending Time'].values[0], '%Y-%m-%dT%H:%M:%S')
    
    outside = outside_count_by_hour[(outside_count_by_hour.index >= start) & (outside_count_by_hour.index <= end)]
    streetscape = streetscape_count_by_hour[(streetscape_count_by_hour.index >= start) & \
                                            (streetscape_count_by_hour.index <= end)]
    rain_coat = rain_coat_count_by_hour[(rain_coat_count_by_hour.index >= start) & \
                                        (rain_coat_count_by_hour.index <= end)]
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=outside.index, y=outside.values, 
                         name="Outside",
                         line_color='royalblue'))

    fig.add_trace(go.Scatter(x=streetscape.index, y=streetscape.values, 
                         name="Street Scape",
                         line_color='dimgray'))

    fig.add_trace(go.Scatter(x=rain_coat.index, y=rain_coat.values, 
                         name="Under Rain Coat",
                         line_color='firebrick'))
    
    fig.add_trace(go.Scatter(x=rain_coat.index, y=outside.values+streetscape.values+rain_coat.values, 
                         name="Sum",
                         line_color='gold'))

    fig.update_layout(title_text=f'Pedestrians Count By Event: {event}',
                  xaxis_rangeslider_visible=True)
    fig.show()

In [None]:
_ = interact(plot_pedestrian_count_event, 
             event=widgets.Dropdown(options=event_dates.Event.tolist(), value='Sidewalk Summer Open House'),
            )

### Subsection: Dwell Time

In [None]:
# daily dwell time - device
feed_dwell_1h_df = pd.concat([get_dwell('feedDwellTimeDistribution', device_ids[i], '1h') 
                              for i in range(3)])

In [None]:
feed_dwell_1h_df = feed_dwell_1h_df.dropna()

In [None]:
feed_dwell_1h_df.time = feed_dwell_1h_df.time.str[:-6]

In [None]:
feed_dwell_1h_df['device_name'] = [device_dict[d] for d in feed_dwell_1h_df.device]

In [None]:
feed_dwell_1h_df.time = feed_dwell_1h_df.apply(lambda x: dt.strptime(x.time, '%Y-%m-%dT%H:%M:%S'), axis = 1)

In [None]:
def plot_dwell_time_event(event, metric):
    '''
    Display time series of the matrix of the event specified
    '''
    event_info = event_dates[event_dates.Event == event]
    start = dt.strptime(event_info['Starting Time'].values[0], '%Y-%m-%dT%H:%M:%S')
    end = dt.strptime(event_info['Ending Time'].values[0], '%Y-%m-%dT%H:%M:%S')

    dwell_time_df = feed_dwell_1h_df[(feed_dwell_1h_df.time >= start) & (feed_dwell_1h_df.time <= end)]
    
    outside = dwell_time_df[dwell_time_df.device_name == 'Outside']
    streetscape = dwell_time_df[dwell_time_df.device_name == 'Streetscape']
    rain_coat = dwell_time_df[dwell_time_df.device_name == 'Under Raincoat']
    
    fig = go.Figure()
    
    fig.add_trace(go.Scatter(x=outside.time, y=outside[metric], 
                         name="Outside",
                         line_color='royalblue'))

    fig.add_trace(go.Scatter(x=streetscape.time, y=streetscape[metric], 
                         name="Street Scape",
                         line_color='dimgray'))

    fig.add_trace(go.Scatter(x=rain_coat.time, y=rain_coat[metric], 
                         name="Under Rain Coat",
                         line_color='firebrick'))
    
    fig.update_layout(title_text=f"Distribution of Pedestrian dwell time ({metric}) By Event: {event}",
                  xaxis_rangeslider_visible=True)
    
    fig.update_layout(
        xaxis_title="time",
        yaxis_title=metric,
    )
    
    fig.show()
    
    

In [None]:
_ = interact(plot_dwell_time_event, 
             event=widgets.Dropdown(options=event_dates.Event.tolist(), value='Sidewalk Summer Open House'),
             metric=widgets.Dropdown(options=['mean', 'pct100', 'pct75', 'pct50', 'pct25', 'total'], 
                                     value='mean')
            )

## Maintenance Strategy

In [44]:
# need hourly data so writing the query again; can combine with the previous one later
def get_dwell_by_hour(func, ID):
    '''
    func is either feedDwellTimeDistribution or zoneDwellTimeDistribution
    '''
    if func == 'feedDwellTimeDistribution':
        arg = 'serialnos: "{0}"'.format(ID)
    else:
        arg = 'zoneIds: {0}'.format(ID)
        
    query = """
    query {{
        {0}(
        {1},
        startTime: "2019-02-20T00:00:00",
        endTime: "2020-01-12T00:00:00",
        timezone: "America/New_York",
        objClasses: ["pedestrian"],
        interval: "1h"
        ){{
        edges {{
          node {{
            time
            objClass
            pct100
            pct75
            pct50
            pct25
            mean
            count
          }}
        }}
      }}
    }}
    """.format(func, arg)

    dwell = requests.post(url, json={'query': query}, 
                           headers = {'Authorization':token})
    
    df = pd.DataFrame([x['node'] for x in dwell.json()['data'][func]['edges']])
    if func == 'feedDwellTimeDistribution':
        df['device'] = ID
    else:
        df['zone'] = ID
    
    return df

In [45]:
feed_dwell_df = pd.concat([get_dwell_by_hour('feedDwellTimeDistribution', device_ids[i]) 
                           for i in range(3)])

In [46]:
# replace NaN with 0
feed_dwell_df = feed_dwell_df.fillna(0)

# convert time to timestamp object
feed_dwell_df['time'] = feed_dwell_df['time'].str[:-6].apply(lambda x : pd.Timestamp(x))

# add name column in addition to ID
feed_dwell_df['device_name'] = [device_dict[d] for d in feed_dwell_df.device]

In [47]:
import datetime as dt
from pandas.api.types import CategoricalDtype
days = [(dt.datetime(2019, 3, 4) + dt.timedelta(days=x)).strftime('%a') for x in range(0, 7)]
day_type = CategoricalDtype(categories=days, ordered=True)

feed_dwell_df['day of week'] = feed_dwell_df['time'].apply(lambda x: x.strftime('%a')).astype(day_type)
feed_dwell_df['date'] = feed_dwell_df['time'].apply(lambda x: x.strftime('%Y-%m-%d'))
feed_dwell_df['hour'] = feed_dwell_df['time'].apply(lambda x: x.strftime('%H'))
feed_dwell_df['hour'] = pd.to_numeric(feed_dwell_df['hour'])

In [55]:
daily_count = feed_dwell_df.groupby(['date', 'device_name'])['count'].max()
daily_count = pd.DataFrame(daily_count).reset_index()
daily_count['date'] = pd.to_datetime(daily_count['date'])

In [49]:
def plot_count(selected, start_date, end_date, threshold):
    '''
    device_or_zone is either 'device' or 'zone';
    selected is a list of device rawIds or zone rawIds;
    metric is a value in ['mean', 'pct100', 'pct75', 'pct50', 'pct25']
    '''
    #df = nonzero_df
    df = daily_count
        
    plot_df = df.loc[(df.date >= pd.Timestamp(start_date)) & 
                     (df.date <= pd.Timestamp(end_date))].copy()
    
    fig = go.Figure()
    
    for device in selected:
        sub_df = plot_df[plot_df['device_name'] == device]
        sub_df_under = sub_df[sub_df['count'] <= threshold]
        fig.add_trace(go.Scatter(x=sub_df_under.date, y=sub_df_under['count'], mode='lines', name=device))
        # TODO: fix string representation
        print("There are", len(sub_df_under), "days for ", device, 
              "with a daily pedestrian count under", threshold)
        print("There are", len(sub_df)-len(sub_df_under), "days for ", device, 
              "with a daily pedestrian count above", threshold)
    
    fig.update_layout(
        title="Pedestrian count under threshold grouped by device",
        xaxis_title="time",
        yaxis_title="count")
    
    fig.show()

In [50]:
# SWLSANDBOX1 = Streetscape
# SWLSANDBOX2 = Under Raincoat
# SWLSANDBOX3 = Outside
_ = interact(plot_count, 
             selected=widgets.SelectMultiple(options=device_names, value=device_names, disabled=False),
             start_date=widgets.DatePicker(value=pd.to_datetime('2019-02-20')),
             end_date=widgets.DatePicker(value=pd.to_datetime('2020-01-12')),
             threshold=widgets.IntSlider(value=500, min=300, max=1000, step=100, readout_format='d')
            )

interactive(children=(SelectMultiple(description='selected', index=(0, 1, 2), options=('Streetscape', 'Under R…

In [51]:
# TODO: combine the two box plots to a single interactive
def plot_boxplot_count_by_day(threshold):
    fig = go.Figure()
    
    #df, byvals, clrs = get_df(groupby)
    df = feed_dwell_df[feed_dwell_df['count'] <= threshold]
    days = ['Mon', 'Tue','Wed', 'Thu', 'Fri', 'Sat', 'Sun']
    
    for i in reversed(range(len(days))):
        # Use x instead of y argument for horizontal plot
        fig.add_trace(go.Box(x=df.loc[df['day of week']==days[i], 'count'], name=days[i]))

    # layout - axes labels
    #fig.update_layout(
    #    xaxis_title=metric,
    #    xaxis_rangeslider_visible=True
    #)
    # title
    
    fig.update_layout(
        title="Pedestrian count under threshold by day of week",
        xaxis_title="time",
        yaxis_title="count")
    
    fig.show()

In [52]:
_ = interact(plot_boxplot_count_by_day, 
             threshold=widgets.IntSlider(value=50, min=50, max=1000, step=50, readout_format='d'))

interactive(children=(IntSlider(value=50, description='threshold', max=1000, min=50, step=50), Output()), _dom…

In [53]:
def plot_boxplot_count_by_hour(threshold):
    fig = go.Figure()
    
    #df, byvals, clrs = get_df(groupby)
    df = feed_dwell_df[feed_dwell_df['count'] <= threshold]
    for j in range(7, 21):
        fig.add_trace(go.Box(x=df.loc[df['hour']==j, 'count'], name=j))

    # layout - axes labels
    #fig.update_layout(
    #    xaxis_title=metric,
    #    xaxis_rangeslider_visible=True
    #)
    # title
    
    fig.update_layout(
        title="Pedestrian count under threshold grouped by hour",
        xaxis_title="time",
        yaxis_title="count")
    
    fig.show()

In [54]:
_ = interact(plot_boxplot_count_by_hour, 
             threshold=widgets.IntSlider(value=100, min=50, max=1000, step=50, readout_format='d'))

interactive(children=(IntSlider(value=100, description='threshold', max=1000, min=50, step=50), Output()), _do…