# Assignment 3 

# Table of Content
## Overview
1. Where is 307?

## Data Exploration
1. People's Behavior in terms of Dwell Time 
2. Which areas of 307 do people pass through
3. Where do people tend to linger?
4. How does dwell time change over time?

## In-depth Analysis
1. How do different zones affect people's behavior?
2. How do events affect people's behavior?
3. What is the best maintenance strategy?
4. What are other factor affect people's bahavior?

# About 307

In [58]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg 
import matplotlib.gridspec as gridspec

In [2]:
import plotly as py
import plotly.express as px
import plotly.graph_objs as go
from plotly.offline import iplot, init_notebook_mode
#init_notebook_mode(connected=True)

import cufflinks as cf
cf.go_offline(connected=True)
cf.set_config_file(colorscale='plotly', world_readable=True)

# Extra options
# pd.options.display.max_rows = 30
# pd.options.display.max_columns = 25

# Show all code cells outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

import os
from IPython.display import Image, display, HTML

In [3]:
import ipywidgets as widgets
from ipywidgets import HBox, VBox
from ipywidgets import interact, interact_manual

In [4]:
# store login data in login.py
%run login.py

In [5]:
# login query as multiline formatted string
# this assumes that login and pwd are defined 
# above

loginquery = f"""
mutation {{
  logIn(
      email:\"{login}\",
      password:\"{pwd}\") {{
    jwt {{
      token
      exp
    }}
  }}
}}
"""

In [6]:
import requests
url = 'https://api.numina.co/graphql'

mylogin = requests.post(url, json={'query': loginquery})
# mylogin

In [7]:
token = mylogin.json()['data']['logIn']['jwt']['token']

In [8]:
expdate = mylogin.json()
# expdate

# Explore the Data!

Now that you've been provided with the context, before we present our analysis, it's time for YOU to explore the data! As mentioned, the following are the full areas covered by the three cameras:

Streetscape | Under Raincoat | Outside
------------- | -------------  | -------------
![alt](streetscape_sandbox.png) | ![alt](underraincoat_sandbox.png) | ![alt](outside_sandbox.png)

As you see in the above images, each area essentially consists of two parts: objects such as tables and chairs, and empty spaces presumably for walking. Based on this reasoning, we have defined the following smaller behaviour zones so as to perform more in-depth research:

### Streetscape ###

Chair Zone | Corridor Zone | Free Zone
------------- | -------------  | -------------
![alt](BehaviorZoneImage/Streetscape-ChairZone.png) | ![alt](BehaviorZoneImage/Streetscape-PathZone.png) | ![alt](BehaviorZoneImage/Streetscape-ActivityZone.png)

### Under Raincoat ###

Chair Zone | Traffic Zone | Free Zone
------------- | -------------  | -------------
![alt](BehaviorZoneImage/UnderRaincoat-ChairZone.png) | ![alt](BehaviorZoneImage/UnderRaincoat-TrafficZone.png) | ![alt](BehaviorZoneImage/UnderRaincoat-ActivityZone.png)

### Outside ###

Chair Zone | Path Zone | -
------------- | -------------  | -------------
![alt](BehaviorZoneImage/Outside-ChairZone.png) | ![alt](BehaviorZoneImage/Outside-PathZone.png) | ![alt](blank.png)

Note that we have to be aware of the fact that the chairs can be moved and that the above images may not necessarily reflect the layout of the room during the whole period of data collection. Specifically, the three sets of chairs in the Under Raincoat area can be easily moved; thus in the initial exploration, we will not be investigating the Chair Zone of Under Raincoat. 

Nonetheless, notice that they are included in the Free Zone. We believe that it is safe to assume that the chairs would not be moved outside the Free Zone to the Traffic Zone.

Similarly, in the Streetscape area, under the assumption that it is intended to place the chairs together, it is unlikely that the group of chairs would be moved around freely and frequently due to the other obstacles in the room. As for the Outside area, it is also unlikely that the chairs would be placed in the middle of the road to block the path. Thus, we will be analyzing these two Chair Zones (while keeping the limitation in mind).

In [8]:
device_dict = {'SWLSANDBOX1':'Streetscape', 'SWLSANDBOX2':'Under Raincoat', 'SWLSANDBOX3':'Outside'}
device_ids = list(device_dict.keys())
device_names = list(device_dict.values())

# streetscape, under raincoat, outside
device_clrs = ['royalblue', 'firebrick', 'forestgreen']

In [9]:
def get_zones(device_id):
    
    query_zones = """
    query {{
      behaviorZones (
        serialnos: "{0}"
        ) {{
        count
        edges {{
          node {{
            rawId
            text
          }}
        }}
      }}
    }}
    """.format(device_id)
    
    zones = requests.post(url, json={'query': query_zones}, headers = {'Authorization':token})
    
    df = pd.DataFrame([x['node'] for x in zones.json()['data']['behaviorZones']['edges']])
    df['device'] = device_id
    
    return df

In [10]:
zones_df = pd.concat([get_zones(device_ids[i]) for i in range(3)])
zones_df = zones_df[(zones_df.text.notnull()) & 
                    (zones_df.text.str.startswith('x-')) & 
                    (zones_df.text.str.endswith('zone'))]

In [11]:
zones_df['text'] = zones_df['text'].str.replace('x-', '')

In [12]:
def get_dwell(func, ID, interval):
    '''
    func is either feedDwellTimeDistribution or zoneDwellTimeDistribution
    '''
    if func == 'feedDwellTimeDistribution':
        arg = 'serialnos: "{0}"'.format(ID)
    else:
        arg = 'zoneIds: {0}'.format(ID)
        
    query = """
    query {{
        {0}(
        {1},
        startTime: "2019-02-20T00:00:00",
        endTime: "2020-01-12T00:00:00",
        timezone: "America/New_York",
        objClasses: ["pedestrian"],
        interval: "{2}"
        ){{
        edges {{
          node {{
            time
            objClass
            pct100
            pct75
            pct50
            pct25
            mean
            count
          }}
        }}
      }}
    }}
    """.format(func, arg, interval)

    dwell = requests.post(url, json={'query': query}, 
                           headers = {'Authorization':token})
    
    df = pd.DataFrame([x['node'] for x in dwell.json()['data'][func]['edges']])
    if func == 'feedDwellTimeDistribution':
        df['device'] = ID
    else:
        df['zone'] = ID
    
    return df

In [13]:
# daily dwell time - device
feed_dwell_1d_df = pd.concat([get_dwell('feedDwellTimeDistribution', device_ids[i], '1d') 
                              for i in range(3)])

In [14]:
# daily dwell time - zone
zone_dwell_1d_df = pd.concat([get_dwell('zoneDwellTimeDistribution', z, '1d')
                             for z in zones_df['rawId'].values])

In [15]:
'''
def extract_time(df):
    df['year'] = df['time'].str[:4].astype(int)
    df['month'] = df['time'].str[5:7].astype(int)
    df['day'] = df['time'].str[8:10].astype(int)
    df['date'] = pd.to_datetime(df['time'].str[:10])
    df['hour'] = df['time'].str[11:13].astype(int)
    return df.drop('time', axis=1)
''';

In [16]:
'''
feed_dwell_df = extract_time(feed_dwell_df)
zone_dwell_df = extract_time(zone_dwell_df)
''';

In [17]:
# replace NaN with 0
feed_dwell_1d_df = feed_dwell_1d_df.fillna(0)
zone_dwell_1d_df = zone_dwell_1d_df.fillna(0)

In [18]:
# convert time to timestamp object
feed_dwell_1d_df['time'] = feed_dwell_1d_df['time'].str[:-6].apply(lambda x : pd.Timestamp(x))
zone_dwell_1d_df['time'] = zone_dwell_1d_df['time'].str[:-6].apply(lambda x : pd.Timestamp(x))
zone_dwell_1d_df.zone = zone_dwell_1d_df.zone.astype(str)

In [19]:
# add name column in addition to ID
feed_dwell_1d_df['device_name'] = [device_dict[d] for d in feed_dwell_1d_df.device]

# zone ID from int to str
zones_df.rawId = zones_df.rawId.astype(str)
zone_dict = dict(zip(zones_df.rawId, zones_df.text))
# zone name
zone_dwell_1d_df['zone_name'] = [zone_dict[z] for z in zone_dwell_1d_df.zone]

In [20]:
def get_df(groupby):
    if groupby == 'device_name':
        return feed_dwell_1d_df.copy(), device_names, device_clrs
    else:
        return zone_dwell_1d_df.copy(), list(zones_df.text), list(zones_df.colour)

In [21]:
# assign a colour to each behaviour zone
zones_df['colour'] = ['blue', 'lightblue', 'cadetblue',
                      'orangered', 'lightcoral', 
                      'palegreen', 'lightgreen']

In [22]:
# add a total column = mean * count
zone_dwell_1d_df['total'] = zone_dwell_1d_df['mean'] * zone_dwell_1d_df['count'] 
feed_dwell_1d_df['total'] = feed_dwell_1d_df['mean'] * feed_dwell_1d_df['count'] 

Recall that the timeframe of our data is approximately one year. Therefore, in the initial exploration, let's focus on the daily dwell time and daily count of pedestrains in the 307 region. 

As a starting point, explore the data using the following interactive line plot and think about these questions:
1. Is there any trend in pedesdrian count / dwell time in any of the areas / zones?
2. Where would you expect to see a bigger crowd? Is any of the areas / zones more popular than others?

Tip: You can click the legend on the right to include/exclude a line on the plot.

In [66]:
metric_list = ['count', 'mean', 'pct100', 'pct75', 'pct50', 'pct25', 'total']

In [67]:
def plot_timeline(groupby, metric):
    '''
    device_or_zone is either 'device_name' or 'zone_name';
    metric is a value in ['count', 'mean', 'pct100', 'pct75', 'pct50', 'pct25', 'total']
    '''
    df, byvals, clrs = get_df(groupby)
    
    fig = go.Figure()
    
    # line plot for each name
    for i in range(len(byvals)):
        sub_df = df[df[groupby] == byvals[i]]
        fig.add_trace(go.Scatter(x=sub_df.time, y=sub_df[metric], line_color=clrs[i], name=byvals[i]))
    
    # layout - axes labels
    fig.update_layout(
        xaxis_title="time",
        yaxis_title=metric,
        xaxis_rangeslider_visible=True
    )
    # title
    if metric != 'count':
        fig.update_layout(title=f"pedestrian dwell time ({metric}) grouped by '{groupby}'")
    else:
        fig.update_layout(title=f"pedestrian count grouped by '{groupby}'")
    
    fig.show()
    

In [68]:
_ = interact(plot_timeline, 
             groupby=widgets.RadioButtons(options=['device_name', 'zone_name'], value='device_name'),
             metric=widgets.Dropdown(options=metric_list, value='mean')
            )

interactive(children=(RadioButtons(description='groupby', options=('device_name', 'zone_name'), value='device_…

Not too surprisingly, we observe a few peak days. The following interactive dataframe summarizes the exact locations and dates:

In [69]:
def sort_dwell_1d(groupby, sortby, ascending, top):
    df, _, _ = get_df(groupby)
    
    cols = [groupby, 'time', sortby]
    if sortby == 'count':
        cols.append('mean')
    elif sortby == 'mean':
        cols.append('count')
    else:
        cols.append('count')
        cols.append('mean')
        
    display(df.sort_values(sortby, ascending=ascending).reset_index(drop=True)
              .loc[:int(top)-1, cols])

_ = interact(sort_dwell_1d, 
             groupby=widgets.RadioButtons(options=['device_name', 'zone_name'], value='device_name'),
             sortby=widgets.Dropdown(options=metric_list, value='mean'),
             top=widgets.IntSlider(value=5, min=1, max=30, step=1, readout_format='d'),
             ascending=widgets.Checkbox(value=False, description='ascending'))

interactive(children=(RadioButtons(description='groupby', options=('device_name', 'zone_name'), value='device_…

In [70]:
def plot_boxplot(groupby, metric):
    fig = go.Figure()
    
    df, byvals, clrs = get_df(groupby)
    
    for i in range(len(byvals)):
        # Use x instead of y argument for horizontal plot
        fig.add_trace(go.Box(x=df.loc[df[groupby]==byvals[i], metric], name=byvals[i],
                             marker_color=clrs[i], boxpoints='outliers'))

    # layout - axes labels
    fig.update_layout(
        xaxis_title=metric,
        xaxis_rangeslider_visible=True
    )
    # title
    if metric != 'count':
        fig.update_layout(title=f"distribution of pedestrian dwell time ({metric}) grouped by '{groupby}'")
    else:
        fig.update_layout(title=f"distribution of pedestrian count grouped by '{groupby}'")
    
    fig.show()
    

In [29]:
_ = interact(plot_boxplot, 
             groupby=widgets.RadioButtons(options=['device_name', 'zone_name'], value='device_name'),
             metric=widgets.Dropdown(options=metric_list, value='mean')
            )

interactive(children=(RadioButtons(description='groupby', options=('device_name', 'zone_name'), value='device_…

In [30]:
# feed_dwell_1d_df.groupby('device')['count'].describe()

In [71]:
# groupby zone_name / device_name and take the sum for the other columns
# should only investigate the count and total columns

grouped_df = zone_dwell_1d_df.groupby('zone_name').sum().reset_index(drop=False)\
                             .rename(columns={'zone_name':'name'})
grouped_df = grouped_df.append(feed_dwell_1d_df.groupby('device_name').sum().reset_index(drop=False)
                               .rename(columns={'device_name':'name'}))

In [72]:
from plotly.subplots import make_subplots

def plot_barplot(metric):
    '''
    metric is either 'count' or 'total' (dwell time)
    '''
    fig = make_subplots(rows=1, cols=3)
    
    df = grouped_df.copy()
    m = metric.split(' ')[0]
    
    for i in range(3):
        dname = device_names[i]
        total = df.loc[df.name==dname, m]
        sub_df = df[[n[1:5]==dname[1:5] for n in df.name]]
        sub_df.name = [s[-1] for s in sub_df.name.str.split('-')]
        sub_df['perc'] = sub_df[m].apply(lambda x : x / total * 100)
        
        fig.add_bar(x=sub_df.name, y=sub_df[m], name=dname, row=1, col=i+1)
        
    #fig.update_yaxes(ticksuffix="%", col=1)
    layout = go.Layout(yaxis=dict(range=[0, 100]))
    
    fig.update_layout(title=f"proportion of individual behaviour zones w.r.t. the area in terms of {metric} of pedestrians")

    fig.show()

In [33]:
_ = interact(plot_barplot, metric=widgets.RadioButtons(options=['count', 'total dwell time'], value='count'))

interactive(children=(RadioButtons(description='metric', options=('count', 'total dwell time'), value='count')…

In [34]:
'''
def boxplot_dwell(groupby, column, bound_factor):
    df, _, _ = get_df(groupby)
    
    q3 = df[column].quantile(0.75) 
    q1 = df[column].quantile(0.25)
    iqr = q3 - q1
    sub_df = df[(df[column] <= q3 + iqr*bound_factor) & 
                  ((df[column] >= q1 - iqr*bound_factor))]
    
    if column == 'count':
        title = f"distribution of count grouped by '{groupby}'" +\
        f" with values {bound_factor} * IQR beyond Q1/Q3 removed"
    else:
        title = f"distribution of mean dwell time grouped by '{groupby}'" +\
        f" with values {bound_factor} * IQR beyond Q1/Q3 removed"
    
    fig = px.box(sub_df, x=groupby, y=column, points="all", title=title)

    fig.show()
''';

In [35]:
'''
_ = interact(boxplot_dwell, 
             groupby=widgets.RadioButtons(options=['device_name', 'zone_name']), value='device_name',
             column=widgets.RadioButtons(options=['count', 'mean'], value='count'),
             bound_factor=widgets.FloatSlider(
                 value=1.5,
                 min=-3,
                 max=10,
                 step=0.1,
                 disabled=False,
                 continuous_update=False,
                 orientation='horizontal',
                 readout=True,
                 readout_format='.1f')
            )
''';

### Obtain heatmap for pedestrians
I'm going to plot heatmaps for important days in section.
The audience will be able to 
1. Select heatmaps of days (Do comparison)
    1.  top 1/2/4/9 days in terms of dwell counts or average dwell time
    2. event days
    3. customize 1/2/4/9 days
2. Select Quantiles for heatmaps
    1. 0 - 90 Desired Lines (10 each step)
    2. 90 - 100 Desired Spots (1 each step)
3. Show overlap of heatmaps between Traffics and Pedatrains for outdoor cameras.
4. Choose color of heatmap

In [23]:
from datetime import timedelta, datetime
from dateutil.relativedelta import relativedelta
import calendar
START_DATE = datetime(2019, 2, 20, 0, 0, 0)
END_DATE = datetime(2020, 1, 11, 0, 0, 0)
time_delta = relativedelta(days = +1)

In [24]:
import pandas as pd

In [25]:
## functions to get the data 
def heatmap_query_gen(startTime: str, endTime: str, camera:int, obj:str):
    heatmap_query = """
query {{
  feedHeatmaps(
    serialno: "{0}",
    startTime:"{1}",
    endTime:"{2}",
    objClasses:["{3}"],
    timezone:"America/New_York") {{
    edges {{
      node {{
        time
        objClass
        heatmap
      }}
    }}
  }}
}}
""".format(camera, startTime, endTime,obj)
    return heatmap_query
def get_heatmap_data(camera: int, obj: str, start_times:list, end_times:list):
    heatmap_df = pd.DataFrame(columns = ['startTime', 'endTime', 'heatMap', 'obj'])
    i = 0
    while i < len(start_times):
        heatmap_data = requests.post(url, json={'query': heatmap_query_gen(start_times[i].strftime('%Y-%m-%dT%H:%M:%S'), 
                                                                                end_times[i].strftime('%Y-%m-%dT%H:%M:%S'), camera, obj)}, 
                                                                                headers = {'Authorization':token})
        heatmap_json = heatmap_data.json()
        if heatmap_json['data']:
            if 'feedHeatmaps' in heatmap_json['data']:
                heatmap = heatmap_json['data']['feedHeatmaps']['edges'][0]['node']['heatmap']
                temp_df = pd.DataFrame({"startTime":start_times[i], "endTime":end_times[i], 'heatMap':heatmap, 'obj': obj})
                heatmap_df = heatmap_df.append(temp_df, ignore_index = True)
        i = i + 1
    return heatmap_df
def generate_consecutive_times(start_time: datetime, end_time: datetime, interval: relativedelta):
    ## the first element in the list are the start times
    time = [[], []]
    current_time = start_time
    while current_time < end_time:
        time[0].append(current_time)
        time[1].append(current_time + interval)
        current_time = current_time + interval
    return time
def daily_heatmap_data(df):
    return df.groupby(['startTime', 'endTime'])['heatMap'].apply(list).reset_index(name='heatMapMatrix')

In [26]:
## load the data, it takes a lot of time, so we do it camera by camera
all_time = generate_consecutive_times(START_DATE, END_DATE, time_delta)
outside_heatmap_pedestrian = daily_heatmap_data(get_heatmap_data('SWLSANDBOX3', 'pedestrian', all_time[0], all_time[1]))

In [27]:
streetscape_heatmap_pedestrian = daily_heatmap_data(get_heatmap_data('SWLSANDBOX1', 'pedestrian', all_time[0], all_time[1]))

In [28]:
underraincoat_heatmap_pedestrian_1 = daily_heatmap_data(get_heatmap_data('SWLSANDBOX2', 'pedestrian', all_time[0][0:100], all_time[1][0:100]))

In [29]:
underraincoat_heatmap_pedestrian_2 = daily_heatmap_data(get_heatmap_data('SWLSANDBOX2', 'pedestrian', all_time[0][100:200], all_time[1][100:200]))

In [30]:
underraincoat_heatmap_pedestrian_3 = daily_heatmap_data(get_heatmap_data('SWLSANDBOX2', 'pedestrian', all_time[0][200:], all_time[1][200:]))

In [31]:
underraincoat_heatmap_pedestrian = underraincoat_heatmap_pedestrian_1.append(
    underraincoat_heatmap_pedestrian_2).append(
    underraincoat_heatmap_pedestrian_3).reset_index(drop = True)

In [32]:
## join two dataframes
streetscape_pedestrian_data_all = pd.merge(feed_dwell_1d_df[feed_dwell_1d_df['device'] == 'SWLSANDBOX1'], streetscape_heatmap_pedestrian, left_on = "time", right_on = "startTime")
underraincoat_pedestrian_data_all = pd.merge(feed_dwell_1d_df[feed_dwell_1d_df['device'] == 'SWLSANDBOX2'], underraincoat_heatmap_pedestrian, left_on = "time", right_on = "startTime")
outside_pedestrian_data_all = pd.merge(feed_dwell_1d_df[feed_dwell_1d_df['device'] == 'SWLSANDBOX3'], outside_heatmap_pedestrian, left_on = "time", right_on = "startTime")

In [33]:
event_days = [datetime(2019, 3, 2, 0, 0, 0), datetime(2019, 6, 29, 0, 0, 0), datetime(2019, 8, 15, 0, 0, 0), datetime(2019, 9, 26, 0, 0, 0), 
                    datetime(2019, 11, 20, 0, 0, 0), datetime(2019, 11, 21, 0, 0, 0), datetime(2019, 11, 22, 0, 0, 0), datetime(2019, 11, 23, 0, 0, 0)] ## sorted by number of people

In [34]:
## Plot streetscape heatmap
## First, check how many plots the audience wants to show, input p
## Then, let the user to choose the day: IntRangeSlider
def get_daily_matrix(day: datetime, percentile: int, camera: str):
    data = []
    if camera == 'Outside Camera':
        data = list(outside_heatmap_pedestrian[outside_heatmap_pedestrian['startTime'] == day]['heatMapMatrix'])
    elif camera == 'StreetScape Camera':
         data = list(streetscape_heatmap_pedestrian[streetscape_heatmap_pedestrian['startTime'] == day]['heatMapMatrix'])
    elif camera == 'UnderRainCoat Camera':
        data = list(underraincoat_heatmap_pedestrian[underraincoat_heatmap_pedestrian['startTime'] == day]['heatMapMatrix'])
    if data: 
        p = np.percentile([i[2] for i in data[0]], percentile)
        filtered = list(filter(lambda x : x[2] >= p, data[0]))
        x = [i[0] for i in filtered] 
        y = [i[1] for i in filtered]
        density = [i[2] for i in filtered]
        return [x, y, density]
    else:
        return []
def handle_not_exist_day(day):
    print('There is no pedestrian recorded on {0}.{1}.{2}, please select another day.'.format(day.year, day.month, day.day))
def plot_streetscape_heatmap(percentile, day1:datetime, day2:datetime, day3: datetime, day4: datetime,
                             day5:datetime, day6:datetime, day7: datetime, day8: datetime, day9: datetime, mode: str, camera: str):
    fig = plt.figure(figsize=(16,11))
    days = [day1, day2, day3, day4, day5, day6, day7, day8, day9]
    if camera == "Outside Camera":
        image = mpimg.imread('outside_sandbox.png')
    elif camera == "StreetScape Camera":
        image = mpimg.imread('streetscape_sandbox.png')
    elif camera == "UnderRainCoat Camera":
        image = mpimg.imread('underraincoat_sandbox.png')
    if mode == "Days with the most dwell counts":
        days = days
    elif  mode == "Days with the highest mean dwell time":
        days = days
    elif mode == "Event Days with most pedestrian":
        days = days
    elif mode == "Customize":
         days = days
    axes = []
    for i in range(0,9):
        day_data = get_daily_matrix(days[i], percentile, camera)
        ax = fig.add_subplot(3, 3, i+1)
        axes.append(ax)
        if not (day_data):
            handle_not_exist_day(days[i])
        else:
            ax = fig.add_subplot(3, 3, i+1)
            ax.scatter(day_data[0], y = day_data[1], c=day_data[2], s=1, cmap= plt.cm.nipy_spectral)
        ax.imshow( image, aspect='auto')
        ax.set_title("Heatmap on {0}.{1}.{2}".format(days[i].year, days[i].month, days[i].day))
        ax.axis('off')
widgets.interact_manual(plot_streetscape_heatmap, day1=widgets.DatePicker(value=pd.to_datetime('2019-02-20')), 
                                                              day2=widgets.DatePicker(value=pd.to_datetime('2019-02-21')),
                                                              day3=widgets.DatePicker(value=pd.to_datetime('2019-02-22')),
                                                              day4=widgets.DatePicker(value=pd.to_datetime('2019-02-23')), 
                                                              day5=widgets.DatePicker(value=pd.to_datetime('2019-02-20')), 
                                                              day6=widgets.DatePicker(value=pd.to_datetime('2019-02-21')),
                                                              day7=widgets.DatePicker(value=pd.to_datetime('2019-02-22')),
                                                              day8=widgets.DatePicker(value=pd.to_datetime('2019-02-23')), 
                                                              day9=widgets.DatePicker(value=pd.to_datetime('2019-02-23')), 
                                                              percentile = widgets.IntSlider(min=0, max=100, step=5, value=0),
                                                              mode = widgets.Dropdown(options=[("Days with the most dwell counts", "Days with the most dwell counts"),
                                                                                               ("Days with the highest mean dwell time", "Days with the highest mean dwell time"),
                                                                                               ("Event Days with most pedestrian", "Event Days with most pedestrian"),
                                                                                               ("Customize", "Customize")],description='Plots:'),
                                                               camera = widgets.Dropdown(options=["Outside Camera", "StreetScape Camera", "UnderRainCoat Camera"]))


interactive(children=(IntSlider(value=0, description='percentile', step=5), DatePicker(value=Timestamp('2019-0…

<function __main__.plot_streetscape_heatmap(percentile, day1: datetime.datetime, day2: datetime.datetime, day3: datetime.datetime, day4: datetime.datetime, day5: datetime.datetime, day6: datetime.datetime, day7: datetime.datetime, day8: datetime.datetime, day9: datetime.datetime, mode: str, camera: str)>

For the benefit of analyze heatmap in a single day, I want to introduce two more interative pages.
They are animations, one is for quantile, one is for hour.
So, I want to download hour heatmap of event days.

In [105]:
## download data of 6.29 as an example, if you want to add more days, just add them to list
start_time = datetime(2019, 6, 29, 0, 0, 0)
end_time = datetime(2019, 6, 30, 0, 0, 0)
interval =  relativedelta(hours = +1)
hour_interval = generate_consecutive_times(start_time, end_time, interval)
streetscape_heatmap_pedestrian_event_days = daily_heatmap_data(get_heatmap_data('SWLSANDBOX1', 'pedestrian', hour_interval[0], hour_interval[1]))


In [None]:
def add_column(data, date, camera, t):
    data["date"] = date
    data["camera"] = camera
    data["type"] = t
    return data

In [116]:
outside_heatmap_pedestrian_event_days  = daily_heatmap_data(get_heatmap_data('SWLSANDBOX3', 'pedestrian', hour_interval[0], hour_interval[1]))
outside_heatmap_pedestrian_event_days = add_column(outside_heatmap_pedestrian_event_days,  datetime(2019, 6, 29, 0, 0, 0), "outside", "pedestrian")

In [117]:
underraincoat_heatmap_pedestrian_event_days  = daily_heatmap_data(get_heatmap_data('SWLSANDBOX2', 'pedestrian', hour_interval[0], hour_interval[1]))
underraincoat_heatmap_pedestrian_event_days = add_column(underraincoat_heatmap_pedestrian_event_days,  datetime(2019, 6, 29, 0, 0, 0), "underraincoat", "pedestrian")

In [108]:
start_time = datetime(2019, 6, 29, 0, 0, 0)
for day in event_days:
    if day != start_time:
        start_time = day
        end_time = start_time + relativedelta(days = +1)
        hour_interval = generate_consecutive_times(start_time, end_time, interval)
        temp = add_column(daily_heatmap_data(get_heatmap_data('SWLSANDBOX1', 'pedestrian', hour_interval[0], hour_interval[1])), start_time,  'streetscape', 'pedestrian')
        streetscape_heatmap_pedestrian_event_days = streetscape_heatmap_pedestrian_event_days.append(temp)

In [122]:
start_time = datetime(2019, 6, 29, 0, 0, 0)
for day in event_days:
    if day != start_time:
        start_time = day
        end_time = start_time + relativedelta(days = +1)
        hour_interval = generate_consecutive_times(start_time, end_time, interval)
        temp = add_column(daily_heatmap_data(get_heatmap_data('SWLSANDBOX2', 'pedestrian', hour_interval[0], hour_interval[1])), start_time,  'underraincoat', 'pedestrian')
        underraincoat_heatmap_pedestrian_event_days = underraincoat_heatmap_pedestrian_event_days.append(temp)


Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.





In [123]:
start_time = datetime(2019, 6, 29, 0, 0, 0)
for day in event_days:
    if day != start_time:
        start_time = day
        end_time = start_time + relativedelta(days = +1)
        hour_interval = generate_consecutive_times(start_time, end_time, interval)
        temp = add_column(daily_heatmap_data(get_heatmap_data('SWLSANDBOX3', 'pedestrian', hour_interval[0], hour_interval[1])), start_time,  'outside', 'pedestrian')
        outside_heatmap_pedestrian_event_days = outside_heatmap_pedestrian_event_days.append(temp)

In [132]:
def event_hour_data_helper(camera, percentile, time):
    data = []
    if camera == "Outside Camera":
        data = outside_heatmap_pedestrian_event_days
    elif camera == "StreetScape Camera":
        data = streetscape_heatmap_pedestrian_event_days
    elif camera == "UnderRainCoat Camera":
        data = underraincoat_heatmap_pedestrian_event_days
    data = list(data[data['startTime'] == time]['heatMapMatrix'])
    if data:
        p = np.percentile([i[2] for i in data[0]], percentile)
        filtered = list(filter(lambda x : x[2] >= p, data[0]))
        x = [i[0] for i in filtered] 
        y = [i[1] for i in filtered]
        density = [i[2] for i in filtered]
        data = [x, y, density]
    return data

In [162]:
## heatmap_animation_hour("StreetScape Camera", start_time, 20, 12)
## User is able to select the event day, percentile,
def heatmap_animation_hour(camera: str, day: datetime, percentile: int, hour: int):
    hour_interval =  relativedelta(hours = +1)
    fig, ax = plt.subplots(figsize=(15,10))
    if camera == "Outside Camera":
        image = mpimg.imread('outside_sandbox.png')
    elif camera == "StreetScape Camera":
        image = mpimg.imread('streetscape_sandbox.png')
    elif camera == "UnderRainCoat Camera":
        image = mpimg.imread('underraincoat_sandbox.png')
    time = day+ hour_interval*hour
    data = event_hour_data_helper(camera, percentile, time)
    if data:
        x = data[0]
        y = data[1]
        density = data[2]
        ax.scatter(x, y, c= density, s=1, cmap= plt.cm.RdPu)
    ax.imshow(image, aspect='auto')
    ax.set_title("Hourly Heatmap Animation on {0}.{1}.{2} hour:{3}".format(day.year, day.month, day.day, hour))
    ax.axis('off')
    plt.show()

In [164]:
play= widgets.Play(
    value=0,
    min=0,
    max=23,
    step=1,
    interval=10000,
    description="Press play",
    disabled=False
)
hour_slider = widgets.IntSlider(value=0,min=0,max=23,step=1,description='Hour:')
widgets.jslink((play, 'value'), (slider, 'value'))
hour_player= widgets.HBox([play, hour_slider])

Link(source=(Play(value=0, description='Press play', interval=10000, max=23), 'value'), target=(IntSlider(valu…

In [165]:
Day_time_drop = widgets.Dropdown(options=event_days)
Camera_Hbox = widgets.ToggleButtons(options=[('Outside',  "Outside Camera"), ('StreetScape',"StreetScape Camera") , ('Under RainCoat', "UnderRainCoat Camera")], description='Camera:')
percentile_slider = widgets.IntSlider(min=0, max=100, step=5, value=0)
hour = hour_player
heatmap_animation_hour_widget = widgets.interactive(heatmap_animation_hour,
                                             camera = Camera_Hbox,
                                             day = Day_time_drop,
                                             percentile = percentile_slider,
                                             hour =hour_slider,continuous_update=False)

#button_a = heatmap_animation_hour_widget.children[-2]

output_a = heatmap_animation_hour_widget.children[-1]

tab1 = VBox(children=[Camera_Hbox,
                      Day_time_drop,
                    percentile_slider,
                      hour])

VBox(children=[tab1, output_a])

VBox(children=(VBox(children=(ToggleButtons(description='Camera:', options=(('Outside', 'Outside Camera'), ('S…

In order to investigate this deeper, we want to plot something like a cumulative heatmap.

Since heatmap is on a daily basis, and the maximum is always 1, the minmum depends on that day's situtation.
we would like to normalize the day based on its dwell counts on that day. So, here is my approach
1. Get the daily dwell count data
2. Figure out the best way to normalize it
3. Plot the one cumulative plot on the left, and another on the right
I will first a cumulative plot for the whole time

In [39]:
## functions to generate cumulative heatmap matrix
## This function return the density multiply the factor
def weight_matrix(matrix, factor):
    new_density = [i[2]*factor for i in matrix]
    temp =  [[matrix[i][0], matrix[i][1], new_density[i]] for i in range(len(matrix))]
    return temp

In [40]:
## Add weighted heatmap matrix for three dataframes
streetscape_pedestrian_data_all["weighted_heatMapMatrix"] = streetscape_pedestrian_data_all.apply(lambda x: weight_matrix(x['heatMapMatrix'], x['count']),axis=1)

In [41]:
underraincoat_pedestrian_data_all["weighted_heatMapMatrix"] = underraincoat_pedestrian_data_all.apply(lambda x: weight_matrix(x['heatMapMatrix'], x['count']),axis=1)

In [42]:
outside_pedestrian_data_all["weighted_heatMapMatrix"] = outside_pedestrian_data_all.apply(lambda x: weight_matrix(x['heatMapMatrix'], x['count']),axis=1)

In [43]:
## This function generate data for summation of heatmap matrix
def culmulative_heat_map_data_generator(days, data):
    culmulative_data_lis = []
    for day in days:
        daily_data = list(data[data['time'] == day]['weighted_heatMapMatrix'])
        if daily_data:
            culmulative_data_lis.append(daily_data[0])
    return culmulative_data_lis

In [44]:
## This function returns the cumulative heatmap matrix by summation based on the corrdinates
def culmulative_heat_map_data(data):
    culmulative_data_dic = {}
    culmulative_data_lis = []
    for daily_data in data:
        for coordinate_data in daily_data:
            coordinate = (coordinate_data[0], coordinate_data[1])
            if coordinate not in culmulative_data_dic:
                culmulative_data_dic[coordinate] = coordinate_data[2]
            else:
                culmulative_data_dic[coordinate] += coordinate_data[2]
    for coordinate in list(culmulative_data_dic.keys()):
        culmulative_data_lis.append([coordinate[0], coordinate[1], culmulative_data_dic[coordinate]])
    return culmulative_data_lis

In [45]:
## This function generate quantiled x,y coordinates, used for clustering
def get_quantiled_data_coordinate(data, percentile):
    p = np.percentile([i[2] for i in data], percentile)
    filtered = list(filter(lambda x : x[2] >= p, data))
    x = [i[0] for i in filtered] 
    y = [i[1] for i in filtered]
    density = [i[2] for i in filtered]
    quantiled_data = []
    for i in range(len(x)):
        quantiled_data.append([x[i], y[i]])
    return quantiled_data

In [46]:
## This function is 
def get_quantiled_data(data, percentile, form):
    p = np.percentile([i[2] for i in data], percentile)
    filtered = list(filter(lambda x : x[2] >= p, data))
    if form:
        x = [i[0] for i in filtered] 
        y = [i[1] for i in filtered]
        density = [i[2] for i in filtered]
        return [x, y, density]
    else:
        return filtered

In [47]:
## Generate cumulative heatmap matrix for important times
cumulative_streetscape_pedestrian_all_days = culmulative_heat_map_data(culmulative_heat_map_data_generator(all_time[0], streetscape_pedestrian_data_all))
cumulative_outside_pedestrian_all_days = culmulative_heat_map_data(culmulative_heat_map_data_generator(all_time[0], outside_pedestrian_data_all))
cumulative_under_pedestrian_all_days = culmulative_heat_map_data(culmulative_heat_map_data_generator(all_time[0], underraincoat_pedestrian_data_all))
cumulative_streetscape_pedestrian_event_days = culmulative_heat_map_data(culmulative_heat_map_data_generator(event_days, streetscape_pedestrian_data_all))
cumulative_outside_pedestrian_event_days = culmulative_heat_map_data(culmulative_heat_map_data_generator(event_days, outside_pedestrian_data_all))
cumulative_under_pedestrian_event_days = culmulative_heat_map_data(culmulative_heat_map_data_generator(event_days, underraincoat_pedestrian_data_all))

In [48]:
def cumulative_heatmap_data_helper(camera:str, plot:str, quantile:int, form):
    data = []
    if camera == "Outside Camera":
        if plot == "Event Days":
            data = cumulative_outside_pedestrian_event_days
        if plot == "All the Days":
            data = cumulative_outside_pedestrian_all_days
    if camera == "StreetScape Camera":
        if plot == "Event Days":
            data = cumulative_streetscape_pedestrian_event_days
        if plot == "All the Days":
            data = cumulative_streetscape_pedestrian_all_days
    if camera == "UnderRainCoat Camera":
        if plot == "Event Days":
            data = cumulative_under_pedestrian_event_days
        if plot == "All the Days":
            data = cumulative_under_pedestrian_all_days
    total_density = sum(i[2]  for i in data)
    data = get_quantiled_data(data, quantile, form)
    quantiled_density = sum(data[2])
    return (data, [quantiled_density, total_density-quantiled_density])

In [49]:
def check_circle(radius, center, coordinate):
    return ((center[0]- coordinate[0])**2 + (center[1]- coordinate[1])**2) < radius**2

In [None]:
label1 = ['Quan', 'Jellybean', 'Milkshake', 'Cheesecake']
sizes = [38.4, 40.6, 20.7, 10.3]
colors = ['yellowgreen', 'gold', 'lightskyblue', 'lightcoral']
patches, texts = plt.pie(sizes, colors=colors, shadow=True, startangle=90)
plt.legend(patches, labels, loc="best")
plt.axis('equal')
plt.tight_layout()
plt.show()

In [73]:
## tab: data, desired ines: density, we recommend you to lower to 50 percentile, desired slots, proportion of density
## Design of interactive part
# 1. Hbox: ToggleButtons, 
# 2. Hbox:  Dropdown for plot1,  widgets.IntSlide for quantile1
# 3. Hbox: Dropdown for plot2,  widgets.IntSlide for quantile2
def plot_cumulative_heatmap(camera, plot1, plot2, quantile1, quantile2):
    
#     camera = Camera_Hbox.value
#     plot1 = Plot1_Drop.value
#     plot2 = Plot2_Drop.value
#     quantile1 = Plot1_quantile.value
#     quantile2 = Plot2_quantile.value
    
    if camera == "Outside Camera":
        image = mpimg.imread('outside_sandbox.png')
    elif camera == "StreetScape Camera":
        image = mpimg.imread('streetscape_sandbox.png')
    elif camera == "UnderRainCoat Camera":
        image = mpimg.imread('underraincoat_sandbox.png')
        
    temp1 =  cumulative_heatmap_data_helper(camera, plot1, quantile1, True)
    temp2 = cumulative_heatmap_data_helper(camera, plot2, quantile2, True)
    data1 = temp1[0]
    pie1 = temp1[1]
    data2 = temp2[0]
    pie2 = temp2[1]
    fig, ax = plt.subplots(figsize=(16,10))
    gs = gridspec.GridSpec(8, 4)
    ax0 = plt.subplot(gs[0:4,0:2]) # upper heatmap
    ax1 = plt.subplot(gs[4:8,0:2]) # lower heatmap
    ax2 = plt.subplot(gs[0:2,2]) # 1st pie chart
    ax3 = plt.subplot(gs[2:4,2])
    ax4 = plt.subplot(gs[4:6,2])
    ax5 = plt.subplot(gs[6:8,2])
    
    ax0.scatter(data1[0], data1[1], c = data1[2], cmap = plt.cm.YlGnBu_r)
    ax0.imshow(image, aspect='auto')
    ax0.axis('off')
    
    ax1.scatter(data2[0], data2[1], c = data2[2], cmap = plt.cm.YlGnBu_r)
    ax1.imshow(image, aspect='auto')
    ax1.axis('off')
    
    label1 = ['Blue Lines', 'Other Points']
    title1 = "Density Proportion for {0}".format(plot1)
    label2 =  ['Blue Points', 'Other Points']
    title2 =  "Proportion of Number of Points for {0}".format(plot1)
    
    label3 = ['Blue Lines', 'Other Points']
    title3 = "Density Proportion of for {0}".format(plot2) 
    label4 =  ['Blue Points', 'Other Points']
    title4 =  "Proportion of Number of Points for {0}".format(plot2)
    
    colors=["lightskyblue", "lightcoral"]
    
    wedges, texts, autotexts = ax2.pie(pie1, autopct='%1.1f%%', colors = colors)
    ax2.legend(wedges, label1,  loc="center left",bbox_to_anchor=(1, 0, 0.5, 1))
    ax2.set_title(title1)
    
    wedges, texts, autotexts= ax3.pie([100-quantile1, quantile1], autopct='%1.1f%%', colors = colors)
    ax3.legend(wedges, label2,  loc="center left",bbox_to_anchor=(1, 0, 0.5, 1))
    ax3.set_title(title2)
    
    wedges, texts, autotexts= ax5.pie([100-quantile2, quantile2], autopct='%1.1f%%', colors = colors)
    ax5.legend(wedges,label4, loc="center left",bbox_to_anchor=(1, 0, 0.5, 1))
    ax5.set_title(title4)
    
    ax4.set_title(title3)
    wedges, texts, autotexts = ax4.pie(pie2,autopct='%1.1f%%', colors = colors)
    ax4.legend(wedges,label3, loc="center left",bbox_to_anchor=(1, 0, 0.5, 1))
    
    plt.tight_layout()

In [75]:
 style = {'description_width': 'initial'}

Plot1_Drop = widgets.Dropdown(options=["Event Days","All the Days"], description='Time (first plot): ', style = style)
Plot2_Drop = widgets.Dropdown(options=["Event Days","All the Days"], description='Time (second plot):', style = style)
Plot1_quantile  = widgets.IntSlider(min=0, max=100, step=1, value=50, description='Percentile (first plot): ',style = style)
Plot2_quantile = widgets.IntSlider(min=0, max=100, step=1, value=50, description='Percentile (second plot): ',style = style)
Plot1_Hbox = widgets.HBox(children=[Plot1_Drop, Plot1_quantile], style = style)
Plot2_Hbox = widgets.HBox(children=[Plot2_Drop, Plot2_quantile], style = style)
Camera_Hbox = widgets.ToggleButtons(options=[('Outside',  "Outside Camera"), ('StreetScape',"StreetScape Camera") , ('Under RainCoat', "UnderRainCoat Camera")], description='Camera:')

plot_cumulative_heatmap_widget = widgets.interactive(plot_cumulative_heatmap, {'manual': True},
                                             camera = Camera_Hbox,
                                             plot1 = Plot1_Drop,
                                             plot2 = Plot2_Drop,
                                             quantile1 = Plot1_quantile,
                                             quantile2 = Plot2_quantile)

button1 = plot_cumulative_heatmap_widget.children[-2]

output1 = plot_cumulative_heatmap_widget.children[-1]

tab1 = VBox(children=[Camera_Hbox,
                      Plot1_Hbox,
                    Plot2_Hbox,button1])

VBox(children=[tab1, output1])

VBox(children=(VBox(children=(ToggleButtons(description='Camera:', options=(('Outside', 'Outside Camera'), ('S…

In [100]:
from sklearn.cluster import MiniBatchKMeans, KMeans
import math
from IPython.display import clear_output

def plot_cumulative_heatmap_points(camera, plot1, plot2, n, radius, coordinate_x, coordinate_y, show_scatter, show_circle):
    
#     camera = Camera_Hbox.value
#     plot1 = Plot1_Drop.value
#     plot2 = Plot2_Drop.value
#     n = n_cluster_slider.value
#     radius = radius_slider.value
    coordinate = (coordinate_x,coordinate_y)
#     show_scatter = show_scatter_box.value
#     show_circle = show_circle_box.value
    
    if camera == "Outside Camera":
        image = mpimg.imread('outside_sandbox.png')
    elif camera == "StreetScape Camera":
        image = mpimg.imread('streetscape_sandbox.png')
    elif camera == "UnderRainCoat Camera":
        image = mpimg.imread('underraincoat_sandbox.png')

    fig, ax = plt.subplots(figsize=(16,10))
    gs = gridspec.GridSpec(8, 4)
    ax0 = plt.subplot(gs[0:4,0:2]) # upper heatmap
    ax1 = plt.subplot(gs[4:8,0:2]) # lower heatmap
    ax2 = plt.subplot(gs[0:2,2]) # 1st pie chart
    ax3 = plt.subplot(gs[2:4,2])
    ax4 = plt.subplot(gs[4:6,2])
    ax5 = plt.subplot(gs[6:8,2])
    
    data1 =  cumulative_heatmap_data_helper(camera, plot1, 80, False)[0]
    data2 = cumulative_heatmap_data_helper(camera, plot2, 80, False)[0]
    
    k_data1 = [[i[0], i[1]] for i in data1]
    K_data2 = [[i[0], i[1]] for i in data2]
    
    k_means1 = KMeans(init='k-means++', n_clusters=n, n_init=10)
    k_means1.fit(k_data1)
    k_means2 = KMeans(init='k-means++', n_clusters=n, n_init=10)
    k_means2.fit(K_data2)
    k_means_cluster_centers_1 = list(np.sort(k_means1.cluster_centers_, axis=0))
    k_means_cluster_centers_2 = list(np.sort(k_means2.cluster_centers_, axis=0))
    x_number_list1 = [i[0] for i in k_means_cluster_centers_1]
    y_number_list1 = [i[1] for i in k_means_cluster_centers_1]
    x_number_list2 =  [i[0] for i in k_means_cluster_centers_2]
    y_number_list2 =  [i[1] for i in k_means_cluster_centers_2]
   
    if show_scatter:
        temp1 =  cumulative_heatmap_data_helper(camera, plot1, 80, True)
        temp2 = cumulative_heatmap_data_helper(camera, plot2, 80, True)
        data_scatter1 = temp1[0]
        data_scatter2 = temp2[0]
        ax0.scatter(data_scatter1[0], data_scatter1[1], c = data_scatter1[2], cmap = plt.cm.YlGnBu_r)
        ax1.scatter(data_scatter2[0], data_scatter2[1], c = data_scatter2[2], cmap = plt.cm.YlGnBu_r)
    
        
    # calculate area
    area = math.pi*(radius*radius) # limitation, circle may not in reactangle
    total_area = 500 * 650
    
    # paint circle
    if show_circle:
        circle1 = plt.Circle(coordinate, radius, color='orange', fill=False,lw=5 )
        circle2 = plt.Circle(coordinate, radius, color='orange', fill=False, lw=5 )
        ax0.add_artist(circle1)
        ax1.add_artist(circle2)
    
   
        
    ax0.scatter(x_number_list1, y_number_list1, s=100, color = 'r')
    ax0.imshow(image, aspect='auto')
    ax1.scatter(x_number_list2, y_number_list2, s=100, color = 'r')
    ax1.imshow(image, aspect='auto')
    
    colors=["orange", "lightskyblue"]
    
    circle_density_1 = 0
    total_density_1 = 0
    for coordinates in data1:
        total_density_1 = total_density_1 + coordinates[2]
        if check_circle(radius,coordinate,(coordinates[0],coordinates[1])):
            circle_density_1 = circle_density_1 + coordinates[2]
    
    circle_density_2 = 0
    total_density_2 = 0
    for coordinates in data2:
        total_density_2 = total_density_2 + coordinates[2]
        if check_circle(radius,coordinate,(coordinates[0],coordinates[1])):
            circle_density_2 = circle_density_2 + coordinates[2]
            
    # label 
    label = ["Inside the circle", "Outside the circle"]

    
    # pie chart data
    pie1 = [circle_density_1, total_density_1 - circle_density_1]
    pie2 = [circle_density_2, total_density_2 - circle_density_2]
    pie3 = [area, total_area-area]
    
    wedges, texts, autotexts = ax2.pie(pie1, autopct='%1.1f%%', colors = colors)
    ax2.legend(wedges, label,loc="center left",bbox_to_anchor=(1, 0, 0.5, 1))
    ax2.set_title("Density of circle for {0}".format(plot1))
    
    wedges, texts, autotexts = ax3.pie(pie3, autopct='%1.1f%%', colors = colors)
    ax3.legend(wedges, label,loc="center left",bbox_to_anchor=(1, 0, 0.5, 1))
    ax3.set_title("Proportion of Area")
    
    wedges, texts, autotexts = ax5.pie(pie3, autopct='%1.1f%%', colors = colors)
    ax5.legend(wedges,label, loc="center left",bbox_to_anchor=(1, 0, 0.5, 1))
    ax5.set_title("Proportion of Area")
    
    ax4.set_title("Density of circle for {0}".format(plot2))
    wedges, texts, autotexts = ax4.pie(pie2, autopct='%1.1f%%', colors = colors)
    ax4.legend(wedges,label, loc="center left",bbox_to_anchor=(1, 0, 0.5, 1))
    
    print("Using clustering, the coordinates of potential desired points for the above plot are {0}".format(list(k_means_cluster_centers_1)))
    print("Using clustering, the coordinates of potential desired points for the below plot are {0}".format(list(k_means_cluster_centers_2)))
    plt.tight_layout()

In [101]:
Camera_Hbox = widgets.ToggleButtons(
    options=[('Outside',  "Outside Camera"), ('StreetScape',"StreetScape Camera") , ('Under RainCoat', "UnderRainCoat Camera")],
    description='Camera:',
)

Plot1_Drop_2 = widgets.Dropdown(options=["Event Days","All the Days"], description='Time (first plot): ', style = style)
Plot2_Drop_2 = widgets.Dropdown(options=["Event Days","All the Days"], description='Time (second plot):', style = style)
n_cluster_slider = widgets.IntSlider(min=1, max=5, step=1, value=1, description='Number of Desired Points:', style = style)
radius_slider = widgets.IntSlider(min=1, max=50, step=1, value=10, description='Radius of the circle:', style = style)
x_coordinate_slider = widgets.IntSlider(min=30, max=600, step=1, value=300, description='x coordinate of the center of the circle:',style = style, 
                                        layout=widgets.Layout(width='50%', height='30px'))
y_coordinate_slider = widgets.IntSlider(min=30, max=450, step=1, value=200, description='y coordinate of the center of the circle:', style = style,
                                       layout=widgets.Layout(width='50%', height='30px'))
show_scatter_box = widgets.Checkbox(value=False, description='Show Desired Lines(Heatmap)', disabled=False, indent=False, style = style)
show_circle_box = widgets.Checkbox(value=True, description='Show Circle', disabled=False, indent=False, style = style)

plot_cumulative_heatmap_points_widget = widgets.interactive(plot_cumulative_heatmap_points, {'manual': True},
                                         camera = Camera_Hbox, 
                                         plot1 = Plot1_Drop, 
                                         plot2 = Plot2_Drop, 
                                         n = n_cluster_slider,
                                         radius = radius_slider,
                                         coordinate_x = x_coordinate_slider,
                                         coordinate_y = y_coordinate_slider,
                                         show_scatter = show_scatter_box,
                                         show_circle = show_circle_box)
coordinate_Hbox = widgets.HBox(children=[x_coordinate_slider, y_coordinate_slider])
Plot1_Hbox_1 = widgets.HBox(children=[Plot1_Drop_2])
Plot1_Hbox_2 = widgets.HBox(children=[Plot2_Drop_2])
Show= widgets.HBox(children=[show_scatter_box, show_circle_box])
vbox1 = VBox(children=[Camera_Hbox, Show, Plot1_Hbox_1, Plot1_Hbox_2])
vbox2 = VBox(children=[coordinate_Hbox, radius_slider, n_cluster_slider])
tab = widgets.Tab(children=[vbox1, vbox2])
tab.set_title(0, 'Plot')
tab.set_title(1, 'Desired Point')
button2 = plot_cumulative_heatmap_points_widget.children[-2]
output = plot_cumulative_heatmap_points_widget.children[-1]
tab2 = VBox(children=[tab, button2])
plot_cumulative_heatmap_points_widget_rearrange = VBox(children = [tab2, output]) 
plot_cumulative_heatmap_points_widget_rearrange

VBox(children=(VBox(children=(Tab(children=(VBox(children=(ToggleButtons(description='Camera:', options=(('Out…

In [74]:
# style = {'description_width': 'initial'}
# button = widgets.Button(
#     description='Plot',
# )
# @button.on_click
# def plot_on_click(b):
#     plot_cumulative_heatmap()
# Camera_Hbox = widgets.ToggleButtons(
#     options=[('Outside',  "Outside Camera"), ('StreetScape',"StreetScape Camera") , ('Under RainCoat', "UnderRainCoat Camera")],
#     description='Camera:',
# )
# Plot1_Drop = widgets.Dropdown(options=["Event Days","All the Days"], description='Time (first plot): ', style = style)
# Plot2_Drop = widgets.Dropdown(options=["Event Days","All the Days"], description='Time (second plot):', style = style)
# Plot1_quantile  = widgets.IntSlider(min=0, max=100, step=1, value=50, description='Percentile (first plot): ',style = style)
# Plot2_quantile = widgets.IntSlider(min=0, max=100, step=1, value=50, description='Percentile (second plot): ',style = style)
# Plot1_Hbox = widgets.HBox(children=[Plot1_Drop, Plot1_quantile], style = style)
# Plot2_Hbox = widgets.HBox(children=[Plot2_Drop, Plot2_quantile], style = style)
# tab1 = VBox(children=[Camera_Hbox,
#                       Plot1_Hbox,
#                     Plot2_Hbox])
# VBox(children=[tab1, button])

In [378]:
# button2 = widgets.Button(
#     description='Plot'
# )
# @button2.on_click
# def plot_on_click(b):
#     plot_cumulative_heatmap_points()
    
# Camera_Hbox = widgets.ToggleButtons(
#     options=[('Outside',  "Outside Camera"), ('StreetScape',"StreetScape Camera") , ('Under RainCoat', "UnderRainCoat Camera")],
#     description='Camera:',
# )

# Plot1_Drop_2 = widgets.Dropdown(options=["Event Days","All the Days"], description='Time (first plot): ', style = style)
# Plot2_Drop_2 = widgets.Dropdown(options=["Event Days","All the Days"], description='Time (second plot):', style = style)
# n_cluster_slider = widgets.IntSlider(min=1, max=5, step=1, value=1, description='Number of Desired Points:', style = style)
# radius_slider = widgets.IntSlider(min=1, max=50, step=1, value=10, description='Radius of the circle:', style = style)
# x_coordinate_slider = widgets.IntSlider(min=30, max=600, step=1, value=300, description='x coordinate of the center of the circle:',style = style, 
#                                         layout=widgets.Layout(width='50%', height='30px'))
# y_coordinate_slider = widgets.IntSlider(min=30, max=450, step=1, value=200, description='y coordinate of the center of the circle:', style = style,
#                                        layout=widgets.Layout(width='50%', height='30px'))
# show_scatter_box = widgets.Checkbox(value=False, description='Show Desired Lines(Heatmap)', disabled=False, indent=False, style = style)
# show_circle_box = widgets.Checkbox(value=True, description='Show Circle', disabled=False, indent=False, style = style)

# coordinate_Hbox = widgets.HBox(children=[x_coordinate_slider, y_coordinate_slider])
# Plot1_Hbox_2 = widgets.HBox(children=[Plot1_Drop_2])
# Plot1_Hbox_2 = widgets.HBox(children=[Plot2_Drop_2])
# Show= widgets.HBox(children=[show_scatter, show_circle])
# vbox1 = VBox(children=[Camera_Hbox, Show, Plot1_Hbox_2, Plot1_Hbox_2])
# vbox2 = VBox(children=[coordinate_Hbox, radius_slider, n_cluster_slider])

# tab = widgets.Tab(children=[vbox1, vbox2])
# tab.set_title(0, 'Plot')
# tab.set_title(1, 'Desired Point')

In [35]:
# def get_max_count_of_day(day):
#     temp = list(outside_heatmap_pedestrian[outside_heatmap_pedestrian['startTime'] == day]['heatMapMatrix'])
#     if temp:
#         matrix = list(temp[0])
#         m = min(i[2] for i in matrix)
#     return m
# get_max_count_of_day(pd.to_datetime('2019-6-20'))

## Event vs Non Event Days

### Subsection: Pedestrian Count

In this section, we will be exploring how poeple's behaviour differ when there is an event and when there is no event occuring. We have obtained the Sidewalk Labs' event schedule from the [website](https://www.sidewalktoronto.ca/participate/). I have recorded all the events between Febuary 20th, 2019 and January 11th, 2020.

In [42]:
event_dates = pd.read_csv('EventDates.csv')

I also have obtained the pedestrian count data. We will first explore how the pedestrian count changes in different days. 

In [43]:
outside_count_df = pd.read_csv('OverviewForOutsideCount.csv')
streetscape_count_df = pd.read_csv('OverviewForStreetScapeCount.csv')
under_rain_coat_count_df = pd.read_csv('OverviewForUnderRainCoatCount.csv')

FileNotFoundError: [Errno 2] File b'OverviewForOutsideCount.csv' does not exist: b'OverviewForOutsideCount.csv'

In [None]:
outside_count_df.time = outside_count_df.time.str[:-6]
streetscape_count_df.time = streetscape_count_df.time.str[:-6]
under_rain_coat_count_df.time = under_rain_coat_count_df.time.str[:-6]

In [None]:
from datetime import datetime as dt
outside_count_df.time = outside_count_df.apply(lambda x: dt.strptime(x.time, '%Y-%m-%dT%H:%M:%S'), axis = 1)
streetscape_count_df.time = streetscape_count_df.apply(lambda x: dt.strptime(x.time, '%Y-%m-%dT%H:%M:%S'), axis = 1)
under_rain_coat_count_df.time = under_rain_coat_count_df.apply(lambda x: 
                                                               dt.strptime(x.time, '%Y-%m-%dT%H:%M:%S'), axis = 1)

In [None]:
outside_count_by_day = outside_count_df.resample('d', on='time')['pedestrians'].agg(np.sum)
streetscape_count_by_day = streetscape_count_df.resample('d', on='time')['pedestrians'].agg(np.sum)
under_rain_coat_count_df_by_day = under_rain_coat_count_df.resample('d', on='time')['pedestrians'].agg(np.sum)

In [None]:
fig = go.Figure()
fig = fig.add_trace(go.Scatter(x=outside_count_by_day.index, y=outside_count_by_day.values, 
                         name="Outside",
                         line_color='royalblue'))

fig = fig.add_trace(go.Scatter(x=streetscape_count_by_day.index, y=streetscape_count_by_day.values, 
                         name="Street Scape",
                         line_color='dimgray'))

fig = fig.add_trace(go.Scatter(x=under_rain_coat_count_df_by_day.index, y=under_rain_coat_count_df_by_day.values, 
                         name="Under Rain Coat",
                         line_color='firebrick'))

fig = fig.update_layout(title_text='Pedestrians Count By Day',
                  xaxis_rangeslider_visible=True)
fig.show()

In [None]:
sum_ped_count_by_day = outside_count_by_day + streetscape_count_by_day + under_rain_coat_count_df_by_day

In [None]:
sum_ped_count_by_day

In [None]:
# fig = go.Figure(boxpoints='all')
# fig.add_trace(go.Box(x=sum_ped_count_by_day.values))

From the time series line plot above, we notced that there are several days that have significantly higher pedestrian count than other days. We will examine this further in the hour granular level.

In [None]:
outside_count_by_hour = outside_count_df.resample('H', on='time')['pedestrians'].agg(np.sum)
streetscape_count_by_hour = streetscape_count_df.resample('H', on='time')['pedestrians'].agg(np.sum)
rain_coat_count_by_hour = under_rain_coat_count_df.resample('H', on='time')['pedestrians'].agg(np.sum)

In [None]:
fig = go.Figure()
fig = fig.add_trace(go.Scatter(x=outside_count_by_hour.index, y=outside_count_by_hour.values, 
                         name="Outside",
                         line_color='royalblue'))

fig = fig.add_trace(go.Scatter(x=streetscape_count_by_hour.index, y=streetscape_count_by_hour.values, 
                         name="Street Scape",
                         line_color='dimgray'))

fig = fig.add_trace(go.Scatter(x=rain_coat_count_by_hour.index, y=rain_coat_count_by_hour.values, 
                         name="Under Rain Coat",
                         line_color='firebrick'))

fig = fig.update_layout(title_text='Pedestrians Count By Hour',
                  xaxis_rangeslider_visible=True)
fig.show()

In [None]:
def plot_pedestrian_count_event(event):
    '''
    Display time series pedestrian count of the event specified
    '''
    event_info = event_dates[event_dates.Event == event]
    start = dt.strptime(event_info['Starting Time'].values[0], '%Y-%m-%dT%H:%M:%S')
    end = dt.strptime(event_info['Ending Time'].values[0], '%Y-%m-%dT%H:%M:%S')
    
    outside = outside_count_by_hour[(outside_count_by_hour.index >= start) & (outside_count_by_hour.index <= end)]
    streetscape = streetscape_count_by_hour[(streetscape_count_by_hour.index >= start) & \
                                            (streetscape_count_by_hour.index <= end)]
    rain_coat = rain_coat_count_by_hour[(rain_coat_count_by_hour.index >= start) & \
                                        (rain_coat_count_by_hour.index <= end)]
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=outside.index, y=outside.values, 
                         name="Outside",
                         line_color='royalblue'))

    fig.add_trace(go.Scatter(x=streetscape.index, y=streetscape.values, 
                         name="Street Scape",
                         line_color='dimgray'))

    fig.add_trace(go.Scatter(x=rain_coat.index, y=rain_coat.values, 
                         name="Under Rain Coat",
                         line_color='firebrick'))
    
    fig.add_trace(go.Scatter(x=rain_coat.index, y=outside.values+streetscape.values+rain_coat.values, 
                         name="Sum",
                         line_color='gold'))

    fig.update_layout(title_text=f'Pedestrians Count By Event: {event}',
                  xaxis_rangeslider_visible=True)
    fig.show()

In [None]:
_ = interact(plot_pedestrian_count_event, 
             event=widgets.Dropdown(options=event_dates.Event.tolist(), value='Sidewalk Summer Open House'),
            )

### Subsection: Dwell Time

In [None]:
# daily dwell time - device
feed_dwell_1h_df = pd.concat([get_dwell('feedDwellTimeDistribution', device_ids[i], '1h') 
                              for i in range(3)])

In [None]:
feed_dwell_1h_df = feed_dwell_1h_df.dropna()

In [None]:
feed_dwell_1h_df.time = feed_dwell_1h_df.time.str[:-6]

In [None]:
feed_dwell_1h_df['device_name'] = [device_dict[d] for d in feed_dwell_1h_df.device]

In [None]:
feed_dwell_1h_df.time = feed_dwell_1h_df.apply(lambda x: dt.strptime(x.time, '%Y-%m-%dT%H:%M:%S'), axis = 1)

In [None]:
def plot_dwell_time_event(event, metric):
    '''
    Display time series of the matrix of the event specified
    '''
    event_info = event_dates[event_dates.Event == event]
    start = dt.strptime(event_info['Starting Time'].values[0], '%Y-%m-%dT%H:%M:%S')
    end = dt.strptime(event_info['Ending Time'].values[0], '%Y-%m-%dT%H:%M:%S')

    dwell_time_df = feed_dwell_1h_df[(feed_dwell_1h_df.time >= start) & (feed_dwell_1h_df.time <= end)]
    
    outside = dwell_time_df[dwell_time_df.device_name == 'Outside']
    streetscape = dwell_time_df[dwell_time_df.device_name == 'Streetscape']
    rain_coat = dwell_time_df[dwell_time_df.device_name == 'Under Raincoat']
    
    fig = go.Figure()
    
    fig.add_trace(go.Scatter(x=outside.time, y=outside[metric], 
                         name="Outside",
                         line_color='royalblue'))

    fig.add_trace(go.Scatter(x=streetscape.time, y=streetscape[metric], 
                         name="Street Scape",
                         line_color='dimgray'))

    fig.add_trace(go.Scatter(x=rain_coat.time, y=rain_coat[metric], 
                         name="Under Rain Coat",
                         line_color='firebrick'))
    
    fig.update_layout(title_text=f"Distribution of Pedestrian dwell time ({metric}) By Event: {event}",
                  xaxis_rangeslider_visible=True)
    
    fig.update_layout(
        xaxis_title="time",
        yaxis_title=metric,
    )
    
    fig.show()
    
    

In [None]:
_ = interact(plot_dwell_time_event, 
             event=widgets.Dropdown(options=event_dates.Event.tolist(), value='Sidewalk Summer Open House'),
             metric=widgets.Dropdown(options=['mean', 'pct100', 'pct75', 'pct50', 'pct25', 'total'], 
                                     value='mean')
            )

## Maintenance Strategy

In [44]:
# need hourly data so writing the query again; can combine with the previous one later
def get_dwell_by_hour(func, ID):
    '''
    func is either feedDwellTimeDistribution or zoneDwellTimeDistribution
    '''
    if func == 'feedDwellTimeDistribution':
        arg = 'serialnos: "{0}"'.format(ID)
    else:
        arg = 'zoneIds: {0}'.format(ID)
        
    query = """
    query {{
        {0}(
        {1},
        startTime: "2019-02-20T00:00:00",
        endTime: "2020-01-12T00:00:00",
        timezone: "America/New_York",
        objClasses: ["pedestrian"],
        interval: "1h"
        ){{
        edges {{
          node {{
            time
            objClass
            pct100
            pct75
            pct50
            pct25
            mean
            count
          }}
        }}
      }}
    }}
    """.format(func, arg)

    dwell = requests.post(url, json={'query': query}, 
                           headers = {'Authorization':token})
    
    df = pd.DataFrame([x['node'] for x in dwell.json()['data'][func]['edges']])
    if func == 'feedDwellTimeDistribution':
        df['device'] = ID
    else:
        df['zone'] = ID
    
    return df

In [45]:
feed_dwell_df = pd.concat([get_dwell_by_hour('feedDwellTimeDistribution', device_ids[i]) 
                           for i in range(3)])

In [46]:
# replace NaN with 0
feed_dwell_df = feed_dwell_df.fillna(0)

# convert time to timestamp object
feed_dwell_df['time'] = feed_dwell_df['time'].str[:-6].apply(lambda x : pd.Timestamp(x))

# add name column in addition to ID
feed_dwell_df['device_name'] = [device_dict[d] for d in feed_dwell_df.device]

In [47]:
import datetime as dt
from pandas.api.types import CategoricalDtype
days = [(dt.datetime(2019, 3, 4) + dt.timedelta(days=x)).strftime('%a') for x in range(0, 7)]
day_type = CategoricalDtype(categories=days, ordered=True)

feed_dwell_df['day of week'] = feed_dwell_df['time'].apply(lambda x: x.strftime('%a')).astype(day_type)
feed_dwell_df['date'] = feed_dwell_df['time'].apply(lambda x: x.strftime('%Y-%m-%d'))
feed_dwell_df['hour'] = feed_dwell_df['time'].apply(lambda x: x.strftime('%H'))
feed_dwell_df['hour'] = pd.to_numeric(feed_dwell_df['hour'])

In [55]:
daily_count = feed_dwell_df.groupby(['date', 'device_name'])['count'].max()
daily_count = pd.DataFrame(daily_count).reset_index()
daily_count['date'] = pd.to_datetime(daily_count['date'])

In [49]:
def plot_count(selected, start_date, end_date, threshold):
    '''
    device_or_zone is either 'device' or 'zone';
    selected is a list of device rawIds or zone rawIds;
    metric is a value in ['mean', 'pct100', 'pct75', 'pct50', 'pct25']
    '''
    #df = nonzero_df
    df = daily_count
        
    plot_df = df.loc[(df.date >= pd.Timestamp(start_date)) & 
                     (df.date <= pd.Timestamp(end_date))].copy()
    
    fig = go.Figure()
    
    for device in selected:
        sub_df = plot_df[plot_df['device_name'] == device]
        sub_df_under = sub_df[sub_df['count'] <= threshold]
        fig.add_trace(go.Scatter(x=sub_df_under.date, y=sub_df_under['count'], mode='lines', name=device))
        # TODO: fix string representation
        print("There are", len(sub_df_under), "days for ", device, 
              "with a daily pedestrian count under", threshold)
        print("There are", len(sub_df)-len(sub_df_under), "days for ", device, 
              "with a daily pedestrian count above", threshold)
    
    fig.update_layout(
        title="Pedestrian count under threshold grouped by device",
        xaxis_title="time",
        yaxis_title="count")
    
    fig.show()

In [50]:
# SWLSANDBOX1 = Streetscape
# SWLSANDBOX2 = Under Raincoat
# SWLSANDBOX3 = Outside
_ = interact(plot_count, 
             selected=widgets.SelectMultiple(options=device_names, value=device_names, disabled=False),
             start_date=widgets.DatePicker(value=pd.to_datetime('2019-02-20')),
             end_date=widgets.DatePicker(value=pd.to_datetime('2020-01-12')),
             threshold=widgets.IntSlider(value=500, min=300, max=1000, step=100, readout_format='d')
            )

interactive(children=(SelectMultiple(description='selected', index=(0, 1, 2), options=('Streetscape', 'Under R…

In [51]:
# TODO: combine the two box plots to a single interactive
def plot_boxplot_count_by_day(threshold):
    fig = go.Figure()
    
    #df, byvals, clrs = get_df(groupby)
    df = feed_dwell_df[feed_dwell_df['count'] <= threshold]
    days = ['Mon', 'Tue','Wed', 'Thu', 'Fri', 'Sat', 'Sun']
    
    for i in reversed(range(len(days))):
        # Use x instead of y argument for horizontal plot
        fig.add_trace(go.Box(x=df.loc[df['day of week']==days[i], 'count'], name=days[i]))

    # layout - axes labels
    #fig.update_layout(
    #    xaxis_title=metric,
    #    xaxis_rangeslider_visible=True
    #)
    # title
    
    fig.update_layout(
        title="Pedestrian count under threshold by day of week",
        xaxis_title="time",
        yaxis_title="count")
    
    fig.show()

In [52]:
_ = interact(plot_boxplot_count_by_day, 
             threshold=widgets.IntSlider(value=50, min=50, max=1000, step=50, readout_format='d'))

interactive(children=(IntSlider(value=50, description='threshold', max=1000, min=50, step=50), Output()), _dom…

In [53]:
def plot_boxplot_count_by_hour(threshold):
    fig = go.Figure()
    
    #df, byvals, clrs = get_df(groupby)
    df = feed_dwell_df[feed_dwell_df['count'] <= threshold]
    for j in range(7, 21):
        fig.add_trace(go.Box(x=df.loc[df['hour']==j, 'count'], name=j))

    # layout - axes labels
    #fig.update_layout(
    #    xaxis_title=metric,
    #    xaxis_rangeslider_visible=True
    #)
    # title
    
    fig.update_layout(
        title="Pedestrian count under threshold grouped by hour",
        xaxis_title="time",
        yaxis_title="count")
    
    fig.show()

In [54]:
_ = interact(plot_boxplot_count_by_hour, 
             threshold=widgets.IntSlider(value=100, min=50, max=1000, step=50, readout_format='d'))

interactive(children=(IntSlider(value=100, description='threshold', max=1000, min=50, step=50), Output()), _do…