# Assignment 3 

# Table of Content
## Overview
1. Where is 307?

## Data Exploration
1. People's Behavior in terms of Dwell Time 
2. Which areas of 307 do people pass through
3. Where do people tend to linger?
4. How does dwell time change over time?

## In-depth Analysis
1. How do different zones affect people's behavior?
2. How do events affect people's behavior?
3. What is the best maintenance strategy?
4. What are other factor affect people's bahavior?

# About 307

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg 
import matplotlib.gridspec as gridspec


In [None]:
import plotly as py
import plotly.express as px
import plotly.graph_objs as go
from plotly.offline import iplot, init_notebook_mode
#init_notebook_mode(connected=True)

import cufflinks as cf
cf.go_offline(connected=True)
cf.set_config_file(colorscale='plotly', world_readable=True)

# Extra options
# pd.options.display.max_rows = 30
# pd.options.display.max_columns = 25

# Show all code cells outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

import os
from IPython.display import Image, display, HTML

import time

In [None]:
import ipywidgets as widgets
from ipywidgets import HBox, VBox
from ipywidgets import interact, interact_manual

In [None]:
# store login data in login.py
%run login.py

In [None]:
# login query as multiline formatted string
# this assumes that login and pwd are defined 
# above

loginquery = f"""
mutation {{
  logIn(
      email:\"{login}\",
      password:\"{pwd}\") {{
    jwt {{
      token
      exp
    }}
  }}
}}
"""

In [None]:
import requests
url = 'https://api.numina.co/graphql'

mylogin = requests.post(url, json={'query': loginquery})
# mylogin

In [None]:
token = mylogin.json()['data']['logIn']['jwt']['token']

In [None]:
expdate = mylogin.json()
# expdate

# Explore the Data!

Now that you've been provided with the context, before we present our analysis, it's time for YOU to explore the data! As mentioned, the following are the full areas covered by the three cameras:

Streetscape | Under Raincoat | Outside
------------- | -------------  | -------------
![alt](streetscape_sandbox.png) | ![alt](underraincoat_sandbox.png) | ![alt](outside_sandbox.png)

As you see in the above images, each area essentially consists of two parts: objects such as tables and chairs, and empty spaces presumably for walking. Based on this reasoning, we have defined the following smaller behaviour zones so as to perform more in-depth research:

### Streetscape ###

Chair Zone | Corridor Zone | Free Zone
------------- | -------------  | -------------
![alt](BehaviorZoneImage/Streetscape-ChairZone.png) | ![alt](BehaviorZoneImage/Streetscape-PathZone.png) | ![alt](BehaviorZoneImage/Streetscape-ActivityZone.png)

### Under Raincoat ###

Chair Zone | Traffic Zone | Free Zone
------------- | -------------  | -------------
![alt](BehaviorZoneImage/UnderRaincoat-ChairZone.png) | ![alt](BehaviorZoneImage/UnderRaincoat-TrafficZone.png) | ![alt](BehaviorZoneImage/UnderRaincoat-ActivityZone.png)

### Outside ###

Chair Zone | Path Zone | -
------------- | -------------  | -------------
![alt](BehaviorZoneImage/Outside-ChairZone.png) | ![alt](BehaviorZoneImage/Outside-PathZone.png) | ![alt](blank.png)

Note that we have to be aware of the fact that the chairs can be moved and that the above images may not necessarily reflect the layout of the room during the whole period of data collection. Specifically, the three sets of chairs in the Under Raincoat area can be easily moved; thus in the initial exploration, we will not be investigating the Chair Zone of Under Raincoat. 

Nonetheless, notice that they are included in the Free Zone. We believe that it is safe to assume that the chairs would not be moved outside the Free Zone to the Traffic Zone.

Similarly, in the Streetscape area, under the assumption that it is intended to place the chairs together, it is unlikely that the group of chairs would be moved around freely and frequently due to the other obstacles in the room. As for the Outside area, it is also unlikely that the chairs would be placed in the middle of the road to block the path. Thus, we will be analyzing these two Chair Zones (while keeping the limitation in mind).

In [None]:
device_dict = {'SWLSANDBOX1':'Streetscape', 'SWLSANDBOX2':'Under Raincoat', 'SWLSANDBOX3':'Outside'}
device_ids = list(device_dict.keys())
device_names = list(device_dict.values())

# streetscape, under raincoat, outside
device_clrs = ['royalblue', 'firebrick', 'forestgreen']

In [None]:
def get_zones(device_id):
    
    query_zones = """
    query {{
      behaviorZones (
        serialnos: "{0}"
        ) {{
        count
        edges {{
          node {{
            rawId
            text
          }}
        }}
      }}
    }}
    """.format(device_id)
    
    zones = requests.post(url, json={'query': query_zones}, headers = {'Authorization':token})
    
    df = pd.DataFrame([x['node'] for x in zones.json()['data']['behaviorZones']['edges']])
    df['device'] = device_id
    
    return df

In [None]:
zones_df = pd.concat([get_zones(device_ids[i]) for i in range(3)])
zones_df = zones_df[(zones_df.text.notnull()) & 
                    (zones_df.text.str.startswith('x-')) & 
                    (zones_df.text.str.endswith('zone'))]

In [None]:
zones_df['text'] = zones_df['text'].str.replace('x-', '')

In [None]:
def get_dwell(func, ID, interval):
    '''
    func is either feedDwellTimeDistribution or zoneDwellTimeDistribution
    '''
    if func == 'feedDwellTimeDistribution':
        arg = 'serialnos: "{0}"'.format(ID)
    else:
        arg = 'zoneIds: {0}'.format(ID)
        
    query = """
    query {{
        {0}(
        {1},
        startTime: "2019-02-20T00:00:00",
        endTime: "2020-01-12T00:00:00",
        timezone: "America/New_York",
        objClasses: ["pedestrian"],
        interval: "{2}"
        ){{
        edges {{
          node {{
            time
            objClass
            pct100
            pct75
            pct50
            pct25
            mean
            count
          }}
        }}
      }}
    }}
    """.format(func, arg, interval)

    dwell = requests.post(url, json={'query': query}, 
                           headers = {'Authorization':token})
    
    df = pd.DataFrame([x['node'] for x in dwell.json()['data'][func]['edges']])
    if func == 'feedDwellTimeDistribution':
        df['device'] = ID
    else:
        df['zone'] = ID
    
    return df

In [None]:
# daily dwell time - device
feed_dwell_1d_df = pd.concat([get_dwell('feedDwellTimeDistribution', device_ids[i], '1d') 
                              for i in range(3)])

In [None]:
# daily dwell time - zone
zone_dwell_1d_df = pd.concat([get_dwell('zoneDwellTimeDistribution', z, '1d')
                             for z in zones_df['rawId'].values])

In [None]:
'''
def extract_time(df):
    df['year'] = df['time'].str[:4].astype(int)
    df['month'] = df['time'].str[5:7].astype(int)
    df['day'] = df['time'].str[8:10].astype(int)
    df['date'] = pd.to_datetime(df['time'].str[:10])
    df['hour'] = df['time'].str[11:13].astype(int)
    return df.drop('time', axis=1)
''';

In [None]:
'''
feed_dwell_df = extract_time(feed_dwell_df)
zone_dwell_df = extract_time(zone_dwell_df)
''';

In [None]:
# replace NaN with 0
feed_dwell_1d_df = feed_dwell_1d_df.fillna(0)
zone_dwell_1d_df = zone_dwell_1d_df.fillna(0)

In [None]:
# convert time to timestamp object
feed_dwell_1d_df['time'] = feed_dwell_1d_df['time'].str[:-6].apply(lambda x : pd.Timestamp(x))
zone_dwell_1d_df['time'] = zone_dwell_1d_df['time'].str[:-6].apply(lambda x : pd.Timestamp(x))
zone_dwell_1d_df.zone = zone_dwell_1d_df.zone.astype(str)

In [None]:
# add name column in addition to ID
feed_dwell_1d_df['device_name'] = [device_dict[d] for d in feed_dwell_1d_df.device]

# zone ID from int to str
zones_df.rawId = zones_df.rawId.astype(str)
zone_dict = dict(zip(zones_df.rawId, zones_df.text))
# zone name
zone_dwell_1d_df['zone_name'] = [zone_dict[z] for z in zone_dwell_1d_df.zone]

In [None]:
def get_df(groupby):
    if groupby == 'device_name':
        return feed_dwell_1d_df.copy(), device_names, device_clrs
    else:
        return zone_dwell_1d_df.copy(), list(zones_df.text), list(zones_df.colour)

In [None]:
# assign a colour to each behaviour zone
zones_df['colour'] = ['blue', 'lightblue', 'cadetblue',
                      'orangered', 'lightcoral', 
                      'palegreen', 'lightgreen']

In [None]:
# add a total column = mean * count
zone_dwell_1d_df['total'] = zone_dwell_1d_df['mean'] * zone_dwell_1d_df['count'] 
feed_dwell_1d_df['total'] = feed_dwell_1d_df['mean'] * feed_dwell_1d_df['count'] 

Recall that the timeframe of our data is approximately one year. Therefore, in the initial exploration, let's focus on the daily dwell time and daily count of pedestrains in the 307 region. 

As a starting point, explore the data using the following interactive line plot and think about these questions:
1. Is there any trend in pedesdrian count / dwell time in any of the areas / zones?
2. Where would you expect to see a bigger crowd? Is any of the areas / zones more popular than others?

Tip: You can click the legend on the right to include/exclude a line on the plot.

In [None]:
metric_list = ['count', 'mean', 'pct100', 'pct75', 'pct50', 'pct25', 'total']

In [None]:
def plot_timeline(groupby, metric):
    '''
    device_or_zone is either 'device_name' or 'zone_name';
    metric is a value in ['count', 'mean', 'pct100', 'pct75', 'pct50', 'pct25', 'total']
    '''
    df, byvals, clrs = get_df(groupby)
    
    fig = go.Figure()
    
    # line plot for each name
    for i in range(len(byvals)):
        sub_df = df[df[groupby] == byvals[i]]
        fig.add_trace(go.Scatter(x=sub_df.time, y=sub_df[metric], line_color=clrs[i], name=byvals[i]))
    
    # layout - axes labels
    fig.update_layout(
        xaxis_title="time",
        yaxis_title=metric,
        xaxis_rangeslider_visible=True
    )
    # title
    if metric != 'count':
        fig.update_layout(title=f"pedestrian dwell time ({metric}) grouped by '{groupby}'")
    else:
        fig.update_layout(title=f"pedestrian count grouped by '{groupby}'")
    
    fig.show()
    

In [None]:
_ = interact(plot_timeline, 
             groupby=widgets.RadioButtons(options=['device_name', 'zone_name'], value='device_name'),
             metric=widgets.Dropdown(options=metric_list, value='mean')
            )

Not too surprisingly, we observe a few peak days. The following interactive dataframe summarizes the exact locations and dates:

In [None]:
def sort_dwell_1d(groupby, sortby, ascending, top):
    df, _, _ = get_df(groupby)
    
    cols = [groupby, 'time', sortby]
    if sortby == 'count':
        cols.append('mean')
    elif sortby == 'mean':
        cols.append('count')
    else:
        cols.append('count')
        cols.append('mean')
        
    display(df.sort_values(sortby, ascending=ascending).reset_index(drop=True)
              .loc[:int(top)-1, cols])

_ = interact(sort_dwell_1d, 
             groupby=widgets.RadioButtons(options=['device_name', 'zone_name'], value='device_name'),
             sortby=widgets.Dropdown(options=metric_list, value='mean'),
             top=widgets.IntSlider(value=5, min=1, max=30, step=1, readout_format='d'),
             ascending=widgets.Checkbox(value=False, description='ascending'))

In [None]:
def plot_boxplot(groupby, metric):
    fig = go.Figure()
    
    df, byvals, clrs = get_df(groupby)
    
    for i in range(len(byvals)):
        # Use x instead of y argument for horizontal plot
        fig.add_trace(go.Box(x=df.loc[df[groupby]==byvals[i], metric], name=byvals[i],
                             marker_color=clrs[i], boxpoints='outliers'))

    # layout - axes labels
    fig.update_layout(
        xaxis_title=metric,
        xaxis_rangeslider_visible=True
    )
    # title
    if metric != 'count':
        fig.update_layout(title=f"distribution of pedestrian dwell time ({metric}) grouped by '{groupby}'")
    else:
        fig.update_layout(title=f"distribution of pedestrian count grouped by '{groupby}'")
    
    fig.show()
    

In [None]:
_ = interact(plot_boxplot, 
             groupby=widgets.RadioButtons(options=['device_name', 'zone_name'], value='device_name'),
             metric=widgets.Dropdown(options=metric_list, value='mean')
            )

In [None]:
# feed_dwell_1d_df.groupby('device')['count'].describe()

In [None]:
# groupby zone_name / device_name and take the sum for the other columns
# should only investigate the count and total columns

grouped_df = zone_dwell_1d_df.groupby('zone_name').sum().reset_index(drop=False)\
                             .rename(columns={'zone_name':'name'})
grouped_df = grouped_df.append(feed_dwell_1d_df.groupby('device_name').sum().reset_index(drop=False)
                               .rename(columns={'device_name':'name'}))

In [None]:
from plotly.subplots import make_subplots

def plot_barplot(metric):
    '''
    metric is either 'count' or 'total' (dwell time)
    '''
    fig = make_subplots(rows=1, cols=3)
    
    df = grouped_df.copy()
    m = metric.split(' ')[0]
    
    for i in range(3):
        dname = device_names[i]
        total = df.loc[df.name==dname, m]
        sub_df = df[[n[1:5]==dname[1:5] for n in df.name]]
        sub_df.name = [s[-1] for s in sub_df.name.str.split('-')]
        sub_df['perc'] = sub_df[m].apply(lambda x : x / total * 100)
        
        fig.add_bar(x=sub_df.name, y=sub_df[m], name=dname, row=1, col=i+1)
        
    #fig.update_yaxes(ticksuffix="%", col=1)
    layout = go.Layout(yaxis=dict(range=[0, 100]))
    
    fig.update_layout(title=f"proportion of individual behaviour zones w.r.t. the area in terms of {metric} of pedestrians")

    fig.show()

In [None]:
_ = interact(plot_barplot, metric=widgets.RadioButtons(options=['count', 'total dwell time'], value='count'))

In [None]:
'''
def boxplot_dwell(groupby, column, bound_factor):
    df, _, _ = get_df(groupby)
    
    q3 = df[column].quantile(0.75) 
    q1 = df[column].quantile(0.25)
    iqr = q3 - q1
    sub_df = df[(df[column] <= q3 + iqr*bound_factor) & 
                  ((df[column] >= q1 - iqr*bound_factor))]
    
    if column == 'count':
        title = f"distribution of count grouped by '{groupby}'" +\
        f" with values {bound_factor} * IQR beyond Q1/Q3 removed"
    else:
        title = f"distribution of mean dwell time grouped by '{groupby}'" +\
        f" with values {bound_factor} * IQR beyond Q1/Q3 removed"
    
    fig = px.box(sub_df, x=groupby, y=column, points="all", title=title)

    fig.show()
''';

In [None]:
'''
_ = interact(boxplot_dwell, 
             groupby=widgets.RadioButtons(options=['device_name', 'zone_name']), value='device_name',
             column=widgets.RadioButtons(options=['count', 'mean'], value='count'),
             bound_factor=widgets.FloatSlider(
                 value=1.5,
                 min=-3,
                 max=10,
                 step=0.1,
                 disabled=False,
                 continuous_update=False,
                 orientation='horizontal',
                 readout=True,
                 readout_format='.1f')
            )
''';

### Obtain heatmap for pedestrians
I'm going to plot heatmaps for important days in section.
The audience will be able to 
1. Select heatmaps of days (Do comparison)
    1.  top 1/2/4/9 days in terms of dwell counts or average dwell time
    2. event days
    3. customize 1/2/4/9 days
2. Select Quantiles for heatmaps
    1. 0 - 90 Desired Lines (10 each step)
    2. 90 - 100 Desired Spots (1 each step)
3. Show overlap of heatmaps between Traffics and Pedatrains for outdoor cameras.
4. Choose color of heatmap

In this part, we will focus on the investigation of desired lines and desired spots for pedestrians under three camera devices. In this part, an audience is able to 
1. Visualize the heatmap of pedestrians for every day.
2. Compare and Contrast heatmap on event days.
3. 

In [None]:
from datetime import timedelta, datetime
from dateutil.relativedelta import relativedelta
import calendar
START_DATE = datetime(2019, 2, 20, 0, 0, 0)
END_DATE = datetime(2020, 1, 11, 0, 0, 0)
time_delta = relativedelta(days = +1)

In [None]:
import pandas as pd

In [None]:
## fundatmental functions to get the heatmap data 
def heatmap_query_gen(startTime: str, endTime: str, camera:int, obj:str):
    '''
    This function is for generating heatmap query given time, device, and object
    '''
    heatmap_query = """
query {{
  feedHeatmaps(
    serialno: "{0}",
    startTime:"{1}",
    endTime:"{2}",
    objClasses:["{3}"],
    timezone:"America/New_York") {{
    edges {{
      node {{
        time
        objClass
        heatmap
      }}
    }}
  }}
}}
""".format(camera, startTime, endTime,obj)
    return heatmap_query

def get_heatmap_data(camera: int, obj: str, start_times:list, end_times:list):
    '''
    This function is for get the heatmap matrix raw dataframe using  heatmap_query_gen as a helper
    '''
    heatmap_df = pd.DataFrame(columns = ['startTime', 'endTime', 'heatMap', 'obj'])
    while i < len(start_times):
        heatmap_data = requests.post(url, json={'query': heatmap_query_gen(start_times[i].strftime('%Y-%m-%dT%H:%M:%S'), 
                                                                                end_times[i].strftime('%Y-%m-%dT%H:%M:%S'), camera, obj)}, 
                                                                                headers = {'Authorization':token})
        heatmap_json = heatmap_data.json()
        if heatmap_json['data']:
            if 'feedHeatmaps' in heatmap_json['data']:
                heatmap = heatmap_json['data']['feedHeatmaps']['edges'][0]['node']['heatmap']
                temp_df = pd.DataFrame({"startTime":start_times[i], "endTime":end_times[i], 'heatMap':heatmap, 'obj': obj})
                heatmap_df = heatmap_df.append(temp_df, ignore_index = True)
    return heatmap_df

def generate_consecutive_times(start_time: datetime, end_time: datetime, interval: relativedelta):
    '''
    This function is for generating consecutive datetime objects between start_time and end_time
    '''
    ## the first element in the list are the start times
    time = [[], []]
    current_time = start_time
    while current_time < end_time:
        time[0].append(current_time)
        time[1].append(current_time + interval)
        current_time = current_time + interval
    return time

def daily_heatmap_data(df):
    '''
    This function is merge the raw heatmap data by day(time)
    '''
    return df.groupby(['startTime', 'endTime'])['heatMap'].apply(list).reset_index(name='heatMapMatrix')

In [None]:
## load the data, it takes a lot of time, so we do it camera by camera
all_time = generate_consecutive_times(START_DATE, END_DATE, time_delta)
outside_heatmap_pedestrian = daily_heatmap_data(get_heatmap_data('SWLSANDBOX3', 'pedestrian', all_time[0], all_time[1]))

In [None]:
streetscape_heatmap_pedestrian = daily_heatmap_data(get_heatmap_data('SWLSANDBOX1', 'pedestrian', all_time[0], all_time[1]))

In [None]:
underraincoat_heatmap_pedestrian_1 = daily_heatmap_data(get_heatmap_data('SWLSANDBOX2', 'pedestrian', all_time[0][0:100], all_time[1][0:100]))

In [None]:
underraincoat_heatmap_pedestrian_2 = daily_heatmap_data(get_heatmap_data('SWLSANDBOX2', 'pedestrian', all_time[0][100:200], all_time[1][100:200]))

In [None]:
underraincoat_heatmap_pedestrian_3 = daily_heatmap_data(get_heatmap_data('SWLSANDBOX2', 'pedestrian', all_time[0][200:], all_time[1][200:]))

In [None]:
underraincoat_heatmap_pedestrian = underraincoat_heatmap_pedestrian_1.append(
    underraincoat_heatmap_pedestrian_2).append(
    underraincoat_heatmap_pedestrian_3).reset_index(drop = True)

In [None]:
## join two dataframes (dwell data and heatmap)
streetscape_pedestrian_data_all = pd.merge(feed_dwell_1d_df[feed_dwell_1d_df['device'] == 'SWLSANDBOX1'], streetscape_heatmap_pedestrian, left_on = "time", right_on = "startTime")
underraincoat_pedestrian_data_all = pd.merge(feed_dwell_1d_df[feed_dwell_1d_df['device'] == 'SWLSANDBOX2'], underraincoat_heatmap_pedestrian, left_on = "time", right_on = "startTime")
outside_pedestrian_data_all = pd.merge(feed_dwell_1d_df[feed_dwell_1d_df['device'] == 'SWLSANDBOX3'], outside_heatmap_pedestrian, left_on = "time", right_on = "startTime")

In [None]:
def week_days(lis, weekday):
     '''
    This function is for generating a given weekday within a time period
    '''
    days = []
    for day in lis:
        if day.weekday() == weekday:
            days.append(day)
    return days

In [None]:
event_days = [datetime(2019, 3, 2, 0, 0, 0), datetime(2019, 6, 29, 0, 0, 0), datetime(2019, 8, 15, 0, 0, 0), datetime(2019, 9, 26, 0, 0, 0), 
                    datetime(2019, 11, 20, 0, 0, 0), datetime(2019, 11, 21, 0, 0, 0), datetime(2019, 11, 22, 0, 0, 0), datetime(2019, 11, 23, 0, 0, 0)] ## sorted by number of people
saturdays = week_days(all_time[0], 5)
sundays = week_days(all_time[0], 6)
mondays= week_days(all_time[0], 0)
tuesdays = week_days(all_time[0], 1)
wednesdays = week_days(all_time[0], 2)
thursdays = week_days(all_time[0], 3)
fridays = week_days(all_time[0], 4)
weekday = [mondays, tuesdays, wednesdays, thursdays, fridays]
weekdays = []
for day in weekday:
    weekdays.extend(day)

In [None]:
# Plot streetscape heatmap
# First, check how many plots the audience wants to show, input p
# Then, let the user to choose the day: IntRangeSlider
def get_daily_matrix(day: datetime, percentile: int, camera: str):
    data = []
    if camera == 'Outside Camera':
        data = list(outside_heatmap_pedestrian[outside_heatmap_pedestrian['startTime'] == day]['heatMapMatrix'])
    elif camera == 'StreetScape Camera':
         data = list(streetscape_heatmap_pedestrian[streetscape_heatmap_pedestrian['startTime'] == day]['heatMapMatrix'])
    elif camera == 'UnderRainCoat Camera':
        data = list(underraincoat_heatmap_pedestrian[underraincoat_heatmap_pedestrian['startTime'] == day]['heatMapMatrix'])
    if data: 
        p = np.percentile([i[2] for i in data[0]], percentile)
        filtered = list(filter(lambda x : x[2] >= p, data[0]))
        x = [i[0] for i in filtered] 
        y = [i[1] for i in filtered]
        density = [i[2] for i in filtered]
        return [x, y, density]
    else:
        return []
def handle_not_exist_day(day):
    print('There is no pedestrian recorded on {0}.{1}.{2}, please select another day.'.format(day.year, day.month, day.day))
def plot_streetscape_heatmap(percentile, day1:datetime, day2:datetime, day3: datetime, day4: datetime,
                             day5:datetime, day6:datetime, day7: datetime, day8: datetime, day9: datetime, mode: str, camera: str):
    fig = plt.figure(figsize=(16,11))
    days = [day1, day2, day3, day4, day5, day6, day7, day8, day9]
    if camera == "Outside Camera":
        image = mpimg.imread('outside_sandbox.png')
    elif camera == "StreetScape Camera":
        image = mpimg.imread('streetscape_sandbox.png')
    elif camera == "UnderRainCoat Camera":
        image = mpimg.imread('underraincoat_sandbox.png')
    if mode == "Days with the most dwell counts":
        days = days
    elif  mode == "Days with the highest mean dwell time":
        days = days
    elif mode == "Event Days with most pedestrian":
        days = days
    elif mode == "Customize":
         days = days
    axes = []
    for i in range(0,9):
        day_data = get_daily_matrix(days[i], percentile, camera)
        ax = fig.add_subplot(3, 3, i+1)
        axes.append(ax)
        if not (day_data):
            handle_not_exist_day(days[i])
        else:
            ax = fig.add_subplot(3, 3, i+1)
            ax.scatter(day_data[0], y = day_data[1], c=day_data[2], s=1, cmap= plt.cm.nipy_spectral)
        ax.imshow( image, aspect='auto')
        ax.set_title("Heatmap on {0}.{1}.{2}".format(days[i].year, days[i].month, days[i].day))
        ax.axis('off')
widgets.interact_manual(plot_streetscape_heatmap, day1=widgets.DatePicker(value=pd.to_datetime('2019-02-20')), 
                                                              day2=widgets.DatePicker(value=pd.to_datetime('2019-02-21')),
                                                              day3=widgets.DatePicker(value=pd.to_datetime('2019-02-22')),
                                                              day4=widgets.DatePicker(value=pd.to_datetime('2019-02-23')), 
                                                              day5=widgets.DatePicker(value=pd.to_datetime('2019-02-20')), 
                                                              day6=widgets.DatePicker(value=pd.to_datetime('2019-02-21')),
                                                              day7=widgets.DatePicker(value=pd.to_datetime('2019-02-22')),
                                                              day8=widgets.DatePicker(value=pd.to_datetime('2019-02-23')), 
                                                              day9=widgets.DatePicker(value=pd.to_datetime('2019-02-23')), 
                                                              percentile = widgets.IntSlider(min=0, max=100, step=5, value=0),
                                                              mode = widgets.Dropdown(options=[("Days with the most dwell counts", "Days with the most dwell counts"),
                                                                                               ("Days with the highest mean dwell time", "Days with the highest mean dwell time"),
                                                                                               ("Event Days with most pedestrian", "Event Days with most pedestrian"),
                                                                                               ("Customize", "Customize")],description='Plots:'),
                                                               camera = widgets.Dropdown(options=["Outside Camera", "StreetScape Camera", "UnderRainCoat Camera"]))


For the benefit of analyze heatmap in a single day, I want to introduce two more interative pages.
They are animations, one is for quantile, one is for hour.
So, I want to download hour heatmap of event days.

In [None]:
# Initialize time data for loading the data
# I want to load hourly data on 2019.6.29 as Initialization
start_time = datetime(2019, 6, 29, 0, 0, 0)
end_time = datetime(2019, 6, 30, 0, 0, 0)
interval =  relativedelta(hours = +1)
hour_interval = generate_consecutive_times(start_time, end_time, interval)

In [None]:
def add_column(data, date, camera, t):
    '''
    This is a simple wrapper function for creating columns
    '''
    data["date"] = date
    data["camera"] = camera
    data["type"] = t
    return data

In [None]:
## Initialize the hourly heatmap data for streetscape
streetscape_heatmap_pedestrian_event_days = daily_heatmap_data(get_heatmap_data('SWLSANDBOX1', 'pedestrian', hour_interval[0], hour_interval[1]))
streetscape_heatmap_pedestrian_event_days =  add_column(outside_heatmap_pedestrian_event_days,  datetime(2019, 6, 29, 0, 0, 0), "outside", "pedestrian")

In [None]:
## Initialize the hourly heatmap data for outside
outside_heatmap_pedestrian_event_days  = daily_heatmap_data(get_heatmap_data('SWLSANDBOX3', 'pedestrian', hour_interval[0], hour_interval[1]))
outside_heatmap_pedestrian_event_days = add_column(outside_heatmap_pedestrian_event_days,  datetime(2019, 6, 29, 0, 0, 0), "outside", "pedestrian")

In [None]:
## Initialize the hourly heatmap data for underraincoat
underraincoat_heatmap_pedestrian_event_days  = daily_heatmap_data(get_heatmap_data('SWLSANDBOX2', 'pedestrian', hour_interval[0], hour_interval[1]))
underraincoat_heatmap_pedestrian_event_days = add_column(underraincoat_heatmap_pedestrian_event_days,  datetime(2019, 6, 29, 0, 0, 0), "underraincoat", "pedestrian")

In [None]:
## Loading event hourly heatmap data for streetscape
start_time = datetime(2019, 6, 29, 0, 0, 0)
for day in event_days:
    if day != start_time:
        start_time = day
        end_time = start_time + relativedelta(days = +1)
        hour_interval = generate_consecutive_times(start_time, end_time, interval)
        temp = add_column(daily_heatmap_data(get_heatmap_data('SWLSANDBOX1', 'pedestrian', hour_interval[0], hour_interval[1])), start_time,  'streetscape', 'pedestrian')
        streetscape_heatmap_pedestrian_event_days = streetscape_heatmap_pedestrian_event_days.append(temp)

In [None]:
## Loading event hourly heatmap data for underraincoat
start_time = datetime(2019, 6, 29, 0, 0, 0)
for day in event_days:
    if day != start_time:
        start_time = day
        end_time = start_time + relativedelta(days = +1)
        hour_interval = generate_consecutive_times(start_time, end_time, interval)
        temp = add_column(daily_heatmap_data(get_heatmap_data('SWLSANDBOX2', 'pedestrian', hour_interval[0], hour_interval[1])), start_time,  'underraincoat', 'pedestrian')
        underraincoat_heatmap_pedestrian_event_days = underraincoat_heatmap_pedestrian_event_days.append(temp)

In [None]:
## Loading event hourly heatmap data for outside
start_time = datetime(2019, 6, 29, 0, 0, 0)
for day in event_days:
    if day != start_time:
        start_time = day
        end_time = start_time + relativedelta(days = +1)
        hour_interval = generate_consecutive_times(start_time, end_time, interval)
        temp = add_column(daily_heatmap_data(get_heatmap_data('SWLSANDBOX3', 'pedestrian', hour_interval[0], hour_interval[1])), start_time,  'outside', 'pedestrian')
        outside_heatmap_pedestrian_event_days = outside_heatmap_pedestrian_event_days.append(temp)

In [None]:
## This is a helper function for getting the data for plotting hourly heatmap
def event_hour_data_helper(camera, percentile, time):
    '''
    This is a simple helper function to get the data
    camera is required to be "Outside Camera" or  "StreetScape Camera" or  "UnderRainCoat Camera"
    percentile is required to be an integer between 0 and 100
    time is required to be a datetime object on event days
    Return Value for this function will be like [[1, 3, ...,],[2, 1, ...],[12, 0.2,...]] 
    A list containing 3 sublists the first represent x, second for y, third for density, they have to be in the same length
    '''
    data = []
    if camera == "Outside Camera":
        data = outside_heatmap_pedestrian_event_days
    elif camera == "StreetScape Camera":
        data = streetscape_heatmap_pedestrian_event_days
    elif camera == "UnderRainCoat Camera":
        data = underraincoat_heatmap_pedestrian_event_days
    data = list(data[data['startTime'] == time]['heatMapMatrix'])
    if data:
        p = np.percentile([i[2] for i in data[0]], percentile)
        filtered = list(filter(lambda x : x[2] >= p, data[0]))
        x = [i[0] for i in filtered] 
        y = [i[1] for i in filtered]
        density = [i[2] for i in filtered]
        data = [x, y, density]
    return data

In [None]:
## heatmap_animation_hour("StreetScape Camera", start_time, 20, 12)
## User is able to select the event day, percentile,
def heatmap_animation_hour(camera: str, day: datetime, percentile: int, hour: int):
    '''
    This is a function for plotting hourly heatmap on event days
    day is required to be one of the event days
    camera is required to be "Outside Camera" or  "StreetScape Camera" or  "UnderRainCoat Camera"
    percentile is required to be an integer between 0 and 100
    hour is required to be an interger between 0 to 23
    The function will plot a heatmap given the day and hour, only keeps the data points above the percentile
    '''
    hour_interval =  relativedelta(hours = +1)
    fig, ax = plt.subplots(figsize=(15,10))
    
    # Setting Background Image
    if camera == "Outside Camera":
        image = mpimg.imread('outside_sandbox.png')
    elif camera == "StreetScape Camera":
        image = mpimg.imread('streetscape_sandbox.png')
    elif camera == "UnderRainCoat Camera":
        image = mpimg.imread('underraincoat_sandbox.png')
    # We find the time through date + hour
    time = day+ hour_interval*hour
    data = event_hour_data_helper(camera, percentile, time)
    if data:
        x = data[0]
        y = data[1]
        density = data[2]
        ax.scatter(x, y, c= density, s=1, cmap= plt.cm.RdPu)
    ax.imshow(image, aspect='auto')
    ax.set_title("Hourly Heatmap Animation on {0}.{1}.{2} hour:{3}".format(day.year, day.month, day.day, hour))
    ax.axis('off')
    plt.show()

In [None]:
## This is part is for design of widget for 'heatmap_animation_hour'
## This is a play widget to display the hourly heatmap on event days automatically
play= widgets.Play(
    value=0,
    min=0,
    max=23,
    step=1,
    interval=10000, # Notice that interval here is 10000ms, since it takes time to load heatmap data to the image
    description="Press play",
    disabled=False
)
## This is a slider widget to change the hour value
hour_slider = widgets.IntSlider(value=0,min=0,max=23,step=1,description='Hour:')
# We link slider value with the player
widgets.jslink((play, 'value'), (hour_slider, 'value'))
# We show these widgets in a horizontal box
hour_player= widgets.HBox([play, hour_slider])

## This is a dropdown widget for selecting day
Day_time_drop = widgets.Dropdown(options=event_days)
## This is a ToggleButton widget for selecting camera
Camera_Hbox = widgets.ToggleButtons(options=[('Outside',  "Outside Camera"), ('StreetScape',"StreetScape Camera") , ('Under RainCoat', "UnderRainCoat Camera")], description='Camera:')
## This is a intslider widget for selecting percentile you want to use
percentile_slider = widgets.IntSlider(min=0, max=100, step=5, value=0)
## Setting widget for each variable
heatmap_animation_hour_widget = widgets.interactive(heatmap_animation_hour,
                                             camera = Camera_Hbox,
                                             day = Day_time_drop,
                                             percentile = percentile_slider,
                                             hour = hour_slider,continuous_update=False)
## Get the output of the widget
output_a = heatmap_animation_hour_widget.children[-1]

## Rearrange the widgets in a vertical way
tab1 = VBox(children=[Camera_Hbox,
                      Day_time_drop,
                    percentile_slider,
                      hour_player])
## Display output and widget
heatmap_animation_hour_widget = VBox(children=[tab1, output_a])

In [None]:
heatmap_animation_hour_widget

In order to investigate this deeper, we want to plot something like a cumulative heatmap.

Since heatmap is on a daily basis, and the maximum is always 1, the minmum depends on that day's situtation.
we would like to normalize the day based on its dwell counts on that day. So, here is my approach
1. Get the daily dwell count data
2. Figure out the best way to normalize it
3. Plot the one cumulative plot on the left, and another on the right
I will first a cumulative plot for the whole time

In [None]:
## functions to generate cumulative heatmap matrix
def weight_matrix(matrix, factor):
    '''
    This function is a helper function to generate weighted heatmap matrix
    It will return the gievn heatmap matrix with density multiply the factor
    '''
    new_density = [i[2]*factor for i in matrix]
    temp =  [[matrix[i][0], matrix[i][1], new_density[i]] for i in range(len(matrix))]
    
    return temp

In [None]:
## This function generate data for summation of heatmap matrix
def culmulative_heat_map_data_generator(days, data):
    '''
    This function is a helper function to generate weighted heatmap matrix
    It will takes multiple heatmap, store them into one big list, and return
    '''
    culmulative_data_lis = []
    for day in days:
        daily_data = list(data[data['time'] == day]['weighted_heatMapMatrix'])
        if daily_data:
            culmulative_data_lis.append(daily_data[0])
    return culmulative_data_lis

In [None]:


def culmulative_heat_map_data(data):
    '''
    This function is a helper function to generate weighted heatmap matrix
    It will takes a list containing heatmaps, combine them into a single heatmap
    '''
    ## This is dictionary takes coordinates as keys, take density as value
    culmulative_data_dic = {}
    ## This is list stores the finalized heatmap
    culmulative_data_lis = []
    for daily_data in data:
        for coordinate_data in daily_data:
            coordinate = (coordinate_data[0], coordinate_data[1])
            if coordinate not in culmulative_data_dic:
                culmulative_data_dic[coordinate] = coordinate_data[2]
            else:
                culmulative_data_dic[coordinate] += coordinate_data[2]
    for coordinate in list(culmulative_data_dic.keys()):
        culmulative_data_lis.append([coordinate[0], coordinate[1], culmulative_data_dic[coordinate]])
    return culmulative_data_lis

In [None]:
## Add weighted heatmap matrix for three dataframes 
all_dataframe = [streetscape_pedestrian_data_all, underraincoat_pedestrian_data_all, outside_pedestrian_data_all]
for dataframe in all_dataframe:
    dataframe["weighted_heatMapMatrix"] = dataframe.apply(lambda x: weight_matrix(x['heatMapMatrix'], x['count']),axis=1)

In [None]:
## For the sake of time, we generate cumulative heatmap matrix for important times before plotting
cumulative_streetscape_pedestrian_all_days = culmulative_heat_map_data(culmulative_heat_map_data_generator(all_time[0], streetscape_pedestrian_data_all))
cumulative_outside_pedestrian_all_days = culmulative_heat_map_data(culmulative_heat_map_data_generator(all_time[0], outside_pedestrian_data_all))
cumulative_under_pedestrian_all_days = culmulative_heat_map_data(culmulative_heat_map_data_generator(all_time[0], underraincoat_pedestrian_data_all))
cumulative_streetscape_pedestrian_event_days = culmulative_heat_map_data(culmulative_heat_map_data_generator(event_days, streetscape_pedestrian_data_all))
cumulative_outside_pedestrian_event_days = culmulative_heat_map_data(culmulative_heat_map_data_generator(event_days, outside_pedestrian_data_all))
cumulative_under_pedestrian_event_days = culmulative_heat_map_data(culmulative_heat_map_data_generator(event_days, underraincoat_pedestrian_data_all))

In [None]:
cumulative_streetscape_pedestrian_saturdays = culmulative_heat_map_data(culmulative_heat_map_data_generator(saturdays, streetscape_pedestrian_data_all))
cumulative_streetscape_pedestrian_sundays = culmulative_heat_map_data(culmulative_heat_map_data_generator(sundays, streetscape_pedestrian_data_all))
cumulative_streetscape_pedestrian_weekdays = culmulative_heat_map_data(culmulative_heat_map_data_generator(weekdays, streetscape_pedestrian_data_all))

In [None]:
cumulative_outside_pedestrian_saturdays = culmulative_heat_map_data(culmulative_heat_map_data_generator(saturdays, outside_pedestrian_data_all))
cumulative_outside_pedestrian_sundays = culmulative_heat_map_data(culmulative_heat_map_data_generator(sundays, outside_pedestrian_data_all))
cumulative_outside_pedestrian_weekdays = culmulative_heat_map_data(culmulative_heat_map_data_generator(weekdays, outside_pedestrian_data_all))

In [None]:
cumulative_under_pedestrian_saturdays = culmulative_heat_map_data(culmulative_heat_map_data_generator(saturdays, underraincoat_pedestrian_data_all))
cumulative_under_pedestrian_sundays = culmulative_heat_map_data(culmulative_heat_map_data_generator(sundays, underraincoat_pedestrian_data_all))
cumulative_under_pedestrian_weekdays = culmulative_heat_map_data(culmulative_heat_map_data_generator(weekdays, underraincoat_pedestrian_data_all))

In [None]:
def get_quantiled_data(data, percentile, form):
    '''
    This function is a helper function to get the quantiled data in two ways
    If form variable is True, it will return [[x coordinates], [y coordinates], [density]]
    '''
    p = np.percentile([i[2] for i in data], percentile)
    filtered = list(filter(lambda x : x[2] >= p, data))
    if form:
        x = [i[0] for i in filtered] 
        y = [i[1] for i in filtered]
        density = [i[2] for i in filtered]
        return [x, y, density]
    else:
        return filtered

In [None]:
## This function generate quantiled x,y coordinates, used for clustering
def get_quantiled_data_coordinate(data, percentile):
    '''
    This function is a helper function to get the coordinates of quantiled data
    It will return all coordinates of data points above percentile 
    '''
    p = np.percentile([i[2] for i in data], percentile)
    filtered = list(filter(lambda x : x[2] >= p, data))
    x = [i[0] for i in filtered] 
    y = [i[1] for i in filtered]
    density = [i[2] for i in filtered]
    quantiled_data = []
    for i in range(len(x)):
        quantiled_data.append([x[i], y[i]])
    return quantiled_data

In [None]:
def cumulative_heatmap_data_helper(camera:str, plot:str, quantile:int, form):
    '''
    This function is a helper function to get the dataframe based on gievn camera and days(plot)
    '''
    data = []
    if camera == "Outside Camera":
        if plot == "Event Days":
            data = cumulative_outside_pedestrian_event_days
        if plot == "All the Days":
            data = cumulative_outside_pedestrian_all_days
        if plot == "Saturdays":
            data = cumulative_outside_pedestrian_saturdays
        if plot == "Sundays":
            data = cumulative_outside_pedestrian_sundays
        if plot == "Weekdays":
            data = cumulative_outside_pedestrian_weekdays
    if camera == "StreetScape Camera":
        if plot == "Event Days":
            data = cumulative_streetscape_pedestrian_event_days
        if plot == "All the Days":
            data = cumulative_streetscape_pedestrian_all_days
        if plot == "Saturdays":
            data = cumulative_streetscape_pedestrian_saturdays
        if plot == "Sundays":
            data = cumulative_streetscape_pedestrian_sundays
        if plot == "Weekdays":
            data = cumulative_streetscape_pedestrian_weekdays
    if camera == "UnderRainCoat Camera":
        if plot == "Event Days":
            data = cumulative_under_pedestrian_event_days
        if plot == "All the Days":
            data = cumulative_under_pedestrian_all_days
        if plot == "Saturdays":
            data = cumulative_under_pedestrian_saturdays
        if plot == "Sundays":
            data = cumulative_under_pedestrian_sundays
        if plot == "Weekdays":
            data = cumulative_under_pedestrian_weekdays
    total_density = sum(i[2]  for i in data)
    data = get_quantiled_data(data, quantile, form)
    quantiled_density = sum(data[2])
    return (data, [quantiled_density, total_density-quantiled_density])

In [None]:
def check_circle(radius, center, coordinate):
    return ((center[0]- coordinate[0])**2 + (center[1]- coordinate[1])**2) < radius**2

In [None]:
## tab: data, desired ines: density, we recommend you to lower to 50 percentile, desired slots, proportion of density
## Design of interactive part
# 1. Hbox: ToggleButtons, 
# 2. Hbox:  Dropdown for plot1,  widgets.IntSlide for quantile1
# 3. Hbox: Dropdown for plot2,  widgets.IntSlide for quantile2
def plot_cumulative_heatmap(camera, plot1, plot2, quantile1, quantile2):
    '''
    This is a function for plotting culmulative heatmap on event days, all days...
    Plot1 is the days you want to plot on the above plot 
    Plot2 is for the below plot
    camera is required to be "Outside Camera" or  "StreetScape Camera" or  "UnderRainCoat Camera"
    quantile is required to be an integer between 0 and 100
    The function will plot two culmulative heatmaps given the days and camera, only keeps the data points above the quantile
    
    Also, it will plot four pie charts. They indicates the proportion of data points on the plot
    The proportion of total density of data points on the plot
    '''
    
    # Setting Background Image
    if camera == "Outside Camera":
        image = mpimg.imread('outside_sandbox.png')
    elif camera == "StreetScape Camera":
        image = mpimg.imread('streetscape_sandbox.png')
    elif camera == "UnderRainCoat Camera":
        image = mpimg.imread('underraincoat_sandbox.png')
      
    # Get the data for two plots
    temp1 =  cumulative_heatmap_data_helper(camera, plot1, quantile1, True)
    temp2 = cumulative_heatmap_data_helper(camera, plot2, quantile2, True)
    
    # Divide them into x, y, density lists
    data1 = temp1[0]
    pie1 = temp1[1]
    data2 = temp2[0]
    pie2 = temp2[1]
    

    fig, ax = plt.subplots(figsize=(16,10))
    gs = gridspec.GridSpec(8, 4)
    ax0 = plt.subplot(gs[0:4,0:2]) # upper heatmap
    ax1 = plt.subplot(gs[4:8,0:2]) # lower heatmap
    ax2 = plt.subplot(gs[0:2,2]) # 1st pie chart
    ax3 = plt.subplot(gs[2:4,2]) # 2nd pie chart
    ax4 = plt.subplot(gs[4:6,2]) # 3rd pie chart
    ax5 = plt.subplot(gs[6:8,2]) # 4th pie chart
    
    # Upper heatmap
    ax0.scatter(data1[0], data1[1], c = data1[2], cmap = plt.cm.YlGnBu_r, s = 0.1)
    ax0.imshow(image, aspect='auto')
    ax0.set_title("Heatmap for {0} on {1} ({2} percentile)".format(camera, plot1, quantile1))
    ax0.axis('off')
    
    # Lower heatmap
    ax1.scatter(data2[0], data2[1], c = data2[2], cmap = plt.cm.YlGnBu_r, s = 0.1)
    ax1.imshow(image, aspect='auto')
    ax1.set_title("Heatmap for {0} on {1} ({2} percentile)".format(camera, plot2, quantile2))
    ax1.axis('off')
    
    # Labels, title and colors for pie charts
    label1 = ['Blue Lines', 'Other Points']
    title1 = "Density Proportion for {0}".format(plot1)
    label2 =  ['Blue Points', 'Other Points']
    title2 =  "Proportion of Number of Points for {0}".format(plot1)
    label3 = ['Blue Lines', 'Other Points']
    title3 = "Density Proportion of for {0}".format(plot2) 
    label4 =  ['Blue Points', 'Other Points']
    title4 =  "Proportion of Number of Points for {0} ".format(plot2)
    colors=["lightskyblue", "lightcoral"]
    
    # Plot pie charts and set their titles and legends
    wedges, texts, autotexts = ax2.pie(pie1, autopct='%1.1f%%', colors = colors)
    ax2.legend(wedges, label1,  loc="center left",bbox_to_anchor=(1, 0, 0.5, 1))
    ax2.set_title(title1)
    
    wedges, texts, autotexts= ax3.pie([100-quantile1, quantile1], autopct='%1.1f%%', colors = colors)
    ax3.legend(wedges, label2,  loc="center left",bbox_to_anchor=(1, 0, 0.5, 1))
    ax3.set_title(title2)
    
    wedges, texts, autotexts= ax5.pie([100-quantile2, quantile2], autopct='%1.1f%%', colors = colors)
    ax5.legend(wedges,label4, loc="center left",bbox_to_anchor=(1, 0, 0.5, 1))
    ax5.set_title(title4)
    
    ax4.set_title(title3)
    wedges, texts, autotexts = ax4.pie(pie2,autopct='%1.1f%%', colors = colors)
    ax4.legend(wedges,label3, loc="center left",bbox_to_anchor=(1, 0, 0.5, 1))
    
    plt.tight_layout()

In [None]:
## This is part is for design of widget for 'plot_cumulative_heatmap'

## Make sure description will not be shorten
style = {'description_width': 'initial'}

## Two dropdowns for selecting days
Plot1_Drop = widgets.Dropdown(options=["Event Days","All the Days", "Sundays", "Saturdays", "Weekdays"], description='Time (first plot): ', style = style)
Plot2_Drop = widgets.Dropdown(options=["Event Days","All the Days", "Sundays", "Saturdays", "Weekdays"], description='Time (second plot):', style = style)

## Two Intslider for selecting percentiles
Plot1_quantile  = widgets.IntSlider(min=0, max=100, step=1, value=50, description='Percentile (first plot): ',style = style)
Plot2_quantile = widgets.IntSlider(min=0, max=100, step=1, value=50, description='Percentile (second plot): ',style = style)

## Store them seperately in horizontal boxes
Plot1_Hbox = widgets.HBox(children=[Plot1_Drop, Plot1_quantile], style = style)
Plot2_Hbox = widgets.HBox(children=[Plot2_Drop, Plot2_quantile], style = style)
Camera_Hbox = widgets.ToggleButtons(options=[('Outside',  "Outside Camera"), ('StreetScape',"StreetScape Camera") , ('Under RainCoat', "UnderRainCoat Camera")], description='Camera:')

## Set the widgets to vairables of functions
plot_cumulative_heatmap_widget = widgets.interactive(plot_cumulative_heatmap, {'manual': True},
                                             camera = Camera_Hbox,
                                             plot1 = Plot1_Drop,
                                             plot2 = Plot2_Drop,
                                             quantile1 = Plot1_quantile,
                                             quantile2 = Plot2_quantile)

## Get the button for running interaction
button1 = plot_cumulative_heatmap_widget.children[-2]

## Get the output for running interaction
output1 = plot_cumulative_heatmap_widget.children[-1]

## Store them vertically
tab1 = VBox(children=[Camera_Hbox,
                      Plot1_Hbox,
                    Plot2_Hbox,button1])

plot_cumulative_heatmap_widget = VBox(children=[tab1, output1])

In [None]:
plot_cumulative_heatmap_widget

In [None]:
from sklearn.cluster import MiniBatchKMeans, KMeans
import math

def plot_cumulative_heatmap_points(camera, plot1, plot2, n, radius, coordinate_x, coordinate_y, show_scatter, show_circle):

    # Setting Coordinates
    coordinate = (coordinate_x,coordinate_y)
    
    # Setting Background Image
    if camera == "Outside Camera":
        image = mpimg.imread('outside_sandbox.png')
    elif camera == "StreetScape Camera":
        image = mpimg.imread('streetscape_sandbox.png')
    elif camera == "UnderRainCoat Camera":
        image = mpimg.imread('underraincoat_sandbox.png')

    
    # Divide the image into four parts
    fig, ax = plt.subplots(figsize=(14,10))
    gs = gridspec.GridSpec(8, 4)
    ax0 = plt.subplot(gs[0:4,0:2]) # upper heatmap
    ax1 = plt.subplot(gs[4:8,0:2]) # lower heatmap
    ax2 = plt.subplot(gs[0:2,2]) # 1st pie chart
    ax3 = plt.subplot(gs[2:4,2])# 2nd pie chart
    ax4 = plt.subplot(gs[4:6,2])# 3rd pie chart
    ax5 = plt.subplot(gs[6:8,2])# 4th pie chart
    
    # Get the data for choosing the best points
    data1 =  cumulative_heatmap_data_helper(camera, plot1, 80, False)[0]
    data2 = cumulative_heatmap_data_helper(camera, plot2, 80, False)[0]

    m_data1 = [i[2] for i in data1]
    m_data2 = [i[2] for i in data2]
    
    # Find the points with maximum density
    ind1 = np.argmax(m_data1)
    ind2 = np.argmax(m_data2)
    
    x_data1 =  [i[0] for i in data1]
    y_data1 =  [i[1] for i in data1]
    x_data2 =  [i[0] for i in data2]
    y_data2 =  [i[1] for i in data2]
    
    x_number_list1 = [x_data1[ind1]]
    y_number_list1 = [y_data1[ind1]]
    x_number_list2 = [x_data2[ind1]]
    y_number_list2 = [y_data2[ind1]]
   
    # Show the scatter plot 
    if show_scatter:
        temp1 =  cumulative_heatmap_data_helper(camera, plot1, 0, True)
        temp2 = cumulative_heatmap_data_helper(camera, plot2, 0, True)
        data_scatter1 = temp1[0]
        data_scatter2 = temp2[0]
        ax0.scatter(data_scatter1[0], data_scatter1[1], c = data_scatter1[2], cmap = plt.cm.YlGnBu_r, s = 0.1)
        ax1.scatter(data_scatter2[0], data_scatter2[1], c = data_scatter2[2], cmap = plt.cm.YlGnBu_r, s = 0.1)
    
    # Calculate the Area around the circle
    area = math.pi*(radius*radius) # limitation, circle may not in reactangle
    total_area = 500 * 650
    
    # Paint circle
    if show_circle:
        circle1 = plt.Circle(coordinate, radius, color='orange', fill=False,lw=5 )
        circle2 = plt.Circle(coordinate, radius, color='orange', fill=False, lw=5 )
        ax0.add_artist(circle1)
        ax1.add_artist(circle2)
    
    # Plot the potential desired points 
    ax0.scatter(x_number_list1, y_number_list1, color = 'r',s = 100)
    ax0.imshow(image, aspect='auto')
    ax0.set_title("{0} on {1} ".format(camera, plot1))
    
    ax1.scatter(x_number_list2, y_number_list2, color = 'r', s =100)
    ax1.imshow(image, aspect='auto')
    ax1.set_title("{0} on {1} ".format(camera, plot2))
    
    # Calculate the density inside the circles
    circle_density_1 = 0
    total_density_1 = 0
    for coordinates in data1:
        total_density_1 = total_density_1 + coordinates[2]
        if check_circle(radius,coordinate,(coordinates[0],coordinates[1])):
            circle_density_1 = circle_density_1 + coordinates[2]
    
    circle_density_2 = 0
    total_density_2 = 0
    for coordinates in data2:
        total_density_2 = total_density_2 + coordinates[2]
        if check_circle(radius,coordinate,(coordinates[0],coordinates[1])):
            circle_density_2 = circle_density_2 + coordinates[2]
            
    # label and colors for pie charts
    label = ["Inside the circle", "Outside the circle"]
    colors=["orange", "lightskyblue"]

    
    # pie chart data
    pie1 = [circle_density_1, total_density_1 - circle_density_1]
    pie2 = [circle_density_2, total_density_2 - circle_density_2]
    pie3 = [area, total_area-area]
    
    # Deal with pie charts
    wedges, texts, autotexts = ax2.pie(pie1, autopct='%1.1f%%', colors = colors)
    ax2.legend(wedges, label,loc="center left",bbox_to_anchor=(1, 0, 0.5, 1))
    ax2.set_title("Density of circle for {0}".format(plot1))
    
    wedges, texts, autotexts = ax3.pie(pie3, autopct='%1.1f%%', colors = colors)
    ax3.legend(wedges, label,loc="center left",bbox_to_anchor=(1, 0, 0.5, 1))
    ax3.set_title("Proportion of Area")
    
    wedges, texts, autotexts = ax5.pie(pie3, autopct='%1.1f%%', colors = colors)
    ax5.legend(wedges,label, loc="center left",bbox_to_anchor=(1, 0, 0.5, 1))
    ax5.set_title("Proportion of Area")
    
    ax4.set_title("Density of circle for {0}".format(plot2))
    wedges, texts, autotexts = ax4.pie(pie2, autopct='%1.1f%%', colors = colors)
    ax4.legend(wedges,label, loc="center left",bbox_to_anchor=(1, 0, 0.5, 1))
    
    # Print potential Desired Points
    print("The coordinates of potential desired points for the above plot are {0}".format((x_number_list1[0], y_number_list1[0])))
    print("The coordinates of potential desired points for the below plot are {0}".format((x_number_list2[0], y_number_list2[0])))
    plt.tight_layout()
    plt.show()

In [None]:
## This is part is for design of widget for 'plot_cumulative_heatmap_points'

## This is a ToggleButton widget for selecting camera
Camera_Hbox = widgets.ToggleButtons(
    options=[('Outside',  "Outside Camera"), ('StreetScape',"StreetScape Camera") , ('Under RainCoat', "UnderRainCoat Camera")],
    description='Camera:',
)

## They are two dropdowns widgets for choosing days
Plot1_Drop_2 = widgets.Dropdown(options=["Event Days","All the Days", "Sundays", "Saturdays", "Weekdays"], description='Time (first plot): ', style = style)
Plot2_Drop_2 = widgets.Dropdown(options=["Event Days","All the Days", "Sundays", "Saturdays", "Weekdays"], description='Time (second plot):', style = style)

## This is intslider widgets for choosing number of desired points
n_cluster_slider = widgets.IntSlider(min=1, max=5, step=1, value=1, description='Number of Desired Points:', style = style)

## They are two intslider widgets for the size and position of circle
radius_slider = widgets.IntSlider(min=1, max=50, step=1, value=10, description='Radius of the circle:', style = style)
x_coordinate_slider = widgets.IntSlider(min=30, max=600, step=1, value=300, description='x coordinate of the center of the circle:',style = style, 
                                        layout=widgets.Layout(width='50%', height='30px'))
y_coordinate_slider = widgets.IntSlider(min=30, max=450, step=1, value=200, description='y coordinate of the center of the circle:', style = style,
                                       layout=widgets.Layout(width='50%', height='30px'))

## They are two checkpoints to show what to display on the plots
show_scatter_box = widgets.Checkbox(value=False, description='Show Desired Lines(Heatmap)', disabled=False, indent=False, style = style)
show_circle_box = widgets.Checkbox(value=True, description='Show Circle', disabled=False, indent=False, style = style)

## Set the widgets to vairables of functions
plot_cumulative_heatmap_points_widget = widgets.interactive(plot_cumulative_heatmap_points, {'manual': True},
                                         camera = Camera_Hbox, 
                                         plot1 = Plot1_Drop_2, 
                                         plot2 = Plot2_Drop_2, 
                                         n = n_cluster_slider,
                                         radius = radius_slider,
                                         coordinate_x = x_coordinate_slider,
                                         coordinate_y = y_coordinate_slider,
                                         show_scatter = show_scatter_box,
                                         show_circle = show_circle_box)

## Rearrange the position of wiegets
coordinate_Hbox = widgets.HBox(children=[x_coordinate_slider, y_coordinate_slider])
Plot1_Hbox_1 = widgets.HBox(children=[Plot1_Drop_2])
Plot1_Hbox_2 = widgets.HBox(children=[Plot2_Drop_2])
Show= widgets.HBox(children=[show_scatter_box, show_circle_box])
vbox1 = VBox(children=[Camera_Hbox, Show, Plot1_Hbox_1, Plot1_Hbox_2])
vbox2 = VBox(children=[coordinate_Hbox, radius_slider, n_cluster_slider])

tab = widgets.Tab(children=[vbox1, vbox2])
tab.set_title(0, 'Plot')
tab.set_title(1, 'Desired Point')

button2 = plot_cumulative_heatmap_points_widget.children[-2]
output = plot_cumulative_heatmap_points_widget.children[-1]
tab2 = VBox(children=[tab, button2])
plot_cumulative_heatmap_points_widget_rearrange = VBox(children = [tab2, output]) 
plot_cumulative_heatmap_points_wieget =plot_cumulative_heatmap_points_widget_rearrange

Main Idea: Time: when conflicting + Space: where conflicting 

In [None]:
def get_hourly_dwell(func, ID, interval, startTime, endTime):
    '''
    func is either feedDwellTimeDistribution or zoneDwellTimeDistribution
    '''
    startTime_str = startTime.strftime("%Y-%m-%dT%H:%M:%S")
    endTime_str = endTime.strftime("%Y-%m-%dT%H:%M:%S")
    
    if func == 'feedDwellTimeDistribution':
        arg = 'serialnos: "{0}"'.format(ID)
    else:
        arg = 'zoneIds: {0}'.format(ID)
        
    query = """
    query {{
        {0}(
        {1},
        startTime: "{2}",
        endTime: "{3}",
        timezone: "America/New_York",
        objClasses: ["pedestrian", "car"],
        interval: "{4}"
        ){{
        edges {{
          node {{
            time
            objClass
            pct100
            pct75
            pct50
            pct25
            mean
            count
          }}
        }}
      }}
    }}
    """.format(func, arg, startTime_str, endTime_str,  interval)

    dwell = requests.post(url, json={'query': query}, 
                           headers = {'Authorization':token})
    
    df = pd.DataFrame([x['node'] for x in dwell.json()['data'][func]['edges']])
    if func == 'feedDwellTimeDistribution':
        df['device'] = ID
    else:
        df['zone'] = ID
    df['time'] = [datetime.strptime(i[:-6], "%Y-%m-%dT%H:%M:%S") for i in list(df['time'])]
    
    return df

In [None]:
## Get the days that we want to investigate conflict zone, we choose the days with 20 highest count of cars
dwell_data_allobj = get_hourly_dwell('feedDwellTimeDistribution','SWLSANDBOX2' , '1d', START_DATE, END_DATE)
dwell_data_car = dwell_data_allobj[dwell_data_allobj['objClass'] == 'car'].sort_values(by='count', ascending=False)
high_traffic_day_list = list(dwell_data_car.sort_values(by='count', ascending=False)['time'].head(20))

In [None]:
## Then, we get the dwell data of dwell data of these days
dwell_data_traffic_day = pd.concat([get_hourly_dwell('feedDwellTimeDistribution','SWLSANDBOX2', '1h', x+7*time_delta_hour, x+23*time_delta_hour) 
                                    for x in high_traffic_day_list])
dwell_data_traffic_day =  dwell_data_traffic_day.reset_index(drop = True)

In [None]:
import time
## Get Heatmap data of car these days based on hour
column_names = ["startTime", "endTime", "heatMapMatrix", "objClass"]
high_traffic_day_car_heatmap = pd.DataFrame(columns = column_names)
for day in high_traffic_day_list:
    time_delta_hour = relativedelta(hours = +1)
    start_hour = day + time_delta_hour*7
    end_hour = day + time_delta_hour*23
    interval = generate_consecutive_times(start_hour, end_hour, time_delta_hour)
    temp = daily_heatmap_data(get_heatmap_data('SWLSANDBOX2', 'car', interval[0], interval[1]))
    temp['objClass'] = 'car'
    high_traffic_day_car_heatmap = pd.concat([temp, high_traffic_day_car_heatmap])
    time.sleep(5)

In [None]:
## Get Heatmap data of pedestrian these days based on hour
high_traffic_day_pedestrian_heatmap = pd.DataFrame(columns = column_names)
for day in high_traffic_day_list:
    time_delta_hour = relativedelta(hours = +1)
    start_hour = day + time_delta_hour*7
    end_hour = day + time_delta_hour*23
    interval = generate_consecutive_times(start_hour, end_hour, time_delta_hour)
    temp = daily_heatmap_data(get_heatmap_data('SWLSANDBOX2', 'pedestrian', interval[0], interval[1]))
    temp['objClass'] = 'pedestrian'
    high_traffic_day_pedestrian_heatmap = pd.concat([temp, high_traffic_day_pedestrian_heatmap])
    time.sleep(10)

In [None]:
## Merge two dataframes 
high_traffic_day_heatmap = pd.concat([high_traffic_day_car_heatmap, high_traffic_day_pedestrian_heatmap])
high_traffic_day_heatmap = high_traffic_day_heatmap.drop("index", axis =1).reset_index(drop= True)
high_traffic_day_merged = pd.merge(high_traffic_day_heatmap, dwell_data_traffic_day, 
                                   left_on = ["startTime", "objClass"], 
                                   right_on = ["time", "objClass"]).drop(["endTime",  "startTime", "pct75", "pct25", "device", "pct100", "mean","pct50"],axis =1)

In [None]:
## Find weighted heatmap matrix
high_traffic_day_merged["WeightedheatMapMatrix"] = high_traffic_day_merged.apply(lambda x: weight_matrix(x['heatMapMatrix'], x['count']),axis=1)

In [None]:
## Seperate two dataframes based on objclass
high_traffic_day_car_merged = high_traffic_day_merged[high_traffic_day_merged['objClass'] == 'car'].drop(["heatMapMatrix", "objClass"], axis = 1)
high_traffic_day_car_merged.rename(columns={"WeightedheatMapMatrix": "carWeightedheatMapMatrix", "count": "carCount"}, inplace = True)
high_traffic_day_pedestrian_merged = high_traffic_day_merged[high_traffic_day_merged['objClass'] == 'pedestrian'].drop(["heatMapMatrix", "objClass"], axis = 1)
high_traffic_day_pedestrian_merged.rename(columns={"WeightedheatMapMatrix": "pedestrianWeightedheatMapMatrix", "count": "pedestrianCount"}, inplace = True)

In [None]:
high_traffic_day_hourly_heatmap = pd.merge(high_traffic_day_car_merged, high_traffic_day_pedestrian_merged, how = "outer", on ="time")

In [None]:
## To do: Calculating conflicting index of each hour
## For each hour, Create two lists, first is weighted_heatMapMatrix for pedestrian, second is weighted_heatMapMatrix for pedestrian.
## Approach: Draw a circle with radius 50 around each non-pedestrian object(use check circle function), check pedestrian density around it 
## why choose 50? Summing density pedestrian objects in the circle weighted by distance (50 - 1 ), and this is conflicting index 
## Calculate conflicting index for each hour, create a new dataframe for this.
## User is allowed to have orignial or scaled index
def find_distance(c1, c2):
    return ((c1[0] - c2[0])**2 + (c1[1] - c2[1])**2)**(1/2)


def calculate_conflicting_index(non_pd, pd):
    '''
    This is a function for calculating conflicting index of datapoints under the camera,
    non_pd is the heatmap matrix for nonpedastrain objects,
    pd is the heatmap matrix for nonpedastrain objects.
    
    The input matrix should be weighted.
    For each non-pedestrian object, check pedestrian density around it with distance less than 50
    Its conflicting index is proportional to density of objects, but has negative relationship with distance.
    Therefore, the index is calculated by density of non_pd object at that point * (sum(density of pd objects/ distance))
    For more information, see documentation.
    '''
    conflicting_index = []
    if type(non_pd) == float:
        return []
    else:
        if type(pd) == float:
                return []
        for coordinate in non_pd:
            index = 0
            for coordinate_p in pd:
                distance = find_distance(coordinate[0:2], coordinate_p[0:2])
                if distance < 1:
                      index = index + coordinate[2]*coordinate_p[2]
                elif distance < 50:
                    index = index + coordinate[2]*coordinate_p[2]/distance
            conflicting_index.append([coordinate[0], coordinate[1], index])
        return conflicting_index

In [None]:
def sum_conflicting_index(lis):
    '''
    This is a helper function which sums up the density
    '''
    s = 0
    for i in lis:
        s = i[2] + s
    return s

In [None]:
## Caluclate conflictIndexMatrix for every time period
high_traffic_day_hourly_heatmap['conflictIndexMatrix'] = high_traffic_day_hourly_heatmap.apply(lambda x : 
                                                                                               calculate_conflicting_index(x['carWeightedheatMapMatrix'],
                                                                                                                                      x['pedestrianWeightedheatMapMatrix']), axis=1)

In [None]:
## Sum up the total conflictIndex in a given time period
high_traffic_day_hourly_heatmap["conflict_index_total"] = [sum_conflicting_index(i) for i in list(high_traffic_day_hourly_heatmap['conflictIndexMatrix'])]

In [None]:
##  Group the data by hour
high_traffic_day_hourly_heatmap['hour'] = [i.hour for i in high_traffic_day_hourly_heatmap["time"]]

In [None]:
## To do: Visualization2: Heatmap: Can show both daily heatmap and hourly cumulative heatmap for this month and 
## also a heatmap of all potential conflicting points in every hour without and the whole month

In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

def lineplot_hour_conflicting(objclass, metric):# Create figure with secondary y-axis
    fig = make_subplots(specs=[[{"secondary_y": True}]])
    
    #check what objclass it comes from
    obj = objclass + "Count"

    # Add traces
    fig.add_trace(
        go.Scatter(x=list(high_traffic_day_hourly_heatmap.groupby(['hour']).sum()['conflict_index_total'].index)
                   , y=list(high_traffic_day_hourly_heatmap.groupby(['hour'])['conflict_index_total'].sum()), name="Cumulative Conflicting Index"),
        secondary_y=False,
    )

    # if metric is mean
    if metric == "mean":
        fig.add_trace(
            go.Scatter(x=high_traffic_day_hourly_heatmap.groupby(['hour']).mean()[obj].index
                       , y=list(high_traffic_day_hourly_heatmap.groupby(['hour'])[obj].mean()), name="{0} Count({1})".format(objclass, metric)),
            secondary_y=True,
        )
    # if metric is median
    if metric == "median":
        fig.add_trace(
            go.Scatter(x=high_traffic_day_hourly_heatmap.groupby(['hour']).median()[obj].index
                       , y=list(high_traffic_day_hourly_heatmap.groupby(['hour'])[obj].median()), name="{0} Count({1})".format(objclass, metric)),
            secondary_y=True,
        )
    # if metric is max
    if metric == "max":
        fig.add_trace(
            go.Scatter(x=high_traffic_day_hourly_heatmap.groupby(['hour']).max()[obj].index
                       , y=list(high_traffic_day_hourly_heatmap.groupby(['hour'])[obj].max()), name="{0} Count({1})".format(objclass, metric)),
            secondary_y=True,
        )
    # Add figure title
    fig.update_layout(
        title_text="Cumulative Conflicting Index (hour)"
    )

    # Set x-axis title
    fig.update_xaxes(title_text="Hour")

    # Set y-axes titles
    fig.update_yaxes(title_text="<b>Cumulative Conflicting Index</b>", secondary_y=False)
    fig.update_yaxes(title_text="<b>{0} Count({1})</b>".format(objclass, metric), secondary_y=True)

    fig.show()
    

In [None]:
_ = interact(lineplot_hour_conflicting, metric=widgets.RadioButtons(options=['median', 'mean', "max"]),
             objclass=widgets.RadioButtons(options=['car', 'pedestrian']))

In [None]:
## Show plot with the highest density, indicating where is the largest conflicting point
def heatmap_hour_conflicting(hour):
    
    image = mpimg.imread('underraincoat_sandbox.png')
    fig, ax = plt.subplots(figsize=(14,10))
    ax.imshow(image, aspect='auto')
    data = high_traffic_day_hourly_heatmap[high_traffic_day_hourly_heatmap['hour'] == hour]
    scatter = culmulative_heat_map_data(data['conflictIndexMatrix'].tolist())
    x = [i[0] for i in scatter]
    y =  [i[1] for i in scatter]
    density = [i[2] for i in scatter]
    ax.scatter(x,  y, c= density, cmap = plt.cm.YlGnBu_r)
    ax.axis('off')
    ax.set_title("The conflicting Heatmap between pedestrian and car at {0}".format(hour))

In [None]:
_ = interact(heatmap_hour_conflicting,
             hour=widgets.IntSlider(min=7, max=22, step=1, value=1, description='Time (hour):'))