# Documentation

# 0. Context and Preview

This preview section introduces the purpose of the webpage and the scope of analysis.

The webpage focuses on exploratory analysis and mainly consists of interactive widgets. It aims to provide a tool for the users to understand and explore data collected at the [307 area](https://www.azuremagazine.com/article/sidewalk-labs-307/) of the [Sidewalk Labs](https://www.sidewalklabs.com/). 

The 307 area consists of three `devices` (or `areas`): Streetscape, Under Raincoat and Outside. This section of the webpage displays the images of the areas as well as the predefined behaviour `zones`.

The data used here was collected by three Numina sensors, one in each of the three areas, and accessed through the API of Numina. The data has been deidentified; it includes the count, dwell time and the location of objects detected in the 307 area, with a time scope from February 2019 to January 2020. 

# 1. Pedestrian Count and Dwell Time

The first main section of the webpage allows the user to compare patterns in pedestrian count and dwell time across different device areas or behaviour zones. This section aims to allow the user to observe where pedestrians tend to pass through or to linger. Such observations could help the planner understand where and/or when to put more attention. 

All widgets in this section allow the user to compare the data across both devices and zones. Every device / zone has been assigned a unique colour; zones in the same device area have been assigned colours of the same hue.

We will refer to the following list as the `full metric list`: count, mean dwell time, median dwell time, max dwell time and total dwell time.

### 1.1 Long-Term Trend

This line plot uses daily data on count and dwell time. It allows the user to choose from the full metric list and compare the trend of the data over the timeframe. 

This very first plot aims to give the user a general impression of the data in terms of different metrics. It shows the range of the values and highlights the peaks.

We suggest to use this plot for a quick overview rather than for in-depth analysis.

### 1.2 Peak Days Summary

This interactive dataframe allows the user to rank the daily data by any of the metrics in the full metric list. Since the data used to create this webpage is limited, we suggest the user to use this widget to filter out the days with the highest values so as to perform further research on the reasons behind the large values.

### 1.3 Overall Distribution

This box plot also uses daily data and gives the basic statistics of the metrics in the full metric list. In addition to basic quantiles and outliers, the box plot includes dashed lines for mean values. In contrast to the previous time series chart, this plot focuses on the cumulative distribution. This is useful for comparing the popularity of device areas / zones numerically and for identifying the location in which the outlier values occurred.

### 1.4 Grouping by Time

As we are working with time series data, it is natural to see the effects of different time groups (ex. hours, weekdays, months). This grouped bar plot uses either daily or hourly data depending on the "time group" specified. If the user groups the data by "hour", then hourly data would be used; otherwise daily data would be used for "dayofweek" or "month".

For any specified metric in the full metric list, the plot displays the median value of that metric in each hour/weekday/month. We chose to display the median value because we would like to emphasize a general pattern as time changes and we would like to minimize the effects of outlier values. 

### 1.5 Averaged Proportion

Similar to the previous plot, this stacked bar plot also uses either daily or hourly data depending on the "time group". 

In contrast, we only allow the user to choose between count or total dwell time. To generate the plot, compute the proportion of an area or zone in terms of either total count or total dwell time during each hour or day. Then for each area/zone, group the data by the specified "time group" and take the average of the proportion values. 

Note that we decided to use "mean" proportion rather than "median" proportion since it is reasonable to assume that even for a day had an abnormally high count / total dwell time, the values would still be informative in terms of proportions. The mean proportion values would also always sum up to one.

In fact, we believe that the days with a higher count could be more informative on where people tend to be comparing to regular days. Therefore, we offer the option of setting a lower bound on count to select only a subset of the hours/days to be considered. Hence, the user would be able to compare the importance of each device area / zone on crowded days against regular days.

### 1.6 Events and User Specified Time Interval 

The purpose of this interactive time series plot is to allows the user to compare patterns in the pedestrian count and dwell time across different device areas or behavior zones for the user-specified time interval or events. The user can first choose to visualize pedestrian count and dwell time (by selecting the `metric` of their choice) by different device areas or behavior zones (`groupby`). Moreover, users can also choose if they want information regarding events (hourly data) or time interval of their choice (daily data). Note that for the time interval of the user's choice, users must specify the `starting_time` and the `ending_time` in the format of 'Y-m-dT00:00:00'.

# 2. Desired Lines and Spots
The second main section of the webpage allows the user to investigate 
1. Desired lines and spots of pedestrians on event days
2. Difference of desired lines and spots on event days and other days.<br>

For each plot in this section, we will focus one of them.

In visualization, **heatmap** is the best representation of how objects move about under the camera. To answer this question, we will use heatmap to represent pedestrian's motion on event days. Also, a color bar is included to guide the audience the density of heatmap on different points.<br>

Notice that the color of a point on the heatmap represents the density of heatmap on that point. (See color bar). In this part, we let the audience to explore, if there is a spot with mostly high density points, we say that spot is a desired spot for pedestrian's movement. Similarly, if there is a continous line with mostly high density points on the heatmap, we say that line is a desired line for pedestrian's movement, they are tend to take on that path. If the audience find same spots or lines among different event time periods, then the audience may conclude  people tend to dwell there or choose those paths. <br>

Also, we could numerically calculates the total density inside a zone to find out weather that region is zone for pedestrians.

### Definition and Methods in this part
Please read through this before step into plots.
- **Cumulative Heatmap**
We introduce this definition due to the limitation of data collection. The raw heatmap matrix we get from the API is scaled based on the density of that time interval. Therefore, given any non-empty heatmap matrix, the highest density component of that matrix is always 1. Therefore, if we want to compare density heatmaps of two discrete time periods (such as event days and Sundays), we cannot just sum them up.   <br>(For example, $[[1, 2, 1], [1, 3, 0.5]]$ and $[[1, 2, 0.1], [1, 3, 0.3]]$, its cumulation should not just be $[[1, 2, 1.1 (1 + 0.1)], [1, 3, 0.8 (0.5 + 0.3)]]$).<br>However, If we want to compare two discrete time periods, it will be meaningless to compare individual days seperately. <br> (For example, as we find in the previous part, the heatmap on a single event day cannot represent the trend for all event days.)<br> Therefore, it would be better if we have a way to approximate the heatmap for discrete time periods in a more accurate way. In order to achieve this, we need to weight heatmaps of different time periods differently. According to the construction of heatmap provided by the Sidewalk lab, we find that density on a point is essentially a "dwell count" data on that point, but scaled between 0 and 1.<br> Therefore, if we weight density for a given time period based on the dwell count of that time period in that given region, we could get a better approximation when sum up heatmap matrix of discrete time periods. <br> (For example, since for all event days, 6.29 has the highest dwell count, its heatmap will be weighted more.) <br> And for the previous example, if $[[1, 2, 1], [1, 3, 0.5]]$ has dwell count 2, while $[[1, 2, 0.1], [1, 3, 0.3]]$ has dwell count 20 with respect to its time periods, then our cumulative heatmap shoule look like $[[1, 2, 3], [1, 3, 7]]$ where $3 = 1*2 +0.1* 20, 7 = 0.5*2 + 3*20$. Then we can sclae them again.<br>
**Definition** Let $M1$ and $M2$ be two density heatmap matrix for a given zone, with different time periods $t1$ and $t2$, their dwell count, its dwell count is $c1$ and $c2$. $M1 = [[x1, y1, d1], [x2, y2, d2] ...], M1 = [[x1, y1, e1], [x2, y2, e2] ...]$.<br> 
Then culumative heatmap of $M1$ and $M2$ is $M_c = [[x1, y1, d1*c1+e1*c2], [x2, y2, d2*c1+e2*c2]...]$


- **Quantiled Heatmap**
We introduce this method due to the limitation of data visualization. According to previous explanantion, if there is a continous line with mostly high density points on the heatmap, we say that line is a desired line for pedestrian's movement, however, only from the data, we cannot see a clear line or a cluser of desired points directly because of the interuption of low density points. Therefore, we may want to drop those low density points based on the percentile(quantile) because they are useless for our analysis, and we may have a better visualization. Also, after we plotting Cumulative Heatmap, there may be many points with extreme low density, they can be considered as outliners, we could drop them for the benefit of data analysis.

### 2.1 On Events Summary

#### Purpose
***
This plot will focus on answering **Desired lines and spots of pedestrians on event days**. <br>

#### Rationale and Approach
***
As we discussed at the begining of part 2, we will use heatmap as the visualization. 
For **data usage**, we first use the **event calendar**, we get the name, time periods of the events. Also, This plot requires heatmap matrix objects for pedestrian objects during the events times under all 3 devices. The **heatmap matrix** we get are for the whole event.<br>

For **overall design**, we will use mutiple subplots. Since we want audience to understand the general trend of motion on event days for pedestrians. We plot all heatmaps under a camera for all event days together, so that audience can compare and figure out the common desired spots of heatmaps eaiser.

#### Instruction (Interative Part)
***
- **Camera(Device) Selection** 
Since the audience can only choose one camera at one time to plot the heatmaps, we use ToggleButtons for this. There are three cameras, Under Raincoat, Streetscape and Outside. And because 3 cameras is not a leage number, ToggleButtons is appropriate in terms of aesthetics.

### 2.2 At Different Times of Events
#### Purpose
***
This plot will focus on answering **Desired lines and spots of pedestrians on event days.**

#### Rationale and Approach
***
For this plot, we want to zoom into people's dwell on event days (on time axis). The previous plot gives us a general (culmulative) heatmap for the whole event time period. However, what about different time periods in the event, do pedestrians tend to have different motions for different hours?

As we discussed at the begining of part 2, we will use heatmap as the visualization.
For **data usage**, similar to 2.1, but we get heatmaps on for each **hour** on event days for all 3 cameras instead of the whole event.


For **overall design** of this plot, the audience is allowed to display the **animation** of heatmaps on a event day, the heatmap will change for each hour based on the heatmap matrix at that hour. This functionality allows users to observe the trend of desired lines or spots in a more coherent way.

#### Instruction (Interative Part)
***
- **Camera(Device) Selection** Same design as 2.1.
- **Event Selection** We use Dropdown widget to let users to choose event. Since event names are long, dropdown is appropriate to viusalize them.
- **Animation Play Widget** This allows the audience to start, plause and end the animation. 
- **Hour Slider** There is an int slider widget for choosing hour in this plot. (**Attention:** this widget is connected to Animation Play widget, that is, any change of animation will affect this widget and vice cersa. We recommend you not use this widget after playing animation, or use this widget after you end your animation if you want to zoom into the heatmap of a specific hour).

### 2.3 Customized Percentile Check
#### Purpose
***
This plot will focus on answering **Difference of desired lines and spots on event days and other days.**

#### Rationale and Approach
***
Since we are comparing desired lines and spots for different discrete time periods, we will apply **cumulative heatmap** method as above. If the audience want to know the difference and similarity of heatmaps on event days and all. the days, she/he can choose two plot culmulative heatmaps for those days.

Therefore, for **data usage** of this plot, similar to previous part, but we want heatmap matrix objects for pedestrian for four different time periods in a daily basis.

    1. Event Days (On the event calendar)
    2. Sundays 
    3. Top 20 Days with Highest pedestrian dwell counts
    4. Cumulative Heatmap from 2019.2.20 to 2020.1.11
    
We get heatmap matrix for each day under all 3 cameras.
We also need daily Dwell Count data
In order to figure out top 20 Days with Highest pedestrian dwell counts. 
Also, for further data munipulation (culmulative heatmap), we need dwell count data for those days to calculate cumulative heatmap.

For **overall design** of this plot, 
since we want to compare between two heatmaps. We have two seperate subplots for each heatmap, so that the audience is allowed to compare between them directly.

we let the audience to explore and change different **percentiles** (drop low density data points on the heatmap), based on the method of the Quantiled Heatmap, we can see the desired line or spots more intuitive even using cumulative heatmap. The audience is allowed to choose the quantile for both subplots.

Notice that we will plot four **pie charts** on the right hand side of the heatmaps. The upper two pie charts correponds to the upper heatmap, the lower two pie charts correponds to the lower heatmaps. In each group of pie charts, The first pie chart describe the proportion of data points on the heatmap in terms of count (that is essentially percentile) (count of number of points). For the second pie chart, the audience can see how much proportion of density do the points on the plot has. From the density proportion, the audience can visualize how condensed the high density points are. (For example, if for the first heatmap, 20% highest density points have 70% of the total density, while for the second, 20% highest density points have 30% of the total density, we may conclude the density of the first plot is more concentrated).

#### Instruction (Interative Part)
***
- **Camera(Device) Selection** Same design as 2.1.

- **Time(Days) period Selection** We use Dropdown widget to let users to choose days. Two plots for two subplots, the audience compare between any two of them. There are four options for this dropdown, event days, sundays, high dwell count days, all the days.

- **Percentile Slider** There are two percentile sliders for two subplots. The audience can choose any interger value between 0 and 100. The number here means percentile, 100 means drop all data points on the plot.

### 2.4 Customized Spot Check
#### Purpose
***
This plot will focus on answering **Difference of desired lines and spots on event days and other days.**

#### Rationale and Approach
***
Since we are comparing desired lines and spots for different discrete time periods, we will apply **cumulative heatmap** method as above. If the audience want to know the difference and similarity of heatmaps on event days and all. the days, she/he can choose two plot culmulative heatmaps for those days.

First, I want to introduce the core functionality of this plot **Customized Spot Check**.
By previous plots (2.1~2.3), users can easily check desired spots or lines through visualization on heatmap, but that is actually not rigorous enough, because the colormap of heatmap cannot be transformed into the accurate number of density just through visualization. Therefore, we want to introduce a method, Customized Spot Check, specifically for this part. This is useful when the audience want to check the density around a spot, for example, a functionality zone (chair zone), or a piece of area that look like high density on the heatmap.

The audience is allowed to draw one circle on each subplot given its center coordinate and radius, and our page will help to calculate the total density inside that circle, and also the total density of the heatmap. Also, it will calcuate the area of this circle and compare to the total area. Therefore, if a user draws a circle, which has 2.4% of area on the heatmap, but inside that circle, it has 10% of total density on the heatmap, the user may consider that point(circle) as a potential desired spot, and then do further investigation. Notice that the user is not allowed to draw different circles on two plots, because that is useless for comparison.

The **data usage** is the same as 2.3. 

Notice that the focus of this part is customized spot check, so the user can choose to not show **culumative heatmap**. The advantage of not showing heatmap is that, user may want to choose the circle based on the object or functionality zone on the heatmap, but the heatmap will hide the vision.

Similar to 2.3, we have four **pie charts**. Two for each heatmap (subplot). The upper two pie charts correponds to the upper heatmap, the lower two pie charts correponds to the lower heatmaps.
In each group of pie charts, The first pie chart visualize the proportion of density inside the user selected circle. The second pie chart represents the proportion of the area that circle occupies in the whole plot.

#### Instruction (Interative Part)
***
- **Camera(Device) Selection** Same rationale and design as 2.1.
- **Time(Days) period Selection** We use Dropdown widget to let users to choose days. Two plots for two subplots, the audience compare between any two of them. There are four options for this dropdown, event days, sundays, high dwell count days, all the days.
- **Show Heatmap checkbox** We use checkbox to allow the audience to choose to show heatmap on the plot or not. The advantage of this is in the design part.
- **Coordinate Slider** There are two coordinate sliders for two axies. It is the center of the circle. The audience is allowed to choose the coordinate value between the range so that circle will not be plotted outside the boundary even given the highest radius. Because of this restriction, intsilder widget may be better than a text box to fill in the number.
-  **Radius Slider** This will give us the radius of the circle. The maximum of the radius is 50, the minmum is 10. Too large or too small value of radius will be useless for finding desired spots.

### 2.5 Conflicts with Cars

# 3. Maintenance Schedule

### 3.1 Maintenance Schedule by Pedestrian Count or Dwell Time
This plot is a calendar heatmap that shows maintenance schedule by pedestrian count or dwell time, based on the `metric` selected by the user. 

We define our metrics as such:  
`Pedestrian count`: We choose the maximum hourly pedestrian count as a metric for people, as the total daily pedestrian count may be an overestimate because people would exit and re-enter the area within the same day.   
`Dwell time`: We multiply the pedestrian count by mean dwell time to get total dwell time of each day.  

The user can select a `threshold`, and the default is set as 500. The maintenance schedule is planned as such: First we would check if it is a busy day with pedestrian count or dwell time more than the threshold. If so, an extra maintenance would be schedule on that day; otherwise, this number is added to the number of unmaintained visitors or hours. When the unmaintained number exceeds the threshold, a regular maintenance would be scheduled. Every time a maintenace is scheduled, the unmaintained number clears to zero. 

### 3.2 Maintenance Time During the Day
This plot is a calendar heatmap that shows the average pedestrian count or dwell time by hour during each day of week, based on the `metric` selected by the user. The user can select an area from Streetscape, Under Raincoat, or Outside, or select "all" to see the sum of these three areas. The user can also customize the time range to display data from that period.

The user can select either `extra` or `regular` service. If `regular` is selected, busy hours with a number over `threshold` would not be included in the data displayed.

# 4. Discussion Around Privacy