## Notebook to test the effectiveness of the Stay Point Detection Algorithm (SPDA) <a name="top"></a>

This algorithm was proposed a group of researchers from China ([link to research paper](https://github.com/tyqiangz/Trajectory-Data-Mining/blob/master/Useful%20Research%20Materials/Stay%20Point%20Analysis%20in%20Automatic%20Identification%20System%20Trajectory%20Data.pdf)), to detect regions (called **stay points**) from a dataset of records with timestamps, latitude, longitude variables. Stay points are regions where moving objects are relatively stationary within a region of size not more than `distThres` metres, have stayed there for at least `timeThres` seconds and have at least `minPoints` number of geolocation recorded in that region.

The researchers have proposed to set the parameters as `distThres=200, timeThres=30*60, minPoints=50`. Depending on the quality of the geolocation records, these parameters may not be optimal. In this notebook I list down what kind of travelling patterns will have stay points detected, which doesn't.

<hr></hr>

**Some common scenarios where SPDA is accurate:**

- [Scenario #1](#scenario1): The object moved then stayed completely stationary for a while, then moved away
- [Scenario #2](#scenario2): The object moved then loitered around a few buildings for a while, then moved away.
- [Scenario #3](#scenario3): The object is jumping back and forth over a large distance.
- [Scenario #4](#scenario4): The object is jumping back and forth along a road.
- [Scenario #5](#scenario5): The object is jumping around in a square shape pattern over a large distance.
- [Scenario #6](#scenario6): The object is jumping around in a square shape pattern over a small distance

In [1]:
import pandas as pd
import numpy as np
from statistics import mean, median
from math import radians, cos, sin, asin, sqrt
from datetime import datetime, timedelta
import folium

The following are SPDA and functions that SPDA relies on

In [21]:
def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance (in metres) between two points on the earth (specified in decimal degrees)
    
    :param lon1: longitude of point 1
    :param lat1: longitude of point 1
    :param lon2: longitude of point 2
    :param lat2: longitude of point 2
    :return: the distance between (lon1, lat1) and (lon2, lat2), in metres
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    r = 6378.1 # Radius of earth in kilometers. Use 3956 for miles
    return c * r * 1000

class stayPoint:
    def __init__(self, arrivalTime, departTime, startIndex, endIndex, location):
        '''
        :param arrivalTime: The time when the moving object arrived at this stay point.
        :param departTime: The time when the moving object departed this stay point.
        :param startIndex: The index in the object's trajectory dataset corresponding to `arrivalTime`.
        :param endIndex: The index in the object's trajectory dataset corresponding to `departTime`.
        :param location: The [`lon`, `lat`] values corresponding to the location of this stay point.
        '''
        self.arrivalTime = arrivalTime
        self.departTime = departTime
        self.startIndex = startIndex
        self.endIndex = endIndex
        self.location = location
        
    def toString(self):
        '''
        prints all the information about this stay point.
        '''
        print(f"(arrivalTime: {self.arrivalTime}, departTime: {self.departTime}, startIndex: {self.startIndex}, "+
            f"endIndex: {self.endIndex}, location: {self.location})")
        
def SPDA(traj, distThres=200, timeThres=30*60, minPoints=50):
    '''
    :param traj: a dataframe with `lat`, `lon` and `time` variables
    :param distThres: a threshold of the distance (in metres)
    :param timeThres: a threshold of the time (in seconds)
    :param minPoints: the minimum no. of points required in a stay-point region
    :output: a set of stay-points
    '''
    def distance(pointA, pointB):
        '''
        :param pointA: a point with lat and lon variables
        :param pointB: a point with lat and lon variables
        :return: the distance between pointA and pointB calculated by Haversine formula
        '''
        return haversine(pointA.lon, pointA.lat, pointB.lon, pointB.lat)
    
    def getCentroid(points, centroid_type):
        '''
        :param points: a list of points with lat and lon variables
        :param centroid_type: "median" or "mean"
        :return: the centre of the list of points, calculated by centroid_type function
        '''
#         print("centroid points:\n", points)
        if centroid_type == "median":
            return [median(points.loc[:,"lon"]), median(points.loc[:,"lat"])]
        elif centroid_type == "mean":
            return [mean(points.loc[:,"lon"]), mean(points.loc[:,"lat"])]
        
    i = 0
    pointNum = len(traj)
    stayPoints = []
    
    while i < pointNum:
        j = i+1
        token = 0
        while j < pointNum:
            print("Analysing point: " + str(j) + " "*10, "\r", end="")
            dist = distance(traj.iloc[j,:], traj.iloc[i,:])
            if dist > distThres:
                timeDiff = (traj.time[j] - traj.time[i]).total_seconds()
                if (timeDiff > timeThres) and (j-i >= minPoints):
                    centroid = getCentroid(traj.loc[i:(j-1),:], "median")
                    stayPoints.append(
                        stayPoint(
                            arrivalTime = traj.time[i], 
                            departTime = traj.time[j], 
                            startIndex = i,
                            endIndex = j,
                            location = centroid
                        )
                    )
                    
                    i = j
                    token = 1
                break
            j += 1
            
        if token != 1:
            i += 1
            
    return stayPoints

A function to make plotting more convenient

In [3]:
def plot(traj, stayPoints):
    '''
    :param traj: a dataframe containing `time`, `lat`, `lon` variables
    :param stayPoints: a list of objects of `stayPoint` class
    :return: -
        Plots the travel history together with the stayPoints on a map
    '''
    
    m = folium.Map(location=[traj.lat[0], traj.lon[0]])
    
    for i in range(len(traj)):
        if i == 0:
            folium.Marker(location=[traj.lat[i], traj.lon[i]], 
                          tooltip='Start point',
                          icon=folium.Icon(color='green',icon='none')
                         ).add_to(m)
        if i == len(traj)-1:
            folium.Marker(location=[traj.lat[i], traj.lon[i]], 
                          tooltip='End point',
                          icon=folium.Icon(color='red',icon='none')
                         ).add_to(m)
            
        folium.CircleMarker(location=[traj.lat[i], traj.lon[i]], radius=5, color="black").add_to(m)
            
    for stayPoint in stayPoints:
        folium.CircleMarker(location=[stayPoint.location[1], stayPoint.location[0]], radius=5).add_to(m)
    
    return m

In [151]:
%%html
<style>
table{margin-left: 0 !important;}
</style>

All travel histories start at 2020 Jan 1st. You can change it to whatever date you want.

In [4]:
STARTDATE = datetime(2020, 1, 1)

## Scenario 1 <a name="scenario1"></a>

The object moved then **stayed stationary** for a while, then moved away, **1 STAY POINT DETECTED**.

[Back to top](#top)

In [32]:
traj = pd.DataFrame(columns=["time", "lat", "lon"])

traj.loc[0, :] = [STARTDATE, 1.3521, 103.8198]

for i in range(1, 60):
    randSmallNums = np.random.uniform(low=0, high=0, size=2)
    traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4927+randSmallNums[0], 103.7414+randSmallNums[1]]
    
traj.loc[60, :]  = [traj.time[60-1] + timedelta(seconds=60), 1.35, 103.9]

In [33]:
traj.head()

Unnamed: 0,time,lat,lon
0,2020-01-01 00:00:00,1.3521,103.82
1,2020-01-01 00:01:00,1.4927,103.741
2,2020-01-01 00:02:00,1.4927,103.741
3,2020-01-01 00:03:00,1.4927,103.741
4,2020-01-01 00:04:00,1.4927,103.741


In [34]:
traj.tail()

Unnamed: 0,time,lat,lon
56,2020-01-01 00:56:00,1.4927,103.741
57,2020-01-01 00:57:00,1.4927,103.741
58,2020-01-01 00:58:00,1.4927,103.741
59,2020-01-01 00:59:00,1.4927,103.741
60,2020-01-01 01:00:00,1.35,103.9


In [35]:
stayPoints = SPDA(traj)
for point in stayPoints:
    point.toString()

Analysing point: 1           Analysing point: 2           Analysing point: 3           Analysing point: 4           Analysing point: 5           Analysing point: 6           Analysing point: 7           Analysing point: 8           Analysing point: 9           Analysing point: 10           Analysing point: 11           Analysing point: 12           Analysing point: 13           Analysing point: 14           Analysing point: 15           Analysing point: 16           Analysing point: 17           Analysing point: 18           Analysing point: 19           Analysing point: 20           Analysing point: 21           Analysing point: 22           Analysing point: 23           Analysing point: 24           Analysing point: 25           Analysing point: 26           Analysing point: 27           Analysing point: 28           Analysing point: 29           Analysing point: 30           Analysing point: 31           Analysing point: 32           Analysing point: 

In [36]:
plot(traj, stayPoints)

## Scenario 2 <a name="scenario2"></a>

The object moved then **loiter around a few buildings** for a while, then moved away. **1 STAY POINT DETECTED**

[Back to top](#top)

In [37]:
traj = pd.DataFrame(columns=["time", "lat", "lon"])

traj.loc[0, :] = [STARTDATE, 1.3521, 103.8198]

for i in range(1, 60):
    randSmallNums = np.random.uniform(low=0, high=0.001, size=2)
    traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4927+randSmallNums[0], 103.7414+randSmallNums[1]]
    
traj.loc[60, :]  = [traj.time[60-1] + timedelta(seconds=60), 1.36, 103.71]

In [38]:
traj.head()

Unnamed: 0,time,lat,lon
0,2020-01-01 00:00:00,1.3521,103.82
1,2020-01-01 00:01:00,1.49326,103.742
2,2020-01-01 00:02:00,1.49311,103.742
3,2020-01-01 00:03:00,1.4931,103.742
4,2020-01-01 00:04:00,1.4932,103.742


In [39]:
traj.tail()

Unnamed: 0,time,lat,lon
56,2020-01-01 00:56:00,1.49306,103.741
57,2020-01-01 00:57:00,1.49353,103.741
58,2020-01-01 00:58:00,1.49337,103.741
59,2020-01-01 00:59:00,1.49345,103.742
60,2020-01-01 01:00:00,1.36,103.71


In [40]:
stayPoints = SPDA(traj)
for point in stayPoints:
    point.toString()

Analysing point: 1           Analysing point: 2           Analysing point: 3           Analysing point: 4           Analysing point: 5           Analysing point: 6           Analysing point: 7           Analysing point: 8           Analysing point: 9           Analysing point: 10           Analysing point: 11           Analysing point: 12           Analysing point: 13           Analysing point: 14           Analysing point: 15           Analysing point: 16           Analysing point: 17           Analysing point: 18           Analysing point: 19           Analysing point: 20           Analysing point: 21           Analysing point: 22           Analysing point: 23           Analysing point: 24           Analysing point: 25           Analysing point: 26           Analysing point: 27           Analysing point: 28           Analysing point: 29           Analysing point: 30           Analysing point: 31           Analysing point: 32           Analysing point: 

In [41]:
plot(traj, stayPoints)

## Scenario 3 <a name="scenario3"> </a>

The object is jumping **back and forth** over a **large distance**. **NO STAY POINTS DETECTED**

[Back to top](#top)

In [57]:
traj = pd.DataFrame(columns=["time", "lat", "lon"])

traj.loc[0, :] = [STARTDATE, 1.4521, 103.8198]

for i in range(1, 60):
    randSmallNums = np.random.uniform(low=0, high=0.01, size=2)
    
    if i % 2 == 1:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4927+randSmallNums[0], 103.7414+randSmallNums[1]]
    else:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4521+randSmallNums[0], 103.8198+randSmallNums[1]]

In [58]:
traj.head()

Unnamed: 0,time,lat,lon
0,2020-01-01 00:00:00,1.4521,103.82
1,2020-01-01 00:01:00,1.49551,103.743
2,2020-01-01 00:02:00,1.4535,103.82
3,2020-01-01 00:03:00,1.49711,103.743
4,2020-01-01 00:04:00,1.45484,103.828


In [59]:
traj.tail()

Unnamed: 0,time,lat,lon
55,2020-01-01 00:55:00,1.49365,103.742
56,2020-01-01 00:56:00,1.4618,103.825
57,2020-01-01 00:57:00,1.49563,103.75
58,2020-01-01 00:58:00,1.45831,103.821
59,2020-01-01 00:59:00,1.49372,103.75


In [60]:
stayPoints = SPDA(traj)
for point in stayPoints:
    point.toString()

Analysing point: 1           Analysing point: 2           Analysing point: 3           Analysing point: 4           Analysing point: 5           Analysing point: 6           Analysing point: 7           Analysing point: 8           Analysing point: 9           Analysing point: 10           Analysing point: 11           Analysing point: 12           Analysing point: 13           Analysing point: 14           Analysing point: 15           Analysing point: 16           Analysing point: 17           Analysing point: 18           Analysing point: 19           Analysing point: 20           Analysing point: 21           Analysing point: 22           Analysing point: 23           Analysing point: 24           Analysing point: 25           Analysing point: 26           Analysing point: 27           Analysing point: 28           Analysing point: 29           Analysing point: 30           Analysing point: 31           Analysing point: 32           Analysing point: 

In [61]:
plot(traj, stayPoints)

## Scenario 4 <a name="scenario4"></a>

The object is jumping **back and forth** along a **road** (less than 200m). **1 STAY POINT DETECTED**

[Back to top](#top)

In [128]:
traj = pd.DataFrame(columns=["time", "lat", "lon"])

traj.loc[0, :] = [STARTDATE, 1.4627, 103.7359]

for i in range(1, 60):
    randSmallNums = np.random.uniform(low=0, high=0.001, size=2)
    
    if i % 2 == 1:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4917+randSmallNums[0], 103.7354+randSmallNums[1]]
    else:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4927+randSmallNums[0], 103.7359+randSmallNums[1]]
        
traj.loc[60, :]  = [traj.time[60-1] + timedelta(seconds=60), 1.36, 103.71]

In [129]:
traj.head()

Unnamed: 0,time,lat,lon
0,2020-01-01 00:00:00,1.4627,103.736
1,2020-01-01 00:01:00,1.49253,103.736
2,2020-01-01 00:02:00,1.49291,103.737
3,2020-01-01 00:03:00,1.49192,103.736
4,2020-01-01 00:04:00,1.49306,103.737


In [130]:
traj.tail()

Unnamed: 0,time,lat,lon
56,2020-01-01 00:56:00,1.49322,103.736
57,2020-01-01 00:57:00,1.49242,103.736
58,2020-01-01 00:58:00,1.49326,103.736
59,2020-01-01 00:59:00,1.49192,103.736
60,2020-01-01 01:00:00,1.36,103.71


In [131]:
stayPoints = SPDA(traj)
for point in stayPoints:
    point.toString()

Analysing point: 1           Analysing point: 2           Analysing point: 3           Analysing point: 4           Analysing point: 5           Analysing point: 6           Analysing point: 7           Analysing point: 8           Analysing point: 9           Analysing point: 10           Analysing point: 11           Analysing point: 12           Analysing point: 13           Analysing point: 14           Analysing point: 15           Analysing point: 16           Analysing point: 17           Analysing point: 18           Analysing point: 19           Analysing point: 20           Analysing point: 21           Analysing point: 22           Analysing point: 23           Analysing point: 24           Analysing point: 25           Analysing point: 26           Analysing point: 27           Analysing point: 28           Analysing point: 29           Analysing point: 30           Analysing point: 31           Analysing point: 32           Analysing point: 

In [132]:
plot(traj, stayPoints)

## Scenario 5 <a name="scenario5"></a>

The object is jumping **around in a square shape pattern** over a **large distance**. **NO STAY POINT DETECTED**

[Back to top](#top)

In [176]:
traj = pd.DataFrame(columns=["time", "lat", "lon"])

traj.loc[0, :] = [STARTDATE, 1.4627, 103.7359]

for i in range(1, 60):
    randSmallNums = np.random.uniform(low=0, high=0.001, size=2)
    
    if i % 4 == 0:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4627+randSmallNums[0], 103.7359+randSmallNums[1]]
    elif i % 4 == 1:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4687+randSmallNums[0], 103.7359+randSmallNums[1]]
    elif i % 4 == 2:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4687+randSmallNums[0], 103.7459+randSmallNums[1]]
    elif i % 4 == 3:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4627+randSmallNums[0], 103.7459+randSmallNums[1]]

In [177]:
traj.head()

Unnamed: 0,time,lat,lon
0,2020-01-01 00:00:00,1.4627,103.736
1,2020-01-01 00:01:00,1.4696,103.736
2,2020-01-01 00:02:00,1.46954,103.746
3,2020-01-01 00:03:00,1.46284,103.746
4,2020-01-01 00:04:00,1.46286,103.736


In [178]:
traj.tail()

Unnamed: 0,time,lat,lon
55,2020-01-01 00:55:00,1.46305,103.746
56,2020-01-01 00:56:00,1.46342,103.737
57,2020-01-01 00:57:00,1.46873,103.736
58,2020-01-01 00:58:00,1.4694,103.747
59,2020-01-01 00:59:00,1.46316,103.747


In [179]:
stayPoints = SPDA(traj)
for point in stayPoints:
    point.toString()

Analysing point: 1           Analysing point: 2           Analysing point: 3           Analysing point: 4           Analysing point: 5           Analysing point: 6           Analysing point: 7           Analysing point: 8           Analysing point: 9           Analysing point: 10           Analysing point: 11           Analysing point: 12           Analysing point: 13           Analysing point: 14           Analysing point: 15           Analysing point: 16           Analysing point: 17           Analysing point: 18           Analysing point: 19           Analysing point: 20           Analysing point: 21           Analysing point: 22           Analysing point: 23           Analysing point: 24           Analysing point: 25           Analysing point: 26           Analysing point: 27           Analysing point: 28           Analysing point: 29           Analysing point: 30           Analysing point: 31           Analysing point: 32           Analysing point: 

In [180]:
plot(traj, stayPoints)

## Scenario 6 <a name="scenario6"></a>

The object is jumping **around in a square shape pattern** over a **small distance**

[Back to top](#top)

In [173]:
traj = pd.DataFrame(columns=["time", "lat", "lon"])

traj.loc[0, :] = [STARTDATE, 1.4627, 103.7359]

for i in range(1, 60):
    randSmallNums = np.random.uniform(low=0, high=0.0005, size=2)
    
    if i % 4 == 0:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4627+randSmallNums[0], 103.7399+randSmallNums[1]]
    elif i % 4 == 1:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4637+randSmallNums[0], 103.7399+randSmallNums[1]]
    elif i % 4 == 2:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4637+randSmallNums[0], 103.7409+randSmallNums[1]]
    elif i % 4 == 3:
        traj.loc[i, :] = [traj.time[i-1] + timedelta(seconds=60), 1.4627+randSmallNums[0], 103.7409+randSmallNums[1]]
        
traj.loc[60, :] = [traj.time[i-1] + timedelta(seconds=60), 1.46, 103.75]

In [174]:
stayPoints = SPDA(traj)
for point in stayPoints:
    point.toString()

Analysing point: 1           Analysing point: 2           Analysing point: 3           Analysing point: 4           Analysing point: 5           Analysing point: 6           Analysing point: 7           Analysing point: 8           Analysing point: 9           Analysing point: 10           Analysing point: 11           Analysing point: 12           Analysing point: 13           Analysing point: 14           Analysing point: 15           Analysing point: 16           Analysing point: 17           Analysing point: 18           Analysing point: 19           Analysing point: 20           Analysing point: 21           Analysing point: 22           Analysing point: 23           Analysing point: 24           Analysing point: 25           Analysing point: 26           Analysing point: 27           Analysing point: 28           Analysing point: 29           Analysing point: 30           Analysing point: 31           Analysing point: 32           Analysing point: 

In [175]:
plot(traj, stayPoints)