# Group Tracking Exercise: How fast were you and who did you spend time with?

## 1. Preliminaries

Most of this was already in last week's notebook. We load the data, plot it on a map, and define functions that will be useful later. The main novelty compared to last week is the `path2name` function, which we use to build a list (`names`) with the name of each track's submitter.

### 1.1 Import modules

Import all the necessary modules to handle files (os, glob), dates and times (datetime), maps (mplleaflet), gpx tracking files (gpxpy), lists/arrays/stats (numpy), and plotting (pyplot).

In [None]:
import os, datetime, mplleaflet, glob, gpxpy
import numpy as np
import matplotlib.pyplot as plt

### 1.2 Functions that we will use (distance, get_points and path2name)

#### Distance function:
This function computes the distance between 2 points in meters from their respective longitudes and latitudes. You don't need to modify it. You will need to use it.

#### get_points function: 
This function extracts the GPX point objects contained in a gpx files. We did something similar last week, it just wasn't in a function. You don't need to modify it. We'll use it to prepare the data for the tasks.

#### path2name function:
The path2name function is new and we will use it to extract just the name of each person from the files, and  avoid any extra information. You don't need to modify it. We'll use it to create a list of the names of the submitter of each track which will be useful to visualize results and plots.

In [None]:
# distance function:
def distance(lo1,la1,lo2,la2):      
    ''' 
    Compute the distance between two points ignoring altitudes.
    
    Arguments:
      lo1: Longitude of point 1, in degrees.
      la1: Latitude of point 1, in degrees.
      lo2: Longitude of point 2, in degrees.
      la2: Latitude of point 2, in degrees. 
    Output:
      Distance between the two points in meters.
    '''
    return gpxpy.geo.distance(lo1,la1,None,lo2,la2,None)

# get_points function:
def get_points(path):
    '''
    List of GPX points in the first segment of the first 
    track of a GPX file.
    
    Arguments:
      path: Path to the GPX file.
    Output:
      A list of GPX points.
    '''
    file_object = open(path)
    gpx_object = gpxpy.parse(file_object)
    file_object.close()
    points = gpx_object.tracks[0].segments[0].points
    for p in points:
        p.time = p.time.replace(tzinfo=None)
    return points

# path2name function:
def path2name(path):
    '''
    Extract the submitter's name from the path to a file submitted to canvas.
    
    Input:
      path: Path to the file.
    Output:
      Submitter's name (last name then first name, no spaces, all lower case).
    '''
    filename = os.path.basename(path) # Discard folder name(s), keep file name.
    name = filename.split('_')[0] # Discard everything after the first underscore.
    return name

### 1.3 Load all students GPX files
Today we are going to load the files by making an alphabetically sorted list of all the available gpx files. For this, we will first load the files, then we will make a list of the names of people that submitted files, followed by another list of GPX points contained in each file.

In [None]:
# Load files
gpx_files = sorted(glob.glob('tracks/*.gpx'))

# Make a list of the names of the people who submitted the files to canvas.
# names[i] is the name of the person who submitted gpx_file[i].
names = []
for path in gpx_files:
    names.append(path2name(path))

# Make a list of the list of GPX points contained in each file.
# tracks[i] is the list of points contained in gpx_files[i].
tracks = []
for path in gpx_files:
    tracks.append(get_points(path))

### 1.4 Define and plot the GPS Coordinates and names for each Den
The three dens are the two soccer goals on the field south east of the SR building and the sign in front of the volleyball court, also southeast of the SR building.

We plot the tracks by using a for-loop an by extracting all the the longitude (lo) and latitude (la) for each point in each track. You should see as many tracks as students in the classroom, and three squares for the three dens.

In [None]:
# Define the dens (GPS coordinates and name).
goal_SW  = [-80.117331, 26.887141, 'SW soccer goal']
goal_SE  = [-80.116789, 26.886914, 'SE soccer goal']
VB_court = [-80.117268, 26.886723, 'Volleyball court sign']
# Make a list of the dens so we can loop over the dens later.
dens     = [goal_SW, goal_SE, VB_court] 

# Plot every track as a line and every den as a red square.
plt.figure(figsize=(8,8))
for den in dens:
    plt.scatter(den[0],den[1],s=200,marker='s',color='r')
for points in tracks:
    lo = []
    la = []
    for p in points:
        lo.append(p.longitude)
        la.append(p.latitude)
    plt.plot(lo,la)
mplleaflet.display()
# mplleaflet.show()

## 2. Fully automatic identification of each track's den

Here we solve task 4 from last week to illustrate that the identification of each track's den can be fully automated.

To do that, we need the argmax function from module numpy. It takes a list and returns the index of the largest element in that list:

### 2.1 Example or argmax function

In [None]:
print(np.argmax([2,7,5])) # The largest element is 7, located at index 1.
print(np.argmax([2,7,19])) # The largest element is 19, located at index 2.

# Imagine those lists were the counts of points near each den.
# Once we have the index of the den with the largest count, we can go back to
# the list of dens to retrieve the den's name:
i = np.argmax([2,7,19])
print(dens[i][2])

### 2.2 Automatic Detection of each track's den

Now we can implement a fully automatic detection of each track's den. For each track, we count the number of points within the threshold distance of each den, then use argmax to find the den with the largest count. Then we print "[name of the student] --> [name of the den]." and move on to the next track.

In [None]:
threshold = 3 #define the threshold
for i in range(len(tracks)):  #loop over all the tracks
    den_counts = []  #make a den-counts list
    for den in dens:  #second loop over all dens
        count = 0   #start counts at 0
        for p in tracks[i]:  # make a third loop over all the points in each track
            d = distance(p.longitude,p.latitude,den[0],den[1]) #give the specific longitude and latitude for each point
            if d<threshold:  # add an if statement to only select the distances below the threshold
                count = count+1 #if the distance is less than the threshold count one (this will give the counts)
        den_counts.append(count) #append your counts to add them all in the den_counts list
    
    # Find the index of the largest count.
    j = np.argmax(den_counts) #use the armax function to find the index of teh largest count
    print(f'{names[i]} --> {dens[j][2]}') #print including the names of each student and the names of the dens

## 3. Pair proximity

We would now like to identify pairs of students that were close to each other during (a good part of) the tracking. The core idea is similar to the one we used to identify each student's den: given a pair of tracks, we're going to count how many times they were detected within 3 meters of each other. On the other hand the problem is made much trickier by the fact that both targets are moving. Two tracks coming near the same location is not enough; they need to come near the same location *at the same time*. To make that comparison possible, below we create a new set of "synchronized" tracks such that all the first points have the same time, all the second points have the same time, etc.

### 3.1. Define functions to interpolate tracks 
Here you have a series of functions to creates the synchronized tracks. It uses a lot of functions and code we covered in the class and a few new ones, like interp1d. You will be only using these functions, you don't need to modify this code.

In [None]:
# Convert between datetimes and seconds since a reference date.
t0 = tracks[0][0].time # Use the first point of the first track as a reference.
def date2seconds(t,t0=t0):
    return (t-t0).total_seconds()
def seconds2date(t,t0=t0):
    return t0+datetime.timedelta(seconds=t)
    
# Create a function that interpolates a track.
# We'll make use of the function interp1d from submodule interpolate in module scipy.
from scipy.interpolate import interp1d
def make_interpolated_track(track):
    sec  = [ date2seconds(p.time) for p in track ]
    lon  = [ p.longitude for p in track ]
    lat  = [ p.latitude for p in track ]
    lon_ = interp1d(sec,lon) #,fill_value='extrapolate')
    lat_ = interp1d(sec,lat) #,fill_value='extrapolate')
    return lambda t: [float(lon_(t)),float(lat_(t))]

# Identify the time interval during which every one was tracking.
t_start = max([ track[0].time for track in tracks ])
t_stop  = min([ track[-1].time for track in tracks ])
# Number of seconds for which everyone was tracking:
seconds_all_tracking = (t_stop-t_start).total_seconds()
# print(seconds_all_tracking)

# Create of new set of "synchronized" tracks, one for each original track.
# The synchronized tracks all have points at the exact same times.
T = np.linspace(date2seconds(t_start),date2seconds(t_stop),seconds_all_tracking)
tracks_synchronized = []
for track in tracks:
    track_interpolated = make_interpolated_track(track)
    track_synchronized = [ [seconds2date(t)]+track_interpolated(t) for t in T ]
    tracks_synchronized.append(track_synchronized)

# # Plot all the synchronized tracks and the dens.
# plt.figure(figsize=(8,8))
# for den in dens:
#     plt.scatter(den[0],den[1],s=200,marker='s',color='red')
# for points in tracks_synchronized:
#     lo = [ p[1] for p in points ]    
#     la = [ p[2] for p in points ]
#     plt.plot(lo,la)
# mplleaflet.display()
# # mplleaflet.show()

<div class="alert alert-block alert-danger">
<b>Task 1a:</b>  

Count the number of times track 0 and track 1 were within 3 meters of each other.
</div>

*Outline:*  
Set a counter to zero. Loop over the indices of either track, use the tracks_synchronized list (the synchronized tracks all have the same indices). At each index i, compute the distance between point i on track 0 and point i on track 1. If it's less than 3 meters, increase the counter by 1. Once you're done, print the counter.

<div class="alert alert-block alert-danger">
<b>Task 1b:</b>  

Write a function that takes two track indices i and j and returns the number of times the two tracks were within 3 meters of each other.
</div>

*Hint:*  
Same as the previous task except this time you will have to define a function using tracks i and j instead of 0 and 1.

<div class="alert alert-block alert-danger">
<b>Task 1c:</b>  

Find every pair of students that was detected at least 60 times (1 minute) within 3 meters of each other. For each pair, print the name of each student and the proximity count.
</div>

*Outline:*  
Loop over the students. Inside that loop, loop over the students again. This makes it possible to consider every possible pair of students in turn. For each pair, use the function from the previous task to count the number of times there were within 3 meters of each other. If that count is larger than 60 print the name of the two students and the count. If the pair consists of the same student twice, don't print anything.

<div class="alert alert-block alert-danger">
<b>Extra Task 1d:</b>  

The method outlined in the previous task leads to printing every pair of students twice (the second time the order of the students is reversed). Rewrite your solution to avoid this problem.
</div>

*Hint:*  
Only consider pairs of student indices in which the second index is smaller than the first.

## 4. Speed

Here we plot everyone's speed against time, then compute everyone's mean and top speed and show them as bar plots.

### 4.1. Extract the speed and time

In this section we build two lists, one with the speed (speeds) and other one with the time (times) for each point in the tracks. You will use these lists to later plot the speed of each track.

In [None]:
# Make a list of each track's instantaneous speed (in m/s) vs time.
# For each point of each track, the speed is obtained by computing
# the distance traveled and the time elapsed since the last point on the track
# and dividing one by the other.

times  = []
speeds = []
for i in range(len(tracks)):
    track = tracks[i]
    # Create empty lists to hold the time and speed on this track.
    time  = []
    speed = []
    # Loop over the points on this track. Skip the first point as it has no previous point.
    for j in range(1,len(track)):
        p1 = track[j-1] # previous point
        p2 = track[j]   # current point
        t  = (p2.time-p1.time).total_seconds() # seconds elapsed
        d  = distance(p1.longitude,p1.latitude,p2.longitude,p2.latitude) # distance traveled
        # One of the tracks recorded two points with the same timestamp but different positions.
        # We need to disregard it to avoid a division by zero error.
        if t!=0:
            time.append(p1.time)
            speed.append(d/t)
    speeds.append(speed)
    times.append(time)

### 4.2. Plot the speed vs time

<div class="alert alert-block alert-danger">
<b>Task 2:</b>  
Plot each track's speed vs time, one curve per track. Make a legend with the name of each track's submitter. Discuss the result. What do you think is happening at the very beginning? How can you get a better look at what happens later on?
</div>

### 4.3. Compute the mean and top speed

To compute means and other statistical properties in previous weeks, we always started from a data table in the form of a pandas DataFrame. Something like this:  
<code>import pandas as pd
data = pd.read_excel('some_excel_file.xsls')
print(data['some_column_name'].mean()) # print the mean
print(data['some_column_name'].sum())  # print the sum</code>

This is convenient if the data came from a spreadsheet. When the data is in a list, it's more convenient to use the numpy version of the statistical functions:

In [None]:
import numpy as np

data = [1,43,6,7,9,7,3]
print(np.mean(data))
print(np.std(data))

# The count and max function have no direct numpy counterpart
# because they already exist in basic python:
print(len(data))
print(max(data))

# The sum function is defined both in basic python and inside numpy.
print(sum(data))
print(np.sum(data))

<div class="alert alert-block alert-danger">
<b>Task 3:</b>  
Create a list with the mean speed of every student. Plot it as a bar chart with the students' names on the x axis and the mean speed on the y axis.  

<br/>
Do the same with the top speed.
</div>

*Outline:* Create empty lists for the mean and top speeds. Loop over the lists of instantaneous speeds. For each student, compute the mean and the maximum of their list of speeds and append them to the list of mean speeds and the list of top speeds respectively.