# Virtual Running Relay
West End Runners vs Wellingborough & District Athletic Club  
9 May 2020

35 runners per club, 30 minute slots, how far in total can each club go? Most cummulative mileage wins.

A simple analysis of the total mileage difference between two running clubs, over the course of the day, in order to practice basic featuring engineering and data visualisation.

In [1]:
#load libraries
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline 

In [2]:
data_dir="/Users/Tamela/CODE/src/github.com/tm419/Virtual Running Relays/"

In [None]:
df = pd.read_csv(data_dir+"WEvsWP_data/X")

### Data Exploration 

In [None]:
df.head(34)

In [None]:
df.dtypes

In [None]:
df.describe()

Note we'll also want to strip the seconds from the time values, and perhaps convert the time values from object to timestamp.

### Data cleaning and feature engineering

In [None]:
#rename columns to avoid duplicates of 'distance_miles', and remove uppercase Time

df=df.rename(columns={"Time":"time","distance_miles":"wer_miles","distance_miles.1":"wel_miles"})

In [None]:
# Strip the seconds off the time string
df['time']=df['time'].str.replace('00:00', '00')
df['time']=df['time'].str.replace('30:00', '30')
df.head()
# Convert the time values from object to timestamp.
#pd.to_datetime(df['time'],format= '%H:%M').dt.time

In [None]:
df.describe()

Using the standard descriptive statistics from pd.describe() we can see that Wigston Phoenix (WP) runners had a slightly higher average distance over 30 minutes than West End Runners (WER):  
* WP mean mileage: 4.01 miles  
* WER mean mileage: 4.0 miles

WP had a greater spread of distances, between 2.53 miles and 5.4 miles, compared to 
WER, between 2.74 miles and 5.17 miles.

In [None]:
#calculate total mileage per club
sum_wer = sum(df['wer_miles'])
sum_wp = sum(df['wp_miles'])
print("West End Runners total mileage: %.2f miles" % sum_wer)
print("Wigston Phoenix total mileage: %.2f miles "% sum_wp) 
print("Difference between clubs: %.2f miles" % (sum_wp-sum_wer))

Looking at the total mileage from each club over the day, we can see that Wigston Phoenix beat West End Runners by just 0.33 miles.

Let's analyse this further and see how the running total between the clubs varied over the day.

In [None]:
#create new features (columns): cummulative mileage per club. 'wer_total_mileage'

def cummulative_total(runner_miles):
    '''Calculate the cummulative running total of miles per club'''
    global total_mileage
    total_mileage=total_mileage+runner_miles
    return total_mileage

#calculate running total for West End Runners (WER)
total_mileage=0
df['wer_total_miles']=df['wer_miles'].apply(cummulative_total)

#calculate running total for Wigston Phoenix (WP)
total_mileage=0
df['wp_total_miles']=df['wp_miles'].apply(cummulative_total)

In [None]:
#check this worked
df.head(10)

In [None]:
#create new feature (column) that shows the running total in total mileage difference between the two clubs

df['mileage_difference'] = df['wer_total_miles']-df['wp_total_miles']
df

### Data visualisation

This dataset set now shows the running totals and running difference (in miles) between the two clubs throughout the day. It's ready for some data visualisation to better understand the strengths of the clubs.

In [None]:
#make column for if West End Runners are ahead, for use in colouring the bars below
df['wer_ahead']=df['mileage_difference']>0
df.head()

In [None]:
plt.figure(figsize=(15,7))

colours={True:sns.xkcd_rgb["azure"],False:sns.xkcd_rgb["green"]} #note color palette draws from xkcd named colours https://xkcd.com/color/rgb/

sns.set(style='white')

chart = sns.barplot(x="time", y="mileage_difference", data=df, hue="wer_ahead",palette=colours) 

chart.set(ylim=(-3,1.5))
chart.set_xticklabels(chart.get_xticklabels(), rotation=90, horizontalalignment='right')
plt.title("Total Mileage difference \n West End Runners vs Wigston Phoenix \n 2 May 2020")
plt.ylabel("Difference in miles")
plt.xlabel("Time")


From this plot, we can see that Wigston Phoenix runners were ahead throughout the day. Although the final score was very close (0.33 miles between the two clubs), it could have been a lot larger if the numbers of runners were different. There was quite a lot of variation between the runners, which greatly affected the rolling tally.

Let's look at the variation within the two clubs with a boxplot.

First we need to reshape the data a bit. We'll create a new dataframe that has one column for miles and one column for club:

In [None]:
#west end club df
df_we=df[['time','west_end_runner','wer_miles']] #create a new dataframe with just time, club and miles
df_we.columns= ['time','club','miles'] #rename columns to be more generic
df_we.loc[:,'club']='West End' #assign 'West End' to all values for club

#wigston phoenix club df
df_wp=df[['time','wigston_phoenix_runner','wp_miles']] #create a new dataframe with just time, club and miles
df_wp.columns= ['time','club','miles'] #rename columns to be more generic
df_wp.loc[:,'club']='Wigston Phoenix' #assign 'Wigston Phoenix' to all values for club

#concat the two club dataframes into one
df_merge=pd.concat([df_we,df_wp], ignore_index=True)

#sort by time
df_merge.sort_values(by=['time'],ignore_index=True,inplace=True)

df_merge.head(10)

In [None]:
# Draw a boxplot to variation between the two clubs runner mileage
plt.figure(figsize=(11,7))

colours={"West End":sns.xkcd_rgb["azure"],"Wigston Phoenix":sns.xkcd_rgb["green"]}

ax=sns.boxplot(x="club", y="miles", data=df_merge,palette=colours)
ax=sns.swarmplot(x="club", y="miles", data=df_merge, color=".25")
sns.despine()

plt.title("Variation in Miles")
plt.ylabel("Miles")
plt.xlabel("")


### Map of routes

In [3]:
import gpxpy #reading and parsing gps files
import folium #mapping

In [4]:
# get list of gps files in directory
import os
gps_files=[]
for dirname, _, filenames in os.walk(data_dir+'WEvsWDAC_gps/'):
    for filename in filenames:
        if filename[-3:]=='gpx':
            file=os.path.join(dirname, filename)
            gps_files.append(file)
gps_files

['/Users/Tamela/CODE/src/kaggle.com/Virtual Running Relays/West_End_vs_Wellingborough/Data/gps/relay17.gpx',
 '/Users/Tamela/CODE/src/kaggle.com/Virtual Running Relays/West_End_vs_Wellingborough/Data/gps/relay16.gpx',
 '/Users/Tamela/CODE/src/kaggle.com/Virtual Running Relays/West_End_vs_Wellingborough/Data/gps/relay14.gpx',
 '/Users/Tamela/CODE/src/kaggle.com/Virtual Running Relays/West_End_vs_Wellingborough/Data/gps/relay15.gpx',
 '/Users/Tamela/CODE/src/kaggle.com/Virtual Running Relays/West_End_vs_Wellingborough/Data/gps/relay11.gpx',
 '/Users/Tamela/CODE/src/kaggle.com/Virtual Running Relays/West_End_vs_Wellingborough/Data/gps/relay10.gpx',
 '/Users/Tamela/CODE/src/kaggle.com/Virtual Running Relays/West_End_vs_Wellingborough/Data/gps/relay12.gpx',
 '/Users/Tamela/CODE/src/kaggle.com/Virtual Running Relays/West_End_vs_Wellingborough/Data/gps/relay9.gpx',
 '/Users/Tamela/CODE/src/kaggle.com/Virtual Running Relays/West_End_vs_Wellingborough/Data/gps/relay8.gpx',
 '/Users/Tamela/CODE/

In [5]:
# define a function to read in the gpx file and convert to dataframe
# This code is borrowed from datachico's notebook: https://github.com/datachico/gpx_to_folium_maps/blob/master/folium_maps_From_GPX.ipynb

def process_gpx_to_df(file_name):

    gpx = gpxpy.parse(open(file_name))  
    
    #make DataFrame
    track = gpx.tracks[0]
    segment = track.segments[0]
    # Load the data into a Pandas dataframe (by way of a list)
    data = []
    
    #segment_length = segment.length_3d()
    for point_idx, point in enumerate(segment.points):
        data.append([point.longitude, point.latitude,point.elevation,
                     point.time, segment.get_speed(point_idx)])
    columns = ['Longitude', 'Latitude', 'Altitude', 'Time', 'Speed']
    gpx_df = pd.DataFrame(data, columns=columns)
    
    # create a tuple of points (lat and long) for line)
    points = []
    for track in gpx.tracks:
        for segment in track.segments:        
            for point in segment.points:
                points.append(tuple([point.latitude, point.longitude]))
    
    return gpx_df, points

In [6]:
# Test of function
gpx_df,points = process_gpx_to_df(gps_files[1])

In [7]:
#Check the process function worked.
#gpx_df looks like the following:

gpx_df.head()

Unnamed: 0,Longitude,Latitude,Altitude,Time,Speed
0,-1.12067,52.62506,88.42,,
1,-1.12102,52.624625,88.35,,
2,-1.12137,52.62419,88.14,,
3,-1.1218,52.62361,87.66,,
4,-1.12223,52.62303,86.11,,


In [8]:
#Check the process function worked.
#points tuple looks like the following:

points[0:10]

[(52.625060000000005, -1.12067),
 (52.62462500059257, -1.1210200034167868),
 (52.624190000000006, -1.1213700000000002),
 (52.62361000079505, -1.1218000056858126),
 (52.62303000000001, -1.12223),
 (52.622480000878134, -1.1226650054015412),
 (52.621930000000006, -1.1231),
 (52.6214, -1.12338),
 (52.621230000000004, -1.12338),
 (52.621, -1.1234000000000002)]

In [9]:
club='West End Runners'

In [34]:
def make_folium_map(gps_files,club,map_name='test_map.html',plot_method='poly_line',zoom_level=13,map_type='regular',fullscreen=False):
    '''Make map of gps routes'''
    i=0
    #convert to DF and points tuple
    for file_name in gps_files:
        #print('PROCESSING '+file_name)
        df, points = process_gpx_to_df(file_name)
        #print('dataframe and points created for ' + file_name)
        
        #map centre lat and long 
        lat=52.633331 #option: df.Latitude.mean()
        long=-1.133333 #option: df.Longitude.mean()
        
        #get club colour
        if club=='West End Runners':
            color='blue'
        elif club=='Wellingborough':
            color='green'
        
        if i==0:
            #create map layer
            mymap = folium.Map( location=[ lat, long ], zoom_start=zoom_level, tiles=None)
            folium.TileLayer('openstreetmap', name='OpenStreet Map').add_to(mymap)
            #folium.TileLayer('https://server.arcgisonline.com/ArcGIS/rest/services/NatGeo_World_Map/MapServer/tile/{z}/{y}/{x}', attr="Tiles &copy; Esri &mdash; National Geographic, Esri, DeLorme, NAVTEQ, UNEP-WCMC, USGS, NASA, ESA, METI, NRCAN, GEBCO, NOAA, iPC", name='Nat Geo Map').add_to(mymap)
        
        #plot tracks
        folium.PolyLine(points,color=color, weight=4.5, opacity=0.5).add_to(mymap)
        i+=1
        #print('TRACK CREATED for ' + file_name)

    folium.LayerControl(collapsed=True).add_to(mymap)
    mymap.save(map_name)
    return mymap

In [35]:
make_folium_map(gps_files,club)

In [13]:
%%HTML
<iframe width="1000" height="500" src='test_map.html'</iframe>

In [None]:
#TO DO
# get more gpx files
# create git repository for this
# see if html file can be opened directly from github
# create github.io page??