## Introduction

There are several ways for exporting your Strava activities. We recommend bulk exporting all your Strava data. This example assumes a bulk export archive was downloaded and saved to a local directory. For more information on how to export your Strava data [see these instruction from Strava Support](https://support.strava.com/hc/en-us/articles/216918437-Exporting-your-Data-and-Bulk-Export#Bulk).

In [1]:
import os
import pandas as pd

In [2]:
bulk_strava_dir = "~/Desktop/strava_data"

Your activities meta data can be found in **activities.csv** at the root of your bulk data export. While the activities directory contains the detailed FIT (Flexible and Interoperable Data Transfer) datafile.

Load metadata and inspect contents

In [3]:
df = pd.read_csv(os.path.join(bulk_strava_dir,"activities.csv"))
df.head()

Unnamed: 0,Activity ID,Activity Date,Activity Name,Activity Type,Activity Description,Elapsed Time,Distance,Max Heart Rate,Relative Effort,Commute,...,Weather Ozone,"<span class=""translation_missing"" title=""translation missing: en-US.lib.export.portability_exporter.activities.horton_values.jump_count"">Jump Count</span>","<span class=""translation_missing"" title=""translation missing: en-US.lib.export.portability_exporter.activities.horton_values.total_grit"">Total Grit</span>","<span class=""translation_missing"" title=""translation missing: en-US.lib.export.portability_exporter.activities.horton_values.avg_flow"">Avg Flow</span>","<span class=""translation_missing"" title=""translation missing: en-US.lib.export.portability_exporter.activities.horton_values.flagged"">Flagged</span>","<span class=""translation_missing"" title=""translation missing: en-US.lib.export.portability_exporter.activities.horton_values.avg_elapsed_speed"">Avg Elapsed Speed</span>","<span class=""translation_missing"" title=""translation missing: en-US.lib.export.portability_exporter.activities.horton_values.dirt_distance"">Dirt Distance</span>","<span class=""translation_missing"" title=""translation missing: en-US.lib.export.portability_exporter.activities.horton_values.newly_explored_distance"">Newly Explored Distance</span>","<span class=""translation_missing"" title=""translation missing: en-US.lib.export.portability_exporter.activities.horton_values.newly_explored_dirt_distance"">Newly Explored Dirt Distance</span>","<span class=""translation_missing"" title=""translation missing: en-US.lib.export.portability_exporter.activities.horton_values.sport_type"">Sport Type</span>"
0,363761300,"Aug 7, 2015, 11:30:54 PM",Evening Ride,Ride,,8537,46.81,,,False,...,,,,,,,0.0,,,
1,372131020,"Aug 18, 2015, 9:19:19 PM",Afternoon Ride,Ride,,1112,5.97,,,False,...,,,,,,,0.0,,,
2,372140928,"Aug 18, 2015, 9:40:37 PM",Afternoon Run,Run,,965,2.23,,,False,...,,,,,,,,,,
3,372159226,"Aug 18, 2015, 9:59:10 PM",Afternoon Ride,Ride,,2253,12.34,,,False,...,,,,,,,0.0,,,
4,374254173,"Aug 21, 2015, 4:27:39 PM",Lunch Ride,Ride,,8723,20.33,,,False,...,,,,,,,218.399994,,,


## Filtering Data

The bulk export contains all Strava activities recorded. For this example, consider only cycling events and ranging from 2020-02-10 to 2021-05-23. To make filtering the DataFrame easier convert the "Activity Date" column to Datetime type.

In [4]:
begin_date, end_date = "2020-02-10", "2021-05-23"
activity_type = "Ride"

# Convert
df["Activity Date"] = pd.to_datetime(df["Activity Date"])

In [5]:
filter_df = df[(df["Activity Date"] > begin_date) & 
               (df["Activity Date"] < end_date) & 
               (df["Activity Type"] == activity_type)]
filter_df.head()

Unnamed: 0,Activity ID,Activity Date,Activity Name,Activity Type,Activity Description,Elapsed Time,Distance,Max Heart Rate,Relative Effort,Commute,...,Weather Ozone,"<span class=""translation_missing"" title=""translation missing: en-US.lib.export.portability_exporter.activities.horton_values.jump_count"">Jump Count</span>","<span class=""translation_missing"" title=""translation missing: en-US.lib.export.portability_exporter.activities.horton_values.total_grit"">Total Grit</span>","<span class=""translation_missing"" title=""translation missing: en-US.lib.export.portability_exporter.activities.horton_values.avg_flow"">Avg Flow</span>","<span class=""translation_missing"" title=""translation missing: en-US.lib.export.portability_exporter.activities.horton_values.flagged"">Flagged</span>","<span class=""translation_missing"" title=""translation missing: en-US.lib.export.portability_exporter.activities.horton_values.avg_elapsed_speed"">Avg Elapsed Speed</span>","<span class=""translation_missing"" title=""translation missing: en-US.lib.export.portability_exporter.activities.horton_values.dirt_distance"">Dirt Distance</span>","<span class=""translation_missing"" title=""translation missing: en-US.lib.export.portability_exporter.activities.horton_values.newly_explored_distance"">Newly Explored Distance</span>","<span class=""translation_missing"" title=""translation missing: en-US.lib.export.portability_exporter.activities.horton_values.newly_explored_dirt_distance"">Newly Explored Dirt Distance</span>","<span class=""translation_missing"" title=""translation missing: en-US.lib.export.portability_exporter.activities.horton_values.sport_type"">Sport Type</span>"
642,3129090697,2020-02-24 20:23:38,Test Ride,Ride,,1719,9.03,146.0,7.0,False,...,,,,,,,0.0,,,
643,3163789236,2020-03-08 00:10:39,Afternoon Ride,Ride,,4937,19.54,165.0,31.0,False,...,,,,,,,,,,
644,3167419826,2020-03-08 23:14:45,Afternoon Ride,Ride,,7805,40.45,170.0,70.0,False,...,,,,,,,,,,
645,3170057800,2020-03-09 23:29:08,Afternoon Ride,Ride,,9113,50.76,175.0,97.0,False,...,,,,,,,,,,
646,3178607939,2020-03-12 23:21:11,Afternoon Ride,Ride,,7966,43.76,170.0,103.0,False,...,,,,,,,,,,


Compare number of activities found within specified date range

In [6]:
len(df)

981

In [7]:
len(filter_df)

148

## Parse Data

In [8]:
import glob

The FIT files in the activities directory should match the activity ID. However, sometimes the 'Activity ID' column and activity filename do not line up. Ensure the correct activity FIT file is parsed by reading the 'Filename' column instead.

In [9]:
activity_filenames = filter_df['Filename']
activity_filenames.head()

642    activities/3342746054.fit.gz
643    activities/3381305871.fit.gz
644    activities/3385302851.fit.gz
645    activities/3388256403.fit.gz
646    activities/3397678888.fit.gz
Name: Filename, dtype: object

Import Strava-DataVis modules

In [10]:
from formats import parseFile
from formats import formats

Create Python dictionary data structure template. See the [Strava-DataVis](https://thatguyeddieo.github.io/Strava-DataVis/formats.html) ``formats`` documentation for page for more information on the dictionary structure.

In [11]:
strava_data = formats.create_datastruct()

Create a list of filenames and parse individual fit.gz files

In [12]:
fit_paths = [os.path.expanduser('~/Desktop/strava_data/' + a) for a in activity_filenames]

In [13]:
# Parse fit files of interest
for file in fit_paths:
    parseFile.parse(strava_data,file);

Parsing ~/.../3342746054.fit.gz
Parsing ~/.../3381305871.fit.gz
Parsing ~/.../3385302851.fit.gz
Parsing ~/.../3388256403.fit.gz
Parsing ~/.../3397678888.fit.gz
Parsing ~/.../3410342152.fit.gz
Parsing ~/.../3417012319.fit.gz
Parsing ~/.../3420132646.fit.gz
Parsing ~/.../3423242116.fit.gz
Parsing ~/.../3430723356.fit.gz
Parsing ~/.../3441153174.fit.gz
Parsing ~/.../3456862746.fit.gz
Parsing ~/.../3480735937.fit.gz
Parsing ~/.../3485732942.fit.gz
Parsing ~/.../3490211937.fit.gz
Parsing ~/.../3495267665.fit.gz
Parsing ~/.../3499648535.fit.gz
Parsing ~/.../3508784683.fit.gz
Parsing ~/.../3518644149.fit.gz
Parsing ~/.../3527816940.fit.gz
Parsing ~/.../3537904807.fit.gz
Parsing ~/.../3542379714.fit.gz
Parsing ~/.../3557969094.fit.gz
Parsing ~/.../3589532643.fit.gz
Parsing ~/.../3594941050.fit.gz
Parsing ~/.../3626718043.fit.gz
Parsing ~/.../3638095333.fit.gz
Parsing ~/.../3655153766.fit.gz
Parsing ~/.../3661269383.fit.gz
Parsing ~/.../3682625954.fit.gz
Parsing ~/.../3705655591.fit.gz
Parsing 

One activity did not have any altitude or speed data.
```
Parsing ~/.../4966008517.fit.gz
	Warning: Parameter altitude was not populated with data.
	Warning: Parameter speed was not populated with data.
```
I decided to remove it from the dataset after reviewing the activity from my Strava App and determining it was not needed.

In [14]:
del(strava_data['activities']['4966008517'])

## Save Parsed Data

Creates a new directory and export parsed activities as an Excel file for easier use later.

In [15]:
# write out to folder
out_dir = "datasets/Seattle_Dataset"    
formats.write_xlsx(strava_data,out_dir)

Writing to /datasets/Seattle_Dataset/activities.xlsx
	Writing activity 3342746054
	Writing activity 3381305871
	Writing activity 3385302851
	Writing activity 3388256403
	Writing activity 3397678888
	Writing activity 3410342152
	Writing activity 3417012319
	Writing activity 3420132646
	Writing activity 3423242116
	Writing activity 3430723356
	Writing activity 3441153174
	Writing activity 3456862746
	Writing activity 3480735937
	Writing activity 3485732942
	Writing activity 3490211937
	Writing activity 3495267665
	Writing activity 3499648535
	Writing activity 3508784683
	Writing activity 3518644149
	Writing activity 3527816940
	Writing activity 3537904807
	Writing activity 3542379714
	Writing activity 3557969094
	Writing activity 3589532643
	Writing activity 3594941050
	Writing activity 3626718043
	Writing activity 3638095333
	Writing activity 3655153766
	Writing activity 3661269383
	Writing activity 3682625954
	Writing activity 3705655591
	Writing activity 3738995307
	Writing activity 3