# **GPS Data - Exploratory Data Analysis**

*GPS performance metrics track movement demands, including speed, distance, and acceleration, to assess workload and physical output.
This dataset contains simulated data for 1 player.*

This notebook is organized in the following sections:

* [Part 0 - Preliminary Steps](#0)
    * [Part 0.1 - Importing the Necessary Libraries](#0.1)
    * [Part 0.2 - Reading the GPS Data Dataset](#0.2)

* [Part 1 - Data Cleaning](#1)
    * [Part 1.1 - Preliminary Analysis of the Dataset](#1.1)
    * [Part 1.2 - Dealing with Duplicates](#1.2)
    * [Part 1.3 - Ensuring Correct Data Types](#1.3)
    * [Part 1.4 - Dealing with Null/Missing Values](#1.4)
    * [Part 1.5 - Creating New Columns to Enhance the Analysis](#1.5)
    * [Part 1.6 - Final Checks](#1.6)

* [Part 2 - Exploratory Data Analysis](#2)

<a id='0'></a>
## Part 0 - Preliminary Steps

<a id='0.1'></a>
### Part 0.1 - Importing the Necessary Libraries

In [None]:
import pandas as pd

<a id='0.2'></a>
### Part 0.2 - Reading the GPS Data Dataset

In [None]:
gps_data = pd.read_csv('../data/CFC GPS Data (1).csv', encoding='ISO-8859-1')

<a id='1'></a>
## Part 1 - Data Cleaning

<a id='1.1'></a>
### Part 1.1 - Preliminary Analysis of the Dataset

In [None]:
gps_data.head()

In [None]:
gps_data.tail()

The GPS Data dataset has 826 rows, with only 2 columns which have null values.

In [None]:
gps_data.info()

<a id='1.2'></a>
### Part 1.2 - Dealing with Duplicates

We checked if there were any duplicate rows. We found there were no duplicate rows.

In [None]:
gps_data.duplicated().any()

In [None]:
# Another check for duplicates - just in case
gps_data.duplicated().sum()

<a id='1.3'></a>
### Part 1.3 - Ensuring Correct Data Types

Next, we proceeded to ensure whether the data types of all columns were correct/adequate

In [None]:
gps_data.head()

In [None]:
gps_data.dtypes

The columns which had incorrect data types were the following:
* date --> should have been in datetime format (%d/%m/%Y)
* hr_zone_1_hms --> should have been in datetime format (%H:%M:%S)
* hr_zone_2_hms --> should have been in datetime format (%H:%M:%S)
* hr_zone_3_hms --> should have been in datetime format (%H:%M:%S)
* hr_zone_4_hms --> should have been in datetime format (%H:%M:%S)
* hr_zone_5_hms --> should have been in datetime format (%H:%M:%S)




we proceeded to transform it to the correct format --> datetime type

In [None]:
gps_data['date'] = pd.to_datetime(gps_data['date'], format = '%d/%m/%Y')

<a id='1.4'></a>
### Part 1.4 - Dealing with Null/Missing Values

<a id='1.5'></a>
### Part 1.5 - Creating New Columns to Enhance the Analysis

<a id='1.6'></a>
### Part 1.6 - Final Checks

<a id='2'></a>
## Part 2 - Exploratory Data Analysis