## General F1 Data analysis

In this notebook I want to do some general analysis on F1 data. I will use this notebook to get back at using Python, and to try some interesting stuff with F1 data. My first step will be to import the data, and to just play around with it

In [51]:
## Imports
import pandas as pd
import os
import numpy as np
import seaborn as sns

# To get full output
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [22]:
# Change current working directory to where our F1 data is stored
os.getcwd();
os.chdir('C:\\Users\\yanni\\OneDrive\\Documents\\Data_Science\\F1_data')
os.getcwd()     


'C:\\Users\\yanni\\OneDrive\\Documents\\Data_Science\\F1_data'

'C:\\Users\\yanni\\OneDrive\\Documents\\Data_Science\\F1_data'

In [47]:
circuits_df = pd.read_csv('circuits.csv')

In [48]:
circuits_df.shape
circuits_df.dtypes
circuits_df.describe(include = 'all')
circuits_df.head()

(74, 9)

circuitId       int64
circuitRef     object
name           object
location       object
country        object
lat           float64
lng           float64
alt            object
url            object
dtype: object

Unnamed: 0,circuitId,circuitRef,name,location,country,lat,lng,alt,url
count,74.0,74,74,74,74,74.0,74.0,74,74
unique,,74,74,71,33,,,2,74
top,,valencia,Circuit Bremgarten,Spielburg,USA,,,\N,http://en.wikipedia.org/wiki/Melbourne_Grand_P...
freq,,1,1,2,11,,,73,1
mean,37.5,,,,,33.698638,3.128815,,
std,21.505813,,,,,23.273274,66.041828,,
min,1.0,,,,,-37.8497,-118.189,,
25%,19.25,,,,,33.480575,-9.346393,,
50%,37.5,,,,,41.26845,4.128885,,
75%,55.75,,,,,47.21575,18.127625,,


Unnamed: 0,circuitId,circuitRef,name,location,country,lat,lng,alt,url
0,1,albert_park,Albert Park Grand Prix Circuit,Melbourne,Australia,-37.8497,144.968,10,http://en.wikipedia.org/wiki/Melbourne_Grand_P...
1,2,sepang,Sepang International Circuit,Kuala Lumpur,Malaysia,2.76083,101.738,\N,http://en.wikipedia.org/wiki/Sepang_Internatio...
2,3,bahrain,Bahrain International Circuit,Sakhir,Bahrain,26.0325,50.5106,\N,http://en.wikipedia.org/wiki/Bahrain_Internati...
3,4,catalunya,Circuit de Barcelona-Catalunya,Montmeló,Spain,41.57,2.26111,\N,http://en.wikipedia.org/wiki/Circuit_de_Barcel...
4,5,istanbul,Istanbul Park,Istanbul,Turkey,40.9517,29.405,\N,http://en.wikipedia.org/wiki/Istanbul_Park


In [50]:
# We saw that altitude only has 2 values
circuits_df.groupby(by = 'alt').count()

# This \N value probably indicates a missing value so we set this at missing
circuits_df = circuits_df.replace(r"\N", np.NaN)

#We've now changed it to null values
circuits_df.head()

Unnamed: 0_level_0,circuitId,circuitRef,name,location,country,lat,lng,url
alt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
10,1,1,1,1,1,1,1,1
\N,73,73,73,73,73,73,73,73


Unnamed: 0,circuitId,circuitRef,name,location,country,lat,lng,alt,url
0,1,albert_park,Albert Park Grand Prix Circuit,Melbourne,Australia,-37.8497,144.968,10.0,http://en.wikipedia.org/wiki/Melbourne_Grand_P...
1,2,sepang,Sepang International Circuit,Kuala Lumpur,Malaysia,2.76083,101.738,,http://en.wikipedia.org/wiki/Sepang_Internatio...
2,3,bahrain,Bahrain International Circuit,Sakhir,Bahrain,26.0325,50.5106,,http://en.wikipedia.org/wiki/Bahrain_Internati...
3,4,catalunya,Circuit de Barcelona-Catalunya,Montmeló,Spain,41.57,2.26111,,http://en.wikipedia.org/wiki/Circuit_de_Barcel...
4,5,istanbul,Istanbul Park,Istanbul,Turkey,40.9517,29.405,,http://en.wikipedia.org/wiki/Istanbul_Park


#### general analysis circuits
In this part I will investigate the different tracks. I want to find the answers to the following questions:

- Which track is the most northern
- Which track is the most southern
- Which country has the most F1 tracks

In [85]:
# Which track is the most northern and southern

# Most northern
circuits_df.loc[circuits_df['lat'].idxmax()]

# Most southern
circuits_df.loc[circuits_df['lat'].idxmin()]

circuitId                                                    47
circuitRef                                           anderstorp
name                                       Scandinavian Raceway
location                                             Anderstorp
country                                                  Sweden
lat                                                     57.2653
lng                                                     13.6042
alt                                                         NaN
url           http://en.wikipedia.org/wiki/Scandinavian_Raceway
Name: 46, dtype: object

circuitId                                                     1
circuitRef                                          albert_park
name                             Albert Park Grand Prix Circuit
location                                              Melbourne
country                                               Australia
lat                                                    -37.8497
lng                                                     144.968
alt                                                          10
url           http://en.wikipedia.org/wiki/Melbourne_Grand_P...
Name: 0, dtype: object

In [84]:
# Which country has the most F1 tracks
circuits_df[['circuitId', 'country']] \
.groupby(by = 'country') \
.count()\
.sort_values(by = ['circuitId'], ascending = False) \
.head(5)

Unnamed: 0_level_0,circuitId
country,Unnamed: 1_level_1
USA,11
France,7
Spain,6
UK,4
Germany,3
