# 1. Aims and objectives
## 1.1. Introduction
Formula One, also called F1 in short, is an international auto racing sport. F1 is the highest level of single-seat, open-wheel and open-cockpit professional motor racing contest.

Formula One racing is governed and sanctioned by a world body called the FIA − Fédération Internationale de l'Automobile or the International Automobile Federation. The name ‘Formula’ comes from the set of rules that the participating cars and drivers must follow.

Formula 1 racing originated during the 1920-30s in Europe from other similar racing competitions. In 1946, the FIA standardized racing rules and this formed the basis of Formula One racing. The inaugural Formula One World Drivers’ championship was then held in 1950, the first world championship series.

Apart from the world championship series, many other non-championship F1 races were also held, but as the costs of conducting these contests got higher, such races were discontinued after 1983.

Each F1 team can have maximum of four drivers per season. There is support staff with every F1 team that plays a vital role in the team’s success.

## 1.2. Aims and objectives
Within this program i would like to explore the following:
* Getting to know about webscraping using `BeautifulSoup`
* Extracting following historical grand prix winners' data from Formula1 (F1) official website.
    * Grand prix location
    * Date of the grand prix
    * Name of the grand prix winner
    * Name of the winning team
    * Number of laps driven
    * Time taken to complete all the lap
    
## 1.3 Steps and process
1. Creating a list of all the years in which Formula1 has raced i.e. 1950 - present.
2. Accessing Formula1 official webpage where grand prix winner (per grand prix) is stored.
3. Fetching that data into variables as list and then converting variable into a pandas dataframe.
4. Storing that dataframe into a `.csv` file.

In [13]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

Creating a variable named `years` that contains list of all the years in which Formula1 has raced

In [14]:
years = list(range(1950,2023))

Accessing Formula1 official webiste for all the years using a for loop, creating a BeautifulSoup variable named `soup2` to access the webpage data. Using the earlier created variable `soup2` extracting data for *grand prix* ,*date*, *winner*, *car*, *laps* and *time* and storing that into a pandas dataframe.

In [4]:
F1_Data = []
for j in years:
    webpage2 = requests.get("https://www.formula1.com/en/results.html/{}/races.html".format(j)).text
    soup2 = BeautifulSoup(webpage2, 'lxml')

    grandprix = []
    for i in soup2.find_all('a',class_="dark bold ArchiveLink"):
        grandprix.append(i.text.strip())

    date = []
    for i in soup2.find_all('td',class_="dark hide-for-mobile"):
        date.append(i.text.strip())

    first_name = []
    last_name = []
    winner = []
    for i in soup2.find_all('span',class_="hide-for-tablet"):
        first_name.append(i.text.strip())
    for i in soup2.find_all('span',class_="hide-for-mobile"):
        last_name.append(i.text.strip())
    for i in range(len(first_name)):
        winner.append(first_name[i] +" "+ last_name[i])

    car = []
    for i in soup2.find_all('td',class_="semi-bold uppercase"):
        car.append(i.text.strip())

    laps = []
    for i in soup2.find_all('td',class_="bold hide-for-mobile"):
        laps.append(i.text.strip())

    time = []
    for i in soup2.find_all('td',class_="dark bold hide-for-tablet"):
        time.append(i.text.strip())
    
    for i in range(len(car)):
        all_data = {'GrandPrix' : grandprix[i],
                    'Date' : date[i],
                    'Winner' : winner[i],
                    'Car' : car[i],
                    'Laps' : laps[i],
                    'Time' : time[i]}
        F1_Data.append(all_data)
database = pd.DataFrame(F1_Data)

In [16]:
database

Unnamed: 0,GrandPrix,Date,Winner,Car,Laps,Time
0,Great Britain,13 May 1950,Nino Farina,Alfa Romeo,70,2:13:23.600
1,Monaco,21 May 1950,Juan Manuel Fangio,Alfa Romeo,100,3:13:18.700
2,Indianapolis 500,30 May 1950,Johnnie Parsons,Kurtis Kraft Offenhauser,138,2:46:55.970
3,Switzerland,04 Jun 1950,Nino Farina,Alfa Romeo,42,2:02:53.700
4,Belgium,18 Jun 1950,Juan Manuel Fangio,Alfa Romeo,35,2:47:26.000
...,...,...,...,...,...,...
1071,France,24 Jul 2022,Max Verstappen,Red Bull Racing RBPT,53,1:30:02.112
1072,Hungary,31 Jul 2022,Max Verstappen,Red Bull Racing RBPT,70,1:39:35.912
1073,Belgium,28 Aug 2022,Max Verstappen,Red Bull Racing RBPT,44,1:25:52.894
1074,Netherlands,04 Sep 2022,Max Verstappen,Red Bull Racing RBPT,72,1:36:42.773


Storing the database into a csv file named `Formula1_historic_winners.csv`.

In [6]:
database.to_csv('Formula1_historic_winners.csv', index=False)