## Table Extraction - Extract CSV files from Websites

### Extract CSV files

In this tutorial, we are going to extract a CSV file containing football results from a website. 

Here is the website that will be used to get the results: [Football Data](https://football-data.co.uk/data.php). Once the website is loaded, click on the [England Football Results](https://football-data.co.uk/englandm.php) link.

As you scroll down the page, you will see lots of CSV files for different leagues' football seasons. These can be downloaded to a local server.

**Right-click** on one and you will see an option to 'Copy link'. Left-click this, and paste is in the *read_csv* method as shown below:

In [1]:
import pandas as pd

# Read CSV file from website
premier21 = pd.read_csv('https://football-data.co.uk/mmz4281/2122/E0.csv')

In [2]:
# Display top 10 rows
premier21.head(10)

Unnamed: 0,Div,Date,Time,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,...,AvgC<2.5,AHCh,B365CAHH,B365CAHA,PCAHH,PCAHA,MaxCAHH,MaxCAHA,AvgCAHH,AvgCAHA
0,E0,13/08/2021,20:00,Brentford,Arsenal,2,0,H,1,0,...,1.62,0.5,1.75,2.05,1.81,2.13,2.05,2.17,1.8,2.09
1,E0,14/08/2021,12:30,Man United,Leeds,5,1,H,1,0,...,2.25,-1.0,2.05,1.75,2.17,1.77,2.19,1.93,2.1,1.79
2,E0,14/08/2021,15:00,Burnley,Brighton,1,2,A,1,0,...,1.62,0.25,1.79,2.15,1.81,2.14,1.82,2.19,1.79,2.12
3,E0,14/08/2021,15:00,Chelsea,Crystal Palace,3,0,H,2,0,...,1.94,-1.5,2.05,1.75,2.12,1.81,2.16,1.93,2.06,1.82
4,E0,14/08/2021,15:00,Everton,Southampton,3,1,H,0,1,...,1.67,-0.5,2.05,1.88,2.05,1.88,2.08,1.9,2.03,1.86
5,E0,14/08/2021,15:00,Leicester,Wolves,1,0,H,1,0,...,1.79,-0.75,2.02,1.91,2.01,1.92,2.05,1.95,1.99,1.89
6,E0,14/08/2021,15:00,Watford,Aston Villa,3,2,H,2,0,...,1.74,0.25,2.02,1.91,2.04,1.89,2.04,1.93,1.99,1.9
7,E0,14/08/2021,17:30,Norwich,Liverpool,0,3,A,0,1,...,2.48,1.25,1.85,2.08,1.85,2.09,2.03,2.1,1.88,2.01
8,E0,15/08/2021,14:00,Newcastle,West Ham,2,4,A,2,1,...,1.95,0.25,2.01,1.92,2.02,1.91,2.12,1.94,2.0,1.89
9,E0,15/08/2021,16:30,Tottenham,Man City,1,0,H,0,0,...,1.99,1.0,1.84,2.09,1.87,2.06,1.94,2.15,1.84,2.05


Some of these column names are not very intuitive, so let's rename a couple of them. **FTHG** stands for 'Final Time Home Goals', and **FTAG** stands for 'Final Time Away Goals'. These are the number of goals both the home and away teams had at the end of the game.

In [3]:
# Rename columns
premier21.rename(columns={'FTHG':'home_goals',
                          'FTAG':'away_goals'},
                          inplace=True)

In [4]:
# Display dataframe
premier21.head(3)

Unnamed: 0,Div,Date,Time,HomeTeam,AwayTeam,home_goals,away_goals,FTR,HTHG,HTAG,...,AvgC<2.5,AHCh,B365CAHH,B365CAHA,PCAHH,PCAHA,MaxCAHH,MaxCAHA,AvgCAHH,AvgCAHA
0,E0,13/08/2021,20:00,Brentford,Arsenal,2,0,H,1,0,...,1.62,0.5,1.75,2.05,1.81,2.13,2.05,2.17,1.8,2.09
1,E0,14/08/2021,12:30,Man United,Leeds,5,1,H,1,0,...,2.25,-1.0,2.05,1.75,2.17,1.77,2.19,1.93,2.1,1.79
2,E0,14/08/2021,15:00,Burnley,Brighton,1,2,A,1,0,...,1.62,0.25,1.79,2.15,1.81,2.14,1.82,2.19,1.79,2.12
