# La Quiniela Machine Learning Analysis

On this notebook we're going to analyze the data and the train a model to predict the results of the matches of the La Liga, from spain. We are going to do it with scikit-learn library.T he data source is Transfermarkt, and it was scraped using Python’s library BeautifulSoup4. The data is provided as a SQLite3 database that is inside the repository. This  data set contains a the following table:

Matches: All the matches played between seasons 1928-1929 and 2021-2022 with the date and score. Columns are season, division, matchday, date, time, home_team, away_team, score. Have in mind there is no time information for many of themand also that it contains matches still not played from current season.


In [20]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sqlite3
from sklearn.ensemble import RandomForestRegressor

# Read the data
con = sqlite3.connect("laliga.sqlite")
df = pd.read_sql_query("SELECT * FROM Matches", con)
con.close()

In [21]:
df.head()
df.sample(10)

Unnamed: 0,season,division,matchday,date,time,home_team,away_team,score
42716,2009-2010,2,8,10/17/09,6:00 PM,Real Unión,Villarreal CF B,0:1
31927,1979-1980,2,20,1/27/80,,Cádiz CF,Real Murcia,1:0
38853,2000-2001,2,35,4/28/01,,Atlético Madrid,Real Murcia,0:3
41718,2007-2008,2,1,8/26/07,7:00 PM,Sporting Gijón,Poli Ejido,4:0
30321,1970-1971,2,12,11/22/70,,RCD Mallorca,CD Logroñés,3:0
33313,1987-1988,2,3,9/13/87,,UE Figueres,Barcelona Atl.,0:0
23996,2015-2016,1,22,1/31/16,8:30 PM,Real Madrid,Espanyol,6:0
16260,1995-1996,1,23,1/24/96,,Real Zaragoza,Real Betis,1:2
33451,1987-1988,2,17,1/9/88,,Castilla CF,Cartagena FC,1:1
13154,1987-1988,1,19,1/23/88,,CE Sabadell,Athletic,3:1


For inputing the data to the model we are going to assign to each team a number, and add a colum of the winner of the match. We also are going a colum with information about wich team (home or away) wins.

- 0: Home team wins
- 1: Draw
- 2: Away team wins

In [26]:
#Drop the null values
df = df.dropna()
#Add a column with the winner of the match
df['winner'] = df['score'].str.split(':').str[0].astype(int) - df['score'].str.split(':').str[1].astype(int)
df['winner'] = np.where(df['winner'] > 0, 0, np.where(df['winner'] < 0, 2, 1))

df.sample(10)


Unnamed: 0,season,division,matchday,date,time,home_team,away_team,score,winner
19385,2003-2004,1,17,12/21/03,5:00 PM,Real Murcia,Málaga CF,1:2,2
48072,2020-2021,2,33,4/4/21,2:00 PM,Sporting Gijón,CD Mirandés,1:2,2
44036,2012-2013,2,2,8/25/12,7:00 PM,Sporting Gijón,Real Murcia,2:3,2
41431,2006-2007,2,17,12/17/06,7:00 PM,RM Castilla,UD Vecindario,3:2,0
47522,2019-2020,2,25,1/25/20,9:00 PM,Real Zaragoza,CD Numancia,1:0,0
21670,2009-2010,1,18,1/16/10,10:00 PM,Barcelona,Sevilla FC,4:0,0
18331,2000-2001,1,26,3/11/01,5:00 PM,Alavés,Rayo Vallecano,4:2,0
44328,2012-2013,2,29,3/9/13,6:00 PM,CD Lugo,CD Numancia,0:0,1
46366,2017-2018,2,4,9/9/17,6:00 PM,Barcelona B,Córdoba CF,4:0,0
21639,2009-2010,1,15,12/19/09,6:00 PM,Athletic,CA Osasuna,2:0,0


In [42]:
#Assing to each team a number
teams = [df['home_team'].unique()]
#Convert the array to a list
teams = teams[0].tolist()

#Create a dictionary with the teams and their number
teams_dict = {}
for i in range(len(teams)):
    teams_dict[teams[i]] = i

#Create a new column with the number of the home team
df['home_team_num'] = df['home_team'].map(teams_dict)
#Create a new column with the number of the away team
df['away_team_num'] = df['away_team'].map(teams_dict)

df.sample(10)

Unnamed: 0,season,division,matchday,date,time,home_team,away_team,score,winner,home_team_num,away_team_num
42266,2008-2009,2,9,10/26/08,6:00 PM,UD Salamanca,SD Huesca,2:0,0,6,44
43636,2011-2012,2,8,10/8/11,6:00 PM,Girona,SD Huesca,1:1,1,43,44
41752,2007-2008,2,4,9/16/07,6:00 PM,CD Tenerife,Hércules CF,0:1,2,17,37
23960,2015-2016,1,19,1/9/16,6:15 PM,Sevilla FC,Athletic,2:0,0,26,13
46579,2017-2018,2,23,1/21/18,4:00 PM,CA Osasuna,CyD Leonesa,2:1,0,22,80
19733,2004-2005,1,14,12/5/04,5:00 PM,Levante,Getafe,0:0,1,30,31
46348,2017-2018,2,2,8/27/17,7:00 PM,Sporting Gijón,CD Lugo,2:0,0,35,74
18516,2001-2002,1,6,10/3/01,10:30 PM,UD Las Palmas,Real Madrid,4:2,0,23,10
22856,2012-2013,1,22,2/3/13,9:00 PM,Atlético Madrid,Real Betis,1:0,0,11,18
23477,2014-2015,1,8,10/20/14,8:45 PM,Real Sociedad,Getafe,1:2,2,4,31
