## AFCON 2O24 MATCH ANALYSIS: A data driven exploration of team performance and scoring trends across both halves of a game ##





This project takes a closer look at the performance of teams during the two halves of the game. Scroing trends of teams and results in both halves will be analysed, as well as match outcomes. 

This project provides an in-depth analysis of the **Africa Cup of Nations (AFCON)** football tournament, focusing on team performance, scoring patterns, and match outcomes. Using data sourced through **web scraping** from sofascore, this project aims to identify trends in goals scored across both halves, assess team performance in different match phases, and explore factors contributing to match outcomes.

The dataset includes detailed information on match dates, team names, goals scored by each team in the first and second halves, first and second half results and the full-time results. Additional data cleaning steps were conducted using **Microsoft Excel** for accurate score correction and to ensure data integrity. 

With a focus on delivering valuable insights, the project showcases **interactive** and visually engaging data representations using **Plotly** in Jupyter Notebook. 




## Data Collection

In this section, we detail the process used to collect data for the **Africa Cup of Nations (AFCON)** football tournament. The data was retrieved using **web scraping** via an API provided by Sofascore, which allows access to real-time match statistics. The process involves sending requests to a specific API endpoint, parsing the returned data, and cleaning it for further analysis.


In [3]:
# importing the needed libraries

import requests

import json

import csv

import pandas as pd

import plotly.express as px

import matplotlib.pyplot as plt

import numpy as np

%matplotlib inline

In [None]:
# Establishing a connection with the website

response = requests.get("https://api.sofascore.com/api/v1/event/11761888")

if response.status_code == 200:

    print(response.json())  # This will print the JSON response
    
else:

    print("Failed to retrieve data")

In [None]:
afcon_url = "https://www.sofascore.com/api/v1/unique-tournament/270/season/56021/team-events/total"

response = requests.get(afcon_url)

if response.status_code == 200:
    # Parse JSON data

    afcon_data = response.json()

    print(afcon_data)  # Display the raw JSON data for inspection

else:
    
    print(f"Failed to retrieve data: {response.status_code}")


The data had to be manipulated in order to extract the needed data for the project. 

In [None]:
# a dictionary to store the match results
matches_data = []

# going through all the values(the matches)
for matches_dict in afcon_data.values():
    
    # doing same for the keys that identify the groups and teams
    for group_key, team_games in matches_dict.items():
        
        # going through every game played by the teams 
        for team_game in team_games:
          
          # finding the number of games played by each team 
          for game_num in range(len(matches_dict[group_key][team_game])):

            # storing every game in a variable and extracting the needed information
            game = matches_dict[group_key][team_game][game_num]
            
            match_info = {
               
                'game_id': game['id'],

                'Date': game['startTimestamp'],
                
                'group_name': game['tournament']['name'],  # Extract group name
                
                'home_team': game['homeTeam']['name'],  # Home team name

                'away_team': game['awayTeam']['name'],  # Away team name

                'home_goals_ht': game['homeScore']['period1'],  # Home team goals at half-time

                'away_goals_ht': game['awayScore']['period1'],  # Away team goals at half-time

                'home_goals_2nd_half': game['homeScore']['period2'],  # Home team goals in the second half

                'away_goals_2nd_half': game['awayScore']['period2'],  # Away team goals in the second half

                'home_goals_ft': game['homeScore']['normaltime'],  # Full-time home goals
                
                'away_goals_ft': game['awayScore']['normaltime'],   # Full-time away goals
                
            }

            matches_data.append(match_info)



In [None]:
# creating a csv file to store the matches

csv_file = 'afcon_group_stage_2024.csv'

# the headers for the csv file

fields = ['group_name','game_id','Date', 'home_team', 'away_team', 'home_goals_ht', 'away_goals_ht', 
          'home_goals_2nd_half', 'away_goals_2nd_half', 'home_goals_ft', 'away_goals_ft',]


with open(csv_file, mode='w', newline='') as file:
    
    writer = csv.DictWriter(file, fieldnames=fields)
    
    # Write the header
    writer.writeheader()
    
    # Write each match's data
    for match in matches_data:
        writer.writerow(match)


Microsoft excel was used to correct the inaccuracy of the match results. Matches that appeared more than once were removed. Additional columns were also created for the number of goals scored in each half,the result of eacah half and the number of goals scored in the game. The date was changed from the json format to the normal readable format.

In [6]:
afcon_2024_group_stage = pd.read_csv(r"C:\Users\Felix\Documents\python practice\data science projects\AFCON_2024_ANALYSIS\Files\AFCON GROUP STAGE GAMES 2023.csv")

In [7]:
# Trimming the group name

afcon_2024_group_stage['group_name']= afcon_2024_group_stage['group_name'].str.replace(r'^Africa Cup of Nations, ', '',regex=True)

In [None]:
afcon_2024_group_stage.info()

In [8]:
first_second_half_goals = pd.read_csv(r"C:\Users\Felix\Documents\python practice\data science projects\AFCON_2024_ANALYSIS\Files\AFCON GROUP STAGE GAMES 2023_FIRST AND SECOND HALF GOALS.csv")

In [10]:
fig = px.bar(first_second_half_goals, x = 'Team', 
             y = ['first_half_goals','second_half_goals'], 
             barmode='group',
             labels={'variable':'Half','value':'Goals','Team':'Team'})
fig.update_layout(title="Goals Scored in First and Second Halves by Team")
fig.show()

In [8]:
afcon_2024_group_stage['total_1st_half_goals'].sum()

np.int64(33)