# Lesson 5: Applying SQL Aggregate Functions to Soccer Data

# Applying SQL Aggregate Functions to Soccer Data

## Quick Recap
Great job on making it this far! Thus far, we've covered a great deal, from drilling into **COUNT** and **DISTINCT** to exploring **SUM** and **GROUP BY**. These are some of the key SQL functions required to dig deep into any dataset. In this unit, we're going to broaden our repertoire by applying these aggregate functions to analyze data related to Lionel Messi's career achievements.

As you may recall from our previous lessons, aggregate functions allow us to perform calculations on a set of values to return a single scalar value. We've already seen the **COUNT** and **SUM** functions in action, but have you ever wondered if we could derive other useful insights, such as averages? That’s where the SQL **AVG** function comes into play.

---

## SUM and AVG Functions
At this juncture, the **SUM** function must seem pretty familiar to you. It does the heavy lifting when we need to find total values. For instance, it calculates total trophies won or total goals scored in our case.

On the other hand, the **AVG** function might be new to you. It's a classic SQL function utilized for calculating the arithmetic mean of a set of values. Simply put, **AVG** can help us determine an average value, such as the average trophies won per season in Lionel Messi's career.

---

## Example 1: Utilizing The SUM Function

```sql
-- Aggregate total trophies won per each season
SELECT Seasons.season_id, SUM(Seasons.trophies_won) AS TotalTrophiesWon
FROM Seasons
JOIN Matches ON Seasons.season_id = Matches.season_id
GROUP BY Seasons.season_id;
```

### Sneak peek of the output:
```
| season_id | TotalTrophiesWon |
|-----------|------------------|
|         1 |                1 |
|         2 |               16 |
```

In the above example, we're using the **SUM** function to find the total trophies won per season in Lionel Messi's career. This is achieved by joining the **Seasons** and **Matches** tables on **season_id**, where **Matches** records the details of matches played in each season. The **GROUP BY** clause ensures we get a total trophies count for each season, providing a comprehensive view of Messi's career achievements.

---

## Example 2: Leveraging The AVG Function

```sql
-- Aggregate average trophies won per each season after 2010
SELECT Seasons.season_id, AVG(Seasons.trophies_won) AS AverageTrophiesWon
FROM Seasons
JOIN Matches ON Seasons.season_id = Matches.season_id
WHERE YEAR(Matches.date) > 2010
GROUP BY Seasons.season_id;
```

### Sneak peek of the output:
```
| season_id | AverageTrophiesWon |
|-----------|--------------------|
|         7 |             3.0000 |
|         8 |             4.0000 |
```

Here, we're introducing the **AVG** function to find the average trophies won per season in Lionel Messi's career after 2010. By filtering matches based on the date condition (**YEAR(Matches.date) > 2010**), we focus on more recent seasons. The **AVG** function calculates the arithmetic mean of trophies won across these seasons, offering insights into Messi's consistent performance over time.

---

## Remembering the GROUP BY Clause
From our past lessons, you should recall that the **GROUP BY** clause groups a result into subsets that share the same attribute value. It’s a vital component when using aggregate functions like **SUM**, **COUNT**, **AVG**, and others because it enables us to apply these functions to each group of data independently, providing us with insightful segmented data.

As you've noticed in our examples, **GROUP BY** plays an essential role when using aggregate functions. We use **GROUP BY** to return a separate sum or average for each season, allowing us to analyze Messi's career achievements in a structured manner.

---

## You Are Almost There
Excellent work on learning how to use the **SUM** and **AVG** functions and mastering their symbiotic relationship with the **GROUP BY** clause. Using these functions isn't always straightforward, but with practice, it will become second nature.

Congratulations on completing this lesson of the course! Let's continue practicing to solidify this knowledge and enhance your SQL skills further.

## Fix SQL Aggregation Query

In this task, we are going to embark on a journey to correct bugs. You are provided with a faulty SQL query designed to calculate the average trophies won per season using data from Lionel Messi's matches. However, the provided starter code is flawed and does not execute as intended. Your task is to identify the mistake and correct it.

-- TODO: Fix the query that aggregates average trophies won per each season after 2010
SELECT Seasons.season_id, AVG(Matches.trophies_won) as AverageTrophiesWon
FROM Seasons
JOIN Matches ON Seasons.season_id = Matches.season_id
WHERE YEAR(Matches.date) > 2010
GROUP BY Seasons.season_id;

Let’s break down the SQL query step by step to understand what each part does:

```sql
SELECT Seasons.season_id, AVG(Seasons.trophies_won) AS AverageTrophiesWon
FROM Seasons
JOIN Matches ON Seasons.season_id = Matches.season_id
WHERE YEAR(Matches.date) > 2010
GROUP BY Seasons.season_id;
```

### 1. **`SELECT Seasons.season_id, AVG(Seasons.trophies_won) AS AverageTrophiesWon`**
- **`SELECT`**: This is used to specify the columns that will be returned in the query result.
- **`Seasons.season_id`**: We're selecting the `season_id` from the `Seasons` table. Each `season_id` represents a particular season.
- **`AVG(Seasons.trophies_won)`**: The **`AVG`** function calculates the **average** of the `trophies_won` column from the `Seasons` table. This gives us the average number of trophies won across the seasons for each group of data. The result is labeled as `AverageTrophiesWon` using the **`AS`** alias.

### 2. **`FROM Seasons`**
- This specifies the **`Seasons`** table as the primary table from which we are retrieving data.
- The `Seasons` table contains information about each season, such as the `season_id` and the number of trophies won in that season (`trophies_won`).

### 3. **`JOIN Matches ON Seasons.season_id = Matches.season_id`**
- **`JOIN`**: This is used to combine rows from two or more tables based on a related column between them. In this case, it’s the `season_id`.
- **`Matches`**: The `Matches` table contains records of individual matches, including details like match dates, trophies won, and the `season_id` which links each match to a specific season.
- **`ON Seasons.season_id = Matches.season_id`**: The condition specifies that we want to join the `Seasons` and `Matches` tables where the `season_id` column from both tables is equal. This ensures that the data from the `Matches` table is correctly matched with the corresponding season in the `Seasons` table.

### 4. **`WHERE YEAR(Matches.date) > 2010`**
- **`WHERE`**: This clause is used to filter rows based on a specified condition.
- **`YEAR(Matches.date) > 2010`**: This condition filters the records to only include matches where the year part of the `date` column (from the `Matches` table) is greater than 2010. This ensures that we only consider matches from seasons after 2010.

### 5. **`GROUP BY Seasons.season_id`**
- **`GROUP BY`**: This clause groups rows that have the same value in the specified column(s). It’s used when performing aggregate functions (like `AVG`, `SUM`, `COUNT`, etc.) to apply those functions to each group of data.
- **`Seasons.season_id`**: This tells the query to group the data by `season_id`. So, for each distinct `season_id`, it will calculate the average trophies won.

### Final Result
- The query returns the **`season_id`** and the **average number of trophies won** for each season (after 2010), considering the matches played in that season.
- It groups the data by season and calculates the average trophies won for that season, based on the match data available.

### Example of Output:

Assume that after running the query, you get this result:

| season_id | AverageTrophiesWon |
|-----------|--------------------|
|         7 |             3.0000 |
|         8 |             4.0000 |
|         9 |             2.5000 |

- **`season_id`**: The unique identifier for each season.
- **`AverageTrophiesWon`**: The average number of trophies won in that season, calculated from the data in the `Matches` table.

This query helps you analyze the average performance (in terms of trophies won) of each season, filtered by seasons after 2010.

## Fix the Aggregate Query

We are still on a debugging tour. You are given an SQL query designed to calculate the total trophies won in each season based on Lionel Messi's matches. The SQL query contains some bugs and is not returning the expected results. Your task is to identify the issue and debug the query so that it works correctly.

-- Incorrect SQL query with mistake
SELECT Seasons.season_id, SUM(Seasons.trophies_won) as TotalTrophiesWon
FROM Seasons
JOIN Matches AT Seasons.season_id = Matches.season_id
GROUP BY Seasons.season_id;

ERROR 1064 (42000) at line 2: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'Seasons.season_id = Matches.season_id
GROUP BY Seasons.season_id' at line 3

The error in your SQL query is due to an incorrect use of the `JOIN` clause. Specifically, the keyword `AT` is not valid syntax for joining tables in SQL. You should use `ON` instead. Here's how you can correct the query:

```sql
-- Corrected SQL query
SELECT Seasons.season_id, SUM(Seasons.trophies_won) AS TotalTrophiesWon
FROM Seasons
JOIN Matches ON Seasons.season_id = Matches.season_id
GROUP BY Seasons.season_id;
```

### Explanation of the Fix:
1. **Syntax Issue**: The `JOIN` clause should use `ON` instead of `AT`. The `ON` keyword is used to specify the condition for joining tables.
2. **Query Purpose**: The query is intended to calculate the total trophies won in each season by summing the `trophies_won` for each season, using the `Matches` table to filter relevant matches for each season.

### Final Note:
Ensure that the `season_id` column in both the `Seasons` and `Matches` tables are correctly defined and that the join condition is appropriate for your database schema.

## Calculate Average Trophies Won Per Season

Moving on, it is time to gather some insightful statistics from the dataset related to Lionel Messi's career. Let's see if we can determine the average trophies won per season based on his match performances.

Using the Seasons and Matches tables, write a SQL query that calculates the average trophies won in each season after 2010.

Replace the _______ placeholders in the initial code with the correct SQL queries or field names.

-- TODO: Aggregate average trophies won per each season after 2010
SELECT _______.season_id, _____(Seasons._______) as AverageTrophiesWon
FROM Seasons
JOIN Matches ON Seasons.season_id = Matches.season_id
WHERE YEAR(Matches.date) > 2010
GROUP BY Seasons._______;

ERROR 1054 (42S22) at line 2: Unknown column '_______.season_id' in 'field list'

Here’s how you can complete the SQL query to calculate the average trophies won per season after 2010, filling in the placeholders correctly:

```sql
-- Aggregating average trophies won per season after 2010
SELECT Seasons.season_id, AVG(Seasons.trophies_won) AS AverageTrophiesWon
FROM Seasons
JOIN Matches ON Seasons.season_id = Matches.season_id
WHERE YEAR(Matches.date) > 2010
GROUP BY Seasons.season_id;
```

### Explanation of the Filled-In Placeholders:
1. **Seasons.season_id**: Used to select the season ID from the `Seasons` table.
2. **AVG**: The aggregate function `AVG()` is used to calculate the average value of trophies won per season.
3. **Seasons.trophies_won**: The column `trophies_won` from the `Seasons` table is specified to compute the average.
4. **GROUP BY Seasons.season_id**: Groups the results by each unique `season_id` to calculate the average trophies for each season.

### Key Points:
- The `YEAR(Matches.date)` function extracts the year from the `date` field in the `Matches` table to filter only matches after 2010.
- Make sure that the `trophies_won` field and `date` field exist in your `Seasons` and `Matches` tables, respectively, and that they are named correctly in your schema.

This query will return the average number of trophies won per season for seasons occurring after the year 2010.

## Calculate Total Trophies Won

Let's spice things up a bit. Your next task is to find the total trophies won across each season in Lionel Messi's career.

To complete the task, you'll need to replace the _______ placeholders with an aggregate function and the relevant field names to perform the addition of all trophies won per season.

-- TODO: Aggregate total trophies won per each season
SELECT _______.season_id, _________
FROM Seasons
JOIN Matches ON Seasons.season_id = Matches.season_id
GROUP BY Seasons._______;

Here's how you can complete the SQL query to calculate the total trophies won per season:

```sql
-- Aggregating total trophies won per season
SELECT Seasons.season_id, SUM(Seasons.trophies_won) AS TotalTrophiesWon
FROM Seasons
JOIN Matches ON Seasons.season_id = Matches.season_id
GROUP BY Seasons.season_id;
```

### Explanation of the Filled-In Placeholders:
1. **Seasons.season_id**: Selects the `season_id` from the `Seasons` table.
2. **SUM(Seasons.trophies_won)**: Uses the `SUM()` aggregate function to add up all trophies won for each season.
3. **GROUP BY Seasons.season_id**: Groups the results by each unique `season_id` to calculate the total trophies for each season.

### Key Points:
- **SUM() Function**: This function sums up the values of `trophies_won` for all records that match each `season_id`.
- **Grouping**: The `GROUP BY` clause is necessary to ensure the aggregation is done separately for each season.

This query will give you the total number of trophies won for each season across Lionel Messi's career.

Congratulations on making it this far! Now, we will combine all the components to test your understanding of the SQL aggregate functions we have covered in our lessons. For our final exercise, we are going to write an SQL query that calculates both the total trophies won and average trophies won per season based on Lionel Messi's matches.

Putting your skills to the test, write the complete SQL query in the SELECT form to perform the above.
-- TODO: Write an SQL query to aggregate total trophies won and average trophies won per each season.

Here's how you can write a complete SQL query that calculates both the total trophies won and the average trophies won per season:

```sql
-- Aggregating total and average trophies won per season
SELECT 
    Seasons.season_id, 
    SUM(Seasons.trophies_won) AS TotalTrophiesWon, 
    AVG(Seasons.trophies_won) AS AverageTrophiesWon
FROM Seasons
JOIN Matches ON Seasons.season_id = Matches.season_id
GROUP BY Seasons.season_id;
```

### Explanation:
1. **SUM(Seasons.trophies_won) AS TotalTrophiesWon**: This part calculates the total number of trophies won for each season using the `SUM()` aggregate function.
2. **AVG(Seasons.trophies_won) AS AverageTrophiesWon**: This part calculates the average number of trophies won per season using the `AVG()` aggregate function.
3. **GROUP BY Seasons.season_id**: This ensures that the aggregation functions operate separately for each unique `season_id`.

### Final Note:
- The query combines both the `SUM()` and `AVG()` aggregate functions in the `SELECT` statement to produce both statistics in a single query.
- Make sure that the `trophies_won` field exists in your `Seasons` table and that the table joins properly based on the `season_id` field. 

This query will provide a comprehensive overview of Lionel Messi's performance, displaying both the total and average trophies won for each season.