# Lesson 3: Mastering the SUM Function for Aggregate Calculations

# Mastering the SUM Function for Aggregate Calculations

## Introduction to Aggregate Functions in SQL
Welcome back! So far, we've explored the **COUNT** function to count rows and the **DISTINCT** keyword to ensure data uniqueness. Now, we will dive into another powerful aggregate function in SQL: **SUM**.

Aggregate functions help us summarize and analyze data. For example, when analyzing a soccer match dataset, you might want to find the total number of goals scored by a team throughout a season. The **SUM** function allows you to add up values in a column, providing valuable insights.

Let's get started by understanding the tools and environment you'll need for this lesson.

---

## Understanding SUM
The **SUM** function is an aggregate operation in SQL used to calculate the total sum of a numerical column in a database. Think of it as a mathematical operation that adds up all the numbers in a set—simple, yet profound!

The syntax is as follows: 

```sql
SUM(column)
```

Where **column** is the name of the column for which you want to calculate the sum.

You might be wondering, "When would I need to use **SUM**?" Consider a situation where you have a matches database like ours and want to determine the total number of events of each season. That's a perfect opportunity to deploy the **SUM** function. Let's see how it works!

---

## Applying SUM in a Query
Here's the basic syntax:

```sql
SELECT SUM(expression) FROM table_name;
```

- **SUM(expression)**: The **SUM()** function expects at least one argument to specify what to sum. The correct usage is **SUM(expression)**, where expression is typically a column name or a numerical value, such as **SUM(1)** to count occurrences or **SUM(column_name)** to sum up values from a specific column.
- **table_name**: The table containing the column you want to sum.

For example, if you want to find the total number of trophies won in all seasons, you would use the **SUM** function on the column that records the number of trophies:

```sql
SELECT SUM(trophies_won) FROM Seasons;
```

### Output:
```
 SUM(trophies_won) 
-------------------
                 28 
```

Let's see a practical example emphasizing our shared interest: soccer matches, and then break it down:

```sql
SELECT Matches.season_id, SUM(1) AS TotalEvents
FROM Matches
JOIN MatchEvents ON Matches.match_id = MatchEvents.match_id
GROUP BY Matches.season_id;
```

### Sneak peek of the output:
```
| season_id | TotalEvents |
|-----------|-------------|
|         1 |           1 |
|         2 |           8 |
```

It might seem complex, but don't worry! We're here to dissect it line by line.

- **SELECT Matches.season_id, SUM(1) AS TotalEvents**: In this query, **SUM(1)** is used to count the number of events for each season. This utilizes the **SUM()** function in a straightforward manner to aggregate the total count of events by season. This part of the query selects rows separately for each group according to the **season_id** from the Matches table and calculates the total number of events by summing **1** for each event in the group.
- **FROM Matches**: This line informs SQL that our main table in this operation is Matches.
- **JOIN MatchEvents ON Matches.match_id = MatchEvents.match_id**: Here, we express our intention to join the Matches table with the MatchEvents table on the common field **match_id**, essentially linking matches and their respective events. Note that **JOIN** here is synonymous with **INNER JOIN**, which ensures that only matching rows between the tables are selected.
- **GROUP BY Matches.season_id**: Finally, we use the **GROUP BY** clause to group the total events by seasons. You will learn more about **GROUP BY** in the next unit!

---

## Common Pitfalls and Tips
When working with the **SUM** function, there are a few common pitfalls to be aware of:

- Bear in mind that **SUM** works with numerical data. Using it on non-numerical columns will result in errors.
- The **AS** keyword, as seen in the code, can make your output more readable by renaming the result of our **SUM** operation. Don't forget to use it as necessary.

---

## Lesson Recap and Looking Ahead
Great job! You've made excellent progress in mastering SQL functions. In this lesson, we learned about the **SUM** function and how to use it to perform aggregate calculations in SQL. We applied it to our Lionel Messi database and calculated the total events in each season.

In the upcoming practice exercises, you'll get the opportunity to apply the **SUM** function, deepen your understanding, and increase your confidence in handling it. Stay determined as you continue to unleash the power of SQL!

## Summing Events by Clubs

Welcome to the lesson on mastering the SUM function in SQL!

To start, let's practice running a basic query using the SUM function. The following query calculates the total number of events for each club from the Matches and MatchEvents tables. Execute the provided SQL query to observe its output.

SELECT Matches.club_id, SUM(1) AS TotalEvents
FROM Matches
JOIN MatchEvents ON Matches.match_id = MatchEvents.match_id
GROUP BY Matches.club_id;

The provided SQL query uses the `SUM` function to calculate the total number of events for each club. Let’s break down how the query works:

```sql
SELECT Matches.club_id, SUM(1) AS TotalEvents
FROM Matches
JOIN MatchEvents ON Matches.match_id = MatchEvents.match_id
GROUP BY Matches.club_id;
```

### Explanation:
- **Matches.club_id**: This column represents the club ID from the `Matches` table, which will be used to group the results.
- **SUM(1)**: This sums up the count of all events for each club. Since `SUM(1)` is equivalent to counting the number of rows that meet the join condition, it effectively gives the total number of events for each club.
- **AS TotalEvents**: The `AS` keyword is used to give the result of the `SUM` function an alias, making it easier to read.
- **JOIN MatchEvents ON Matches.match_id = MatchEvents.match_id**: This joins the `Matches` table with the `MatchEvents` table using the `match_id` column to match events with their respective matches.
- **GROUP BY Matches.club_id**: This groups the results by `club_id`, so that the total number of events is calculated for each club.

### What to Observe:
- Running this query will give you a list of `club_id` values along with the total number of events associated with each club.

## Summing Events by Season

In the previous lesson, you learned the basics of the SUM function in SQL. Now, let's take a step forward and apply the SUM function with an additional condition.

Currently, the query calculates the total number of events for each season. Modify the query to calculate the total number of events only for the seasons in the years 2018 and 2019.

-- TODO: Modify the query to calculate the total number of events only for the seasons in the years 2018 and 2019
SELECT Matches.season_id, SUM(1) AS TotalEvents
FROM Matches
JOIN MatchEvents ON Matches.match_id = MatchEvents.match_id
WHERE YEAR(Matches.date) IN (______, ______)
GROUP BY Matches.season_id;

To modify the query to calculate the total number of events only for the seasons in the years 2018 and 2019, you need to replace the placeholders in the `WHERE` clause with the specific years. Here's how you can write the SQL query:

```sql
-- TODO: Modify the query to calculate the total number of events only for the seasons in the years 2018 and 2019
SELECT Matches.season_id, SUM(1) AS TotalEvents
FROM Matches
JOIN MatchEvents ON Matches.match_id = MatchEvents.match_id
WHERE YEAR(Matches.date) IN (2018, 2019)
GROUP BY Matches.season_id;
```

This query will sum up the events for each `season_id` where the match date falls in the years 2018 or 2019.


## Fix the Aggregate Function Query

You've become familiar with using the SUM function in SQL to perform aggregate calculations. However, the query below is supposed to calculate the total number of events for each season, but it has a small error that needs to be fixed. Your task is to identify and fix the error in the provided SQL query.
SELECT Matches.season_id, SUM() AS TotalEvents
FROM Matches
JOIN MatchEvents ON Matches.match_id = MatchEvents.match_id
GROUP BY Matches.season_id;

The error in the provided SQL query is that the `SUM()` function is missing an argument. The `SUM()` function requires a numeric expression to sum over. In this context, you likely want to count the number of events, which can be done by summing a constant value of 1 for each row in the `MatchEvents` table that corresponds to a match.

Here's the corrected query:

```sql
SELECT Matches.season_id, SUM(1) AS TotalEvents
FROM Matches
JOIN MatchEvents ON Matches.match_id = MatchEvents.match_id
GROUP BY Matches.season_id;
```

This query will calculate the total number of events for each `season_id` by summing up the value `1` for each event associated with a match.

## Summarize Events by Club

In this task, you will work on a partial SQL query to calculate the total number of events for each club using the SUM function. Your task is to fill in the missing parts to complete the query.
-- TODO: Complete the SQL query to calculate total events for each club
SELECT Matches.________, ________ AS TotalEvents
FROM Matches
JOIN MatchEvents ON Matches.________ = MatchEvents.________
GROUP BY Matches.club_id;

To complete the SQL query to calculate the total number of events for each club, you need to fill in the missing parts with the appropriate column names. Assuming that the `club_id` is the identifier for clubs in the `Matches` table, and you want to count each event, you can use the `SUM(1)` to count each event associated with a club. Here's how you can complete the query:

```sql
-- TODO: Complete the SQL query to calculate total events for each club
SELECT Matches.club_id, SUM(1) AS TotalEvents
FROM Matches
JOIN MatchEvents ON Matches.match_id = MatchEvents.match_id
GROUP BY Matches.club_id;
```

This query will calculate the total number of events for each club by summing up the value `1` for each event associated with a match, grouped by `club_id`. Make sure that `club_id` is the correct column representing clubs in your `Matches` table. If your schema uses a different column name for clubs, replace `club_id` accordingly.
        

## Mastering SQL Aggregate Functions
Congratulations on making it this far! Let's put your knowledge into action.

Write an SQL query to calculate the total number of events for each season using the SUM function. Your task is to write a complete SQL query to achieve this.

-- TODO: Write a SQL query to calculate the total number of events for each season


To calculate the total number of events for each season using the SUM function, you can write a similar SQL query. Here’s how to do it:

```sql
-- TODO: Write a SQL query to calculate the total number of events for each season
SELECT Matches.season_id, SUM(1) AS TotalEvents
FROM Matches
JOIN MatchEvents ON Matches.match_id = MatchEvents.match_id
GROUP BY Matches.season_id;

```

### Explanation:
- **SELECT Matches.season**: We select the `season` column from the `Matches` table to group our results by season.
- **SUM(1) AS TotalEvents**: We use `SUM(1)` to count each event, adding up each occurrence to get the total number of events.
- **JOIN MatchEvents ON Matches.match_id = MatchEvents.match_id**: We join the `Matches` table with the `MatchEvents` table using the `match_id` as a common key to connect the tables.
- **GROUP BY Matches.season**: We group the results by the `season` column to aggregate the events for each season. 

This query will give you the total number of events grouped by each season.