# Lesson 2: Utilizing DISTINCT for Uniqueness Exploration

# Utilizing DISTINCT for Uniqueness Exploration

## Introduction to DISTINCT
Hello again! I hope you enjoyed the previous unit on understanding and using the **COUNT** function. Today, we're going to expand your SQL knowledge further by introducing a new concept: the **DISTINCT** keyword in SQL. While DISTINCT is not strictly a function, it's a powerful keyword/clause that will help you strengthen your data querying skills.

When handling data, especially large datasets, we often encounter duplicate values. The **DISTINCT** keyword helps us remove these duplicates and present a clean, unique list of values. Isn't that handy?!

This concept is common in many areas of life. Imagine you're trying to create a list of all the cities from which your friends come. If three friends come from New York, should we count New York three times? Of course not! We use distinct or unique values, and SQL provides the **DISTINCT** keyword to do just that with our data.

As we learn SQL using Messi's matches, we'll be using **DISTINCT** to explore his distinct sports footprints.

---

## The Need for DISTINCT in Our Database
Given Lionel Messi's rich game history, utilizing **DISTINCT** is quite handy. We can use it to identify unique seasons and much more. It will broaden our understanding of his matches data.

Let's first learn the format of a basic SQL query that uses **DISTINCT**:

```sql
SELECT DISTINCT column_name FROM table_name;
```

It is time to apply this format to our Lionel Messi's matches dataset. Consider the following statement:

```sql
SELECT DISTINCT season_id FROM Matches;
```

### Sneak peek of the output:
```
 season_id
-----------
        1
        2
```

This query will fetch all the distinct or unique **season_id** from the **Matches** table. We use **DISTINCT** to avoid getting repeated IDs in the output.

---

## More Applications of DISTINCT
Let's look at a few more examples using the **DISTINCT** keyword.

What if we want to know all the unique clubs Messi has played for? Simple! We would run:

```sql
SELECT DISTINCT club_id FROM Matches;
```

### Output:
```
 club_id
---------
      1
      2
```

When using **DISTINCT**, it's important to remember that fetching unique values from large datasets can be time-consuming and may slow down your queries. Therefore, always consider the performance implications and use **DISTINCT** only when necessary.

---

## Lesson Wrap-Up and Practice Preview
That wraps up our introduction to the **DISTINCT** keyword in SQL! Well done on expanding your SQL toolkit. Today, you learned to use **DISTINCT** to fetch unique values from a database, which is handy in many data analysis scenarios. We also explored how **DISTINCT** can be applied to Lionel Messi's data for deeper insights.

Now, it's time to apply your newfound knowledge in practice! Next up, you'll be tackling some hands-on exercises using **DISTINCT** with the Lionel Messi dataset. Happy coding!

## Extracting Season IDs Query

Let's start by extracting every season_id from the Matches table. This query will showcase the distribution of matches across various seasons.

-- TODO: Select all seasons from Matches

To extract every `season_id` from the `Matches` table and showcase the distribution of matches across various seasons, you can use the `SELECT` statement in SQL. If you want to ensure that each `season_id` is listed only once, you can use the `DISTINCT` keyword. Here's how you can write the SQL query:

```sql
-- TODO: Select all unique season_ids from Matches
SELECT DISTINCT season_id FROM Matches;
```

This query will return a list of unique `season_id` values from the `Matches` table, allowing you to see which seasons have recorded matches.

## Finding Unique Season IDs

Let's start by extracting every season_id from the Matches table. This query will showcase the distribution of matches across various seasons.

Read message out loud
Good try, but it seems like your solution isn't quite there yet. To get unique season_id values, you'll need to use the DISTINCT keyword. Would you like some help with that? 🤔

-- TODO: Select all seasons from Matches

Sure! To find unique `season_id` values from the `Matches` table, you can use the `DISTINCT` keyword in your SQL query. Here's how you can do it:

```sql
SELECT DISTINCT season_id
FROM Matches;
```

This query will return a list of unique `season_id` values, giving you a clear overview of the different seasons represented in the `Matches` table. Let me know if you need further assistance! 😊

## Fix the Unique Season Query

Great job on the previous task! This time, we'd like you to fetch a list of all unique seasons. Please modify the provided starter SQL code to select distinct season_id from the Matches table.

-- TODO: Select distinct seasons
SELECT DISTINCT competition_id
FROM Matches;

You need to update the query to select the unique `season_id` instead of `competition_id`. Here's the corrected query:

```sql
-- TODO: Select distinct seasons
SELECT DISTINCT season_id
FROM Matches;
```

This will fetch a list of all unique `season_id` values from the `Matches` table. Let me know if you have any more queries or need further adjustments! 😊

## Unique Match Venues Retrieval

We have practiced using the DISTINCT keyword in SQL to retrieve unique values from a specific column. However, there seems to be an issue with the given SQL query. It is intended to fetch all the unique season_id entries from the Matches table, but it might not work as expected due to a typo. Your task is to find and fix this error in the query.

-- Select all unique season IDs
SELECT DISTINCT season_id FROM Match;

The issue in the given query is that the table name `Match` is incorrect. It should be `Matches` to match the correct table name. Here's the fixed query:

```sql
-- Select all unique season IDs
SELECT DISTINCT season_id
FROM Matches;
```

This correction ensures the query works properly by referencing the correct table. Let me know if you need any more help! 😊

## Unique Season Identification Task

Awesome work so far! Now, we're going to modify an existing SQL query to retrieve all distinct seasons from a particular venue (away). In our Lionel Messi's dataset, your task is to retrieve all unique seasons belonging to venue = A. Please edit the given starter query to accomplish this task.

-- TODO: Fetch all unique seasons from a specific venue
SELECT ________ ________.season_id
FROM Matches
WHERE Matches.venue = 'A';

To retrieve all unique seasons from the given venue, you can use the `DISTINCT` keyword in the `SELECT` statement. Here’s how you can modify the query:

```sql
-- Fetch all unique seasons from a specific venue
SELECT DISTINCT Matches.season_id
FROM Matches
WHERE Matches.venue = 'A';
```

### Explanation:
- `SELECT DISTINCT`: Ensures that the query returns only unique `season_id` values.
- `Matches.season_id`: Specifies the column from which to retrieve unique season IDs.
- `WHERE Matches.venue = 'A'`: Filters the data to only include matches played at venue 'A'. 

This will return a list of all unique `season_id` values for matches played at venue 'A'.

Great job on learning how to use the DISTINCT keyword! Now, it's time to apply what you've learned. In this task, we'll write an SQL query to identify the unique season IDs from the Matches table. This will help us understand which unique seasons are recorded in the data.

Write an SQL query to fetch all distinct season IDs from the Matches table.

-- TODO: Write an SQL query to select all unique season IDs

You can use the `DISTINCT` keyword to retrieve all unique season IDs from the `Matches` table. Here’s how to write the query:

```sql
-- Fetch all unique season IDs from the Matches table
SELECT DISTINCT season_id
FROM Matches;
```

### Explanation:
- `SELECT DISTINCT season_id`: The `DISTINCT` keyword ensures that only unique `season_id` values are returned.
- `FROM Matches`: Specifies the `Matches` table as the data source. 

This query will give you a list of all distinct season IDs from the `Matches` table.