<a href="https://colab.research.google.com/github/villafue/Progamming/blob/main/SQL/Tutorial/Introduction%20to%20SQL/2%20Filtering%20rows/2_Filtering_rows.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Filtering rows

This chapter builds on the first by teaching you how to filter tables for rows satisfying some criteria of interest. You'll learn how to use basic comparison operators, combine multiple criteria, match patterns in text, and much more.

# Filtering results

Congrats on finishing the first chapter! You now know how to select columns and perform basic counts. This chapter will focus on filtering your results.

In SQL, the WHERE keyword allows you to filter based on both text and numeric values in a table. There are a few different comparison operators you can use:

```
    = equal
    <> not equal
    < less than
    > greater than
    <= less than or equal to
    >= greater than or equal to
```

For example, you can filter text records such as title. The following code returns all films with the title 'Metropolis':

```
SELECT title
FROM films
WHERE title = 'Metropolis';
```

Notice that the WHERE clause always comes after the FROM statement!

**Note that in this course we will use <> and not != for the not equal operator, as per the SQL standard.**

What does the following query return?

```
SELECT title
FROM films
WHERE release_year > 2000;
```

Possible Answers

1. Films released before the year 2000

2. Films released after the year 2000 - Correct!

3. Films released after the year 2001

4. Films released in 2000

# Simple filtering of numeric values

As you learned in the previous exercise, the WHERE clause can also be used to filter numeric records, such as years or ages.

For example, the following query selects all details for films with a budget over ten thousand dollars:

```
SELECT *
FROM films
WHERE budget > 10000;
```

Now it's your turn to use the WHERE clause to filter numeric values!

Instructions

1. Get all details for all films released in 2016.

In [None]:
SELECT * FROM films WHERE release_year = 2016;

'''
id	title	release_year	country	duration	language	certification	gross	budget
4821	10 Cloverfield Lane	2016	USA	104	English	PG-13	71897215	15000000
4822	13 Hours	2016	USA	144	English	R	52822418	50000000
4823	A Beginner's Guide to Snuff	2016	USA	87	English	null	null	null
...
'''

2. Get the number of films released before 2000.

In [None]:
SELECT COUNT(*) FROM films WHERE release_year < 2000;

'''
count
1337
'''

3. Get the title and release year of films released after 2000.

In [None]:
SELECT title, release_year FROM films WHERE release_year > 2000;

'''
title	release_year
15 Minutes	2001
3000 Miles to Graceland	2001
A Beautiful Mind	2001
...
'''

Conclusion

Great job! After filtering of numeric values, it's time to explore filtering of text!

# Simple filtering of text

Remember, the WHERE clause can also be used to filter text results, such as names or countries.

For example, this query gets the titles of all films which were filmed in China:

```
SELECT title
FROM films
WHERE country = 'China';
```

Now it's your turn to practice using WHERE with text values!

**Important: in PostgreSQL (the version of SQL we're using), you must use single quotes with WHERE.**

Instructions

1. Get all details for all French language films.

In [None]:
SELECT * FROM films WHERE language = 'French'

'''
id	title	release_year	country	duration	language	certification	gross	budget
108	Une Femme Mariée	1964	France	94	French	null	null	120000
111	Pierrot le Fou	1965	France	110	French	Not Rated	null	300000
140	Mississippi Mermaid	1969	France	123	French	R	26893	1600000
...
'''

2. Get the name and birth date of the person born on November 11th, 1974. Remember to use ISO date format ('1974-11-11')!

In [None]:
SELECT name, birthdate FROM people WHERE birthdate = '1974-11-11'

'''
name	birthdate
Leonardo DiCaprio	1974-11-11
'''

3. Get the number of Hindi language films.

In [None]:
SELECT COUNT(*) FROM films WHERE language = 'Hindi'

'''
count
28
'''

4. Get all details for all films with an R certification.

In [None]:
SELECT * FROM films WHERE certification = 'R'

'''
id	title	release_year	country	duration	language	certification	gross	budget
76	Psycho	1960	USA	108	English	R	32000000	806947
99	A Fistful of Dollars	1964	Italy	99	Italian	R	3500000	200000
134	Rosemary's Baby	1968	USA	136	English	R	null	2300000
...
'''

Conclusion

Wonderful! Let's look at combining different conditions now!

# WHERE AND

Often, you'll want to select data based on multiple conditions. You can build up your WHERE queries by combining multiple conditions with the AND keyword.

For example,

```
SELECT title
FROM films
WHERE release_year > 1994
AND release_year < 2000;
```

gives you the titles of films released between 1994 and 2000.

Note that you need to specify the column name separately for every AND condition, so the following would be invalid:

```
SELECT title
FROM films
WHERE release_year > 1994 AND < 2000;
```

You can add as many AND conditions as you need!

Instructions

1. Get the title and release year for all Spanish language films released before 2000.

In [None]:
SELECT title, release_year FROM films WHERE language = 'Spanish' AND release_year < 2000;

'''
title	release_year
El Mariachi	1992
La otra conquista	1998
Tango	1998
Showing 3 out of 3 rows
'''

2. Get all details for Spanish language films released after 2000.

In [None]:
SELECT * FROM films WHERE language = 'Spanish' AND release_year > 2000;

'''
Id	title	release_year	country	duration	language	certification	gross	budget
1695	Y Tu Mamá También	2001	Mexico	106	Spanish	R	13622333	2000000
1757	El crimen del padre Amaro	2002	Mexico	118	Spanish	R	5709616	1800000
1807	Mondays in the Sun	2002	Spain	113	Spanish	R	146402	4000000
...
'''

3. Get all details for Spanish language films released after 2000, but before 2010.

In [None]:
SELECT * FROM films WHERE language = 'Spanish' AND release_year > 2000 AND release_year < 2010;

'''
id	title	release_year	country	duration	language	certification	gross	budget
1695	Y Tu Mamá También	2001	Mexico	106	Spanish	R	13622333	2000000
1757	El crimen del padre Amaro	2002	Mexico	118	Spanish	R	5709616	1800000
1807	Mondays in the Sun	2002	Spain	113	Spanish	R	146402	4000000
...
'''

Conclusion

Great work! Being able to combine conditions with AND will prove to be very useful if you only want your query to return a specific subset of records!

# WHERE AND OR

What if you want to select rows based on multiple conditions where some but not all of the conditions need to be met? For this, SQL has the OR operator.

For example, the following returns all films released in either 1994 or 2000:

```
SELECT title
FROM films
WHERE release_year = 1994
OR release_year = 2000;
```

Note that you need to specify the column for every OR condition, so the following is invalid:

```
SELECT title
FROM films
WHERE release_year = 1994 OR 2000;
```

When combining AND and OR, be sure to enclose the individual clauses in parentheses, like so:

```
SELECT title
FROM films
WHERE (release_year = 1994 OR release_year = 1995)
AND (certification = 'PG' OR certification = 'R');
```

Otherwise, due to SQL's precedence rules, you may not get the results you're expecting!

What does the OR operator do?

Possible Answers

1. Display only rows that meet at least one of the specified conditions.
 - Correct!
 
2. Display only rows that meet all of the specified conditions.
 - Incorrect. OR does not only display rows that meet all of the specified conditions.

3. Display only rows that meet none of the specified conditions.
 - Incorrect. OR does not display rows that meet none of the specified conditions.


# WHERE AND OR (2)

You now know how to select rows that meet some but not all conditions by combining AND and OR.

For example, the following query selects all films that were released in 1994 or 1995 which had a rating of PG or R.

```
SELECT title
FROM films
WHERE (release_year = 1994 OR release_year = 1995)
AND (certification = 'PG' OR certification = 'R');
```

Now you'll write a query to get the title and release year of films released in the 90s which were in French or Spanish and which took in more than $2M gross.

It looks like a lot, but you can build the query up one step at a time to get comfortable with the underlying concept in each step. Let's go!

Instructions

1. Get the title and release year for films released in the 90s.

In [None]:
SELECT title, release_year
FROM films
WHERE release_year >= 1990
AND release_year < 2000;

'''
title	release_year
Arachnophobia	1990
Back to the Future Part III	1990
Child's Play 2	1990
...
'''

2. Now, build on your query to filter the records to only include French or Spanish language films.

In [None]:
SELECT title, release_year
FROM films
WHERE (release_year >= 1990 AND release_year < 2000)
AND (language = 'French' OR language = 'Spanish');

'''
title	release_year
El Mariachi	1992
Les visiteurs	1993
The Horseman on the Roof	1995
When the Cat's Away	1996
The Chambermaid on the Titanic	1997
The Swindle	1997
La otra conquista	1998
Les couloirs du temps: Les visiteurs II	1998
Tango	1998
The Red Violin	1998
Showing 10 out of 10 rows
'''

3. Finally, restrict the query to only return films that took in more than $2M gross.

In [None]:
SELECT title, release_year
FROM films
WHERE (release_year >= 1990 AND release_year < 2000)
AND (language = 'French' OR language = 'Spanish')
AND gross > 2000000;

'''
title	release_year
El Mariachi	1992
The Red Violin	1998
Showing 2 out of 2 rows
'''

# BETWEEN

As you've learned, you can use the following query to get titles of all films released in and between 1994 and 2000:

```
SELECT title
FROM films
WHERE release_year >= 1994
AND release_year <= 2000;
```

Checking for ranges like this is very common, so in SQL the BETWEEN keyword provides a useful shorthand for filtering values within a specified range. This query is equivalent to the one above:

```
SELECT title
FROM films
WHERE release_year
BETWEEN 1994 AND 2000;
```

It's important to remember that BETWEEN is inclusive, meaning the beginning and end values are included in the results!

What does the BETWEEN keyword do?

Possible Answers

1. Filter numeric values - Incorrect. BETWEEN does not just filter numeric values.

2. Filter text values - Incorrect. BETWEEN does not just filter text values.

3. Filter values in a specified list - Incorrect!

4. Filter values in a specified range - Correct!

# BETWEEN (2)

Similar to the WHERE clause, the BETWEEN clause can be used with multiple AND and OR operators, so you can build up your queries and make them even more powerful!

For example, suppose we have a table called kids. We can get the names of all kids between the ages of 2 and 12 from the United States:

```
SELECT name
FROM kids
WHERE age BETWEEN 2 AND 12
AND nationality = 'USA';
```

Take a go at using BETWEEN with AND on the films data to get the title and release year of all Spanish language films released between 1990 and 2000 (inclusive) with budgets over $100 million. We have broken the problem into smaller steps so that you can build the query as you go along!

Instructions

1. Get the title and release year of all films released between 1990 and 2000 (inclusive).

In [None]:
SELECT title, release_year
FROM films
WHERE release_year
BETWEEN 1990 AND 2000;

'''
title	release_year
Arachnophobia	1990
Back to the Future Part III	1990
Child's Play 2	1990
...
'''

2. Now, build on your previous query to select only films that have budgets over $100 million.

In [None]:
SELECT title, release_year
FROM films
WHERE release_year BETWEEN 1990 AND 2000
AND budget > 100000000;

'''
title	release_year
Terminator 2: Judgment Day	1991
True Lies	1994
Waterworld	1995
...
'''

3. Now restrict the query to only return Spanish language films.

In [None]:
SELECT title, release_year
FROM films
WHERE release_year BETWEEN 1990 AND 2000
AND budget > 100000000
AND language = 'Spanish'

'''
title	release_year
Tango	1998
'''

4. Finally, modify to your previous query to include all Spanish language or French language films with the same criteria as before. Don't forget your parentheses!

In [None]:
SELECT title, release_year
FROM films
WHERE release_year BETWEEN 1990 AND 2000
AND budget > 100000000
AND (language = 'Spanish' OR language = 'French');

'''
title	release_year
Les couloirs du temps: Les visiteurs II	1998
Tango	1998
'''

Conclusion

Well done! Off to the next filtering operator!

# WHERE IN

As you've seen, WHERE is very useful for filtering results. However, if you want to filter based on many conditions, WHERE can get unwieldy. For example:

```
SELECT name
FROM kids
WHERE age = 2
OR age = 4
OR age = 6
OR age = 8
OR age = 10;
```

Enter the IN operator! The IN operator allows you to specify multiple values in a WHERE clause, making it easier and quicker to specify multiple OR conditions! Neat, right?

So, the above example would become simply:

```
SELECT name
FROM kids
WHERE age IN (2, 4, 6, 8, 10);
```

Try using the IN operator yourself!

Instructions

1. Get the title and release year of all films released in 1990 or 2000 that were longer than two hours. Remember, duration is in minutes!

In [None]:
SELECT title, release_year
FROM films
WHERE release_year IN (1990, 2000)
AND duration > 120;

'''
title	release_year
Dances with Wolves	1990
Die Hard 2	1990
Ghost	1990
'''

2. Get the title and language of all films which were in English, Spanish, or French.

In [None]:
SELECT title, language
FROM films
WHERE language IN ('English', 'Spanish', 'French');

'''
title	language
The Broadway Melody	English
Hell's Angels	English
A Farewell to Arms	English
'''

3. Get the title and certification of all films with an NC-17 or R certification.

In [None]:
SELECT title, certification
FROM films
WHERE certification in ('NC-17', 'R');

'''
title	certification
Psycho	R
A Fistful of Dollars	R
Rosemary's Baby	R
'''

Conclusion

Your SQL vocabulary is growing by the minute!

# Introduction to NULL and IS NULL

In SQL, NULL represents a missing or unknown value. You can check for NULL values using the expression IS NULL. For example, to count the number of missing birth dates in the people table:

```
SELECT COUNT(*)
FROM people
WHERE birthdate IS NULL;
```

As you can see, IS NULL is useful when combined with WHERE to figure out what data you're missing.

Sometimes, you'll want to filter out missing values so you only get results which are not NULL. To do this, you can use the IS NOT NULL operator.

For example, this query gives the names of all people whose birth dates are not missing in the people table.

```
SELECT name
FROM people
WHERE birthdate IS NOT NULL;
```

What does NULL represent?

Possible Answers

1. A corrupt entry - Incorrect. We can not be sure that a NULL value is actually corrupt.

2. A missing value - Correct! NULL is used to represent unknown values.

3. An empty string - Incorrect. An empty string is not the same as a NULL value.

4. An invalid value - Incorrect!

# NULL and IS NULL

Now that you know what NULL is and what it's used for, it's time for some practice!

Instructions

1. Get the names of people who are still alive, i.e. whose death date is missing.

In [None]:
SELECT *
FROM people
WHERE deathdate IS NULL;

'''
id	name	birthdate	deathdate
1	50 Cent	1975-07-06	null
2	A. Michael Baldwin	1963-04-04	null
3	A. Raven Cruz	null	null
'''

2. Get the title of every film which doesn't have a budget associated with it.

In [None]:
SELECT title
FROM films
WHERE budget IS NULL;

'''
title
Pandora's Box
The Prisoner of Zenda
The Blue Bird
'''

3. Get the number of films which don't have a language associated with them.

In [None]:
SELECT COUNT(*)
FROM films
WHERE language IS NULL;

'''
count
11
'''

Conclusion

Alright! Are you ready for a last type of operator?

# LIKE and NOT LIKE

As you've seen, the WHERE clause can be used to filter text data. However, so far you've only been able to filter by specifying the exact text you're interested in. In the real world, often you'll want to search for a pattern rather than a specific text string.

In SQL, the LIKE operator can be used in a WHERE clause to search for a pattern in a column. To accomplish this, you use something called a wildcard as a placeholder for some other values. There are two wildcards you can use with LIKE:

The `%` wildcard will match zero, one, or many characters in text. For example, the following query matches companies like 'Data', 'DataC' 'DataCamp', 'DataMind', and so on:

```
SELECT name
FROM companies
WHERE name LIKE 'Data%';
```

The `_` wildcard will match a single character. For example, the following query matches companies like 'DataCamp', 'DataComp', and so on:

```
SELECT name
FROM companies
WHERE name LIKE 'DataC_mp';
```

You can also use the `NOT LIKE` operator to find records that don't match the pattern you specify.

Got it? Let's practice!

Instructions

1. Get the names of all people whose names begin with 'B'. The pattern you need is 'B%'.

In [None]:
SELECT name
FROM people
WHERE name LIKE 'B%';

'''
name
B.J. Novak
Babak Najafi
Babar Ahmed
'''

2. Get the names of people whose names have 'r' as the second letter. The pattern you need is '_r%'.

In [None]:
SELECT name
FROM people
WHERE name LIKE '_r%';

3. Get the names of people whose names don't start with A. The pattern you need is 'A%'.

In [None]:
SELECT name
FROM people
WHERE name NOT LIKE 'A%';

'''
name
Ara Celi
Aramis Knight
Arben Bajraktaraj
'''

Conclusion

This concludes the second chapter of the intro to SQL course. Rush over to chapter 3 if you want to learn more about aggregate functions!