# Filtering rows
This chapter builds on the first by teaching you how to filter tables for rows satisfying some criteria of interest. You'll learn how to use basic comparison operators, combine multiple criteria, match patterns in text, and much more.

In [1]:
from sqlalchemy import create_engine, inspect
import os

current_directory = os.getcwd()

%load_ext sql
%sql sqlite:///{current_directory}/films.db

## Filtering results
Congrats on finishing the first chapter! You now know how to select columns and perform basic counts. This chapter will focus on filtering your results.

In SQL, the `WHERE` keyword allows you to filter based on both text and numeric values in a table. There are a few different comparison operators you can use:

- `=` equal
- `<>` not equal
- `<` less than
- `>` greater than
- `<=` less than or equal to
- `>=` greater than or equal to
For example, you can filter text records such as `title`. The following code returns all films with the title `'Metropolis'`:
```sql
SELECT title
FROM films
WHERE title = 'Metropolis';
```
Notice that the `WHERE` clause always comes after the `FROM` statement!

**Note that in this course we will use `<>` and not `!=` for the not equal operator, as per the SQL standard.**

What does the following query return?
```sql
SELECT title
FROM films
WHERE release_year > 2000;
```

In [2]:
%%sql
SELECT title
FROM films
WHERE release_year > 2000
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title
15 Minutes
3000 Miles to Graceland
A Beautiful Mind
A Knight's Tale
A.I. Artificial Intelligence


---

## Simple filtering of numeric values
As you learned in the previous exercise, the `WHERE` clause can also be used to filter numeric records, such as years or ages.

For example, the following query selects all details for films with a budget over ten thousand dollars:
```sql
SELECT *
FROM films
WHERE budget > 10000;
```
Now it's your turn to use the `WHERE` clause to filter numeric values!

In [3]:
%%sql
SELECT * 
FROM films 
WHERE release_year = 2016
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


id,title,release_year,country,duration,language,certification,gross,budget
4821,10 Cloverfield Lane,2016,USA,104,English,PG-13,71897215.0,15000000.0
4822,13 Hours,2016,USA,144,English,R,52822418.0,50000000.0
4823,A Beginner's Guide to Snuff,2016,USA,87,English,,,
4824,Airlift,2016,India,130,Hindi,,,4400000.0
4825,Alice Through the Looking Glass,2016,USA,113,English,PG,76846624.0,170000000.0


In [4]:
%%sql
SELECT COUNT(title) 
FROM films 
WHERE release_year < 2000
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


COUNT(title)
1337


In [5]:
%%sql
SELECT title, release_year 
FROM films 
WHERE release_year > 2000
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title,release_year
15 Minutes,2001
3000 Miles to Graceland,2001
A Beautiful Mind,2001
A Knight's Tale,2001
A.I. Artificial Intelligence,2001


---

## Simple filtering of text
Remember, the `WHERE` clause can also be used to filter text results, such as names or countries.

For example, this query gets the titles of all films which were filmed in China:
```sql
SELECT title
FROM films
WHERE country = 'China';
```
Now it's your turn to practice using `WHERE` with text values!

**Important: in PostgreSQL (the version of SQL we're using), you must use single quotes with WHERE.))**

In [6]:
%%sql
SELECT * 
FROM films 
WHERE language = 'French'
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


id,title,release_year,country,duration,language,certification,gross,budget
108,Une Femme Mariée,1964,France,94,French,,,120000
111,Pierrot le Fou,1965,France,110,French,Not Rated,,300000
140,Mississippi Mermaid,1969,France,123,French,R,26893.0,1600000
423,Subway,1985,France,98,French,R,,17000000
662,Les visiteurs,1993,France,107,French,R,700000.0,50000000


In [7]:
%%sql
SELECT name, birthdate 
FROM people 
WHERE birthdate = '1974-11-11';

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


name,birthdate
Leonardo DiCaprio,1974-11-11


In [8]:
%%sql
SELECT COUNT(title) 
FROM films 
WHERE language = 'Hindi';

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


COUNT(title)
28


In [9]:
%%sql
SELECT * 
FROM films 
WHERE certification = 'R'
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


id,title,release_year,country,duration,language,certification,gross,budget
76,Psycho,1960,USA,108,English,R,32000000.0,806947
99,A Fistful of Dollars,1964,Italy,99,Italian,R,3500000.0,200000
134,Rosemary's Baby,1968,USA,136,English,R,,2300000
140,Mississippi Mermaid,1969,France,123,French,R,26893.0,1600000
145,The Wild Bunch,1969,USA,144,English,R,,6244087


---

## WHERE AND
Often, you'll want to select data based on multiple conditions. You can build up your `WHERE` queries by combining multiple conditions with the `AND` keyword.

For example,
```sql
SELECT title
FROM films
WHERE release_year > 1994
AND release_year < 2000;
```
gives you the titles of films released between 1994 and 2000.

Note that you need to specify the column name separately for every `AND` condition, so the following would be invalid:

```sql
SELECT title
FROM films
WHERE release_year > 1994 AND < 2000;
```
You can add as many `AND` conditions as you need!

In [10]:
%%sql
SELECT title, release_year 
FROM films 
WHERE language = 'Spanish'
AND release_year < 2000;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title,release_year
El Mariachi,1992
La otra conquista,1998
Tango,1998


In [11]:
%%sql
SELECT *
FROM films
WHERE language = 'Spanish'
AND release_year > 2000
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


id,title,release_year,country,duration,language,certification,gross,budget
1695,Y Tu Mamá También,2001,Mexico,106,Spanish,R,13622333.0,2000000
1757,El crimen del padre Amaro,2002,Mexico,118,Spanish,R,5709616.0,1800000
1807,Mondays in the Sun,2002,Spain,113,Spanish,R,146402.0,4000000
2173,Live-In Maid,2004,Argentina,83,Spanish,Unrated,,800000
2175,Maria Full of Grace,2004,Colombia,101,Spanish,R,6517198.0,3000000


In [12]:
%%sql
SELECT *
FROM films
WHERE language = 'Spanish'
AND release_year > 2000
AND release_year < 2010
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


id,title,release_year,country,duration,language,certification,gross,budget
1695,Y Tu Mamá También,2001,Mexico,106,Spanish,R,13622333.0,2000000
1757,El crimen del padre Amaro,2002,Mexico,118,Spanish,R,5709616.0,1800000
1807,Mondays in the Sun,2002,Spain,113,Spanish,R,146402.0,4000000
2173,Live-In Maid,2004,Argentina,83,Spanish,Unrated,,800000
2175,Maria Full of Grace,2004,Colombia,101,Spanish,R,6517198.0,3000000


---

## WHERE AND OR
What if you want to select rows based on multiple conditions where some but not all of the conditions need to be met? For this, SQL has the `OR` operator.

For example, the following returns all films released in either 1994 or 2000:
```sql
SELECT title
FROM films
WHERE release_year = 1994
OR release_year = 2000;
```
Note that you need to specify the column for every `OR` condition, so the following is invalid:
```sql
SELECT title
FROM films
WHERE release_year = 1994 OR 2000;
```
When combining `AND` and `OR`, be sure to enclose the individual clauses in parentheses, like so:
```sql
SELECT title
FROM films
WHERE (release_year = 1994 OR release_year = 1995)
AND (certification = 'PG' OR certification = 'R');
```
Otherwise, due to SQL's precedence rules, you may not get the results you're expecting!

---

## WHERE AND OR (2)
You now know how to select rows that meet some but not all conditions by combining `AND` and `OR`.

For example, the following query selects all films that were released in 1994 or 1995 which had a rating of PG or R.
```sql
SELECT title
FROM films
WHERE (release_year = 1994 OR release_year = 1995)
AND (certification = 'PG' OR certification = 'R');
```
Now you'll write a query to get the title and release year of films released in the 90s which were in French or Spanish and which took in more than $2M gross.

In [13]:
%%sql
SELECT title, release_year
FROM films
WHERE release_year >= 1990
AND release_year < 2000
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title,release_year
Arachnophobia,1990
Back to the Future Part III,1990
Child's Play 2,1990
Dances with Wolves,1990
Days of Thunder,1990


In [14]:
%%sql
SELECT title, release_year
FROM films
WHERE (release_year >= 1990 AND release_year < 2000)
AND (language = 'French' OR language = 'Spanish')
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title,release_year
El Mariachi,1992
Les visiteurs,1993
The Horseman on the Roof,1995
When the Cat's Away,1996
The Chambermaid on the Titanic,1997


In [15]:
%%sql
SELECT title, release_year
FROM films
WHERE (release_year >= 1990 AND release_year < 2000)
AND (language = 'French' OR language = 'Spanish')
AND (gross > 2000000);

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title,release_year
El Mariachi,1992
The Red Violin,1998


---

## BETWEEN
As you've learned, you can use the following query to get titles of all films released in and between 1994 and 2000:
```sql
SELECT title
FROM films
WHERE release_year >= 1994
AND release_year <= 2000;
```
Checking for ranges like this is very common, so in SQL the `BETWEEN` keyword provides a useful shorthand for filtering values within a specified range. This query is equivalent to the one above:
```sql
SELECT title
FROM films
WHERE release_year
BETWEEN 1994 AND 2000;
```
It's important to remember that `BETWEEN` is inclusive, meaning the beginning and end values are included in the results!

## BETWEEN (2)
Similar to the `WHERE` clause, the `BETWEEN` clause can be used with multiple `AND` and `OR` operators, so you can build up your queries and make them even more powerful!

For example, suppose we have a table called `kids`. We can get the names of all kids between the ages of 2 and 12 from the United States:
```sql
SELECT name
FROM kids
WHERE age BETWEEN 2 AND 12
AND nationality = 'USA';
```
Take a go at using `BETWEEN` with `AND` on the films data to get the title and release year of all Spanish language films released between 1990 and 2000 (inclusive) with budgets over $100 million. We have broken the problem into smaller steps so that you can build the query as you go along!

In [16]:
%%sql
SELECT title, release_year
FROM films
WHERE release_year BETWEEN 1990 AND 2000
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title,release_year
Arachnophobia,1990
Back to the Future Part III,1990
Child's Play 2,1990
Dances with Wolves,1990
Days of Thunder,1990


In [17]:
%%sql
SELECT title, release_year
FROM films
WHERE release_year BETWEEN 1990 AND 2000
AND budget > 100000000
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title,release_year
Terminator 2: Judgment Day,1991
True Lies,1994
Waterworld,1995
Batman & Robin,1997
Dante's Peak,1997


In [18]:
%%sql
SELECT title, release_year
FROM films
WHERE release_year BETWEEN 1990 AND 2000
AND budget > 100000000
AND language = 'Spanish'
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title,release_year
Tango,1998


In [19]:
%%sql
SELECT title, release_year
FROM films
WHERE release_year BETWEEN 1990 AND 2000
AND budget > 100000000
AND (language = 'Spanish' OR language = 'French');

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title,release_year
Les couloirs du temps: Les visiteurs II,1998
Tango,1998


---

## WHERE IN
As you've seen, `WHERE` is very useful for filtering results. However, if you want to filter based on many conditions, `WHERE` can get unwieldy. For example:
```sql
SELECT name
FROM kids
WHERE age = 2
OR age = 4
OR age = 6
OR age = 8
OR age = 10;
```
Enter the `IN` operator! The `IN` operator allows you to specify multiple values in a `WHERE` clause, making it easier and quicker to specify multiple `OR` conditions! Neat, right?

So, the above example would become simply:
```sql
SELECT name
FROM kids
WHERE age IN (2, 4, 6, 8, 10);
```
Try using the IN operator yourself!

In [20]:
%%sql
SELECT title, release_year
FROM films
WHERE release_year IN (1990, 2000)
AND duration > 120
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title,release_year
Dances with Wolves,1990
Die Hard 2,1990
Ghost,1990
Goodfellas,1990
Mo' Better Blues,1990


In [21]:
%%sql
SELECT title, language
FROM films
WHERE language IN ('English', 'Spanish', 'French')
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title,language
The Broadway Melody,English
Hell's Angels,English
A Farewell to Arms,English
42nd Street,English
She Done Him Wrong,English


In [22]:
%%sql
SELECT title, certification
FROM films
WHERE certification IN ('NC-17', 'R')
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title,certification
Psycho,R
A Fistful of Dollars,R
Rosemary's Baby,R
Mississippi Mermaid,R
The Wild Bunch,R


---

## Introduction to NULL and IS NULL
In SQL, `NULL` represents a missing or unknown value. You can check for `NULL` values using the expression `IS NULL`. For example, to count the number of missing birth dates in the `people` table:
```sql
SELECT COUNT(*)
FROM people
WHERE birthdate IS NULL;
```
As you can see, `IS NULL` is useful when combined with `WHERE` to figure out what data you're missing.

Sometimes, you'll want to filter out missing values so you only get results which are not `NULL`. To do this, you can use the `IS NOT NULL` operator.

For example, this query gives the names of all people whose birth dates are not missing in the `people` table.
```sql
SELECT name
FROM people
WHERE birthdate IS NOT NULL;
```

## NULL and IS NULL
Now that you know what `NULL` is and what it's used for, it's time for some practice!

In [23]:
%%sql
SELECT name
FROM people
WHERE deathdate IS NULL
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


name
50 Cent
A. Michael Baldwin
A. Raven Cruz
A.J. Buckley
A.J. DeLucia


In [24]:
%%sql
SELECT title
FROM films
WHERE budget IS NULL
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title
Pandora's Box
The Prisoner of Zenda
The Blue Bird
Bambi
State Fair


In [25]:
%%sql
SELECT COUNT(title)
FROM films
WHERE language IS NULL;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


COUNT(title)
11


## LIKE and NOT LIKE
As you've seen, the `WHERE` clause can be used to filter text data. However, so far you've only been able to filter by specifying the exact text you're interested in. In the real world, often you'll want to search for a pattern rather than a specific text string.

In SQL, the `LIKE` operator can be used in a `WHERE` clause to search for a pattern in a column. To accomplish this, you use something called a wildcard as a placeholder for some other values. There are two wildcards you can use with `LIKE`:

The `%` wildcard will match zero, one, or many characters in text. For example, the following query matches companies like 'Data', `'DataC'`, `'DataCamp'`, `'DataMind'`, and so on:
```sql
SELECT name
FROM companies
WHERE name LIKE 'Data%';
```
The `_` wildcard will match a single character. For example, the following query matches companies like `'DataCamp'`, `'DataComp'`, and so on:
```sql
SELECT name
FROM companies
WHERE name LIKE 'DataC_mp';
```
You can also use the `NOT LIKE` operator to find records that don't match the pattern you specify.

In [26]:
%%sql
SELECT name
FROM people
WHERE name LIKE 'B%'
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


name
B.J. Novak
Babak Najafi
Babar Ahmed
Bahare Seddiqi
Bai Ling


In [27]:
%%sql
SELECT name
FROM people
WHERE name LIKE '_r%'
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


name
Ara Celi
Aramis Knight
Arben Bajraktaraj
Arcelia Ramírez
Archie Kao


In [28]:
%%sql
SELECT name
FROM people
WHERE name NOT LIKE 'A%'
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


name
50 Cent
Álex Angulo
Álex de la Iglesia
Ángela Molina
B.J. Novak
