In [9]:
%%capture
%load_ext sql
%sql postgresql:///imdb_film_data.sql

In [10]:
%%sql

SELECT *
FROM people
LIMIT 10;

Environment variable $DATABASE_URL not set, and no connect string given.
Connection info needed in SQLAlchemy format, example:
               postgresql://username:password@hostname/dbname
               or an existing connection: dict_keys([])


## Onboarding | Tables

If you've used DataCamp to learn [R](https://www.datacamp.com/courses/free-introduction-to-r) or [Python](https://www.datacamp.com/courses/intro-to-python-for-data-science), you'll be familiar with the interface. For SQL, however, there are a few new features you should be aware of.

For this course, you'll be using a database containing information on almost 5000 films. To the right, underneath the editor, you can see the data in this database by clicking through the tabs.

Instructions

1. From looking at the tabs, who is the first person listed in the `people` table?

50 Cent.

## Onboarding | Query Result

Notice the **query result** tab in the bottom right corner of your screen. This is where the results of your SQL queries will be displayed.

Run this query in the editor and check out the resulting table in the query result tab!

```
SELECT name FROM people;
```

Instructions

1. Who is the second person listed in the query result?

A. Michael Baldwin.

## Onboarding | Errors

If you submit the code below, you'll see that you get two types of errors.

```
-- Try running me!
'DataCamp <3 SQL'
AS result;
```

_SQL_ errors are shown below the editor. These are errors returned by the _SQL_ engine. You should see:

```
syntax error at or near "'DataCamp <3 SQL'" LINE 2: 'DataCamp <3 SQL' ^
```

_DataCamp_ errors are shown in the **Instructions** box. These will let you know in plain English where you went wrong in your code! You should see:

```
You need to add SELECT at the start of line 2!
```

Instructions

1. Submit the code, check out the errors, then fix them!

In [None]:
-- Try running me!
SELECT 'DataCamp <3 SQL'
AS result;

# result
# Datacamp <3 SQL

## Onboarding | Bullet Exercises

Another new feature we're introducing is the _bullet exercise_, which allows you to easily practice a new concept through repetition. Check it out below!

Instructions

1. Submit the query in the editor! Don't worry, you'll learn how it works soon.
2. Now change `'SQL'` to `'SQL is'` and click Submit!
3. Finally, change `'SQL is'` to `'SQL is cool'` and check.

In [None]:
SELECT 'SQL'
AS result;

# result
# SQL

In [None]:
SELECT 'SQL is'
AS result;

# result
# SQL is

In [None]:
SELECT 'SQL is cool'
AS result;

# result
# SQL is cool

## Beginning your SQL journey

Now that you're familiar with the interface, let's get straight into it.

SQL, which stands for _Structured Query Language_, is a language for interacting with data stored in something called a _relational database_.

You can think of a relational database as a collection of tables. A table is just a set of rows and columns, like a spreadsheet, which represents exactly one type of entity. For example, a table might represent employees in a company or purchases made, but not both.

Each row, or _record_, of a table contains information about a single entity. For example, in a table representing employees, each row represents a single person. Each column, or _field_, of a table contains a single attribute for all rows in the table. For example, in a table representing employees, we might have a column containing first and last names for all employees.

The table of employees might look something like this:

```
id   name      age      nationality
1    Jessica   22       Ireland
2    Gabriel   48       France
3    Laura     36       USA
```

How many fields does the employees table above contain?

4.

## SELECTing single columns

While SQL can be used to create and modify databases, the focus of this course will be _querying_ databases. A _query_ is a request for data from a database table (or combination of tables). Querying is an essential skill for a data scientist, since the data you need for your analyses will often live in databases.

In SQL, you can select data from a table using a `SELECT` statement. For example, the following query selects the `name` column from the `people` table:

```
SELECT name
FROM people;
```

In this query, `SELECT` and `FROM` are called keywords. In SQL, keywords are not case-sensitive, which means you can write the same query as:

```
select name
from people;
```

That said, it's good practice to make SQL keywords uppercase to distinguish them from other parts of your query, like column and table names.

It's also good practice (but not necessary for the exercises in this course) to include a semicolon at the end of your query. This tells SQL where the end of your query is!

Remember, you can see the results of executing your query in the **query** tab!

Instructions

1. Select the `title` column from the `films` table.
2. Select the `release_year` column from the `films` table.
3. Select the `name` of each person in the `people` table.

In [None]:
SELECT title
FROM films;

# title
# Intolerance: Love's Struggle Throughout the Ages
# Over the Hill to the Poorhouse
# The Big Parade
# Metropolis
# Pandora's Box
# ...

In [None]:
SELECT release_year
FROM films;

# release_year
# 1916
# 1920
# 1925
# 1927
# 1929
# ...

In [None]:
SELECT name
FROM people;

# people
# 50 Cent
# A. Michael Baldwin
# A. Raven Cruz
# A.J. Buckley
# A.J. DeLucia
# ...

## SELECTing multiple columns

Well done! Now you know how to select single columns.

In the real world, you will often want to select multiple columns. Luckily, SQL makes this really easy. To select multiple columns from a table, simply separate the column names with commas!

For example, this query selects two columns, `name` and `birthdate`, from the `people` table:

```
SELECT name, birthdate
FROM people;
```

Sometimes, you may want to select all columns from a table. Typing out every column name would be a pain, so there's a handy shortcut:

```
SELECT *
FROM people;
```

If you only want to return a certain number of results, you can use the `LIMIT` keyword to limit the number of rows returned:

```
SELECT *
FROM people
LIMIT 10;
```

Before getting started with the instructions below, check out the column names in the `films` table!

Instructions

1. Get the title of every film from the `films` table.
2. Get the title and release year for every film.
3. Get the title, release year and country for every film.
4. Get all columns from the `films` table.

In [None]:
SELECT title
FROM films;

# title
# Intolerance: Love's Struggle Throughout the Ages
# Over the Hill to the Poorhouse
# The Big Parade
# Metropolis
# Pandora's Box
# ...

In [None]:
SELECT title, release_year
FROM films;

# title                                                release_year
# Intolerance: Love's Struggle Throughout the Ages     1916
# Over the Hill to the Poorhouse                       1920
# The Big Parade                                       1925
# Metropolis                                           1927
# Pandora's Box                                        1929
# ...

In [None]:
SELECT title, release_year, country
FROM films;

# title                                              release_year   country
# Intolerance: Love's Struggle Throughout the Ages   1916           USA
# Over the Hill to the Poorhouse                     1920           USA
# The Big Parade                                     1925           USA
# Metropolis                                         1927           Germany
# Pandora's Box                                      1929           Germany
# ...

In [None]:
SELECT *
FROM films;

# id   title                                              release_year   country   duration   language   certification   gross     budget
# 1    Intolerance: Love's Struggle Throughout the Ages   1916           USA       123        null       Not Rated       null      385907
# 2    Over the Hill to the Poorhouse                     1920           USA       110        null       null            3000000   100000
# 3    The Big Parade                                     1925           USA       151        null       Not Rated       null      245000
# 4    Metropolis                                         1927           Germany   145        German     Not Rated       26435     6000000
# 5    Pandora's Box                                      1929           Germany   110        German     Not Rated       9950      null
# ...

## SELECT DISTINCT

Often your results will include many duplicate values. If you want to select all the unique values from a column, you can use the `DISTINCT` keyword.

This might be useful if, for example, you're interested in knowing which languages are represented in the `films` table:

```
SELECT DISTINCT language
FROM films;
```

Remember, you can check out the data in the tables by clicking on the table name!

Instructions

1. Get all the unique countries represented in the `films` table.
2. Get all the different film certifications from the `films` table.
3. Get the different types of film roles from the `roles` table.

In [None]:
SELECT DISTINCT country
FROM films;

# country
# null
# Soviet Union
# Indonesia
# Italy
# Cameroon
# ...

In [None]:
SELECT DISTINCT certification
FROM films;

# certification
# Unrated
# M
# G
# NC-17
# GP
# ...

In [None]:
SELECT DISTINCT role
FROM roles;

# role
# director
# actor

## Learning to COUNT

What if you want to count the number of employees in your employees table? The `COUNT` statement lets you do this by returning the number of rows in one or more columns.

For example, this code gives the number of rows in the `people` table:

```
SELECT COUNT(*)
FROM people;
```

How many records are contained in the `reviews` table?

In [None]:
SELECT COUNT(*)
FROM reviews;

# count
# 4968

## Practice with COUNT

As you've seen, `COUNT(*)` tells you how many rows are in a table. However, if you want to count the number of _non-missing_ values in a particular column, you can call `COUNT` on just that column.

For example, to count the number of birth dates present in the `people` table:

```
SELECT COUNT(birthdate)
FROM people;
```

It's also common to combine `COUNT` with `DISTINCT` to count the number of _distinct_ values in a column.

For example, this query counts the number of distinct birth dates contained in the `people` table:

```
SELECT COUNT(DISTINCT birthdate)
FROM people;
```

Let's get some practice with `COUNT`!

Instructions

1. Count the number of rows in the `people` table.
2. Count the number of (non-missing) birth dates in the `people` table.
3. Count the number of unique birth dates in the `people` table.
4. Count the number of unique languages in the `films` table.
5. Count the number of unique countries in the `films` table.

In [None]:
SELECT COUNT(*)
FROM people;

# count
# 8397

In [None]:
SELECT COUNT(birthdate)
FROM people;

# count
# 6152

In [None]:
SELECT COUNT(DISTINCT birthdate)
FROM people;

# count
# 5398

In [None]:
SELECT COUNT(DISTINCT language)
FROM films;

# count
# 47

In [None]:
SELECT COUNT(DISTINCT country)
FROM films;

# count
# 64