# Selecting columns
This chapter provides a brief introduction to working with relational databases. You'll learn about their structure, how to talk about them using database lingo, and how to begin an analysis using simple SQL commands to select and summarize columns from database tables.

---

## Onboarding | Tables

The DataCamp interface for SQL courses contains a few unique features you should be aware of.

For this course, you'll be using a database containing information on almost 5000 films. To the right, underneath the editor, you can see the data in this database by clicking through the tabs.

From looking at the tabs, who is the first person listed in the people table?

In [1]:
from sqlalchemy import create_engine, inspect
import os

current_directory = os.getcwd()

engine = create_engine(f'sqlite:///{current_directory}/films.db')
inspector = inspect(engine)

table_names = inspector.get_table_names()
print(table_names)

['films', 'people', 'reviews', 'roles']


In [2]:
from sqlalchemy import create_engine, inspect
import os

current_directory = os.getcwd()

%load_ext sql
%sql sqlite:///{current_directory}/films.db


---

## Onboarding | Query Result
Notice the query result tab in the bottom right corner of your screen. This is where the results of your SQL queries will be displayed.

Run this query in the editor and check out the resulting table in the query result tab!
```SQL
SELECT name FROM people;
```
Who is the second person listed in the query result?

In [3]:
%%sql
SELECT name
FROM people
limit 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


name
50 Cent
A. Michael Baldwin
A. Raven Cruz
A.J. Buckley
A.J. DeLucia


---

## Onboarding | Errors
If you submit the code to the right, you'll see that you get two types of errors.

SQL errors are shown below the editor. These are errors returned by the SQL engine. You should see:
```SQL
syntax error at or near "'DataCamp <3 SQL'" LINE 2: 'DataCamp <3 SQL' ^
```
DataCamp errors are shown in the Instructions box. These will let you know in plain English where you went wrong in your code! You should see:
```SQL
You need to add SELECT at the start of line 2`
```

In [4]:
%%sql
-- Try running me!
SELECT 'DataCamp <3 SQL'
AS result;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


result
DataCamp <3 SQL


---

## Onboarding | Multi-step Exercises
The following multi-step exercise allows you to practice a new concept through repetition. Check it out!

In [5]:
%%sql
SELECT 'SQL'
AS result;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


result
SQL


In [6]:
%%sql
SELECT 'SQL is'
AS result;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


result
SQL is


In [7]:
%%sql
SELECT 'SQL is cool'
AS result;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


result
SQL is cool


---

## Beginning your SQL journey
Now that you're familiar with the interface, let's get straight into it.

SQL, which stands for Structured Query Language, is a language for interacting with data stored in something called a relational database.

You can think of a relational database as a collection of tables. A table is just a set of rows and columns, like a spreadsheet, which represents exactly one type of entity. For example, a table might represent employees in a company or purchases made, but not both.

Each row, or record, of a table contains information about a single entity. For example, in a table representing employees, each row represents a single person. Each column, or field, of a table contains a single attribute for all rows in the table. For example, in a table representing employees, we might have a column containing first and last names for all employees.

The table of employees might look something like this:

|id|name|age|nationality|
|-|-|-|-|
1|Jessica|22|Ireland
2|Gabriel|48|France
3|Laura|36|USA

## SELECTing single columns
While SQL can be used to create and modify databases, the focus of this course will be querying databases. A query is a request for data from a database table (or combination of tables). Querying is an essential skill for a data scientist, since the data you need for your analyses will often live in databases.

In SQL, you can select data from a table using a `SELECT` statement. For example, the following query selects the `name` column from the `people` table:
```sql
SELECT name
FROM people;
```
In this query, SELECT and FROM are called keywords. In SQL, keywords are not case-sensitive, which means you can write the same query as:

```sql
select name
from people;
```
That said, it's good practice to make SQL keywords uppercase to distinguish them from other parts of your query, like column and table names.

It's also good practice (but not necessary for the exercises in this course) to include a semicolon at the end of your query. This tells SQL where the end of your query is!

Remember, you can see the results of executing your query in the query tab!

### Instructions
Select the title column from the films table.

In [8]:
%%sql
SELECT title
FROM films
limit 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title
Intolerance: Love's Struggle Throughout the Ages
Over the Hill to the Poorhouse
The Big Parade
Metropolis
Pandora's Box


Select the `release_year` column from the films table.

In [9]:
%%sql
SELECT release_year 
FROM films
limit 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


release_year
1916
1920
1925
1927
1929


Select the `name` of each person in the `people` table.

In [10]:
%%sql
SELECT name 
FROM people
limit 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


name
50 Cent
A. Michael Baldwin
A. Raven Cruz
A.J. Buckley
A.J. DeLucia


---

## SELECTing multiple columns
Well done! Now you know how to select single columns.

In the real world, you will often want to select multiple columns. Luckily, SQL makes this really easy. To select multiple columns from a table, simply separate the column names with commas!

For example, this query selects two columns, `name` and `birthdate`, from the `people` table:
```sql
SELECT name, birthdate
FROM people;
```
Sometimes, you may want to select all columns from a table. Typing out every column name would be a pain, so there's a handy shortcut:
```sql
SELECT *
FROM people;
```
If you only want to return a certain number of results, you can use the `LIMIT` keyword to limit the number of rows returned:
```sql
SELECT *
FROM people
LIMIT 10;
```
Before getting started with the instructions below, check out the column names in the `films` table!

In [11]:
%%sql
SELECT title 
FROM films
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title
Intolerance: Love's Struggle Throughout the Ages
Over the Hill to the Poorhouse
The Big Parade
Metropolis
Pandora's Box


In [12]:
%%sql
SELECT title, release_year
FROM films
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title,release_year
Intolerance: Love's Struggle Throughout the Ages,1916
Over the Hill to the Poorhouse,1920
The Big Parade,1925
Metropolis,1927
Pandora's Box,1929


In [13]:
%%sql
SELECT title, release_year, country
FROM films
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title,release_year,country
Intolerance: Love's Struggle Throughout the Ages,1916,USA
Over the Hill to the Poorhouse,1920,USA
The Big Parade,1925,USA
Metropolis,1927,Germany
Pandora's Box,1929,Germany


In [14]:
%%sql
SELECT *
FROM films
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


id,title,release_year,country,duration,language,certification,gross,budget
1,Intolerance: Love's Struggle Throughout the Ages,1916,USA,123,,Not Rated,,385907.0
2,Over the Hill to the Poorhouse,1920,USA,110,,,3000000.0,100000.0
3,The Big Parade,1925,USA,151,,Not Rated,,245000.0
4,Metropolis,1927,Germany,145,German,Not Rated,26435.0,6000000.0
5,Pandora's Box,1929,Germany,110,German,Not Rated,9950.0,


---

## SELECT DISTINCT
Often your results will include many duplicate values. If you want to select all the unique values from a column, you can use the `DISTINCT` keyword.

This might be useful if, for example, you're interested in knowing which languages are represented in the `films` table:
```sql
SELECT DISTINCT language
FROM films;
```
Remember, you can check out the data in the tables by clicking on the table name!

In [15]:
%%sql
SELECT DISTINCT country 
FROM films
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


country
USA
Germany
Japan
Denmark
UK


In [16]:
%%sql
SELECT DISTINCT certification 
FROM films
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


certification
Not Rated
""
Passed
Unrated
Approved


---

## Learning to COUNT
What if you want to count the number of employees in your employees table? The `COUNT()` function lets you do this by returning the number of rows in one or more columns.

For example, this code gives the number of rows in the `people` table:
```sql
SELECT COUNT(*)
FROM people;
```
How many records are contained in the `reviews` table?

In [17]:
%%sql
SELECT COUNT(*)
FROM people;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


COUNT(*)
8397


---

## Practice with COUNT
As you've seen, `COUNT(*)` tells you how many rows are in a table. However, if you want to count the number of non-missing values in a particular column, you can call `COUNT()` on just that column.

For example, to count the number of birth dates present in the `people` table:
```sql
SELECT COUNT(birthdate)
FROM people;
```
It's also common to combine `COUNT()` with `DISTINCT` to count the number of distinct values in a column.

For example, this query counts the number of distinct birth dates contained in the `people` table:
```sql
SELECT COUNT(DISTINCT birthdate)
FROM people;
```
Let's get some practice with `COUNT()`!

In [18]:
%%sql
SELECT COUNT(*) 
FROM people;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


COUNT(*)
8397


In [19]:
%%sql
SELECT COUNT(birthdate)
FROM people;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


COUNT(birthdate)
6152


In [20]:
%%sql
SELECT COUNT(DISTINCT birthdate)
FROM people;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


COUNT(DISTINCT birthdate)
5398


In [21]:
%%sql
SELECT COUNT(DISTINCT language) 
FROM films;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


COUNT(DISTINCT language)
47


In [22]:
%%sql
SELECT COUNT(DISTINCT country) 
FROM films;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


COUNT(DISTINCT country)
64


---

In [23]:
%%sql
SELECT title
FROM films
WHERE release_year > 2000
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title
15 Minutes
3000 Miles to Graceland
A Beautiful Mind
A Knight's Tale
A.I. Artificial Intelligence


---

In [24]:
%%sql
SELECT * 
FROM films 
WHERE release_year = 2016
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


id,title,release_year,country,duration,language,certification,gross,budget
4821,10 Cloverfield Lane,2016,USA,104,English,PG-13,71897215.0,15000000.0
4822,13 Hours,2016,USA,144,English,R,52822418.0,50000000.0
4823,A Beginner's Guide to Snuff,2016,USA,87,English,,,
4824,Airlift,2016,India,130,Hindi,,,4400000.0
4825,Alice Through the Looking Glass,2016,USA,113,English,PG,76846624.0,170000000.0


In [25]:
%%sql
SELECT COUNT(title) 
FROM films 
WHERE release_year < 2000
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


COUNT(title)
1337


In [26]:
%%sql
SELECT title, release_year 
FROM films 
WHERE release_year > 2000
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title,release_year
15 Minutes,2001
3000 Miles to Graceland,2001
A Beautiful Mind,2001
A Knight's Tale,2001
A.I. Artificial Intelligence,2001


---