# Sorting and grouping
This chapter provides a brief introduction to sorting and grouping your results.

In [1]:
from sqlalchemy import create_engine, inspect
import os

current_directory = os.getcwd()

%load_ext sql
%sql sqlite:///{current_directory}/films.db

## ORDER BY
Congratulations on making it this far! You now know how to select and filter your results.

In this chapter you'll learn how to sort and group your results to gain further insight. Let's go!

In SQL, the `ORDER BY` keyword is used to sort results in ascending or descending order according to the values of one or more columns.

By default `ORDER BY` will sort in ascending order. If you want to sort the results in descending order, you can use the DESC keyword. For example,
```sql
SELECT title
FROM films
ORDER BY release_year DESC;
```
gives you the titles of films sorted by release year, from newest to oldest.

## Sorting single columns
Now that you understand how ORDER BY works, give these exercises a go!

In [2]:
%%sql
SELECT name
FROM people
ORDER BY name
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


name
50 Cent
A. Michael Baldwin
A. Raven Cruz
A.J. Buckley
A.J. DeLucia


In [3]:
%%sql
SELECT name
FROM people
ORDER BY birthdate
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


name
A. Raven Cruz
A.J. DeLucia
Aaron Hann
Aaron Hughes
Aaron Schneider


In [4]:
%%sql
SELECT name, birthdate
FROM people
ORDER BY birthdate
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


name,birthdate
A. Raven Cruz,
A.J. DeLucia,
Aaron Hann,
Aaron Hughes,
Aaron Schneider,


---
## Sorting single columns (2)
Let's get some more practice with ORDER BY!

In [5]:
%%sql
SELECT title
FROM films
WHERE release_year IN (2012,2000)
ORDER BY release_year
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title
102 Dalmatians
28 Days
3 Strikes
Aberdeen
All the Pretty Horses


In [6]:
%%sql
SELECT *
FROM films
WHERE release_year <> 2015
ORDER BY duration
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


id,title,release_year,country,duration,language,certification,gross,budget
1398,Hum To Mohabbat Karega,2000,India,,Hindi,,,
2326,Dil Jo Bhi Kahey...,2005,India,,English,,129319.0,70000000.0
2712,The Naked Ape,2006,USA,,English,,,
3208,Black Water Transit,2009,USA,,English,,,23000000.0
3504,Harry Potter and the Deathly Hallows: Part I,2010,UK,,English,,,


In [7]:
%%sql
SELECT title, gross
FROM films
WHERE title LIKE 'M%'
ORDER BY title
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title,gross
MacGruber,8460995.0
Machete,26589953.0
Machete Kills,7268659.0
Machine Gun McCain,
Machine Gun Preacher,537580.0


---
## Sorting single columns (DESC)
To order results in descending order, you can put the keyword `DESC` after your `ORDER BY`. For example, to get all the names in the `people` table, in reverse alphabetical order:
```sql
SELECT name
FROM people
ORDER BY name DESC;
```
Now practice using `ORDER BY` with `DESC` to sort single columns in descending order!

In [8]:
%%sql
SELECT imdb_score, film_id
FROM reviews
ORDER BY imdb_score DESC
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


imdb_score,film_id
9.5,4960
9.3,742
9.2,178
9.1,4866
9.0,3110


In [9]:
%%sql
SELECT title
FROM films
ORDER BY title DESC
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title
Æon Flux
xXx: State of the Union
xXx
eXistenZ
[Rec] 2


In [10]:
%%sql
SELECT title, duration
FROM films
ORDER BY duration DESC
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title,duration
Carlos,334
"Blood In, Blood Out",330
Heaven's Gate,325
The Legend of Suriyothai,300
Das Boot,293


---
## Sorting multiple columns
`ORDER BY` can also be used to `sort` on multiple columns. It will sort by the first column specified, then sort by the next, then the next, and so on. For example,
```sql
SELECT birthdate, name
FROM people
ORDER BY birthdate, name;
```
sorts on birth dates first (oldest to newest) and then sorts on the names in alphabetical order. **The order of columns is important!**

Try using `ORDER BY` to sort multiple columns! Remember, to specify multiple columns you separate the column names with a comma.

In [11]:
%%sql
SELECT name, birthdate
FROM people
ORDER BY birthdate, name
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


name,birthdate
A. Raven Cruz,
A.J. DeLucia,
Aaron Hann,
Aaron Hughes,
Aaron Schneider,


In [12]:
%%sql
SELECT release_year, duration, title
FROM films
ORDER BY release_year,duration
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


release_year,duration,title
,,Wolf Creek
,22.0,"10,000 B.C."
,22.0,Anger Management
,24.0,Lovesick
,24.0,Yu-Gi-Oh! Duel Monsters


In [13]:
%%sql
SELECT certification, release_year, title
FROM films
ORDER BY certification, release_year
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


certification,release_year,title
,,"10,000 B.C."
,,A Touch of Frost
,,Anger Management
,,Animal Kingdom
,,BrainDead


In [14]:
%%sql
SELECT name, birthdate
FROM people
ORDER BY name, birthdate
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


name,birthdate
50 Cent,1975-07-06
A. Michael Baldwin,1963-04-04
A. Raven Cruz,
A.J. Buckley,1978-02-09
A.J. DeLucia,


---
## GROUP BY
Now you know how to sort results! Often you'll need to aggregate results. For example, you might want to count the number of male and female employees in your company. Here, what you want is to group all the males together and count them, and group all the females together and count them. In SQL, GROUP BY allows you to group a result by one or more columns, like so:
```sql
SELECT sex, count(*)
FROM employees
GROUP BY sex;
```
This might give, for example:

|sex|count|
|-|-|
|male|15|
|female|19|
|-|-|

Commonly, `GROUP BY` is used with aggregate functions like `COUNT()` or `MAX()`. Note that `GROUP BY` always goes after the `FROM` clause!

## GROUP BY practice
As you've just seen, combining aggregate functions with `GROUP BY` can yield some powerful results!

A word of warning: SQL will return an error if you try to `SELECT` a field that is not in your `GROUP BY` clause without using it to calculate some kind of value about the entire group.

Note that you can combine `GROUP BY` with `ORDER BY` to group your results, calculate something about them, and then order your results. For example,
```sql
SELECT sex, count(*)
FROM employees
GROUP BY sex
ORDER BY count DESC;
```
might return something like

|sex|count|
|-|-|
|female|19|
|male|15|
|-|-|

because there are more females at our company than males. Note also that `ORDER BY` always goes after `GROUP BY`. Let's try some exercises!

In [15]:
%%sql
SELECT release_year, count(*)
FROM films
GROUP BY release_year
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


release_year,count(*)
,42
1916.0,1
1920.0,1
1925.0,1
1927.0,1


In [16]:
%%sql
SELECT release_year, AVG(duration)
FROM films
GROUP BY release_year
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


release_year,AVG(duration)
,77.4390243902439
1916.0,123.0
1920.0,110.0
1925.0,151.0
1927.0,145.0


In [17]:
%%sql
SELECT release_year, MAX(budget)
FROM films
GROUP BY release_year
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


release_year,MAX(budget)
,15000000
1916.0,385907
1920.0,100000
1925.0,245000
1927.0,6000000


In [18]:
%%sql
SELECT imdb_score, COUNT(*)
FROM reviews
GROUP BY imdb_score
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


imdb_score,COUNT(*)
1.6,1
1.7,1
1.9,3
2.0,2
2.1,3


---
## GROUP BY practice (2)
Now practice your new skills by combining `GROUP BY` and `ORDER BY` with some more aggregate functions!

Make sure to always put the `ORDER BY` clause at the end of your query. You can't sort values that you haven't calculated yet!

In [19]:
%%sql
SELECT release_year, MIN(gross)
FROM films
GROUP BY release_year
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


release_year,MIN(gross)
,145118.0
1916.0,
1920.0,3000000.0
1925.0,
1927.0,26435.0


In [20]:
%%sql
SELECT language, SUM(gross)
FROM films
GROUP BY language
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


language,SUM(gross)
,4319281
Aboriginal,78680789
Arabic,1681831
Aramaic,499263
Bosnian,301305


In [21]:
%%sql
SELECT country, SUM(budget)
FROM films
GROUP BY country
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


country,SUM(budget)
,3500000
Afghanistan,46000
Argentina,5700000
Aruba,35000000
Australia,1558605523


In [22]:
%%sql
SELECT release_year, country, MAX(budget)
FROM films
GROUP BY release_year, country
ORDER BY release_year, country
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


release_year,country,MAX(budget)
,,
,Australia,15000000.0
,Canada,
,France,
,Iceland,


In [23]:
%%sql
SELECT country, release_year, MIN(gross)
FROM films
GROUP BY release_year,country
ORDER BY country, release_year
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


country,release_year,MIN(gross)
,,
,2014.0,
Afghanistan,2003.0,1127331.0
Argentina,2000.0,1221261.0
Argentina,2004.0,304124.0


---
## HAVING a great time
In SQL, aggregate functions can't be used in `WHERE` clauses. For example, the following query is invalid:
```sql
SELECT release_year
FROM films
GROUP BY release_year
WHERE COUNT(title) > 10;
```
This means that if you want to filter based on the result of an aggregate function, you need another way! That's where the `HAVING` clause comes in. For example,
```sql
SELECT release_year
FROM films
GROUP BY release_year
HAVING COUNT(title) > 10;
```
shows only those years in which more than 10 films were released.

---
## All together now
Time to practice using `ORDER BY`, `GROUP BY` and `HAVING` together.

Now you're going to write a query that returns the average budget and average gross earnings for films in each year after 1990, if the average budget is greater than $60 million.

This is going to be a big query, but you can handle it!

In [24]:
%%sql
SELECT release_year, budget, gross
FROM films
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


release_year,budget,gross
1916,385907.0,
1920,100000.0,3000000.0
1925,245000.0,
1927,6000000.0,26435.0
1929,,9950.0


In [25]:
%%sql
SELECT release_year, budget, gross
FROM films
WHERE release_year > 1990
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


release_year,budget,gross
1991,6000000,869325
1991,20000000,38037513
1991,6000000,57504069
1991,35000000,79100000
1991,15000000,30102717


## All together now (2)
Great work! Now try another large query. This time, all in one go!

Remember, if you only want to return a certain number of results, you can use the `LIMIT` keyword to limit the number of rows returned.

### Instructions
Get the country, average budget, and average gross take of countries that have made more than 10 films. Order the result by country name, and limit the number of results displayed to 5. You should alias the averages as avg_budget and avg_gross respectively.

In [26]:
%%sql
SELECT country, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
FROM films
GROUP BY country
HAVING count(title) > 10
ORDER BY country
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


country,avg_budget,avg_gross
Australia,31172110.46,40205909.571428575
Canada,14798458.71559633,22432066.68055556
China,62219000.0,14143040.736842103
Denmark,13922222.222222222,1418469.111111111
France,30672034.615384616,16350593.578512397


---
## A taste of things to come
Congrats on making it to the end of the course! By now you should have a good understanding of the basics of SQL.

There's one more concept we're going to introduce. You may have noticed that all your results so far have been from just one table, e.g., `films` or `people`.

In the real world however, you will often want to query multiple tables. For example, what if you want to see the IMDB score for a particular movie?

In this case, you'd want to get the ID of the movie from the `films` table and then use it to get IMDB information from the `reviews` table. In SQL, this concept is known as a join, and a basic join is shown in the editor to the right.

The query in the editor gets the IMDB score for the film To Kill a Mockingbird! Cool right?

As you can see, joins are incredibly useful and important to understand for anyone using SQL.

We have a whole follow-up course dedicated to them called [Joining Data in SQL](https://learn.datacamp.com/courses/joining-data-in-postgresql) for you to hone your database skills further!

In [27]:
%%sql
SELECT title, imdb_score
FROM films
JOIN reviews
ON films.id = reviews.film_id
WHERE title = 'To Kill a Mockingbird';

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title,imdb_score
To Kill a Mockingbird,8.4
