## ORDER BY

Congratulations on making it this far! You now know how to select and filter your results.

In this chapter you'll learn how to sort and group your results to gain further insight. Let's go!

In SQL, the `ORDER BY` keyword is used to sort results in ascending or descending order according to the values of one or more columns.

By default `ORDER BY` will sort in ascending order. If you want to sort the results in descending order, you can use the `DESC` keyword. For example,

```
SELECT title
FROM films
ORDER BY release_year DESC;
```

gives you the titles of films sorted by release year, from newest to oldest.

Instructions

1. How do you think `ORDER BY` sorts a column of text values by default?

Alphabetically (A-Z).

## Sorting single columns

Now that you understand how `ORDER BY` works, give these exercises a go!

Instructions

1. Get the names of people from the `people` table, sorted alphabetically.
2. Get the names of people, sorted by birth date.
3. Get the birth date and name for every person, in order of when they were born.

In [None]:
SELECT name
FROM people
ORDER BY name;

# name
# 50 Cent
# A. Michael Baldwin
# A. Raven Cruz
# A.J. Buckley
# A.J. DeLucia
# ...

In [None]:
SELECT name
FROM people
ORDER BY birthdate;

# name
# Robert Shaw
# Lucille La Verne
# Mary Carr
# D.W. Griffith
# Finlay Currie
# ...

In [None]:
SELECT birthdate, name
FROM people
ORDER BY birthdate;

# birthdate    name
# 1837-10-10   Robert Shaw
# 1872-11-07   Lucille La Verne
# 1874-03-14   Mary Carr
# 1875-01-22   D.W. Griffith
# 1878-01-20   Finlay Currie
# ...

## Sorting single columns (2)

Let's get some more practice with `ORDER BY`!

Instructions

1. Get the title of films released in 2000 or 2012, in the order they were released.
2. Get all details for all films except those released in 2015 and order them by duration.
3. Get the title and gross earnings for movies which begin with the letter 'M' and order the results alphabetically.

In [None]:
SELECT title
FROM films
WHERE release_year IN (2000, 2012)
ORDER BY release_year;

# title
# 102 Dalmatians
# 28 Days
# 3 Strikes
# Aberdeen
# All the Pretty Horses
# ...

In [None]:
SELECT *
FROM films
WHERE release_year <> 2015
ORDER BY duration;

# id     title                                                  release_year   country   duration     language   certification   gross    budget
# 2926   The Touch                                              2007           USA       7            English    null            null     13000
# 4098   Vessel                                                 2012           USA       14           English    null            null     null
# 2501   Wal-Mart: The High Cost of Low Price                   2005           USA       20           English    Not Rated       null     1500000
# 566    Marilyn Hotchkiss' Ballroom Dancing and Charm School   1990           USA       34           English    null            333658   34000
# 2829   Jesus People                                           2007           USA       35           English    null            null     null
# ...

In [None]:
SELECT title, gross
FROM films
WHERE title LIKE 'M%'
ORDER BY title;

# title                  gross
# MacGruber              8460995
# Machete                26589953
# Machete Kills          7268659
# Machine Gun McCain     null
# Machine Gun Preacher   537580
# ...

## Sorting single columns (DESC)

To order results in _descending_ order, you can put the keyword `DESC` after your `ORDER BY`. For example, to get all the names in the `people` table, in reverse alphabetical order:

```
SELECT name
FROM people
ORDER BY name DESC;
```

Now practice using `ORDER BY` with `DESC` to sort single columns in descending order!

Instructions

1. Get the IMDB score and film ID for every film from the reviews table, sorted from highest to lowest score.
2. Get the title for every film, in reverse order.
3. Get the title and duration for every film, in order of longest duration to shortest.

In [None]:
SELECT imdb_score, film_id
FROM reviews
ORDER BY imdb_score DESC;

# imdb_score   film_id
# 9.5          4960
# 9.3          742
# 9.2          178
# 9.1          4866
# 9            3110
# ...

In [None]:
SELECT title
FROM films
ORDER BY title DESC;

# title
# Æon Flux
# xXx: State of the Union
# xXx
# eXistenZ
# [Rec] 2
# ...

In [None]:
SELECT title, duration
FROM films
ORDER BY duration DESC;

# title                                          duration
# Destiny                                        null
# Should've Been Romeo                           null
# Hum To Mohabbat Karega                         null
# Harry Potter and the Deathly Hallows: Part I   null
# Barfi                                          null
# ...

## Sorting multiple columns

`ORDER BY` can also be used to sort on multiple columns. It will sort by the first column specified, then sort by the next, then the next, and so on. For example,

```
SELECT birthdate, name
FROM people
ORDER BY birthdate, name;
```

sorts on birth dates first (oldest to newest) and then sorts on the names in alphabetical order. **The order of columns is important!**

Try using `ORDER BY` to sort multiple columns! Remember, to specify multiple columns you separate the column names with a comma.

Instructions

1. Get the birth date and name of people in the `people` table, in order of when they were born and alphabetically by name.
2. Get the release year, duration, and title of films ordered by their release year and duration.
3. Get certifications, release years, and titles of films ordered by certification (alphabetically) and release year.
4. Get the names and birthdates of people ordered by name and birth date.

In [None]:
SELECT birthdate, name
FROM people
ORDER BY birthdate, name;

# birthdate    name
# 1837-10-10   Robert Shaw
# 1872-11-07   Lucille La Verne
# 1874-03-14   Mary Carr
# 1875-01-22   D.W. Griffith
# 1878-01-20   Finlay Currie
# ...

In [None]:
SELECT release_year, duration, title
FROM films
ORDER BY release_year, duration;

# release_year   duration   title
# 1916           123        Intolerance: Love's Struggle Throughout the Ages
# 1920           110        Over the Hill to the Poorhouse
# 1925           151        The Big Parade
# 1927           145        Metropolis
# 1929           100        The Broadway Melody
# ...

In [None]:
SELECT certification, release_year, title
FROM FILMS
ORDER BY certification, release_year;

# certification   release_year   title
# Approved        1933           She Done Him Wrong
# Approved        1935           Top Hat
# Approved        1936           The Charge of the Light Brigade
# Approved        1937           Snow White and the Seven Dwarfs
# Approved        1937           The Prisoner of Zenda
# ...

In [None]:
SELECT name, birthdate
FROM people
ORDER BY name, birthdate;

# name                 birthdate
# 50 Cent              1975-07-06
# A. Michael Baldwin   1963-04-04
# A. Raven Cruz        null
# A.J. Buckley         1978-02-09
# A.J. DeLucia         null
# ...

## GROUP BY

Now you know how to sort results! Often you'll need to aggregate results. For example, you might want to count the number of male and female employees in your company. Here, what you want is to group all the males together and count them, and group all the females together and count them. In SQL, `GROUP BY` allows you to group a result by one or more columns, like so:

```
SELECT sex, count(*)
FROM employees
GROUP BY sex;
```

This might give, for example:

```
sex      count
male     15
female   19
```

Commonly, `GROUP BY` is used with _aggregate functions_ like `COUNT()` or `MAX()`. Note that `GROUP BY` always goes after the `FROM` clause!

Instructions

1. What is `GROUP BY` used for?

Performing operations by group.

## GROUP BY practice

As you've just seen, combining aggregate functions with `GROUP BY` can yield some powerful results!

A word of warning: SQL will return an error if you try to `SELECT` a field that is not in your `GROUP BY` clause without using it to calculate some kind of value about the entire group.

Note that you can combine `GROUP BY` with `ORDER BY` to group your results, calculate something about them, and then order your results. For example,

```
SELECT sex, count(*)
FROM employees
GROUP BY sex
ORDER BY count DESC;
```

might return something like

```
sex      count
female   19
male     15
```

because there are more females at our company than males. Note also that `ORDER BY` always goes after `GROUP BY`. Let's try some exercises!

Instructions

1. Get the release year and count of films released in each year.
2. Get the release year and average duration of all films, grouped by release year.
3. Get the release year and largest budget for all films, grouped by release year.
4. Get the IMDB score and count of film reviews grouped by IMDB score in the `reviews` table.

In [None]:
SELECT release_year, COUNT(*)
FROM films
GROUP BY release_year;

# release_year   count
# 1954           5
# 1988           31
# 1959           3
# 1964           10
# 1969           10
# ...

In [None]:
SELECT release_year, AVG(duration)
FROM films
GROUP BY release_year;

# release_year   avg
# 1954           140.6000000000000000
# 1988           107.0000000000000000
# 1959           136.6666666666666667
# 1964           119.4000000000000000
# 1969           126.0000000000000000
# ...

In [None]:
SELECT release_year, MAX(budget)
FROM films
GROUP BY release_year;

# release_year   max
# 1954           5000000
# 1988           1100000000
# 1959           5000000
# 1964           19000000
# 1969           20000000
# ...

In [None]:
SELECT imdb_score, COUNT(*)
FROM reviews
GROUP BY imdb_score;

# imdb_score   count
# 5.7          117
# 8.7          11
# 9            2
# 9.1          1
# 8.3          37
# ...

## GROUP BY practice (2)

Now practice your new skills by combining `GROUP BY` and `ORDER BY` with some more aggregate functions!

Make sure to always put the `ORDER BY` clause at the end of your query. You can't sort values that you haven't calculated yet!

Instructions

1. Get the release year and lowest gross earnings per release year.
2. Get the language and total gross amount films in each language made.
3. Get the country and total budget spent making movies in each country.
4. Get the release year, country, and highest budget spent making a film for each year, for each country. Sort your results by release year and country.
5. Get the country, release year, and lowest amount grossed per release year per country. Order your results by country and release year.

In [None]:
SELECT release_year, MIN(gross)
FROM films
GROUP BY release_year;

# release_year   min
# 1954           269061
# 1988           439162
# 1959           25000000
# 1964           12438
# 1969           26893
# ...

In [None]:
SELECT language, SUM(gross)
FROM films
GROUP BY language;

# language   sum
# Danish     2403857
# Greek      110197
# Dzongkha   505295
# None       2601847
# null       4319281
# ...

In [None]:
SELECT country, SUM(budget)
FROM films
GROUP BY country;

# country        sum
# null           3500000
# Soviet Union   1000000
# Indonesia      1100000
# Italy          261350000
# Cameroon       null
# ...

In [None]:
SELECT release_year, country, MAX(budget)
FROM films
GROUP BY release_year, country
ORDER BY release_year, country;

# release_year   country   max
# 1916           USA       385907
# 1920           USA       100000
# 1925           USA       245000
# 1927           Germany   6000000
# 1929           Germany   null
# ...

In [None]:
SELECT country, release_year, MIN(gross)
FROM films
GROUP BY country, release_year
ORDER BY country, release_year;

# country       release_year   min
# Afghanistan   2003           1127331
# Argentina     2000           1221261
# Argentina     2004           304124
# Argentina     2009           20167424
# Aruba         1998           10076136
# ...

## HAVING a great time

In SQL, aggregate functions can't be used in `WHERE` clauses. For example, the following query is invalid:

```
SELECT release_year
FROM films
GROUP BY release_year
WHERE COUNT(title) > 10;
```

This means that if you want to filter based on the result of an aggregate function, you need another way! That's where the HAVING clause comes in. For example,

```
SELECT release_year
FROM films
GROUP BY release_year
HAVING COUNT(title) > 10;
```

shows only those years in which more than 10 films were released.

Instructions

1. In how many different years were more than 200 movies released?

In [None]:
SELECT release_year
FROM films
GROUP BY release_year
HAVING COUNT(title) > 200;

# release_year
# 2008
# 2009
# 2005
# 2013
# 2015
# 2002
# 2004
# 2014
# 2010
# 2011
# 2007
# 2006
# 2012

## All together now

Time to practice using `ORDER BY`, `GROUP BY` and `HAVING` together.

Now you're going to write a query that returns the average budget and average gross earnings for films in each year after 1990, if the average budget is greater than $60 million.

This is going to be a big query, but you can handle it!

Instructions

1. Get the release year, budget and gross earnings for each film in the films table.
2. Modify your query so that only records with a `release_year` after 1990 are included.
3. Remove the budget and gross columns, and group your results by release year.
4. Modify your query to include the average budget and average gross earnings for the results you have so far. Alias the average budget as `avg_budget`; alias the average gross earnings as `avg_gross`.
5. Modify your query so that only years with an average budget of greater than $60 million are included.
6. Finally, modify your query to order the results from highest average gross earnings to lowest.

In [None]:
SELECT release_year, budget, gross
FROM films;

In [None]:
SELECT release_year, budget, gross
FROM films
WHERE release_year > 1990;

In [None]:
SELECT release_year
FROM films
WHERE release_year > 1990
GROUP BY release_year;

In [None]:
SELECT release_year,
       AVG(budget) AS avg_budget,
       AVG(gross) AS avg_gross
FROM films
WHERE release_year > 1990
GROUP BY release_year;

In [None]:
SELECT release_year, 
       AVG(budget) AS avg_budget,
       AVG(gross) AS avg_gross
FROM films
WHERE release_year > 1990
GROUP BY release_year
HAVING AVG(budget) > 60000000;

In [None]:
SELECT release_year, 
       AVG(budget) AS avg_budget, 
       AVG(gross) AS avg_gross
FROM films
WHERE release_year > 1990
GROUP BY release_year
HAVING AVG(budget) > 60000000
ORDER BY AVG(gross) DESC;

# release_year   avg_budget              avg_gross
# 2005           70323938.231527093596   41159143.290640394089
# 2006           93968929.577464788732   39237855.953703703704

## All together now (2)

Great work! Now try another large query. This time, all in one go!

Remember, if you only want to return a certain number of results, you can use the `LIMIT` keyword to limit the number of rows returned

Instructions

1. Get the country, average budget, and average gross take of countries that have made more than 10 films. Order the result by country name, and limit the number of results displayed to 5. You should alias the averages as `avg_budget` and `avg_gross` respectively.

In [None]:
-- select country, average budget, and average gross
SELECT country,
       AVG(budget) AS avg_budget,
       AVG(GROSS) AS avg_gross
-- from the films table
FROM films
-- group by country 
GROUP BY country
-- where the country has more than 10 titles
HAVING COUNT(country) > 10
-- order by country
ORDER BY country
-- limit to only show 5 results
LIMIT 5;

# country     avg_budget              avg_gross
# Australia   31172110.460000000000   40205909.571428571429
# Canada      14798458.715596330275   22432066.680555555556
# China       62219000.000000000000   14143040.736842105263
# Denmark     13922222.222222222222   1418469.111111111111
# France      30672034.615384615385   16350593.578512396694

## A taste of things to come

Congrats on making it to the end of the course! By now you should have a good understanding of the basics of SQL.

There's one more concept we're going to introduce. You may have noticed that all your results so far have been from just one table, e.g. `films` or `people`.

In the real world however, you will often want to query multiple tables. For example, what if you want to see the IMDB score for a particular movie?

In this case, you'd want to get the ID of the movie from the `films` table and then use it to get IMDB information from the `reviews` table. In SQL, this concept is known as a **join**, and a basic join is shown in the editor to the right.

The query in the editor gets the IMDB score for the film _To Kill a Mockingbird_! Cool right?

As you can see, joins are incredibly useful and important to understand for anyone using SQL.

We have a whole follow-up course dedicated to them called [Joining Data in PostgreSQL](https://www.datacamp.com/courses/joining-data-in-postgresql) for you to hone your database skills further!

Instructions

1. Submit the code in the editor and inspect the results.
2. What is the IMDB score for the film _To Kill a Mockingbird_?

In [None]:
SELECT title, imdb_score
FROM films
JOIN reviews
ON films.id = reviews.film_id
WHERE title = 'To Kill a Mockingbird';

# title                   imdb_score
# To Kill a Mockingbird   8.4