# Aggregate Functions
This chapter teaches you how to use aggregate functions to summarize data and gain useful insights. You'll also learn about arithmetic in SQL and how to use aliases to make your results more readable.

In [1]:
from sqlalchemy import create_engine, inspect
import os

current_directory = os.getcwd()

%load_ext sql
%sql sqlite:///{current_directory}/films.db

---

## Aggregate functions
Often, you will want to perform some calculation on the data in a database. SQL provides a few functions, called aggregate functions, to help you out with this.

For example,
```sql
SELECT AVG(budget)
FROM films;
```
gives you the average value from the `budget` column of the `films` table. Similarly, the `MAX()` function returns the highest budget:
```sql
SELECT MAX(budget)
FROM films;
```
The `SUM()` function returns the result of adding up the numeric values in a column:
```sql
SELECT SUM(budget)
FROM films;
```
You can probably guess what the `MIN()` function does! Now it's your turn to try out some SQL functions.

In [2]:
%%sql 
SELECT SUM(duration)
FROM films;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


SUM(duration)
534882


In [3]:
%%sql
SELECT AVG(duration)
FROM films;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


AVG(duration)
107.94793138244198


In [4]:
%%sql
SELECT MIN(duration)
FROM films;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


MIN(duration)
7


In [5]:
%%sql
SELECT MAX(duration)
FROM films;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


MAX(duration)
334


---

## Aggregate functions practice
Good work. Aggregate functions are important to understand, so let's get some more practice!

In [6]:
%%sql
SELECT SUM(gross)
FROM films;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


SUM(gross)
202515840134


In [7]:
%%sql
SELECT AVG(gross)
FROM films;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


AVG(gross)
48705108.25733526


In [8]:
%%sql
SELECT MIN(gross)
FROM films;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


MIN(gross)
162


In [9]:
%%sql
SELECT MAX(gross)
FROM films;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


MAX(gross)
936627416


---

## Combining aggregate functions with WHERE
Aggregate functions can be combined with the `WHERE` clause to gain further insights from your data.

For example, to get the total budget of movies made in the year 2010 or later:
```sql
SELECT SUM(budget)
FROM films
WHERE release_year >= 2010;
```

In [10]:
%%sql
SELECT sum(gross)
FROM films
WHERE release_year >= 2000;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


sum(gross)
150900926358


In [11]:
%%sql
SELECT AVG(gross)
FROM films
WHERE title LIKE 'A%'

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


AVG(gross)
47893236.42248062


In [12]:
%%sql
SELECT MIN(gross)
FROM films
WHERE release_year = 1994

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


MIN(gross)
125169


In [13]:
%%sql
SELECT MAX(gross)
FROM films
WHERE release_year BETWEEN 2000 AND 2012;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


MAX(gross)
760505847


---

## A note on arithmetic
In addition to using aggregate functions, you can perform basic arithmetic with symbols like `+`, `-`, `*`, and `/`.

So, for example, this gives a result of `12`:
```sql
SELECT (4 * 3);
```
However, the following gives a result of `1`:
```sql
SELECT (4 / 3);
```
What's going on here?

SQL assumes that if you divide an integer by an integer, you want to get an integer back. So be careful when dividing!

If you want more precision when dividing, you can add decimal places to your numbers. For example,
```sql
SELECT (4.0 / 3.0) AS result;
```
gives you the result you would expect: `1.333`.

---

## It's AS simple AS aliasing
You may have noticed in the first exercise of this chapter that the column name of your result was just the name of the function you used. For example,
```sql
SELECT MAX(budget)
FROM films;
```
gives you a result with one column, named `max`. But what if you use two functions like this?
```sql
SELECT MAX(budget), MAX(duration)
FROM films;
```
Well, then you'd have two columns named `max`, which isn't very useful!

To avoid situations like this, SQL allows you to do something called aliasing. Aliasing simply means you assign a temporary name to something. To alias, you use the AS keyword, which you've already seen earlier in this course.

For example, in the above example we could use aliases to make the result clearer:
```sql
SELECT MAX(budget) AS max_budget,
       MAX(duration) AS max_duration
FROM films;
```
Aliases are helpful for making results more readable!

In [14]:
%%sql
SELECT title,
        (gross - budget) AS net_profit
FROM films
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title,net_profit
Intolerance: Love's Struggle Throughout the Ages,
Over the Hill to the Poorhouse,2900000.0
The Big Parade,
Metropolis,-5973565.0
Pandora's Box,


In [15]:
%%sql
SELECT title,
        duration / 60.0 AS duration_hours
FROM films
LIMIT 5;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


title,duration_hours
Intolerance: Love's Struggle Throughout the Ages,2.05
Over the Hill to the Poorhouse,1.8333333333333333
The Big Parade,2.5166666666666666
Metropolis,2.4166666666666665
Pandora's Box,1.8333333333333333


In [16]:
%%sql
SELECT AVG(duration)/60.00 AS avg_duration_hours
FROM films;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


avg_duration_hours
1.7991321897073662


## Even more aliasing
Let's practice your newfound aliasing skills some more before moving on!

**Recall**: SQL assumes that if you divide an integer by an integer, you want to get an integer back.

This means that the following will erroneously result in `400.0`:
```sql
SELECT 45 / 10 * 100.0;
```
This is because `45 / 10` evaluates to an integer `(4)`, and not a decimal number like we would expect.

So when you're dividing make sure at least one of your numbers has a decimal place:
```sql
SELECT 45 * 100.0 / 10;
```
The above now gives the correct answer of `450.0` since the numerator `(45 * 100.0)` of the division is now a decimal!

In [17]:
%%sql
SELECT COUNT(deathdate)*100.0/COUNT(*) AS percentage_dead
FROM people;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


percentage_dead
9.372394902941526


In [18]:
%%sql
SELECT max(release_year) - min(release_year) AS difference
FROM films;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


difference
100


In [19]:
%%sql
SELECT (MAX(release_year)-MIN(release_year))/10 AS number_of_decades
FROM films;

 * sqlite:////Users/matteo/Nextcloud/2-Documenti/Corsi/DataScience/DataCamp/Introduction_to_SQL/films.db
Done.


number_of_decades
10
