<a href="https://colab.research.google.com/github/villafue/Progamming/blob/main/SQL/Tutorial/Introduction%20to%20SQL/3%20Aggregate%20Functions/3_Aggregate_Functions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Aggregate Functions

This chapter teaches you how to use aggregate functions to summarize data and gain useful insights. You'll also learn about arithmetic in SQL and how to use aliases to make your results more readable.

# Aggregate functions

Often, you will want to perform some calculation on the data in a database. SQL provides a few functions, called aggregate functions, to help you out with this.

For example,

```
SELECT AVG(budget)
FROM films;
```
gives you the average value from the budget column of the films table. Similarly, the MAX function returns the highest budget:

```
SELECT MAX(budget)
FROM films;
```

The SUM function returns the result of adding up the numeric values in a column:

```
SELECT SUM(budget)
FROM films;
```

You can probably guess what the MIN function does! Now it's your turn to try out some SQL functions.

Instructions

1. Use the SUM function to get the total duration of all films.

In [None]:
SELECT SUM(duration)
FROM films;

'''
sum
534882
'''

2. Get the average duration of all films.

In [None]:
SELECT AVG(duration)
FROM films;

'''
avg
107.9479313824419778
'''

3. Get the duration of the shortest film.

In [None]:
SELECT MIN(duration)
FROM films;

'''
min
7
'''

4. Get the duration of the longest film.

In [None]:
SELECT MAX(duration)
FROM films;

'''
max
334
'''

# Aggregate functions practice

Good work. Aggregate functions are important to understand, so let's get some more practice!

Instructions

1. Use the SUM function to get the total amount grossed by all films.

In [None]:
SELECT SUM(gross)
FROM films;

'''
sum
202515840134
'''

2. Get the average amount grossed by all films.

In [None]:
SELECT AVG(gross)
FROM films;

'''
avg
48705108.257335257335
'''

3. Get the amount grossed by the worst performing film.

In [None]:
SELECT MIN(gross)
FROM films;

'''
min
162
'''

4. Get the amount grossed by the best performing film.

In [None]:
SELECT MAX(gross)
FROM films;

'''
max
936627416
'''

Conclusion

Well done! Don't forget about these functions. You'll find yourself using them over and over again to get a quick grasp of the data in a SQL database.

# Combining aggregate functions with WHERE

Aggregate functions can be combined with the WHERE clause to gain further insights from your data.

For example, to get the total budget of movies made in the year 2010 or later:

```
SELECT SUM(budget)
FROM films
WHERE release_year >= 2010;
```

Now it's your turn!

Instructions

1. Use the SUM function to get the total amount grossed by all films made in the year 2000 or later.

In [None]:
SELECT SUM(gross)
FROM films
WHERE release_year >= 2000;

'''
sum
150900926358
'''

2. Get the average amount grossed by all films whose titles start with the letter 'A'.

In [None]:
SELECT AVG(gross)
FROM films
WHERE title LIKE 'A%';

'''
avg
47893236.422480620155
'''

3. Get the amount grossed by the worst performing film in 1994.

In [None]:
SELECT MIN(gross)
FROM films
WHERE release_year = '1994';

'''
min
125169
'''

4. Get the amount grossed by the best performing film between 2000 and 2012, inclusive.

In [None]:
SELECT MAX(gross)
FROM films
WHERE release_year BETWEEN 2000 AND 2012;

'''
max
760505847
'''

Conclusion

Nice. Can you see how SQL basically provides you a bunch of building blocks that you can combine in all kinds of ways? Hence the name: Structured Query Language.

# A note on arithmetic

In addition to using aggregate functions, you can perform basic arithmetic with symbols like +, -, *, and /.

So, for example, this gives a result of 12:

`SELECT (4 * 3);`

However, the following gives a result of 1:

`SELECT (4 / 3);`

What's going on here?

SQL assumes that if you divide an integer by an integer, you want to get an integer back. So be careful when dividing!

If you want more precision when dividing, you can add decimal places to your numbers. For example,

`SELECT (4.0 / 3.0) AS result;`

gives you the result you would expect: 1.333.

What is the result of SELECT (10 / 3);?

Possible Answers

1. 2.333

2. 3.333

3. 3 - Correct!

4. 3.0

# It's AS simple AS aliasing

You may have noticed in the first exercise of this chapter that the column name of your result was just the name of the function you used. For example,

```
SELECT MAX(budget)
FROM films;
```

gives you a result with one column, named max. But what if you use two functions like this?

```
SELECT MAX(budget), MAX(duration)
FROM films;
```

Well, then you'd have two columns named max, which isn't very useful!

To avoid situations like this, SQL allows you to do something called aliasing. Aliasing simply means you assign a temporary name to something. To alias, you use the AS keyword, which you've already seen earlier in this course.

For example, in the above example we could use aliases to make the result clearer:

```
SELECT MAX(budget) AS max_budget,
       MAX(duration) AS max_duration
FROM films;
```

Aliases are helpful for making results more readable!

Instructions

1. Get the title and net profit (the amount a film grossed, minus its budget) for all films. Alias the net profit as net_profit.

In [None]:
SELECT title,
       (gross - budget) AS net_profit
FROM films;

'''
title	net_profit
Intolerance: Love's Struggle Throughout the Ages	null
Over the Hill to the Poorhouse	2900000
The Big Parade	null
'''

2. Get the title and duration in hours for all films. The duration is in minutes, so you'll need to divide by 60.0 to get the duration in hours. Alias the duration in hours as duration_hours.

In [None]:
SELECT title,
       (duration/60.0) AS duration_hours
FROM films;

'''
title	duration_hours
Intolerance: Love's Struggle Throughout the Ages	2.0500000000000000
Over the Hill to the Poorhouse	1.8333333333333333
The Big Parade	2.5166666666666667
'''

3. Get the average duration in hours for all films, aliased as avg_duration_hours.

In [None]:
SELECT AVG(duration) / 60.0 AS avg_duration_hours
from FILMS;

'''
avg_duration_hours
1.7991321897073663
'''

# Even more aliasing

Let's practice your newfound aliasing skills some more before moving on!

Recall: SQL assumes that if you divide an integer by an integer, you want to get an integer back.

This means that the following will erroneously result in 400.0:

`SELECT 45 / 10 * 100.0;`

This is because 45 / 10 evaluates to an integer (4), and not a decimal number like we would expect.

So when you're dividing make sure at least one of your numbers has a decimal place:

`SELECT 45 * 100.0 / 10;`

The above now gives the correct answer of 450.0 since the numerator (45 * 100.0) of the division is now a decimal!

Instructions

1. Get the percentage of people who are no longer alive. Alias the result as percentage_dead. Remember to use 100.0 and not 100!

In [None]:
#-- get the count(deathdate) and multiply by 100.0
#-- then divide by count(*)
SELECT (count(deathdate) * 100.0 / count(*)) AS percentage_dead
FROM people;

'''
percentage_dead
9.3723949029415267
'''

2. Get the number of years between the newest film and oldest film. Alias the result as difference.

In [None]:
SELECT (MAX(release_year) - MIN(release_year)) AS difference
FROM films;

'''
difference
100
'''

3. Get the number of decades the films table covers. Alias the result as number_of_decades. The top half of your fraction should be enclosed in parentheses.

In [None]:
SELECT (MAX(release_year) - MIN(release_year)) / 10 AS number_of_decades
FROM films;

'''
number_of_decades
10
'''

4. We're at the end of chapter 3! In chapter 4, you will learn about sorting, grouping and joins. Head over there quickly!