# <span style="color:#E600E6">SUMMARY STATISTICS</span>
* In this session we'll calculate summary statistics for our data.

## <span style="color:#E600E6">Aggregate Functions</span>

* A function takes in input and produces the output of the function
> <span style="color:green">Example: if you are asked to get the **Major** with the **highest Unemployment Rate**. How would you calculate it?</span>
* This is where SQL aggregate functions come in, they are applied over columns of values and return a single value.

In [2]:
%%capture
%load_ext sql
%sql sqlite:///Jobs.db

In [11]:
%%sql
SELECT *
FROM recent_grads
LIMIT 20;

 * sqlite:///Jobs.db
Done.


index,Rank,Major_code,Major,Major_category,Total,Sample_size,Men,Women,ShareWomen,Employed,Full_time,Part_time,Full_time_year_round,Unemployed,Unemployment_rate,Median,P25th,P75th,College_jobs,Non_college_jobs,Low_wage_jobs
0,1,2419,PETROLEUM ENGINEERING,Engineering,2339,36,2057,282,0.120564344,1976,1849,270,1207,37,0.018380527,110000,95000,125000,1534,364,193
1,2,2416,MINING AND MINERAL ENGINEERING,Engineering,756,7,679,77,0.1018518519999999,640,556,170,388,85,0.117241379,75000,55000,90000,350,257,50
2,3,2415,METALLURGICAL ENGINEERING,Engineering,856,3,725,131,0.153037383,648,558,133,340,16,0.024096386,73000,50000,105000,456,176,0
3,4,2417,NAVAL ARCHITECTURE AND MARINE ENGINEERING,Engineering,1258,16,1123,135,0.107313196,758,1069,150,692,40,0.050125313,70000,43000,80000,529,102,0
4,5,2405,CHEMICAL ENGINEERING,Engineering,32260,289,21239,11021,0.341630502,25694,23170,5180,16697,1672,0.061097712,65000,50000,75000,18314,4440,972
5,6,2418,NUCLEAR ENGINEERING,Engineering,2573,17,2200,373,0.144966965,1857,2038,264,1449,400,0.177226407,65000,50000,102000,1142,657,244
6,7,6202,ACTUARIAL SCIENCE,Business,3777,51,832,960,0.535714286,2912,2924,296,2482,308,0.095652174,62000,53000,72000,1768,314,259
7,8,5001,ASTRONOMY AND ASTROPHYSICS,Physical Sciences,1792,10,2110,1667,0.4413555729999999,1526,1085,553,827,33,0.021167415,62000,31500,109000,972,500,220
8,9,2414,MECHANICAL ENGINEERING,Engineering,91227,1029,12953,2105,0.139792801,76442,71298,13101,54639,4650,0.0573422779999999,60000,48000,70000,52844,16384,3253
9,10,2408,ELECTRICAL ENGINEERING,Engineering,81527,631,8407,6548,0.437846874,61928,55450,12695,41413,3895,0.059173845,60000,45000,72000,45829,10874,3170


### <span style="color:#E600E6">MAX</span>
The max aggregate function returns the maximum value in a table field.

In [4]:
%%sql
SELECT Major, MAX(Unemployment_rate)
FROM recent_grads;

 * sqlite:///Jobs.db
Done.


Major,MAX(Unemployment_rate)
NUCLEAR ENGINEERING,0.177226407


### <span style="color:#E600E6">MIN</span>
Returns the minimum value in a field

In [7]:
%%sql
SELECT Major, MIN(Unemployment_rate)
FROM recent_grads;

 * sqlite:///Jobs.db
Done.


Major,MIN(Unemployment_rate)
MATHEMATICS AND COMPUTER SCIENCE,0.0


### <span style="color:#E600E6">SUM</span>
Sums the values in a column.
> <span style="color:green">Example: Total number of unemployed in the Engineering major category.</span>

In [11]:
%%sql
SELECT SUM(Unemployed)
FROM recent_grads
WHERE Major_category == "Engineering";

 * sqlite:///Jobs.db
Done.


SUM(Unemployed)
29817


In [8]:
%%sql
SELECT COUNT(Sample_size)
FROM recent_grads;

 * sqlite:///Jobs.db
Done.


COUNT(Sample_size)
173


In [10]:
%%sql
SELECT AVG(Sample_size)
FROM recent_grads;

 * sqlite:///Jobs.db
Done.


AVG(Sample_size)
356.08092485549133


### <span style="color:#E600E6">More Aggregate Functions</span>
> * **COUNT** - Takes the column name as an argument and counts the number of non-empty values in that column.
> * **AVG** - Calculates the average values for a particular column.
> * **ROUND** - Takes two arguments , the column name and an integer which represents the number of decimal places and rounds of the values in the column to that number of decimal places.

**You can also combine multiple aggregate functions together in a single query.**

In [15]:
%%sql
SELECT AVG(Total), MIN(Men), MAX(Women)
FROM recent_grads;

 * sqlite:///Jobs.db
Done.


AVG(Total),MIN(Men),MAX(Women)
39167.71676300578,119,307087


### <span style="color:#E600E6">Customizing the results with AS</span>
* **AS** is a SQL keyword that allows us to rename a column or a table using an alias
* The aliases only appear in the Result Table, not in the actual table in the database.

In [12]:
%%sql
SELECT SUM(Unemployed) AS Total_Unemployed
FROM recent_grads
WHERE Major_category == "Engineering";

 * sqlite:///Jobs.db
Done.


Total_Unemployed
29817


**We can drop AS entirely and just add the alias name next to the original column**

In [16]:
%%sql
SELECT SUM(Unemployed) Total
FROM recent_grads
WHERE Major_category == "Engineering";

 * sqlite:///Jobs.db
Done.


Total
29817


### <span style="color:#E600E6">Order of Execution</span>
* SQL queries are not executed in the same order they are declared in the query statement.
 >   <span style="color:green">
         SELECT * <br>
         &emsp;FROM [some_table] <br>
         &emsp;WHERE [some_condition] <br>
         &emsp;ORDER BY [some_column] <br>
         LIMIT [some_limit];
     </span>

**Here is the order in which the clauses run:**

1. FROM
2. WHERE
3. SELECT
4. ORDER BY
5. LIMIT

Since aggregate functions are part of SELECT, the calculation happens after WHERE acts.

### <span style="color:#E600E6">Missing Values</span>
* Sometimes, cells in database table, may not have any values.
* When that happens, we say that the value is missing or the table has missing values.
* **NULL** is an entity in SQL that exists to capture the concept of missing values.
* We can use **NULL** with **WHERE** to filter records with missing values.

In [14]:
%%sql
SELECT *
FROM recent_grads
WHERE Unemployment_rate IS NULL;

 * sqlite:///Jobs.db
Done.


index,Rank,Major_code,Major,Major_category,Total,Sample_size,Men,Women,ShareWomen,Employed,Full_time,Part_time,Full_time_year_round,Unemployed,Unemployment_rate,Median,P25th,P75th,College_jobs,Non_college_jobs,Low_wage_jobs
73,74,3801,MILITARY TECHNOLOGIES,Industrial Arts & Consumer Services,124,4,1756,1323,0.429684963,0,111,0,111,0,,40000,40000,40000,0,0,0


### <span style="color:#E600E6">Performing Arithmetic in SQL</span>
* SQL supports the standard arithmetic operators: *, +, -, and / that we can use to perform arithmetic operations on table columns.

In [17]:
%%sql
SELECT ShareWomen * 100 percent_female 
  FROM recent_grads 
LIMIT 10;

 * sqlite:///Jobs.db
Done.


percent_female
12.0564344
10.1851852
15.3037383
10.7313196
34.1630502
14.4966965
53.5714286
44.1355573
13.9792801
43.7846874


# END