# <span style="color:#E600E6">Introduction to SQL Queries</span>
* Querying is the process of retrieving information stored in a database.
* Querying utilises the **SELECT** clause.
* The data returned is stored in a result table, called the **result-set**.

In this lesson we'll use the [data behind the story The Economic Guide To Picking A College Major](https://github.com/fivethirtyeight/data/tree/master/college-majors).

In [2]:
%%capture
%load_ext sql
%sql sqlite:///Jobs.db

### <span style="color:#E600E6">SELECT</span>

In [15]:
%%sql

SELECT *
    FROM recent_grads;

   sqlite:///Jobs
 * sqlite:///Jobs.db
Done.


index,Rank,Major_code,Major,Major_category,Total,Sample_size,Men,Women,ShareWomen,Employed,Full_time,Part_time,Full_time_year_round,Unemployed,Unemployment_rate,Median,P25th,P75th,College_jobs,Non_college_jobs,Low_wage_jobs
0,1,2419,PETROLEUM ENGINEERING,Engineering,2339,36,2057,282,0.120564344,1976,1849,270,1207,37,0.018380527,110000,95000,125000,1534,364,193
1,2,2416,MINING AND MINERAL ENGINEERING,Engineering,756,7,679,77,0.1018518519999999,640,556,170,388,85,0.117241379,75000,55000,90000,350,257,50
2,3,2415,METALLURGICAL ENGINEERING,Engineering,856,3,725,131,0.153037383,648,558,133,340,16,0.024096386,73000,50000,105000,456,176,0
3,4,2417,NAVAL ARCHITECTURE AND MARINE ENGINEERING,Engineering,1258,16,1123,135,0.107313196,758,1069,150,692,40,0.050125313,70000,43000,80000,529,102,0
4,5,2405,CHEMICAL ENGINEERING,Engineering,32260,289,21239,11021,0.341630502,25694,23170,5180,16697,1672,0.061097712,65000,50000,75000,18314,4440,972
5,6,2418,NUCLEAR ENGINEERING,Engineering,2573,17,2200,373,0.144966965,1857,2038,264,1449,400,0.177226407,65000,50000,102000,1142,657,244
6,7,6202,ACTUARIAL SCIENCE,Business,3777,51,832,960,0.535714286,2912,2924,296,2482,308,0.095652174,62000,53000,72000,1768,314,259
7,8,5001,ASTRONOMY AND ASTROPHYSICS,Physical Sciences,1792,10,2110,1667,0.4413555729999999,1526,1085,553,827,33,0.021167415,62000,31500,109000,972,500,220
8,9,2414,MECHANICAL ENGINEERING,Engineering,91227,1029,12953,2105,0.139792801,76442,71298,13101,54639,4650,0.0573422779999999,60000,48000,70000,52844,16384,3253
9,10,2408,ELECTRICAL ENGINEERING,Engineering,81527,631,8407,6548,0.437846874,61928,55450,12695,41413,3895,0.059173845,60000,45000,72000,45829,10874,3170


Let's breakdown the query above:

* **SELECT \*** - Selects all the columns
* **FROM recent_grads** - Specifies the database table we want to select from.

From the query, we also see that the query is written in two separate lines and the second line is indented. This is just a SQL **convention**. We could write the whole query in one line and we would still get the expected results.
* [See more SQL conventions](https://www.sqlstyle.guide/)

![Select](./images/select.svg)

### <span style="color:#E600E6">LIMIT</span>
- LIMIT restricts the number of rows displayed in the result set.
- Saves space on the screen and makes queries run faster.
- LIMIT always goes at the very end of the query.

In [16]:
%%sql

SELECT *
    FROM recent_grads
  LIMIT 5

   sqlite:///Jobs
 * sqlite:///Jobs.db
Done.


index,Rank,Major_code,Major,Major_category,Total,Sample_size,Men,Women,ShareWomen,Employed,Full_time,Part_time,Full_time_year_round,Unemployed,Unemployment_rate,Median,P25th,P75th,College_jobs,Non_college_jobs,Low_wage_jobs
0,1,2419,PETROLEUM ENGINEERING,Engineering,2339,36,2057,282,0.120564344,1976,1849,270,1207,37,0.018380527,110000,95000,125000,1534,364,193
1,2,2416,MINING AND MINERAL ENGINEERING,Engineering,756,7,679,77,0.1018518519999999,640,556,170,388,85,0.117241379,75000,55000,90000,350,257,50
2,3,2415,METALLURGICAL ENGINEERING,Engineering,856,3,725,131,0.153037383,648,558,133,340,16,0.024096386,73000,50000,105000,456,176,0
3,4,2417,NAVAL ARCHITECTURE AND MARINE ENGINEERING,Engineering,1258,16,1123,135,0.107313196,758,1069,150,692,40,0.050125313,70000,43000,80000,529,102,0
4,5,2405,CHEMICAL ENGINEERING,Engineering,32260,289,21239,11021,0.341630502,25694,23170,5180,16697,1672,0.061097712,65000,50000,75000,18314,4440,972


### <span style="color:#E600E6">SELECTING INDIVIDUAL COLUMNS</span>
We can data in specific columns using the SELECT statement.

In [20]:
%%sql
SELECT Total, Major
    FROM recent_grads
  LIMIT 5;

   sqlite:///Jobs
 * sqlite:///Jobs.db
Done.


Total,Major
2339,PETROLEUM ENGINEERING
756,MINING AND MINERAL ENGINEERING
856,METALLURGICAL ENGINEERING
1258,NAVAL ARCHITECTURE AND MARINE ENGINEERING
32260,CHEMICAL ENGINEERING


* To select multiple columns, we use the column names separates by commas in the SELECT statement.
* Notice that there is no comma after the last column — when SQL finds a comma, it expects a column to follow, so we need to make sure we don't include a comma after the last column.
* The columns displayed in the result set are ordered in the same order they are defined in the SELECT.

###  <span style="color:#E600E6">Filtering Rows with WHERE</span>
- WHERE filters the result set to include only the rows where the condition is **True** or rows that meet a certain condition.
- <span style="color:green">Example: We may want to get Major and Major_category with less than 1000 people.</span>

In [22]:
%%sql
SELECT Major, Major_category
    FROM recent_grads
  WHERE Total < 1000;

   sqlite:///Jobs
 * sqlite:///Jobs.db
Done.


Major,Major_category
MINING AND MINERAL ENGINEERING,Engineering
METALLURGICAL ENGINEERING,Engineering
GEOLOGICAL AND GEOPHYSICAL ENGINEERING,Engineering
MATHEMATICS AND COMPUTER SCIENCE,Computers & Mathematics
SCHOOL STUDENT COUNSELING,Education
MILITARY TECHNOLOGIES,Industrial Arts & Consumer Services
SOIL SCIENCE,Agriculture & Natural Resources
EDUCATIONAL ADMINISTRATION AND SUPERVISION,Education


##### Here are the comparison operators we can use with WHERE clause to filter our results:


* **Less than: <**
* **Greater than: >**
* **Less than or equal to: <=**
* **Greater than or equal to: >=**
* **Equal to: =**
* **Not equal to: != or <>**

#### Multiple Filter Criteria
- We can use logical operators — AND, OR to combine multiple filter criteria.
- <span style="color:green">Example: Determine which **Biology & Life Science** majors had a **majority of male students**, we specify two filtering criteria:</span>

In [4]:
%%sql
SELECT  Major
    FROM recent_grads
  WHERE Major_category = "Biology & Life Science" AND ShareWomen < 0.5;

 * sqlite:///Jobs.db
Done.


Major
MOLECULAR BIOLOGY
NEUROSCIENCE


* When using the **AND** operator, all the filtering conditions MUST evaluate to True for a record to appear in the result set.
* The **OR** operator is used in the same way but for OR, any record that meets even one of the filtering conditions only will appear in the result set.
* Sometimes you can combine both AND & OR to get your desired results.
* <span style="color:green">Example Select the first 10 majors where the median salary is greater than 10000 **OR** has more men than women.</span>

In [28]:
%%sql
SELECT Major, ShareWomen, Median
FROM recent_grads
WHERE Median > 10000 OR ShareWomen < 0.5
LIMIT 10;

   sqlite:///Jobs
 * sqlite:///Jobs.db
Done.


Major,ShareWomen,Median
PETROLEUM ENGINEERING,0.120564344,110000
MINING AND MINERAL ENGINEERING,0.1018518519999999,75000
METALLURGICAL ENGINEERING,0.153037383,73000
NAVAL ARCHITECTURE AND MARINE ENGINEERING,0.107313196,70000
CHEMICAL ENGINEERING,0.341630502,65000
NUCLEAR ENGINEERING,0.144966965,65000
ACTUARIAL SCIENCE,0.535714286,62000
ASTRONOMY AND ASTROPHYSICS,0.4413555729999999,62000
MECHANICAL ENGINEERING,0.139792801,60000
ELECTRICAL ENGINEERING,0.437846874,60000


###  <span style="color:#E600E6">Ordering Results with ORDER BY</span>
- So far the result set has been ordered according to order they were added into the database in.
- When you need to take control of the order in the result set, we use the ORDER BY clause.
- Sort is done either numerically or alphabetically.
- The rows are sorted in ascending rder by default.

In [7]:
%%sql
SELECT Major, Major_category, Unemployed
FROM recent_grads
ORDER BY Unemployed
LIMIT 10;

 * sqlite:///Jobs.db
Done.


Major,Major_category,Unemployed
MATHEMATICS AND COMPUTER SCIENCE,Computers & Mathematics,0
MILITARY TECHNOLOGIES,Industrial Arts & Consumer Services,0
BOTANY,Biology & Life Science,0
SOIL SCIENCE,Agriculture & Natural Resources,0
EDUCATIONAL ADMINISTRATION AND SUPERVISION,Education,0
COURT REPORTING,Law & Public Policy,11
METALLURGICAL ENGINEERING,Engineering,16
ENGINEERING MECHANICS PHYSICS AND SCIENCE,Engineering,23
ASTRONOMY AND ASTROPHYSICS,Physical Sciences,33
SOCIAL PSYCHOLOGY,Psychology & Social Work,33


- Sometimes we want our result set sorted in descending order.
- To do this we use the DESC keyword with ORDER BY.

In [8]:
%%sql
SELECT Major, Major_category, Unemployed
FROM recent_grads
ORDER BY Unemployed DESC
LIMIT 10;

 * sqlite:///Jobs.db
Done.


Major,Major_category,Unemployed
PSYCHOLOGY,Psychology & Social Work,28169
BUSINESS MANAGEMENT AND ADMINISTRATION,Business,21502
POLITICAL SCIENCE AND GOVERNMENT,Social Science,15022
GENERAL BUSINESS,Business,14946
COMMUNICATIONS,Communications & Journalism,14602
ENGLISH LANGUAGE AND LITERATURE,Humanities & Liberal Arts,14345
BIOLOGY,Biology & Life Science,13874
ACCOUNTING,Business,12411
MARKETING AND MARKETING RESEARCH,Business,11663
ECONOMICS,Social Science,11452


### QN.
* Write a query, that returns 3 columns:
    * Major
    * Major_category
    * Unemployment_rate
* For records with more females than males.
* The result should be ordered by the unemployment_rate column and from the highest to the lowest.

# END