#Introduction
Structured Query Language, or SQL, is the programming language used with databases, and it is an important skill for any data scientist. In this lab you will get the opportunity to build some basic SQL-based queries and interact with a local database

# The AirQuality Dataset
In this lab we will use the Airquality Dataset. Please refer to the following URL for more information:

https://www.kaggle.com/datasets/open-aq/openaq

## We download the database file to a local directory

In [None]:
!wget -O AirQuality.sqlite https://github.com/thousandoaks/Python4DS201/blob/main/data/AirQuality.sqlite?raw=true

--2022-12-18 09:53:03--  https://github.com/thousandoaks/Python4DS201/blob/main/data/AirQuality.sqlite?raw=true
Resolving github.com (github.com)... 140.82.112.4
Connecting to github.com (github.com)|140.82.112.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github.com/thousandoaks/Python4DS201/raw/main/data/AirQuality.sqlite [following]
--2022-12-18 09:53:03--  https://github.com/thousandoaks/Python4DS201/raw/main/data/AirQuality.sqlite
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/thousandoaks/Python4DS201/main/data/AirQuality.sqlite [following]
--2022-12-18 09:53:04--  https://raw.githubusercontent.com/thousandoaks/Python4DS201/main/data/AirQuality.sqlite
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199

### We connect to the database "AirQuality.sqlite"

In [None]:
import sqlite3

In [None]:
con = sqlite3.connect('AirQuality.sqlite')


Now that you know how to access and examine a dataset, you're ready to write your first SQL query! As you'll soon see, SQL queries will help you sort through a massive dataset, to retrieve only the information that you need.

We'll begin by using the keywords SELECT, FROM, and WHERE to get data from specific columns based on conditions you specify.

# 1. SELECT FROM 

## The most basic SQL query selects a single column from a single table. To do this, specify the column you want after the word SELECT, and then specify the table after the word FROM.

## For instance, to select all the columns from the AirQuality table, our query would appear as follows:

### "SELECT * FROM 'AirQuality' "





### For practical reasons we impose a limit to the number of results returned
### "SELECT * FROM 'AirQuality' LIMIT 4"`

In [None]:
cursor = con.cursor()
cursor.execute("SELECT * FROM 'AirQuality' LIMIT 4")
rows = cursor.fetchall()
rows

[(0,
  'Borówiec, ul. Drapałka',
  'Borówiec',
  'PL',
  'bc',
  0.85217,
  '2022-04-28 07:00:00+00:00',
  'µg/m³',
  'GIOS',
  1.0,
  52.276794,
  17.074114,
  'POINT(52.276794 1)'),
 (1,
  'Kraków, ul. Bulwarowa',
  'Kraków',
  'PL',
  'bc',
  0.91284,
  '2022-04-27 23:00:00+00:00',
  'µg/m³',
  'GIOS',
  1.0,
  50.069308,
  20.053492,
  'POINT(50.069308 1)'),
 (2,
  'Płock, ul. Reja',
  'Płock',
  'PL',
  'bc',
  1.41,
  '2022-03-30 04:00:00+00:00',
  'µg/m³',
  'GIOS',
  1.0,
  52.550938,
  19.709791,
  'POINT(52.550938 1)'),
 (3,
  'Elbląg, ul. Bażyńskiego',
  'Elbląg',
  'PL',
  'bc',
  0.33607,
  '2022-05-03 13:00:00+00:00',
  'µg/m³',
  'GIOS',
  1.0,
  54.167847,
  19.410942,
  'POINT(54.167847 1)')]

## You might prefer to retrieve just a few columns from the table.

## For instance, the following SQL-based query retrieves the columns : city,country,pollutant and value from the table "Airquality" 

### "SELECT city,country,pollutant,value FROM 'AirQuality'"

In [None]:
cursor = con.cursor()
cursor.execute("SELECT city,country,pollutant,value FROM 'AirQuality' LIMIT 10")
rows = cursor.fetchall()
rows

[('Borówiec', 'PL', 'bc', 0.85217),
 ('Kraków', 'PL', 'bc', 0.91284),
 ('Płock', 'PL', 'bc', 1.41),
 ('Elbląg', 'PL', 'bc', 0.33607),
 ('Piastów', 'PL', 'bc', 0.51),
 ('Biała', 'PL', 'bc', 5.64),
 ('Białystok', 'PL', 'bc', 0.28),
 ('Gdańsk', 'PL', 'bc', 0.3726),
 ('Zdzieszowice', 'PL', 'bc', 0.08659),
 ('Mielec', 'PL', 'bc', 0.49923)]

# 2. SELECT FROM WHERE


## Real Databases are usually very large, so you'll usually want to return only the rows meeting specific conditions. You can do this using the WHERE clause.

##The query below returns all datapoints fulfilling the condition: country = 'US'




### "SELECT city,country,pollutant,value FROM 'AirQuality' WHERE country = 'US' LIMIT 30"


In [None]:

cursor = con.cursor()
cursor.execute("SELECT city,country,pollutant,value FROM 'AirQuality' WHERE country = 'US' LIMIT 10")
rows = cursor.fetchall()
rows


[('Seattle-Tacoma-Bellevue', 'US', 'bc', 1.3),
 ('Providence-New Bedford-Fall River', 'US', 'bc', 0.18),
 ('Portland-Vancouver-Beaverton', 'US', 'bc', 0.32),
 ('Milwaukee-Waukesha-West Allis', 'US', 'bc', 0.21),
 ('HOWARD', 'US', 'bc', 0.6),
 ('San Francisco-Oakland-Fremont', 'US', 'bc', 0.53),
 ('Providence-New Bedford-Fall River', 'US', 'bc', 0.43),
 ('Washington-Arlington-Alexandria', 'US', 'bc', 0.1),
 ('Oklahoma City', 'US', 'bc', 1.0),
 ('Providence-New Bedford-Fall River', 'US', 'bc', 0.16)]