# More SQL DML and DDL

## Data and environment setup

In [None]:
!wget -O survey.db http://files.software-carpentry.org/survey.db

In [None]:
%load_ext sql

In [None]:
%sql sqlite:///survey.db

## SQL JOINs

Note first that a simple JOIN without further specification of common attributes will result in a cross product relation.  We see this by examining two tables and then taking a simple JOIN.

In [None]:
%%sql
SELECT *
FROM Site;

In [None]:
%%sql
SELECT *
FROM Visited;

Now we add the join, and we see the Cartesian product of both tables.  All possible combinations are produced in the resulting relation.

In [None]:
%%sql
SELECT * 
FROM Site 
JOIN Visited;

Ordinarily we don't want to do this, but it's important to keep in mind that this is how things work under the hood.

More typically, we'll specify at least one pair of common attributes between the two tables to align them properly.

In [None]:
%%sql
SELECT * 
FROM Site 
JOIN Visited 
  ON Site.name = Visited.site;

Looks much neater, right?

We can of course combine this with naming specific attributes to SELECT (project).

In [None]:
%%sql
SELECT Site.lat, Site.long, Visited.dated
FROM   Site
JOIN   Visited
ON     Site.name = Visited.site;

There's a shorthand form of the `JOIN` statement that you will often see:  the common attributes can be specified in the `WHERE` clause.

In [None]:
%%sql
SELECT Site.lat, Site.long, Visited.dated
FROM   Site, Visited
WHERE  Site.name = Visited.site;

Which you use can be a matter of personal preference or style.  For simple queries, this reads very clearly for me.  For more complex queries with more attribute selection conditions in the `WHERE` clause beyond specifying attributes to `JOIN` on, it can be better to split them up.

Compare:

In [None]:
%%sql
SELECT Site.lat, Site.long, Visited.dated
FROM   Site, Visited
WHERE  Site.name = Visited.site
  AND  Visited.dated IS NOT NULL
  AND  Site.lat < -48
  AND  Site.long > -128;

In [None]:
%%sql
SELECT Site.lat, Site.long, Visited.dated
FROM   Site
JOIN   Visited
  ON   Site.name = Visited.site
WHERE  Visited.dated IS NOT NULL
  AND  Site.lat < -48
  AND  Site.long > -128;

The above two queries are identical logically, but they read differently.

Finally, we can combine several tables at once - not just two!  This works by adding more tables into the `JOIN` operation.

In [None]:
%%sql
SELECT Site.lat, Site.long, Visited.dated, Survey.quant, Survey.reading
FROM   Site
JOIN   Visited 
  ON   Site.name = Visited.site
JOIN   Survey 
  ON   Visited.ident = Survey.taken
WHERE  Visited.dated IS NOT NULL;

## Subqueries

We can use the results from one query to constrain conditions on another.  Take for example the Person table:

In [None]:
%%sql
SELECT ident
FROM person
WHERE ident LIKE 'd%';

This represents a set, which we know we can use with the `IN` clause.

In [None]:
%%sql
SELECT * 
FROM survey
WHERE person IN ('dyer', 'danforth')

But imagine a case where there are dozens, hundreds, or even thousands of possible values.  You want to select carefully, without having to enumerate those possible values.  This is where subqueries work best.

In [None]:
%%sql
SELECT * 
FROM survey
WHERE person IN 
    (SELECT ident
     FROM person
     WHERE ident LIKE 'd%');

With this simple approach, you can expand in all directions as you might guess.  For example, let's add further attribute constraint conditions on both the main query and within the subquery.

In [None]:
%%sql
SELECT * 
FROM survey
WHERE person IN 
    (SELECT ident
     FROM person
     WHERE personal = 'Frank')
AND reading > 7;

This kind of nested subquery is extremely useful and is used quite often.

## Data definition (DDL)

We're going to set up a similar database ourselves, so let's start a new one to avoid messing up the tutorial database.

In [None]:
%sql sqlite:///demo.db

When creating new tables, we often include a `DROP TABLE` command so that when the code is repeated it performs each `CREATE TABLE` cleanly.  The cleanest way to do this is with `DROP TABLE IF EXISTS`, which typically won't raise an error if the table doesn't already exist.

In [None]:
%%sql
DROP TABLE IF EXISTS Person;

CREATE TABLE Person(
    ident TEXT PRIMARY KEY, 
    personal TEXT NOT NULL, 
    family TEXT NOT NULL);

DROP TABLE IF EXISTS Site;

CREATE TABLE Site(
    name TEXT PRIMARY KEY, 
    lat REAL NOT NULL, 
    long REAL NOT NULL);

DROP TABLE IF EXISTS Visited;

CREATE TABLE Visited(
    ident INTEGER PRIMARY KEY, 
    site TEXT REFERENCE Site NOT NULL, 
    dated TEXT);

DROP TABLE IF EXISTS Survey;

CREATE TABLE Survey(
    taken INTEGER REFERENCE Visited, 
    person TEXT REFERENCE Person, 
    quant REAL, 
    reading REAL);

Now we can insert new records into our new tables.

In [None]:
%%sql
INSERT INTO Site values('DR-1', -49.85, -128.57);
INSERT INTO Site values('DR-3', -47.15, -126.72);
INSERT INTO Site values('MSK-4', -48.87, -123.40);

SELECT * FROM Site;

We can do the same thing in a single `INSERT` statement, too.  Note how this time we are specifying our own order of attributes, and our values match that order.  Otherwise, we'd have to follow the schema definition order.

In [None]:
%%sql
DELETE FROM Site;

INSERT INTO Site (lat, long, name) 
VALUES 
    (-49.85, -128.57, 'DR-1'), 
    (-47.15, -126.72, 'DR-3'), 
    (-48.87, -123.40, 'MSK-4')
;
SELECT * FROM Site;

With `INSERT`, we can also use subqueries.

In [None]:
%%sql
DROP TABLE IF EXISTS JustLatLong;
CREATE TABLE JustLatLong(
    lat text, 
    long text);

INSERT INTO JustLatLong 
    SELECT lat, long FROM Site;

In [None]:
%%sql
SELECT * FROM JustLatLong;

## More DML:  UPDATE and DELETE

`UPDATE` and `DELETE` statements are powerful as their changes apply immediately across an entire relation.  They are often written with specific constraints that limit their potential effect to precise conditions.

In [None]:
%%sql
SELECT * 
FROM Site
WHERE name = 'MSK-4';

In [None]:
%%sql
UPDATE Site 
SET lat = -48.88, long = -125.40 
WHERE name = 'MSK-4';

In [None]:
%%sql
SELECT * 
FROM Site
WHERE name = 'MSK-4';

In [None]:
%%sql
SELECT *
FROM Site;

In [None]:
%%sql
DELETE FROM Site 
WHERE name = 'DR-3';

In [None]:
%%sql
SELECT * 
FROM Site;

In [None]:
%%sql
UPDATE Site
SET lat = -50;

In [None]:
%%sql
SELECT *
FROM Site;

In [None]:
%%sql
DELETE FROM Site;

In [None]:
%%sql
SELECT *
FROM Site;