### <font color="brown">Relational Databases Continued</font>

---

#### Nobel Prize Winners Database Version 2

Just to make sure we start from where we left off last time, let's revert to the state of the database as of the end of previous lecture (load from nobelsv3.sql):**
<pre>
sesh> mysql -u sesh -p nobels < nobelsv3.sql
    OR
sesh> cat nobelsv3.sql | mysql -u sesh -p nobels
</pre>


---

In [1]:
# import connector module
import mysql.connector

In [12]:
# connect to nobels database
mydb = mysql.connector.connect(
  host="localhost",
  user="sesh",
  passwd="sesh",  # replace with your password
  database="nobels"
)

In [3]:
# set up for access
cursor = mydb.cursor()

---

#### <font color="brown">Populating the database with all the data from the nobel prize winners dataset</font>

In [4]:
import json, requests

nobel_url = 'http://api.nobelprize.org/v1/prize.json'
resp = requests.get(nobel_url)
nobels = json.loads(resp.text)

[prize for prize in nobels['prizes'] if prize['year'] == '2016' and prize['category'] == 'peace']

[{'year': '2016',
  'category': 'peace',
  'laureates': [{'id': '934',
    'firstname': 'Juan Manuel',
    'surname': 'Santos',
    'motivation': '"for his resolute efforts to bring the country\'s more than 50-year-long civil war to an end"',
    'share': '1'}]}]

##### We'll start with the original JSON load and insert as we go

In [5]:
# the year/categories we already have in the db that we want to skip over
query = 'select year, category from yearcat' 
cursor.execute(query)
res = cursor.fetchall()
ycat_old = []
for row in res:
    ycat_old.append(row)
ycat_old

[(2021, 'Chemistry'),
 (2021, 'Economics'),
 (2021, 'Literature'),
 (2021, 'Physics')]

##### Cycle through the laureates list: 
- When we get to a new year/category, we insert it into the yearcat table. 
- At the same time, we also insert the corresponding motivation in the contribution table. 
- Using the lastrowid for both, we insert the associated laureate. 
- Then, as long as both year/category and motivation are the same, we keep adding laureates with the same yearcat id and the same motivation id. 
- If the motivation changes, but year/category is the same, we insert a new contribution, get lastrowid for that table, and keep adding laureates

In [6]:
# query templates for adding to the tables
add_yearcat = "insert into yearcat (year,category) values (%s,%s)"
add_contribution = "insert into contribution (motivation) values (%s)"
add_laureate = "insert into laureate values (%s, %s, %s, %s, %s)"

In [7]:
prev_year = 0
prev_cat = ''
prev_motiv = ''
for prize in nobels['prizes']:
    if not 'laureates' in prize:
        continue
    year = int(prize['year'])
    cat = prize['category'].capitalize()
    yc_val = (year, cat)
    if yc_val in ycat_old:
        continue
    if year != prev_year or cat != prev_cat:  # switch
        cursor.execute(add_yearcat, yc_val)
        yc_id = cursor.lastrowid
        prev_year = year
        prev_cat = cat
    for winner in prize['laureates']:
        motiv = winner.get('motivation').strip('"')
        if motiv != prev_motiv:
            cursor.execute(add_contribution, (motiv,))
            contrib_id = cursor.lastrowid
            prev_motiv = motiv
        fname = winner.get('firstname')
        share = winner.get('share')
        try:
            lname = winner.get('surname')
        except KeyError:
            lname=None
        
        cursor.execute(add_laureate,(fname,lname,share,yc_id,contrib_id))
        mydb.commit()     

---

#### <font color="brown">Transaction, Atomicity, and Rollback<font>

**What if there is a problem in adding to contribution table after adding to yearcat, or in adding to laureate after adding to contribution (and possibly to yearcat)?**

- We need to take into account these possible issues and not add the data if there is a problem. 

- In order to do this, we will need to do what's called a **rollback** which basically cancels any other changes you may have made in the same breath (adding to any of the other tables). 

- The reason is if there is an update to more than one table for any laureate entry, either all updates happen, or none happen - this group of adds is called a **transaction** which is *atomic* (all or nothing). A transaction could also be multiple updates to a single table.

- There is a property called autocommit that can be set to True (if you want to automatically commit after a statement instead of having to explicity call commit()) or False (otherwise). The Python MySQL connector sets it to False by default
https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlconnection-autocommit.html

- This is useful for transactions, because you only want to commit after *all* component updates have been executed without error: the autocommit property is false, and an explicit commit is issued at the end of the transaction.  

**Here's the code rewritten with error checking and rollback.**

**<font color="red">STOP! First revert to the state of the database as of the end of previous lecture (load from nobelsv3.sql), then execute the code block in the next cell:<font>**
<pre>
sesh> mysql -u sesh -p nobels < nobelsv3.sql
    OR
sesh> cat nobelsv3.sql | mysql -u sesh -p nobels
</pre>


In [9]:
# code includes rollback
import sys
prev_year = 0
prev_cat = ''
prev_motiv = ''
for prize in nobels['prizes']:
    if not 'laureates' in prize:
        continue
    year = int(prize['year'])
    cat = prize['category'].capitalize()
    yc_val = (year, cat)
    if yc_val in ycat_old:
        continue
    if year != prev_year or cat != prev_cat:  # switch
        try:
            cursor.execute(add_yearcat, yc_val)
            yc_id = cursor.lastrowid
            prev_year = year
            prev_cat = cat
        except:  # don't need to rollback here since nothing has been added
            print(sys.exc_info()[0])
            print(f'Could not add {year}/{cat}')
            continue
    
    for winner in prize['laureates']:
        # motivation
        try:
            motiv = winner.get('motivation').strip('"')
            if motiv != prev_motiv:
                cursor.execute(add_contribution, (motiv,))
                contrib_id = cursor.lastrowid
                prev_motiv = motiv
        except:
            # contribution failed, rollback in case year/cat was added
            mydb.rollback()   # no effect, or undo year/cat add since last commit
            print(f'Could not add motivation: "{motiv} in {year}/{cat}"')
            break
            
        # laureate
        try:
            fname = winner.get('firstname')
            share = winner.get('share')
            try:
                lname = winner.get('surname')
            except KeyError:
                lname=None
            cursor.execute(add_laureate,(fname,lname,share,yc_id,contrib_id))
            mydb.commit()    
        except:
            # laureate failed
            mydb.rollback()   # no effect, or undo contribution, or undo contrib and year/cat
            print(f'Could not add laureate {fname} {lname} in {year}/{cat}')
            break  
          

In [10]:
# check the count in laureate
cursor.execute('select count(*) from laureate')
res = cursor.fetchall()
for row in res:
    print(row)

(975,)


#### You can do transaction with commit and rollback in the MySQL client terminal:
https://dev.mysql.com/doc/refman/8.0/en/innodb-autocommit-commit-rollback.html

Unlike the Python connector, a connection through the client interface will start with autocommit set to 1 (True) by default. So if you want to process a transaction, you will need to set autocommit to 0 (false) first.

See also: https://dev.mysql.com/doc/refman/8.0/en/commit.html

---

#### <font color="brown">Queries on the nobels database</font>

##### **The table schemas**

<pre>
mysql> desc yearcat;
+----------+----------+------+-----+---------+----------------+
| Field    | Type     | Null | Key | Default | Extra          |
+----------+----------+------+-----+---------+----------------+
| id       | smallint | NO   | PRI | NULL    | auto_increment |
| year     | year     | NO   |     | NULL    |                |
| category | char(10) | NO   |     | NULL    |                |
+----------+----------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

mysql> desc contribution;
+------------+--------------+------+-----+---------+----------------+
| Field      | Type         | Null | Key | Default | Extra          |
+------------+--------------+------+-----+---------+----------------+
| id         | smallint     | NO   | PRI | NULL    | auto_increment |
| motivation | varchar(500) | NO   |     | NULL    |                |
+------------+--------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)

mysql> desc laureate;
+-------------+-------------+------+-----+---------+-------+
| Field       | Type        | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| fname       | varchar(80) | NO   |     | NULL    |       |
| lname       | varchar(40) | YES  |     | NULL    |       |
| share       | tinyint     | NO   |     | NULL    |       |
| year_cat_id | smallint    | NO   | MUL | NULL    |       |
| motiv_id    | smallint    | NO   | MUL | NULL    |       |
+-------------+-------------+------+-----+---------+-------+
5 rows in set (0.00 sec)
</pre>

---

##### <font color="brown">1. Who were the laureates in 2010, and for what category? (Inner Join)</font>

<pre>
mysql> select fname, lname, category
       from laureate, yearcat
       where yearcat.year=2010 and laureate.year_cat_id = yearcat.id;
       
+----------------+--------------+------------+
| fname          | lname        | category   |
+----------------+--------------+------------+
| Richard F.     | Heck         | Chemistry  |
| Ei-ichi        | Negishi      | Chemistry  |
| Akira          | Suzuki       | Chemistry  |
| Peter A.       | Diamond      | Economics  |
| Dale T.        | Mortensen    | Economics  |
| Christopher A. | Pissarides   | Economics  |
| Mario          | Vargas Llosa | Literature |
| Xiaobo         | Liu          | Peace      |
| Andre          | Geim         | Physics    |
| Konstantin     | Novoselov    | Physics    |
| Robert G.      | Edwards      | Medicine   |
+----------------+--------------+------------+       
</pre>

- We are executing what's called an **inner join** on the tables laureate and yearcat, because we are matching the year_cat_id value in the laureate table with the id value in the yearcat table - only the respective rows from these two tables for which these ids match will be selected for the result.

- Since year_cat_id, year, and id are unique column names in the pairs of table, you can omit the qualifers and write a simplified version like this:

<pre>
mysql> select fname, lname, category
       from laureate, yearcat
       where year=2010 and year_cat_id = id;
</pre>

- Another alternative is to explicity spell out the inner join using the JOIN and ON keywords:

<pre>
mysql> select fname, lname, category
       from laureate
       join yearcat
       on year_cat_id = id
       where year=2010;
</pre>

- The order of tables is irrelevant, you can do the following (yearcat join laureate) with the same result:

<pre>
mysql> select fname, lname, category
       from yearcat
       join laureate
       on year_cat_id = id
       where year=2010;
</pre>

---

##### <font color="brown">2. Find the year and category for laureates that did not have a last name (null values)</font>

<pre>
  mysql> select year,category 
       from yearcat, laureate 
       where lname is null 
       and year_cat_id = id;

+------+----------+
| year | category |
+------+----------+
| 2020 | Peace    |
| 2017 | Peace    |
| 2015 | Peace    |
| 2013 | Peace    |

      ...

| 1938 | Peace    |
| 1917 | Peace    |
| 1910 | Peace    |
| 1904 | Peace    |
+------+----------+   
</pre>

Looks like these are all in the Peace category, but we can verify:
<pre>
mysql> select distinct(category) 
       from yearcat, laureate 
       where lname is null 
       and year_cat_id = id;

+----------+
| category |
+----------+
| Peace    |
+----------+
</pre>


---

##### <font color="brown">3. For what contribution(s) was the Peace prize awarded in 2021? (Join on more than 2 tables)</font>

**Version 1**
<pre>
mysql> select motivation 
       from contribution, yearcat, laureate
       where year=2021 and category='Peace' 
             and yearcat.id = year_cat_id 
             and contribution.id = motiv_id;
             
+-----------------------------------------------------------------------------------
| motivation                                                                         
+----------------------------------------------------------------------------------- 
| for their efforts to safeguard freedom of expression, which  is a precondition ... 
| for their efforts to safeguard freedom of expression, which  is a precondition ...
+-----------------------------------------------------------------------------------             
</pre>

- This is an inner join of all three tables
- Since both the contribution and yearcat tables have a column named id, we need to qualify its usage with table names

**Version 2**

The above result is not quite what we want, since it is duplicated. (Two people shared the Peace prize in 2021, for the same contribution.) So we need to change the query to get distinct motivation:
<pre>
mysql> select distinct(motivation)
       from contribution, yearcat, laureate
       where year=2021 and category='Peace' 
             and yearcat.id = year_cat_id 
             and contribution.id = motiv_id;
             
+-----------------------------------------------------------------------------------
| motivation                                                                         
+----------------------------------------------------------------------------------- 
| for their efforts to safeguard freedom of expression, which  is a precondition ... 
+-----------------------------------------------------------------------------------                 
</pre>

**Version 3**

In this version, the contribution and yearcat tables are given alternative short labels for ease of writing out the join conditions:

<pre>
mysql> select distinct(motivation)
       from contribution C, yearcat Y, laureate
       where year=2021 and category='Peace' 
             and Y.id = year_cat_id 
             and C.id = motiv_id;               
</pre>

**Version 4**

Often, *all* tables in a join are labeled with single letters and *all* column names are qualified, regardless of whether or not the names are unique in the set of tables:

<pre>
mysql> select distinct(C.motivation)
       from contribution C, yearcat Y, laureate L
       where Y.year=2021 and Y.category='Peace' 
             and Y.id = L.year_cat_id 
             and C.id = L.motiv_id; 
</pre>
I'll mostly follow this practice going forward.

---

##### <font color="brown">4. How many people shared the Economics prize 2018?</font>

<pre>
mysql> select count(*) from yearcat Y, laureate L
       where Y.year=2018 and Y.category='Economics'
       and Y.id = L.year_cat_id;

+----------+
| count(*) |
+----------+
|        2 |
+----------+
</pre>