# Lab Assignment 7: Database Queries
## DS 6001: Practice and Application of Data Science

### Instructions
Please answer the following questions as completely as possible using text, code, and the results of code as needed. Format your answers in a Jupyter notebook. To receive full credit, make sure you address every part of the problem, and make sure your document is formatted in a clean and professional way.

### Problem 0
Import the following libraries, load the `.env` file where you store your passwords (see the notebook for module 4 for details), and turn off the error tracebacks to make errors easier to read:

In [1]:
import numpy as np
import pandas as pd
import sys
import os
import requests
import psycopg2
import pymongo
import json
from bson.json_util import dumps, loads
from sqlalchemy import create_engine
import dotenv

# change to the directory where your .env file is
#os.chdir("/Users/tlever/Tom_Levers_Git_Repository/UVA/3--Practice_And_Application_Of_Data_Science")
#dotenv.load_dotenv() # register the .env file where passwords are stored
#PostgreSQL_password = os.getenv("PostgreSQL_password")
import keyring
#keyring.set_password('PostgreSQL', 'tlever', '<password>')
PostgreSQL_password = keyring.get_password('PostgreSQL', 'tlever')
PostgreSQL_password
sys.tracebacklimit = 0 # turn off the error tracebacks

### Problem 1
For this problem, we will be building a PostgreSQL database that contains the collected works of Shakespeare.

<img src="https://www.chappatte.com/prod/wp-content/uploads/artworks/2016/04/L160423ge-950x635.jpg" width="300">

The data were collected by [Catherine Devlin](https://github.com/catherinedevlin/opensourceshakespeare) from the repository at https://opensourceshakespeare.org/. The database will have four tables, one representing works by Shakespeare, one for characters that appear in Shakespeare's plays, one for chapters (this is, scenes within acts), and one for paragraphs (that is, lines of dialogue). The data to populate these four tables are here: 

In [2]:
works = pd.read_csv("https://github.com/jkropko/DS-6001/raw/master/localdata/Works.csv")
characters = pd.read_csv("https://github.com/jkropko/DS-6001/raw/master/localdata/Characters.csv")
chapters = pd.read_csv("https://github.com/jkropko/DS-6001/raw/master/localdata/Chapters.csv")
paragraphs = pd.read_csv("https://github.com/jkropko/DS-6001/raw/master/localdata/Paragraphs.csv")

In PostgreSQL, it is best practice to convert all column names to lower-case, as case sensitive column names will require [extraneous double-quotes](https://stackoverflow.com/questions/20878932/are-postgresql-column-names-case-sensitive) in any query. We first convert the column names in all four dataframe to lowercase:

In [3]:
works.columns = works.columns.str.lower()
characters.columns = characters.columns.str.lower()
chapters.columns = chapters.columns.str.lower()
paragraphs.columns = paragraphs.columns.str.lower()

You will build a database and populate it with these data. The ER diagram for the database is:

<img src="https://github.com/jkropko/DS-6001/raw/master/localimages/shakespeare2.png" width="400">

There's no codebook, unfortunately, but the values in the columns are mostly self-explanatory:

In [4]:
works.head()

Unnamed: 0,workid,title,longtitle,date,genretype,notes,source,totalwords,totalparagraphs
0,12night,Twelfth Night,"Twelfth Night, Or What You Will",1599,c,,Moby,19837,1031
1,allswell,All's Well That Ends Well,All's Well That Ends Well,1602,c,,Moby,22997,1025
2,antonycleo,Antony and Cleopatra,Antony and Cleopatra,1606,t,,Moby,24905,1344
3,asyoulikeit,As You Like It,As You Like It,1599,c,,Gutenberg,21690,872
4,comedyerrors,Comedy of Errors,The Comedy of Errors,1589,c,,Moby,14692,661


In [5]:
characters.head()

Unnamed: 0,charid,charname,abbrev,works,description,speechcount
0,1apparition-mac,First Apparition,First Apparition,macbeth,,1.0
1,1citizen,First Citizen,First Citizen,romeojuliet,,3.0
2,1conspirator,First Conspirator,First Conspirator,coriolanus,,3.0
3,1gentleman-oth,First Gentleman,First Gentleman,othello,,1.0
4,1goth,First Goth,First Goth,titus,,4.0


In [6]:
chapters.head()

Unnamed: 0,workid,chapterid,section,chapter,description
0,12night,18704.0,1.0,1.0,DUKE ORSINO's palace.
1,12night,18705.0,1.0,2.0,The sea-coast.
2,12night,18706.0,1.0,3.0,OLIVIA'S house.
3,12night,18707.0,1.0,4.0,DUKE ORSINO's palace.
4,12night,18708.0,1.0,5.0,OLIVIA'S house.


In [7]:
paragraphs.head()

Unnamed: 0,workid,paragraphid,paragraphnum,charid,plaintext,phonetictext,stemtext,paragraphtype,section,chapter,charcount,wordcount
0,12night,630863,3,xxx,"[Enter DUKE ORSINO, CURIO, and other Lords; Mu...",ENTR TK ORSN KR ANT O0R LRTS MSXNS ATNTNK,enter duke orsino curio and other lord musicia...,b,1.0,1.0,65.0,9.0
1,12night,630864,4,ORSINO,"If music be the food of love, play on;\n[p]Giv...",IF MSK B 0 FT OF LF PL ON JF M EKSSS OF IT 0T ...,if music be the food of love plai on give me e...,b,1.0,1.0,646.0,114.0
2,12night,630865,19,CURIO,"Will you go hunt, my lord?\n",WL Y K HNT M LRT,will you go hunt my lord,b,1.0,1.0,27.0,6.0
3,12night,630866,20,ORSINO,"What, Curio?\n",HT KR,what curio,b,1.0,1.0,13.0,2.0
4,12night,630867,21,CURIO,The hart.\n,0 HRT,the hart,b,1.0,1.0,10.0,2.0


#### Part a
Connect to your local PostgreSQL server (take steps to hide your password!), create a new database for the Shakespeare data, use `create_engine()` from `sqlalchemy` to connect to the database, and create the works, characters, chapters, and paragraphs tables populated with the data from the four dataframes shown above. [2 points]

In [8]:
import psycopg2
from sqlalchemy import create_engine
username = 'tlever'
host = 'localhost'
dbserver = psycopg2.connect(user = username, password = PostgreSQL_password, host = host)
dbserver.autocommit = True
cursor = dbserver.cursor()
dialect = 'postgresql'
driver = 'psycopg2'
database = 'shakespeare'
try:
    #cursor.execute(f"CREATE DATABASE {database}")
    engine = create_engine(f"{dialect}+{driver}://{username}:{PostgreSQL_password}@{host}/{database}")
except:
    cursor.execute(f"DROP DATABASE {database}")
    #cursor.execute(f"CREATE DATABASE {database}")
    engine = create_engine(f"{dialect}+{driver}://{username}:{PostgreSQL_password}@{host}/{database}")
shakespeare_database = psycopg2.connect(user = username, password = PostgreSQL_password, host = host, database = database)
works.to_sql('works', con = engine, index = False, chunksize = 1000, if_exists = 'replace')
characters.to_sql('characters', con = engine, index = False, chunksize = 1000, if_exists = 'replace')
chapters.to_sql('chapters', con = engine, index = False, chunksize = 1000, if_exists = 'replace')
paragraphs.to_sql('paragraphs', con = engine, index = False, chunksize = 1000, if_exists = 'replace')

35475

#### Part b
Write a query to display `title`, `date`, and `totalwords` from the `works` table. Rename `date` to `year`, and sort the output by `totalwords` in descending order. Also create a new column called `era` which is equal to "early" for works created before 1600, "middle" for works created between 1600 and 1607, and "late" for works created after 1607. Finally, display only the 7th through 11th rows of the output data. [1 point]

In [9]:
def query_database(database, query):
    cursor = database.cursor()
    cursor.execute(query)
    data_frame_without_column_names = cursor.fetchall()
    column_names = [x[0] for x in cursor.description]
    data_frame = pd.DataFrame(data_frame_without_column_names, columns = column_names)
    return data_frame

In [10]:
query = '''
SELECT title, date, totalwords
FROM works
'''
data_frame = query_database(shakespeare_database, query)
data_frame.head(n = 3)

Unnamed: 0,title,date,totalwords
0,Twelfth Night,1599,19837
1,All's Well That Ends Well,1602,22997
2,Antony and Cleopatra,1606,24905


In [11]:
query = '''
SELECT title, date AS year, totalwords
FROM works
ORDER BY totalwords DESC
'''
data_frame = query_database(shakespeare_database, query)
data_frame.head(n = 3)

Unnamed: 0,title,year,totalwords
0,Hamlet,1600,30558
1,Richard III,1592,29278
2,Coriolanus,1607,27577


In [12]:
query = '''
SELECT title, date AS year, totalwords,
    CASE 
        WHEN date < 1600 THEN 'early'
        WHEN date >= 1600 AND date <= 1607 THEN 'middle'
        WHEN date > 1607 THEN 'late'
    END AS era
FROM works
ORDER BY totalwords DESC
'''
data_frame = query_database(shakespeare_database, query)
data_frame.head(n = 3)

Unnamed: 0,title,year,totalwords,era
0,Hamlet,1600,30558,middle
1,Richard III,1592,29278,early
2,Coriolanus,1607,27577,middle


In [13]:
query = '''
SELECT title, date AS year, totalwords,
    CASE 
        WHEN date < 1600 THEN 'early'
        WHEN date >= 1600 AND date <= 1607 THEN 'middle'
        WHEN date > 1607 THEN 'late'
    END AS era
FROM works
ORDER BY totalwords DESC
'''
data_frame = query_database(shakespeare_database, query)
data_frame.to_csv('title_year_total_words_era.csv')
query += '''
OFFSET 7 - 1
LIMIT 11 - 7 + 1
'''
data_frame = query_database(shakespeare_database, query)
data_frame

Unnamed: 0,title,year,totalwords,era
0,King Lear,1605,26119,middle
1,Troilus and Cressida,1601,26089,middle
2,"Henry IV, Part II",1597,25692,early
3,"Henry VI, Part II",1590,25411,early
4,The Winter's Tale,1610,24914,late


#### Part c
The `genretype` column in the "works" table designates five types of Shakespearean work:

* `t` is a tragedy, such as *Romeo and Juliet* and *Hamlet*
* `c` is a comedy, such as *A Midsummer Night's Dream* and *As You Like It*
* `h` is a history, such as *Henry V* and *Richard III*
* `s` refers to Shakespeare's sonnets
* `p` is a narrative (non-sonnet) poem, such as *Venus and Adonis* and *Passionate Pilgrim*

Write a query that generates a table that reports the average number of words in Shakepeare's works by genre type. Display the genre type and the average wordcount within genre, use appropriate aliases, and sort by the average in descending order. [1 point]

In [14]:
query = '''
SELECT genretype, AVG(totalwords) AS average_number_of_words
FROM works
GROUP BY genretype
'''
data_frame = query_database(shakespeare_database, query)
data_frame

Unnamed: 0,genretype,average_number_of_words
0,c,20212.071428571428
1,h,24236.0
2,t,23817.36363636364
3,s,17515.0
4,p,6181.8


#### Part d
Use a query to generate a table that contains the text of Hamlet's (the character, not just the play) longest speech, and use the `print()` function to display this text. [1 point]

In [15]:
query = '''
UPDATE paragraphs
SET charcount = 105.0
WHERE paragraphid = '638259'
'''
cursor = shakespeare_database.cursor()
cursor.execute(query)

In [16]:
query = '''
SELECT paragraphid AS paragraph_id, charcount AS maximum_number_of_characters
FROM paragraphs
WHERE LOWER(charid) = 'hamlet'
ORDER BY charcount DESC
LIMIT 1
'''
data_frame = query_database(shakespeare_database, query)
data_frame

Unnamed: 0,paragraph_id,maximum_number_of_characters
0,638505,2691.0


In [17]:
query = '''
SELECT paragraphid, charid, plaintext, section, chapter, charcount, wordcount
FROM paragraphs
INNER JOIN (
    SELECT paragraphid AS paragraph_id, charcount AS maximum_number_of_characters
    FROM paragraphs
    WHERE LOWER(charid) = 'hamlet'
    ORDER BY charcount DESC
    LIMIT 1
) paragraph_id_and_maximum_number_of_characters
ON paragraphid = paragraph_id
'''
data_frame = query_database(shakespeare_database, query)
data_frame

Unnamed: 0,paragraphid,charid,plaintext,section,chapter,charcount,wordcount
0,638505,hamlet,"Ay, so, God b' wi' ye! ...",2.0,2.0,2691.0,466.0


In [18]:
print(data_frame.at[0, 'plaintext'])

Ay, so, God b' wi' ye!                        [Exeunt Rosencrantz and Guildenstern
[p]Now I am alone. 
[p]O what a rogue and peasant slave am I!
[p]Is it not monstrous that this player here,
[p]But in a fiction, in a dream of passion,
[p]Could force his soul so to his own conceit
[p]That, from her working, all his visage wann'd,
[p]Tears in his eyes, distraction in's aspect,
[p]A broken voice, and his whole function suiting
[p]With forms to his conceit? And all for nothing!
[p]For Hecuba!
[p]What's Hecuba to him, or he to Hecuba,
[p]That he should weep for her? What would he do,
[p]Had he the motive and the cue for passion
[p]That I have? He would drown the stage with tears
[p]And cleave the general ear with horrid speech;
[p]Make mad the guilty and appal the free,
[p]Confound the ignorant, and amaze indeed
[p]The very faculties of eyes and ears.
[p]Yet I,
[p]A dull and muddy-mettled rascal, peak
[p]Like John-a-dreams, unpregnant of my cause, 
[p]And can say nothing! No, not for a king

### Part e
Many scenes in Shakespeare's works take place in palaces or castles. Use a query to create a table that lists all of the chapters that take place in a palace. Include the work's title, the section (renamed to "act"), the chapter (renamed to "scene"), and the description of these chapters. The setting of each scene is listed in the `description` column of the "chapters" table. [Hint: be sure to account for case sensitivity] [2 points]

In [19]:
query = '''
SELECT title, section AS act, chapter as scene, description
FROM works
INNER JOIN chapters
    ON works.workid = chapters.workid
WHERE LOWER(description) LIKE '%palace%' OR LOWER(description) LIKE '%castle%'
'''
data_frame = query_database(shakespeare_database, query)
data_frame

Unnamed: 0,title,act,scene,description
0,Twelfth Night,2.0,4.0,DUKE ORSINO's palace.
1,Twelfth Night,1.0,4.0,DUKE ORSINO's palace.
2,Twelfth Night,1.0,1.0,DUKE ORSINO's palace.
3,All's Well That Ends Well,5.0,3.0,Rousillon. The COUNT's palace.
4,All's Well That Ends Well,5.0,2.0,Rousillon. Before the COUNT's palace.
...,...,...,...,...
181,The Winter's Tale,5.0,1.0,A room in LEONTES' palace.
182,The Winter's Tale,4.0,2.0,Bohemia. The palace of POLIXENES.
183,The Winter's Tale,2.0,3.0,A room in LEONTES' palace.
184,The Winter's Tale,2.0,1.0,A room in LEONTES' palace.


### Part f
Create a table that lists characters, the plays that the characters appear in, the number of speeches the character gives, and the average length of the speeches that the character gives. Display the character description and the work title, not the ID values. Sort the table by average speech length, and restrict the table to only those characters that give at least 20 speeches. [Hint: you will need to use a subquery.] [2 points]

In [20]:
query = '''
SELECT charname, description, title, number_of_paragraphs, average_number_of_characters
FROM (
    SELECT *
    FROM characters, unnest(string_to_array(works, ',')) work
) as first_normalized_characters
INNER JOIN (
    SELECT workid, charid, COUNT(charid) as number_of_paragraphs
    FROM paragraphs
    GROUP BY workid, charid
) as workid_charid_and_number_of_paragraphs
ON first_normalized_characters.charid = workid_charid_and_number_of_paragraphs.charid AND first_normalized_characters.work = workid_charid_and_number_of_paragraphs.workid
INNER JOIN (
    SELECT workid, charid, AVG(charcount) as average_number_of_characters
    FROM paragraphs
    GROUP BY workid, charid
) as workid_charid_and_average_number_of_characters
ON first_normalized_characters.charid = workid_charid_and_average_number_of_characters.charid AND first_normalized_characters.work = workid_charid_and_average_number_of_characters.workid
INNER JOIN works
ON works.workid = work
INNER JOIN paragraphs
ON first_normalized_characters.charid = paragraphs.charid
WHERE number_of_paragraphs >= 20
GROUP BY charname, description, title, number_of_paragraphs, average_number_of_characters
ORDER BY average_number_of_characters DESC
'''
data_frame = query_database(shakespeare_database, query)
data_frame

Unnamed: 0,charname,description,title,number_of_paragraphs,average_number_of_characters
0,Poet,the voice of Shakespeare's poetry,Sonnets,154,650.792208
1,Henry IV,King of England,"Henry IV, Part I",30,506.233333
2,Henry IV,King of England,"Henry IV, Part II",34,415.352941
3,Poet,the voice of Shakespeare's poetry,Passionate Pilgrim,43,412.558140
4,King Richard II,king of England,Richard II,98,352.775510
...,...,...,...,...,...
411,First Murderer,,Macbeth,21,46.761905
412,Curtis,,Taming of the Shrew,20,45.550000
413,Lucius,servant to Brutus,Julius Caesar,24,43.500000
414,Alice,a lady attending on Princess Katherine,Henry V,22,42.272727


### Part g
Which Shakepearean works do not contain any scenes in a palace or a castle? Use a query that displays the title, genre type, and publication date of works that do not contain any scenes that take place in a palace or castle. [Hint: use your work in part e as a starting point. You will need a subquery, and you will need to think carefully about the type of join that you need to perform.][2 points]

In [21]:
query = '''
SELECT title, genretype, date
FROM works
INNER JOIN chapters
    ON works.workid = chapters.workid
WHERE LOWER(description) NOT LIKE '%palace%' AND LOWER(description) NOT LIKE '%castle%'
'''
data_frame = query_database(shakespeare_database, query)
data_frame

Unnamed: 0,title,genretype,date
0,Twelfth Night,c,1599
1,Twelfth Night,c,1599
2,Twelfth Night,c,1599
3,Twelfth Night,c,1599
4,Twelfth Night,c,1599
...,...,...,...
754,The Winter's Tale,c,1610
755,The Winter's Tale,c,1610
756,The Winter's Tale,c,1610
757,The Winter's Tale,c,1610


### Problem 2
The following file contains JSON formatted data of the official English-language translations of every constitution currently in effect in the world:

In [22]:
const = requests.get("https://github.com/jkropko/DS-6001/raw/master/localdata/const.json")
const_json = json.loads(const.text)
pd.DataFrame.from_records(const_json)

Unnamed: 0,text,country,adopted,revised,reinstated,democracy
0,'Afghanistan 2004 Preamble \n﻿In the na...,Afghanistan,2004,,,0.372201
1,'Albania 1998 (rev. 2012) Preamble \nWe...,Albania,1998,2012.0,,0.535111
2,'Andorra 1993 Preamble \nThe Andorran P...,Andorra,1993,,,
3,"'Angola 2010 Preamble \nWe, the people ...",Angola,2010,,,0.315043
4,'Antigua and Barbuda 1981 Preamble \nWH...,Antigua and Barbuda,1981,,,
...,...,...,...,...,...,...
140,'Uzbekistan 1992 (rev. 2011) Preamble \...,Uzbekistan,1992,2011.0,,0.195932
141,'Viet Nam 1992 (rev. 2013) Preamble \nI...,Viet Nam,1992,2013.0,,0.251461
142,'Yemen 1991 (rev. 2001) PART ONE. THE FOUN...,Yemen,1991,2001.0,,0.125708
143,"'Zambia 1991 (rev. 2009) Preamble \nWE,...",Zambia,1991,2009.0,,0.405497


The text of the constitutions are available from the [Wolfram Data Repository](https://datarepository.wolframcloud.com/resources/World-Constitutions). I also included scores that represent the level of democractic quality in each country as of 2016. These scores are compiled by the [Varieties of Democracy (V-Dem)](https://www.v-dem.net/en/) project. Higher scores indicate greater levels of democratic openness and competition.

#### Part a
Connect to your local MongoDB server and create a new collection for the constitution data. Use `.delete_many({})` to remove any existing data from this collection, and insert the data in `const_json` into this collection. [2 points]

In [23]:
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost/")
constitutions_db = myclient["constitutions_db"]
list_of_collections = constitutions_db.list_collection_names()
if "constitutions_collection" in list_of_collections:
    constitutions_db.constitutions_collection.drop()
constitutions_collection = constitutions_db["constitutions_collection"]
constitutions_collection.delete_many({})
constitutions_collection.insert_many(const_json)

<pymongo.results.InsertManyResult at 0x7f999d717730>

#### Part b
Use MongoDB queries and the `dumps()` and `loads()` functions from the `bson` package to produce dataframes with the following restrictions:

* The country, adoption year, and democracy features (and not `_id`, text, revised, or reinstated) for countries with constitutions that were written after 1990 
* The country, adoption year, and democracy features (and not `_id`, text, revised, or reinstated) for countries with constitutions that were written after 1990 AND have a democracy score of less than 0.5
* The country, adoption year, and democracy features (and not `_id`, text, revised, or reinstated) for countries with constitutions that were written after 1990 OR have a democracy score of less than 0.5

[1 point]

In [24]:
def mongo_read_query(col, q, features):
    cursor = col.find(q, features)
    if ('score' in features):
        cursor.sort([('score', {'$meta': 'textScore'})])
    qtext = dumps(cursor)
    qrec = loads(qtext)
    qdf = pd.DataFrame.from_records(qrec)
    return qdf

In [25]:
myquery = {
    'adopted': {
        '$gte': 1990
    }
}
features  = {
    'country': 1,
    'adopted': 1,
    'democracy': 1,
    '_id': 0
}
mongo_read_query(constitutions_collection, myquery, features).sort_values(by = 'adopted', ascending = True)

Unnamed: 0,country,adopted,democracy
43,Namibia,1990,0.745421
56,Slovenia,1991,0.861380
48,Romania,1991,0.767288
69,Yemen,1991,0.125708
36,Macedonia (The former Yugoslav Republic of),1991,0.510983
...,...,...,...
11,Central African Republic,2013,0.504033
71,Zimbabwe,2013,0.315359
63,Tunisia,2014,0.748064
16,Egypt,2014,0.218600


In [26]:
myquery = {
    'adopted': {
        '$gte': 1990
    },
    'democracy': {
        '$lt': 0.5
    }
}
features  = {
    'country': 1,
    'adopted': 1,
    'democracy': 1,
    '_id': 0
}
mongo_read_query(constitutions_collection, myquery, features).sort_values(by = 'democracy', ascending = False)

Unnamed: 0,country,adopted,democracy
23,Serbia,2006,0.474443
10,Fiji,2013,0.473559
12,Iraq,2005,0.455402
17,Montenegro,2007,0.455338
18,Myanmar,2008,0.405772
35,Zambia,1991,0.405497
2,Armenia,1995,0.393278
16,Maldives,2008,0.386754
0,Afghanistan,2004,0.372201
31,Ukraine,1996,0.361911


In [27]:
myquery = {
    '$or': [
        {
            'adopted': {
                '$gte': 1990
            }
        },
        {
            'democracy': {
                '$lt': 0.5
            }
        }
    ]
}
features  = {
    'country': 1,
    'adopted': 1,
    'democracy': 1,
    '_id': 0
}
mongo_read_query(constitutions_collection, myquery, features).sort_values(by = 'democracy', ascending = False)

Unnamed: 0,country,adopted,democracy
22,Estonia,1992,0.909233
67,Slovenia,1991,0.861380
16,Czech Republic,1993,0.859101
25,Finland,1999,0.856265
41,Lithuania,1992,0.830487
...,...,...,...
52,Korea (Democratic People's Republic of),1972,0.090438
21,Eritrea,1997,0.075621
61,Saudi Arabia,1992,0.024049
2,Andorra,1993,


#### Part c
According to the Varieties of Democracy project, [Hungary has become less democratic](https://www.v-dem.net/en/news/democratic-declines-hungary/) over the last few years, and can no longer be considered a democracy. Update the record for Hungary to set the democracy score at 0.4. Then query the database to extract the record for Hungary and display the data in a dataframe. [1 point]

In [28]:
myquery = {
    'country': 'Hungary'
}
mongo_read_query(constitutions_collection, myquery, {})

Unnamed: 0,_id,text,country,adopted,revised,reinstated,democracy
0,6415766e00dbc6f5c8acca21,'Hungary 2011 (rev. 2013) Preamble \nGo...,Hungary,2011,2013.0,,0.697058


In [29]:
constitutions_collection.update_one(
    {
        'country': 'Hungary'
    },
    {
        '$set': {
            'democracy': 0.4
        }
    }
)
myquery = {
    'country': 'Hungary'
}
mongo_read_query(constitutions_collection, myquery, {})

Unnamed: 0,_id,text,country,adopted,revised,reinstated,democracy
0,6415766e00dbc6f5c8acca21,'Hungary 2011 (rev. 2013) Preamble \nGo...,Hungary,2011,2013.0,,0.4


#### Part d
Set the `text` field in the database as a text index. Then query the database to find all constitutions that contain the exact phrase "freedom of speech". Display the country name, adoption year, and democracy scores in a dataframe for the constitutions that match this query. [2 points]

In [30]:
constitutions_collection.create_index(
    [
        ('text', 'text')
    ]
)
data_frame = mongo_read_query(
    constitutions_collection,
    {
        '$text': {
            '$search': '\"freedom of speech\"',
            '$caseSensitive': True,
            '$diacriticSensitive': True
        }
    },
    {
        'country': 1,
        'adopted': 1,
        'democracy': 1,
        '_id': 0
    }
)
data_frame

Unnamed: 0,country,adopted,democracy
0,Slovenia,1991,0.86138
1,Poland,1997,0.682208
2,Eritrea,1997,0.075621
3,Croatia,1991,0.710922
4,Macedonia (The former Yugoslav Republic of),1991,0.510983
5,Kazakhstan,1995,0.262596
6,Zimbabwe,2013,0.315359
7,Kenya,2010,0.531911
8,Fiji,2013,0.473559
9,Georgia,1995,0.757486


#### Part e
Use a query to search for the terms "freedom", "liberty", "legal", "justice", and "rights". Generate a text score for all of the countries, and display the data for the countries with the top 10 relevancy scores in a dataframe. [2 points]

In [31]:
myquery = {
    '$text': {
        '$search': '\"freedom\" \"liberty\" \"legal\" \"justice\" \"rights\"',
        '$caseSensitive': True,
        '$diacriticSensitive': True
    }
}
features = {
    'score': {
        '$meta': 'textScore'
    }
}
mongo_read_query(constitutions_collection, myquery, features)

Unnamed: 0,_id,text,country,adopted,revised,reinstated,democracy,score
0,6415766e00dbc6f5c8acca60,'Serbia 2006 Preamble \nConsidering the...,Serbia,2006,,,0.474443,5.030999
1,6415766e00dbc6f5c8acca18,'Finland 1999 (rev. 2011) Chapter 1. Funda...,Finland,1999,2011.0,,0.856265,5.029000
2,6415766e00dbc6f5c8acca15,'Estonia 1992 (rev. 2011) Preamble \nWi...,Estonia,1992,2011.0,,0.909233,5.024473
3,6415766e00dbc6f5c8acc9f4,'Armenia 1995 (rev. 2005) Preamble \nTh...,Armenia,1995,2005.0,,0.393278,5.023651
4,6415766e00dbc6f5c8acc9f0,'Albania 1998 (rev. 2012) Preamble \nWe...,Albania,1998,2012.0,,0.535111,5.023087
...,...,...,...,...,...,...,...,...
105,6415766e00dbc6f5c8acca79,'United Arab Emirates 1971 (rev. 2009) Pre...,United Arab Emirates,1971,2009.0,,,4.255863
106,6415766e00dbc6f5c8acca4e,'Norway 1814 (rev. 2015) A. Form of govern...,Norway,1814,2015.0,,0.901217,4.139630
107,6415766e00dbc6f5c8acca42,'Micronesia (Federated States of) 1978 (rev. 1...,Micronesia (Federated States of),1978,1990.0,,,4.136421
108,6415766e00dbc6f5c8acca34,"'Latvia 1922 (reinst. 1991, rev. 2014) Pre...",Latvia,1922,2014.0,1991.0,0.837859,3.900577


### Question 3
Close the connections to the PostgreSQL and MongoDB databases. [1 point]

In [32]:
dbserver.commit()
dbserver.close()
myclient.close()