# Database Admin 101 - Lab

## Introduction 

In this lab, you'll go through the process of designing and creating a database. From there, you'll begin to populate this table with mock data provided to you.

## Objectives

You will be able to:

* Use knowledge of the structure of databases to create a database and populate it

## The Scenario

You are looking to design a database for a school that will house various information from student grades to contact information, class roster lists and attendance. First, think of how you would design such a database. What tables would you include? What columns would each table have? What would be the primary means to join said tables?

## Creating the Database

Now that you've put a little thought into how you might design your database, it's time to go ahead and create it! Start by import the necessary packages. Then, create a database called **school.sqlite**.

In [2]:
# Import necessary packages
import pandas as pd
import sqlite3


In [3]:
# Create the database school.sqlite 
conn = sqlite3.connect('school.sqlite')
cur = conn.cursor()

In [4]:
ls

CONTRIBUTING.md      README.md            index.ipynb
LICENSE.md           contact_list.pickle  school.sqlite


## Create a Table for Contact Information

Create a table called contactInfo to house contact information for both students and staff. Be sure to include columns for first name, last name, role (student/staff), telephone number, street, city, state, and zipcode. Be sure to also create a primary key for the table. 

In [6]:
cur.execute("""DROP TABLE IF EXISTS contactInfo ;""")

<sqlite3.Cursor at 0x7fc43c3a5570>

In [7]:
# Your code here
sql_str = """CREATE TABLE IF NOT EXISTS contactInfo
             (id INTEGER PRIMARY KEY,
              first_name TEXT,
              last_name TEXT,
              role TEXT,
              phone_num INTEGER,
              street TEXT,
              city TEXT,
              state TEXT,
              zipcode TEXT)
          ;"""

cur.execute(sql_str)

<sqlite3.Cursor at 0x7fc43c3a5570>

## Populate the Table

Below, code is provided for you in order to load a list of dictionaries. Briefly examine the list. Each dictionary in the list will serve as an entry for your contact info table. Once you've briefly investigated the structure of this data, write a for loop to iterate through the list and create an entry in your table for each person's contact info.

In [12]:
# Load the list of dictionaries; just run this cell
import pickle

with open('contact_list.pickle', 'rb') as f:
    contacts = pickle.load(f)

In [21]:
for dict_r in contacts:
    row_list = []
    [row_list.append(dict_r[key]) for key in dict_r.keys()]
    sql_str = f"""INSERT INTO contactInfo
                 ({col_str}) 
                 VALUES ({str(row_list)[1:-1]})"""
    #print(sql_str)
    cur.execute(sql_str)

In [14]:
# Iterate over the contact list and populate the contactInfo table here

# Create a list of table column names
col_list = []
cur.execute("""SELECT * FROM contactInfo LIMIT 1""")
[col_list.append(x[0]) for x in cur.description]
col_str = ', '.join(col_list[1:])

**Query the Table to Ensure it is populated**

In [22]:
# Your code here 
df = pd.DataFrame(cur.execute("""SELECT * FROM contactInfo""")\
                 .fetchall())
df.columns = [x[0] for x in cur.description]

df

Unnamed: 0,id,first_name,last_name,role,phone_num,street,city,state,zipcode
0,1,Christine,Holden,staff,2035687697,1672 Whitman Court,Stamford,CT,6995
1,2,Christopher,Warren,student,2175150957,1935 University Hill Road,Champaign,IL,61938
2,3,Linda,Jacobson,staff,4049446441,479 Musgrave Street,Atlanta,GA,30303
3,4,Andrew,Stepp,student,7866419252,2981 Lamberts Branch Road,Hialeah,Fl,33012
4,5,Jane,Evans,student,3259909290,1461 Briarhill Lane,Abilene,TX,79602
5,7,Mary,Raines,student,9075772295,3975 Jerry Toth Drive,Ninilchik,AK,99639
6,8,Ed,Lyman,student,5179695576,3478 Be Sreet,Lansing,MI,48933
7,9,Christine,Holden,staff,2035687697,1672 Whitman Court,Stamford,CT,6995
8,10,Christopher,Warren,student,2175150957,1935 University Hill Road,Champaign,IL,61938
9,11,Linda,Jacobson,staff,4049446441,479 Musgrave Street,Atlanta,GA,30303


## Commit Your Changes to the Database

Persist your changes by committing them to the database.

In [26]:
# Your code here
conn.commit()

## Create a Table for Student Grades

Create a new table in the database called "grades". In the table, include the following fields: userId, courseId, grade.

** This problem is a bit more tricky and will require a dual key. (A nuance you have yet to see.)
Here's how to do that:

```SQL
CREATE TABLE table_name(
   column_1 INTEGER NOT NULL,
   column_2 INTEGER NOT NULL,
   ...
   PRIMARY KEY(column_1,column_2,...)
);
```

In [17]:
# Create the grades table
sql_str = """CREATE TABLE IF NOT EXISTS grades 
            (userId INTEGER NOT NULL,
            courseId INTEGER NOT NULL,
            grade INTEGER,
            PRIMARY KEY(userId, courseId)
            )"""

cur.execute(sql_str)

<sqlite3.Cursor at 0x7fc43c3a5570>

## Remove Duplicate Entries

An analyst just realized that there is a duplicate entry in the contactInfo table! Find and remove it.

In [19]:
# This is the listed solution. One would note the phone number, 
#   and then use the phone number to DELETE from the table
#   WHERE phone_num = the duplicated value.

# But this solution will delete BOTH of Jane's entries. We want to keep one.

cur.execute("""SELECT first_name, last_name, phone_num, COUNT(*)
               FROM contactInfo
               GROUP BY first_Name, last_Name, phone_num
               HAVING COUNT(*) > 1;""").fetchall()

[('Jane', 'Evans', 3259909290, 2)]

In [23]:
# My proposed solution, which uses pandas to determine a list of the duplicate
#   id values, which can be fed into a SQL query and used to delete just the
#   duplicates.

# Run the INSERT INTO cell multiple times to get more duplicates, to show
# how this would be useful when there is more than a single duplicate

# Load a dataframe of everything
df = pd.DataFrame(cur.execute("""SELECT * FROM contactInfo""")\
                .fetchall())
df.columns = [x[0] for x in cur.description]

# get a df representing only the duplicates to be dropped
dupes_df = df[df.drop(columns=['id']).duplicated() == True]

# create a string of the list of duplicate ids
str_dupe_ids = str(list(dupes_df['id']))[1:-1]

# use the duplicate id string to drop the duplicate rows
cur.execute(f"""DELETE FROM contactInfo 
               WHERE id IN ({str_dupe_ids});""")

str_dupe_ids

'9, 10, 11, 12, 13, 14, 15, 16'

In [24]:
# Delete the duplicate entry
cur.execute(f"""DELETE FROM contactInfo 
               WHERE id IN ({str_dupe_ids});""")

<sqlite3.Cursor at 0x7fc43c3a5570>

In [25]:
# Check that the duplicate entry was removed
df = pd.DataFrame(cur.execute("""SELECT * FROM contactInfo;""").fetchall())

df.columns = [x[0] for x in cur.description]

df

Unnamed: 0,id,first_name,last_name,role,phone_num,street,city,state,zipcode
0,1,Christine,Holden,staff,2035687697,1672 Whitman Court,Stamford,CT,6995
1,2,Christopher,Warren,student,2175150957,1935 University Hill Road,Champaign,IL,61938
2,3,Linda,Jacobson,staff,4049446441,479 Musgrave Street,Atlanta,GA,30303
3,4,Andrew,Stepp,student,7866419252,2981 Lamberts Branch Road,Hialeah,Fl,33012
4,5,Jane,Evans,student,3259909290,1461 Briarhill Lane,Abilene,TX,79602
5,7,Mary,Raines,student,9075772295,3975 Jerry Toth Drive,Ninilchik,AK,99639
6,8,Ed,Lyman,student,5179695576,3478 Be Sreet,Lansing,MI,48933


## Updating an Address

Ed Lyman just moved to `2910 Simpson Avenue York, PA 17403`. Update his address accordingly.

In [38]:
# Update Ed's address
cur.execute(f"""UPDATE contactInfo 
                SET street = '2910 Simpson Avenue York'
                    , city = 'York'
                    , state = 'PA'
                    , zipcode = '17403'
               WHERE id = 8""")

<sqlite3.Cursor at 0x7fabb4b01a40>

In [39]:
# Query the database to ensure the change was made
df = pd.DataFrame(cur.execute("""SELECT * FROM contactInfo""")\
                .fetchall())
df.columns = [x[0] for x in cur.description]

df

Unnamed: 0,id,first_name,last_name,role,phone_num,street,city,state,zipcode
0,1,Christine,Holden,staff,2035687697,1672 Whitman Court,Stamford,CT,6995
1,2,Christopher,Warren,student,2175150957,1935 University Hill Road,Champaign,IL,61938
2,3,Linda,Jacobson,staff,4049446441,479 Musgrave Street,Atlanta,GA,30303
3,4,Andrew,Stepp,student,7866419252,2981 Lamberts Branch Road,Hialeah,Fl,33012
4,5,Jane,Evans,student,3259909290,1461 Briarhill Lane,Abilene,TX,79602
5,7,Mary,Raines,student,9075772295,3975 Jerry Toth Drive,Ninilchik,AK,99639
6,8,Ed,Lyman,student,5179695576,2910 Simpson Avenue York,York,PA,17403


## Commit Your Changes to the Database

Once again, persist your changes by committing them to the database.

In [40]:
# Your code here
conn.commit()

## Summary

While there's certainly more to do with setting up and managing this database, you got a taste for creating, populating, and maintaining databases! Feel free to continue fleshing out this exercise for more practice. 