# Assignment 4. Database Design

## Objectives

This assignment has two parts.

* In Part 1, you will be trained to draw an E/R diagram (Task 1) and transform it into relational schemas (Task 2).
* In Part 2, you will be trained to master important techniques related to database normalization (Tasks 3-5).

Download [A4.zip](A4.zip). Answer the questions in A4.ipynb.

## Part 1. Entity-Relationship Model (10 points)

You will design a database for SFU. This database will include information about departments, students, courses (and their offerings):

* Information about **students** includes their SID, name and age. The SID of a student is assumed to be unique, not shared by any other student. Each student is either a **graduate** or or an **undergraduate**. 
 - Each student must be in one category or the other, and cannot be in both categories simultaneously.
 - For graduate students, we record what their research field is.
 - For undergraduate students, we record their concentration.
 
 
 
* Information about **departments** includes their name and address. The name of a department is assumed to be unique, not shared by any other department.



* We need to be able to associate student with the departments with which they are affiliated. Each student has to be affiliated with exactly one department.



* Information about a course includes its number (e.g., "354"), name (e.g., "Introduction to Databases"), and capacity (e.g., 110). We also need to be able to know the unique department that owns each course: no cross-listing of courses across departments is allowed, and every course is owned by exactly one department.
 * Note: you cannot assume that course number uniquely identifies a course; in fact, you cannot assume even that course number together with course name uniquely identify a course. However, course number uniquely identifies courses within a department.
 
 
 
* Finally, we need to record all terms -- identified as semester (e.g., "fall") and year (e.g., "2018") -- in which each course has been offered in the history of the university.



* Assume that for a course to be offered during a term, it has at least one student enrolled. Also a course is offered at most once during each term. In other words, a course cannot have multiple sections during one term.



* Finally, assume that a student can take courses “owned” by departments with which the student is not affiliated. And a student should be enrolled in at least one course.





### Task 1: E/R Diagram (5 points) 

Render the SFU database in the version of the E/R model that we studied in class, with *exactly* the constraints and requirements specified above.


<img src="ER-diagram.png" alt="Drawing" style="width: 800px;"/>

### Task 2: From E/R Diagram to Relational Schemas (5 points).

Please follow the above E/R Diagram and write SQL queries to create required tables in `sfu.db`

In [1]:
%load_ext sql

%sql sqlite:///sfu.db

u'Connected: @sfu.db'

In [2]:
%%sql

CREATE TABLE students
(
    SID integer,
    name char(20),
    age integer,
    primary key (SID)
);

 * sqlite:///sfu.db
Done.


[]

In [3]:
%%sql

CREATE TABLE undergraduate
(
    studentid integer,
    concentration char(10),
    primary key (studentid)
    foreign key (studentid) references students(SID)
);

 * sqlite:///sfu.db
Done.


[]

In [4]:
%%sql

CREATE TABLE graduate
(
    studentid integer,
    research char(10),
    primary key (studentid)
    foreign key (studentid) references students(SID)
);

 * sqlite:///sfu.db
Done.


[]

In [5]:
%%sql

CREATE TABLE departments
(
    name char(10),
    address char(20),
    primary Key (name)
);

 * sqlite:///sfu.db
Done.


[]

In [6]:
%%sql

CREATE TABLE affiliated
(
    name char(10),
    studentid integer,
    primary key (studentid, name)
    foreign key (studentid) references students(SID)
    foreign key (name) references departments(name)
);

 * sqlite:///sfu.db
Done.


[]

In [7]:
%%sql

CREATE TABLE courses
(
    dname char(10),
    name char(20),
    number integer,
    capacity integer,
    primary Key (number, dname)
    foreign key (dname) references departments(name)
);

 * sqlite:///sfu.db
Done.


[]

In [8]:
%%sql

CREATE TABLE terms
(
    semester char(10),
    year integer,
    primary Key (semester, year)
);

 * sqlite:///sfu.db
Done.


[]

In [9]:
%%sql

CREATE TABLE offered
(
    department_name char(10),
    course_number integer,
    semester char(10),
    year integer,
    primary key (department_name, course_number, semester, year)
    foreign key (department_name) references departments(name)
    foreign key (course_number) references courses(number)
    foreign key (semester) references terms(semester)
    foreign key (year) references terms(year)
);

 * sqlite:///sfu.db
Done.


[]

In [10]:
%%sql

CREATE TABLE enrolled
(
    department_name char(10),
    course_number integer,
    studentid integer,
    primary key (department_name, course_number, studentid)
    foreign key (department_name) references departments(name)
    foreign key (course_number) references course(number)
    foreign key (studentid) references students(SID)
);

 * sqlite:///sfu.db
Done.


[]

## Part 2. Normalization (10 points)

### Task 3. Decompose a relational schema into BCNF

Consider a relational schema and a set of functional dependencies: 

* $R(A,B,C,D,E)$ with functional dependencies $A \rightarrow E$, $BC \rightarrow A$, $DE \rightarrow B$

**Decompose $R(A,B,C,D,E)$ into BCNF. Show all of your work and explain, at each step, which dependency violations you are correcting. You have to write down a description of your decomposition steps. （2 points)**

* closures from the given FDs: 
* {A}+ = {A,E}
* {B,C}+ = {B,C,A,E}
* {D,E}+ = {D,E,B}

* taking the bad FD {A}+ = {A,E}, R(A,B,C,D,E) is decomposed to R1(A,E) and R2(A,B,C,D)
* taking the bad FD {B,C}+ = {B,C,A,E}, R2(A,B,C,D) is decomposed to R21(B,C,A) and R2(B,C,D)
* now we have the tables R1(A,E), R21(B,C,A) and R22(B,C,D) and we don't have anymore non-trivial bad FD
* therefore, R(A,B,C,D,E) is decomposed to R1(A,E), R21(B,C,A) and R22(B,C,D)

### Task 4. Find a set of FDs that is consistent with a closed attribute set

A set of attributes $X$ is called closed (with respect to a given set of functional dependencies) if
$X^+=X$. Consider a relation with schema $R(A,B,C,D)$ and an unknown set of functional dependencies. For each closed attribute set below, give a set of functional dependencies that is consistent with it.

**a. All sets of attributes are closed (1 point)**
* A -> A
* B -> B
* C -> C
* D -> D

** b. The only closed sets are $\{\}$ and $\{A,B,C,D\}$ (1 point)**
* A -> B
* B -> C
* C -> D
* D -> A

** c. The only closed sets are $\{\}$, $\{A,B\}$, and $\{A,B,C,D\}$ (1 point)**
* A -> B
* B -> A
* C -> DAB
* D -> CAB

### Task 5. Normalize a database

Suppose Mike is the owner of a small store. He uses the following database ([mike.db](mike.db)) to store monthly sales of his store. 
* `Sales`(name, discount, mouth, price)

In [11]:
%load_ext sql
%sql sqlite:///mike.db

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


u'Connected: @mike.db'

In [12]:
%sql select * from Sales limit 5

 * sqlite:///mike.db
   sqlite:///sfu.db
Done.


name,discount,month,price
bar1,0.15,apr,19
bar8,0.15,apr,19
gizmo3,0.15,apr,19
gizmo7,0.15,apr,19
mouse1,0.15,apr,19



However, Mike finds that the database is difficult to update (i.e., when inserting new data into the database). Your job is to help Mike to normalize his database. You should do the following steps(a-d):

**a.** Find all *nontrivial* functional dependencies in the database.
This is a reverse engineering task, so expect to proceed in a trial and error fashion. Search first for the simple dependencies, say $name \rightarrow discount$ then try the more complex ones, like $name, discount \rightarrow month$, as needed. To check each functional dependency you have to write a SQL query.

Your challenge is to write this SQL query for every candidate functional dependency that you check, such that:

 - the query's answer is always short (say: no more than ten lines - remember that 0 results can be instructive as well)

 - you can determine whether the FD holds or not by looking at the query's answer. Try to be clever in order not to check too many dependencies, but don't miss potential relevant dependencies. For example, if you have A → B and C → D, you do not need to derive AC → BD as well.

**Write down all FDs that you found. (1 point)**

* Name -> Price
* Month -> Discount

** For each FD above, write down the SQL query that discovered it (remember short queries are preferred) (1 point)**

In [13]:
%%sql

Select *
From Sales S1, Sales S2
Where S1.name = s2.name and S1.price != s2.price

 * sqlite:///mike.db
   sqlite:///sfu.db
Done.


name,discount,month,price,name_1,discount_1,month_1,price_1


In [14]:
%%sql

select *
from Sales S1, Sales S2
where S1.month = s2.month and S1.discount != s2.discount

 * sqlite:///mike.db
   sqlite:///sfu.db
Done.


name,discount,month,price,name_1,discount_1,month_1,price_1


** b. Decompose the `Sales` table into BCNF. Like Task 1, show a description of your decomposition steps. (1 point)**

* Closures for FD:
* {Name}+ = {Name, Price}
* {Month}+ = {Month, Discount}

* taking the bad FD {Month}+ = {Month,Discount}, R(Name,Discount,Month,Price) is decomposed to R1(Month,Discount) and R2(Month,Name,Price)
* taking the bad FD {Name}+ = {Name, Price}, R2(Month,Name,Price) is decomposed to R21(Month,Name) and R22(Name,Price)

* now we have the tables R1(Month,Discount), R21(Month,Name) and R22(Name,Price) and we don't have anymore non-trivial bad FD
* therefore, R(Name,Discount,Month,Price) is decomposed to R1(Month,Discount), R21(Month,Name) and R22(Name,Price)

** c.  Write down SQL queries to create the BCNF tables in the [mike.db](mike.db). Create keys and foreign keys where appropriate. (1 point)**

In [15]:
%%sql 

create table MonthDiscount
(
    month varchar(3),
    discount float,
    primary key (month)
);

 * sqlite:///mike.db
   sqlite:///sfu.db
Done.


[]

In [16]:
%%sql 

create table NamePrice
(
    name varchar(50),
    price int,
    primary key (name)
);

 * sqlite:///mike.db
   sqlite:///sfu.db
Done.


[]

In [17]:
%%sql
create table MonthName
(
    month varchar(3),
    name varchar(50),
    foreign key (month) references MonthDiscount(month)
    foreign key (name) references NamePrice(name)
);

 * sqlite:///mike.db
   sqlite:///sfu.db
Done.


[]

** d.  Populate the BCNF tables using the data from the sales table. (1 point)**

*Hint:* see [SQL INSERT INTO SELECT Statement](https://www.w3schools.com/sql/sql_insert_into_select.asp)

In [18]:
%%sql

insert into MonthDiscount (month, discount)
select distinct month, discount
from Sales

 * sqlite:///mike.db
   sqlite:///sfu.db
12 rows affected.


[]

In [19]:
%%sql

insert into NamePrice (name, price)
select distinct name, price
from Sales

 * sqlite:///mike.db
   sqlite:///sfu.db
36 rows affected.


[]

In [20]:
%%sql
insert into MonthName (month,name)
select month, name
from Sales

 * sqlite:///mike.db
   sqlite:///sfu.db
426 rows affected.


[]

## Submission

Download [A4.zip](A4.zip). Answer the questions in A4.ipynb. Put `A4.ipynb`, `ER-diagram.png`, `sfu.db`, and the `mike.db` (with populated BCNF tables) into A4-submission.zip. 

Submit A4-submission.zip to the CourSys activity Assignment 4. 