# Lesson 2.5: Entity-Relationship Diagram (ERD) & Keys

### Lesson Duration: 3 hours

> Purpose: The purpose of this lesson is to familiarize students with some of the database design principles including _entities_, _attributes_, _relationships_, as well as with the _entity relationship diagrams_. The students will also learn about different kinds of SQL commands (_DDL_, _DML_, _DCL_, and _TCL_). We will then use some simple _DDL_ commands to:


    - create a table,
    - alter the properties of a table, and
    - populate the table with new data.

Suggested link: (Link: https://www.geeksforgeeks.org/sql-ddl-dql-dml-dcl-tcl-commands/)

### Setup

To start this lesson, students should have:

- Completed lesson 2.4
- All previous Setup

---

### Learning Objectives

After this lesson, students will be able to:

- Interpret entity-relationship diagram
- Interpret primary keys and foreign keys, and introduce normalization
- Distinguish between different SQL commands as DML, DDL, DCL, and TCL
- Use DDL to create a database and alter tables in an existing database

---

### Lesson 1 key concepts

> :clock10: 20 min
> **Learning Activity:** [Slideshow](../slides/2.5.1-Keys.pptx)

Entities and their attributes

- `Primary key` and its properties
- `Foreign keys` and why we need them

> Use the ER model to explain the concept of entities and attributes. Also, refer to the `files_for_lesson_and_activities/case_study.pdf` to show the entities and attributes of this database.

- _Primary_ key and its properties

  Introduce the idea of the primary key and foreign key. _Primary keys_ are used to uniquely identify every row/record in a table. It is _unique and not null._

- _Foreign_ keys and why we need them

  Foreign keys are used to create a link between two tables so that we do not repeat the same information in multiple tables. This is crucial for a very important database design principle called "normalization". This helps in eliminating data redundancy in the database.

![Normalized Data](https://education-team-2020.s3-eu-west-1.amazonaws.com/data-analytics/2.5-normalized_data_example.png)

![Non-normalized Data](https://education-team-2020.s3-eu-west-1.amazonaws.com/data-analytics/2.5-non_normalized_data_example.png)

If you compare the two images, you can see the amount of redundant information. If you imagine this in a database that has a large number of such tables and thousands of rows, it can lead to extremely inefficient storage of information. **Hence, the foreign keys are used to reference data from another table, to establish that link wherever necessary.**

# 2.05 Activity 1

During the lesson, we mentioned that one of the primary reasons for normalizing tables is to eliminate data redundancy. Otherwise, data redundancy can result in highly inefficient data storages. Which other problems you may think non-normalized structure may have?

The students can refer to the following link to read more about normalization, its advantages and disadvantages. (https://whatisdbms.com/normalization-in-dbms-anomalies-advantages-disadvantages/)

### Solution:

Some other problems that can arise due to non-normalization of the database are :

- Slower query processing (which would be due to inefficient storage of data)
- Data anomalies (INSERT, UPDATE, DELETE). We will talk about the anomalies in detail in the later lessons
- Database maintenance becomes tedious

### Lesson 2 key concepts

> :clock10: 20 min

- Establishing the relationship between entities
- Understanding Entity-Relationship diagrams
- Identify primary keys and foreign keys in the diagram

![ER Diagram for the case study](https://education-team-2020.s3-eu-west-1.amazonaws.com/data-analytics/2.5-caseStudy_ER_diagram.png)

- Note there is another important concept of cardinality in relationships and types of relationships including one to one, one to many, and many to many. We will talk about those concepts in detail later.

# 2.05 Activity 2

Later in the labs we will use another database that models a DVD rental store. ERD (entity relationship diagram) for the database is shown below. You can refer the file `sakila-schema.pdf` in the `files_for_activities` folder as well.

### Questions

- Identify the primary and foreign keys from the ER diagram.

### Lesson 3 key concepts

> **Learning Activity:**[Slideshow](https://docs.google.com/presentation/d/1C5_hClIbR1nyyX4DNzsvTgstdjzIbrvV/edit?usp=drive_web&ouid=108438125133210712048&dls=true)

> Note that this is just an introduction to make the students familiar with the terms/concepts including data integrity constraints, DML etc. In the next session, we will talk more in detail on how to use _DDL_ commands. Also, inform the students that we will focus on _DML_ more in this curriculum, since writing these commands, and especially `select` queries, is crucial for any data analyst/business analyst.

> :clock10: 20 min

Data Integrity constraints

- What are these constraints and why are they important
- Entity Integrity Constraint
- Referential Integrity Constraint
- Domain Integrity Constraint

**_Data integrity constraints_** are the set of rules that ensure the accuracy and consistency of data over its entire life cycle. They are a critical aspect to the design, implementation, and usage of any system which stores, processes, or retrieves data.

**_Entity integrity constraints_** are a set of rules that states that every table must have a primary key that uniquely identifies each record/row of the data. It should be unique and not null.

**_Referential integrity constraints_** require that a foreign key must take either one of the values that its primary key has or the NULL value i.e. the foreign key can't take a value that is not defined already as a primary key in the table where it is referenced from.

**_Domain integrity constraints_** require that all the values of an attribute must be from the same domain i.e. in a column, all values must be of the same data type.

- Kinds of SQL commands:

      - _DML_: Data Manipulation Language
      - _DDL_: Data Definition Language
      - _DCL_: Data Control Language
      - _TCL_: Transaction Control Language

- Data Manipulation Language is used to edit the data present in the database. Select, insert, update, delete.
- Data Definition Language is to either create or modify the table or the database structure. Create, alter, drop, truncate, rename.
- Data Control Language is used to give rights and permission to the user. It is used to control access to the database by securing it. Grant, revoke.
- Transactional Control Language is used to create and manage transactions within the database. Commit, rollback, savepoint.

Note that this is just an introduction to make the students familiar with these terms/concepts. In the next session, we will talk more in detail on how to use _DDL_ commands. Also, inform the students that we will focus on _DML_ more in this curriculum, since writing these commands, and especially `select` queries, is crucial for any data analyst/business analyst.

# 2.05 Activity 3

1. Check if referential integrity is followed in the following tables?

![Referential Integrity Constraint Check](https://education-team-2020.s3-eu-west-1.amazonaws.com/data-analytics/2.5-referential_integrity_constraint.png)


2. Now use the `bank` database to make the following changes:

      a. Use the insert command to create a new entry in the `loan` table with the following values (8000,8000000,930705,96396,12,8033.0,'C'). Here each element corresponds to the values in columns in the table (in the order the columns appear in the table). This might raise an error. Why is that?

      b. Use the delete command to delete an entry from the table account where the `account_id` is `11382`. Does this result in an error? If it does, then why?

### Solutions:

### 1

The given example does not follow referential integrity constraint. The `company_id` is present as a primary key in the table `InsuranceCompany` and it is present as a foreign key in the table `InsurancePlan`. Hence, the table `InsurancePlan` can not take any value of the `company_id` that is not present in the table `InsuranceCompany`.

### 2

```sql
-- 2a:
insert into bank.loan values(8000,8000000,930705,96396,12,8033.0,'C')
```

Note, there is the second value which is 8000000, corresponds to the `account_id`. We can check that no such `account_id` exists in the `account` table. Hence, this is a referential integrity constraint. If the tables are designed well, Inserting a record in a table referring a non-existing value in the table with the primary key should raise an error.

```sql
-- 2b: 
delete from bank.account 
where account_id=11382;
```

The `account_id` is the primary key in the `account` table. If there are related records in any other table that are using this particular `account_id` (foreign key), then this would result in an error.

### Lesson 4 key concepts

> :clock10: 20 min

- Let's say that we are trying to re-create the bank database. We already know the tables we need to define. We have all the information that we need, which includes their attributes, primary keys and foreign keys, ER diagram, and so on. Now we will try to use DDL commands to create some of the tables of the database ourselves.

- Emphasize on how we are adding the data integrity constraints (entity and referential integrity constraints) while creating the tables. That will dictate how we can access the data from the tables when we write queries on this database.
- We will show the students how to create tables account and district and insert values in them. The students will create the rest of the tables themselves in the activity session.

Using DDL (`Data Definition Language`)

- Create database 'bank_demo'
- Create tables
- Populating data into tables
- Altering properties of a table - Used to add, delete/drop or modify columns in the existing table

<details>
<summary> Click for Code Sample </summary>

```sql
--  create database
create database if not exists bank_demo;
use bank_demo;
```

```sql
-- create tables (table with only primary key)

drop table if exists district_demo;

CREATE TABLE district_demo (
  `A1` int(11) UNIQUE NOT NULL,
  `A2` char(20) DEFAULT NULL,
  `A3` varchar(20) DEFAULT NULL,
  `A4` int(11) DEFAULT NULL,
  `A5` int(11) DEFAULT NULL,
  `A6` int(11) DEFAULT NULL,
  `A7` int(11) DEFAULT NULL,
  `A8` int(11) DEFAULT NULL,
  `A9` int(11) DEFAULT NULL,
  `A10` float DEFAULT NULL,
  `A11` int(11) DEFAULT NULL,
  `A12` float DEFAULT NULL,
  `A13` float DEFAULT NULL,
  `A14` int(11) DEFAULT NULL,
  `A15` int(11) DEFAULT NULL,
  `A16` int(11) DEFAULT NULL,
  CONSTRAINT PRIMARY KEY (A1)  -- constraint keyword is optional but its a good practice
);
```

```sql
-- create a table (table with foreign key)
drop table if exists account_demo;

CREATE TABLE account_demo (
  account_id int(11) UNIQUE NOT NULL,
  district_id int(11) DEFAULT NULL,
  frequency text,
  date int(11) DEFAULT NULL,
  CONSTRAINT PRIMARY KEY (account_id),
  CONSTRAINT FOREIGN KEY (district_id) REFERENCES district_demo(A1)
) ;
```

```sql
-- populating tables
insert into district_demo
values (1,'Hl.m. Praha','Prague',1204953,0,0,0,1,1,100,12541,0.29,0.43,167,85677,99107),
(2,'Benesov','central Bohemia',88884,80,26,6,2,5,46.7,8507,1.67,1.85,132,2159,2674),
 (3,'Beroun','central Bohemia',75232,55,26,4,1,5,41.7,8980,1.95,2.21,111,2824,2813),
 (4,'Kladno','central Bohemia',149893,63,29,6,2,6,67.4,9753,4.64,5.05,109,5244,5892);
```

Note the below code will give a _referential integrity error_.

- Reason: Second column in the `account_demo` table is the foreign key that refers to `A1` in the `district_demo` table. Since we don't have any `A1` value as 5, it can't accept that value for `district_id`.

```sql
insert into account_demo values
(1,4,'POPLATEK MESICNE',950324),
(2,1,'POPLATEK MESICNE',930226),
(3,5,'POPLATEK MESICNE',970707);
```

- Correct Code

```sql
insert into account_demo values
(1,4,'POPLATEK MESICNE',950324),
(2,1,'POPLATEK MESICNE',930226),
(3,2,'POPLATEK MESICNE',970707);
```

- In the table definition of `account_demo`, the column date was defined as _integer_ type. We will modify the column to _date_ type.

```sql
alter table account_demo
modify date date;
select * from account_demo;
```

> Drop a column

```sql
alter table district_demo
drop column A15;
select * from district_demo;
```

> Rename table name

```sql
alter table account_demo
rename to accountDemo;
```

> Rename column name in a table

```sql
alter table district_demo
rename column A1 to dist_id;
```

> Add a new column

```sql
alter table accountDemo
add column balance int(11) after date;
```

# 2.05 Activity 4

1. Create the rest of the tables in the `bank` database (at least the `client` and the `card` tables).

2. Design and create a new database structure. Justify your changes.
    - Some ideas include renaming columns to ones that make more sense and, for eg., in the table `district`, adding foreign keys wherever necessary.

### Solutions:

### 1

```sql
create table card (
  card_id int(11) default null,
  disp_id int(11) default null,
  type text,
  issued text
)

insert into card values (1005,9285,'classic','931107 00:00:00\r'),
                        (104,588,'classic','940119 00:00:00\r')

create table client (
  client_id int(11) default null,
  birth_number int(11) default null,
  district_id int(11) default null
)

insert into client values (1,706213,18),(2,450204,1),(3,406009,1),(4,561201,5),(5,605703,5)

create table disp (
  disp_id int default null,
  client_id int default null,
  account_id int default null,
  type text collate utf8mb4_unicode_ci
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;


create table loan (
  loan_id int default null,
  account_id int default null,
  date int default null,
  amount int default null,
  duration int default null,
  payment float default null,
  status text collate utf8mb4_unicode_ci,
  status_desc varchar(30) COLLATE utf8mb4_unicode_ci default null
)

create table order (
  order_id int default null,
  account_id int default null,
  bank_to text collate utf8mb4_unicode_ci,
  account_to int default null,
  amount float default null,
  k_symbol text collate utf8mb4_unicode_ci
)

create table trans (
  trans_id int default null,
  account_id int default null,
  date int default null,
  type text collate utf8mb4_unicode_ci,
  operation text collate utf8mb4_unicode_ci,
  amount float default null,
  balance float default null,
  k_symbol text collate utf8mb4_unicode_ci,
  bank text collate utf8mb4_unicode_ci,
  account int default null
)
```

### 2

Some valid ideas for better database structure would be:

- Add FK Account -> Client

```sql
alter table bank.account
add column client_id int null;

alter table bank.account
add constraint fk_account_1
  foreign key (client_id)
  references bank.client (client_id)
  on delete no action
  on update no action;
```

- Add FK Card -> Account

```sql
alter table bank.card
add column client_id int null;

alter table bank.card
add constraint fk_card_1
  foreign key (client_id)
  references bank.client (client_id)
  on delete no action
  on update no action;
```

- Rename district columns

```sql
alter table district change A1 to district_id;
alter table district change A2 to district_name;
alter table district change A3 to region;
alter table district change A3 to population;
alter table district change A5 to num_muni_very_small;
alter table district change A6 to num_muni_small;
alter table district change A7 to num_muni_medium;
alter table district change A8 to num_muni_large;
alter table district change A9 to num_cities;
alter table district change A10 to urban_ratio;
alter table district change A11 to avg_salary;
alter table district change A12 to unmployment_rate_95;
alter table district change A13 to unmployment_rate_96;
alter table district change A14 to entrepreneurs;
alter table district change A15 to crimes_95;
alter table district change A16 to crimes_96;
```

# Lab | SQL Queries 5

In this lab, you will be using the [Sakila](https://dev.mysql.com/doc/sakila/en/) database of movie rentals. You have been using this database for a couple labs already, but if you need to get the data again, refer to the official [installation link](https://dev.mysql.com/doc/sakila/en/sakila-installation.html).

The database is structured as follows:
![DB schema](https://education-team-2020.s3-eu-west-1.amazonaws.com/data-analytics/database-sakila-schema.png)

<br><br>

### Instructions

1. Drop column `picture` from `staff`.
2. A new person is hired to help Jon. Her name is TAMMY SANDERS, and she is a customer. Update the database accordingly.
3. Add rental for movie "Academy Dinosaur" by Charlotte Hunter from Mike Hillyer at Store 1. You can use current date for the `rental_date` column in the `rental` table.
   **Hint**: Check the columns in the table rental and see what information you would need to add there. You can query those pieces of information. For eg., you would notice that you need `customer_id` information as well. To get that you can use the following query:

    ```sql
    select customer_id from sakila.customer
    where first_name = 'CHARLOTTE' and last_name = 'HUNTER';
    ```
    
    Use similar method to get `inventory_id`, `film_id`, and `staff_id`.

4. Delete non-active users, but first, create a _backup table_ `deleted_users` to store `customer_id`, `email`, and the `date` for the users that would be deleted. Follow these steps:

   - Check if there are any non-active users
   - Create a table _backup table_ as suggested
   - Insert the non active users in the table _backup table_
   - Delete the non active users from the table _customer_