# Introduction to relational databases

## Course: Programming and Data Management (EDI 3400)

### *Vegard H. Larsen (Department of Data Science and Analytics)*

## What you will learn

- The basic theory behind relational databases and how to use databases using SQL
- Perform queries using SQL statements such as `SELECT`, `INSERT`, `JOIN` and `DELETE` 
- Interact with a database using the DB Browser for SQLite
- Import data from a database into Python
- Analyze data that is stored in a database in Python

# 1. Introduction to Databases

## What is a database?

- A database is a collection of data stored in tables (also called relations)
- Most of the data that exists are stored in databases
- A database provides functionality for reading, creating, modifying and deleting data
- A relational database means that data is stored in tables with rows and columns
- We will only focus on relational databases 
    - Alternatives: In-Memory Databases, NoSQL

## Why do we need a Database?

- To store data in a consistent and secure way
- To provide an organized structure of data
- Many users can change the data simultaneously (concurrency)
- Can share a huge data set among many people 
- Can avoid that the same data is stored on many places (redundancy)

## Splitting the information up in informational themes

- The database stores the information in tables (with rows and columns)
- Each *theme* should have its own table (e.g., *Customers*, *Employees*, *Students*, *Sales*)
- We do this to make sure information is stored on one place in our system only
- Tables can be joined together later to give us the combination of information/data we need

## Bad example: sales data

<table>
   <thead>
      <tr>
         <th></th>
         <th>Date</th>
         <th>Employee</th>
         <th>CustomerId</th>
         <th>Price</th>
         <th>CustomerAddress</th> 
         <th>EmployeeCommission</th>
         <th>CustomerPhone</th>
      </tr>
   </thead>
   <tbody>
      <tr>
         <td>Sale 1</td>
         <td>2022-09-10</td>
         <td>Ted</td>
         <td>2566</td>
         <td>1050</td>
         <td>2210 Tromsø</td>
         <td>0.65</td>
         <td>43251010</td>
      </tr>
      <tr>
         <td>Sale 2</td>
         <td>2022-09-10</td>
         <td>Lisa</td>
         <td>1889</td>
         <td>2000</td>
         <td>1015 Oslo</td>
         <td>0.68</td>
         <td>98589585</td>
      </tr>
      <tr>
         <td>Sale 3</td>
         <td>2022-09-11</td>
         <td>Lisa</td>
         <td>4545</td>
         <td>750</td>
         <td>1025 Oslo</td>
         <td>0.68</td>
         <td>99559955</td>
      </tr>
      <tr>
         <td>Sale 4</td>
         <td>2022-09-11</td>
         <td>Eric</td>
         <td>8321</td>
         <td>1255</td>
         <td>7420 Bergen</td>
         <td>0.5</td>
         <td>45969696</td>
      </tr>
      <tr>
         <td>Sale 5</td>
         <td>2022-09-11</td>
         <td>Ted</td>
         <td>4545</td>
         <td>525</td>
         <td>1025 Oslo</td>
         <td>0.65</td>
         <td>99559955</td>
      </tr>
   </tbody>
</table>

## A better solution for storing the same data


Sales table:

<table>
   <thead>
      <tr>
         <th>id</th> 
         <th>Date</th>
         <th>Price</th>
         <th>CustomerId</th>
         <th>EmployeeId</th>
      </tr>
   </thead>
   <tbody>
      <tr>
         <td>1</td>
         <td>2022-09-10</td>
         <td>1050</td>
         <td>2566</td>
         <td>1</td>
      </tr>
      <tr>
         <td>2</td>
         <td>2022-09-10</td>
         <td>2000</td>
         <td>1889</td>
         <td>2</td>
      </tr>
      <tr>
         <td>3</td>
         <td>2022-09-11</td>
         <td>750</td>
         <td>4545</td>
         <td>2</td>
      </tr>
      <tr>
         <td>4</td>
         <td>2022-09-11</td>
         <td>1255</td>
         <td>8321</td>
         <td>3</td>
      </tr>
      <tr>
         <td>5</td>
         <td>2022-09-11</td>
         <td>525</td>
         <td>4545</td>
         <td>1</td>
      </tr>
   </tbody>
</table>

Employees table:
    
<table>
   <thead>
      <tr>
         <th>id</th> 
         <th>FirstName</th>
         <th>Commission</th>
      </tr>
   </thead>
   <tbody>
      <tr>
         <td>1</td>
         <td>Ted</td>
         <td>0.65</td>
      </tr>
      <tr>
         <td>2</td>
         <td>Lisa</td>
         <td>0.68</td>
      </tr>
      <tr>
         <td>3</td>
         <td>Eric</td>
         <td>0.5</td>
      </tr>
   </tbody>
</table>

Customers table:
    
<table>
   <thead>
      <tr>
         <th>id</th> 
         <th>Address</th>
         <th>Phone</th>
      </tr>
   </thead>
   <tbody>
      <tr>
         <td>...</td>
         <td>...</td>
         <td>...</td>
      </tr>
      <tr>
         <td>1889</td>
         <td>1015 Oslo</td>
         <td>98589585</td>
      </tr>
        <tr>
         <td>...</td>
         <td>...</td>
         <td>...</td>
      </tr>
      <tr>
         <td>2566</td>
         <td>2210 Tromsø</td>
         <td>43251010</td>
      </tr>
        <tr>
         <td>...</td>
         <td>...</td>
         <td>...</td>
      </tr>
      <tr>
         <td>4545</td>
         <td>1025 Oslo</td>
         <td>99559955</td>
      </tr>
        <tr>
         <td>...</td>
         <td>...</td>
         <td>...</td>
      </tr>
      <tr>
        <td>8321</td>
        <td>7420 Bergen</td>
        <td>45969696</td>
      </tr>
   </tbody>
</table>

# 2. A demo of accessing a database with 
    a. DB browser for SQLite
    b. Python
    c. In a Jupyter Notebook

# 3. Components of a Database System

## **User** $\leftarrow\rightarrow$ **Database application**  $\leftarrow\rightarrow$ DBMS $\leftarrow\rightarrow$ **Database**

## Database application

This is the software that interacts with the database. A database application is a computer program whose primary purpose is retrieving information from a computerized database. Many of the applications we use today are database applications.

Examples:
- Facebook
- Wikipedia
- eBay

We will interact with databases using 

- DB Browser for SQLite, https://sqlitebrowser.org
- Python's sqlite3 library

## DBMS - DataBase Management System

A database management system (DBMS) is the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS software is used to administer the database.

- User control (who can access the database)
- Permissions (who can access what, some users can only read, etc.)
- Define specific rules about the structure of the database

Examples of DBMS:

- MySQL
- Oracle
- Microsoft Access
- **SQLite**

## Pros and Cons for SQLite

### Pros
- Easy to use 
- No installation needed
- Lightweight and portable
- Does not requier a server
- No login required

### Cons
- It has a limited database size (limited by file system on the computer)
- Lack of multi-user capabilities
- Less support for advanced functions
- Better options for large databases

# 4. The Relational Model

## Entity

- An entity is something of importance for the user (company, organization, researcher etc.) that we want to store in a database. 
- This entity should represent a theme or concept of importance 

Examples of entities:

- A particular sale 
- A customer 
- An employee
- A product

## Tables
- A table should represent something specific (the group that the entity belongs too)
- Two dimensional with rows and columns
- A cell in the table should hold a single value
- Try to not mix different concepts/themes in the same table
- Find good names for the tables (see Chapter 3 Viescas)

## Rows
- An instance of an entity is represented as a row
- E.g., one particular customer have one and only one row in the Customers table
- No two rows can be identical
- The order of the rows is not important

## Columns

- The columns contain specific data that is common for many instances of the entity
- Columns within the same table must have unique names
- E.g., all customers should have a phone number registered, the phone number is the specific data that should be an attribute of all the instances of an entity of type customer
- All the values in a specific column should be of the same type. (We will talk about valid data types later)
- The order of the columns does not matter 

## Keys

- A key is a column in a table where the values are used to identify a specific set of rows
- Two main types of keys
    1. Primary key
    2. Foreign key
- Every table should have one primary key column. All values of this column should be unique so we can use this column to identify all the different rows in the table 
- A foreign key is usually a column that contains keys that is a primary key in an other table. They do not need to be unique. 
- A foreign key is usually a column that contains keys that is a primary key in an other table. They do not need to be unique. 
- The foreign key is used to link two tables together.

Sales table:

<table>
   <thead>
      <tr>
         <th>SalesID</th> 
         <th>Date</th>
         <th>Price</th>
         <th>CustomerId</th>
         <th>EmployeeId</th>
      </tr>
   </thead>
   <tbody>
      <tr>
         <td>1</td>
         <td>2022-09-10</td>
         <td>1050</td>
         <td>2566</td>
         <td>1</td>
      </tr>
      <tr>
         <td>2</td>
         <td>2022-09-10</td>
         <td>2000</td>
         <td>1889</td>
         <td>2</td>
      </tr>
      <tr>
         <td>3</td>
         <td>2022-09-11</td>
         <td>750</td>
         <td>4545</td>
         <td>2</td>
      </tr>
      <tr>
         <td>4</td>
         <td>2022-09-11</td>
         <td>1255</td>
         <td>8321</td>
         <td>3</td>
      </tr>
      <tr>
         <td>5</td>
         <td>2022-09-11</td>
         <td>525</td>
         <td>4545</td>
         <td>1</td>
      </tr>
   </tbody>
</table>

Employees table:
    
<table>
   <thead>
      <tr>
         <th>EmployeeID</th> 
         <th>FirstName</th>
         <th>Commission</th>
      </tr>
   </thead>
   <tbody>
      <tr>
         <td>1</td>
         <td>Ted</td>
         <td>0.65</td>
      </tr>
      <tr>
         <td>2</td>
         <td>Lisa</td>
         <td>0.68</td>
      </tr>
      <tr>
         <td>3</td>
         <td>Eric</td>
         <td>0.5</td>
      </tr>
   </tbody>
</table>

## Join-operation
- We can link data from different tables together
- A row in one table can be linked to a row in another table by identifying common keys 
- We will use SQL to perform these Join-operations later

## Views
- A view is showing a subset of the data in a database and is based on a query that runs on one or more tables
- A view can show information from many tables together in one table that is displayed to the user
- We can use a view to combine the information in the three tables in the example before and sow the first (bad) table

### Terminology (same concept, different name)

 - **Table** - Relation - File
 - **Row** - Tuple - Record
 - **Column** - Attribute - Field
 

# 5. The Structured Query Language (SQL)

## What is SQL?
- A computer language used for interacting with data in relational databases
- SQL is an international standard language for creating and querying databases and their tables
- Most data-driven applications and websites rely on SQL to retrieve, insert, delete and modify data 

## Data manipulation
- `SELECT` is used to retrieve certain records from one or more tables.
- `INSERT` to create a record.
- `UPDATE` to modify records.
- `DELETE` to delete records.

## Getting data out of the database

- The `SELECT` statement can be used together with other SQL commands to retreive specific combinations of the data in the database 
- `FROM` select **from** specific table
- `WHERE` select **where** a condition is met
- `JOIN` **join** tables on keys 