## Databases

**A relational database** organizes data into one or more tables (or "relations") of columns and rows, with a unique key identifying each row. A **row** of the table contains data about an object, e.g., a student; the **columns** of the table describe different properties of the corresponding object; they contain attributes, e.g., name, year, specialization. Each column describes a single property of the object and has a fixed datatype. All the rows have the same fields with different values for different objects. Here is a toy example:

|Name (string)| Specialization (string)| Year (integer)|
| ------------- |-------------:| -----:|
|Roman| linguistics| 2|
|Masha| linguistics| 2|
|Sasha| programming| 1|

In a relational database each table has to have a **primary key** — a field or a combination of fields that uniquely identify each row of the table.

*Relational tables can be linked with each other*, which means that the data can be extracted from multiple tables. The tables are linked with each other to minimize the size of a database.

There are three types of relationships:
* one-to-one
* one-to-many
* many-to-many

The **one-to-one** correspondence presupposes that each attribute of the first table corresponds to a single attribute of the second table and vice versa. The **one-to-many** correspondence presupposes that one attribute of the first table corresponds to several attributes of the second table. The **many-to-many** correspondence presupposes that one attribute of the first table corresponds to several attributes of the second table and vice versa.

Imagine we want to store information about films.

```
Ларри Краун | 2011 | Том Хэнкс ...
Ларри Краун | 2011 | Джулия Роберт ...
Вам письмо | 1998 | Том Хэнкс ...
Вам письмо | 1998 | Мэг Райан ...
Красотка | 1990 | Джулия Робертс ...
Красотка | 1990 | Ричард Гир ...
```
We could create three tables and link them: actors, films and a table that says that actor X played in film Y. That will allow us not to repeat information multiple times.

people (id, name, ...)
```
1 | Том Хэнкс | ...
2 | Джулия Робертс | ...
3 | Мэг Райан | ...
4 | Ричард Гир | ...
```

films (id, title, year, ...)
```
1 | Ларри Краун | 2011 | ...
2 | Вам письмо | 1998 | ...
3 | Красотка | 1990 | ...
```

roles (film_id, person_id)
```
1 | 1
1 | 2
2 | 1
2 | 3
3 | 2
3 | 4
```

Easy to store a couple of numbers + easy to change the information: it needs to be changed in just one place.

To work with a database you will need to use a special program, a **Database Management System**, or **DBMS**. Below I list some DBMSs:

* SQLite
* MySQL
* PostgreSQL
* MongoDB
* ...

## SQL

Some of the database management systems have SQL in their names. What does SQL stand for?

**SQL** *(Structured Query Language)* is a standard language for accessing and manipulating databases.  SQL can execute queries against a database, can retrieve data from a database, can insert records in a database, can update records in a database, can delete records from a database, can create new databases etc.

SQL is a very simple language. We will need a small number of commands for the operations with the data (CREATE, DELETE, DROP, SELECT, INSERT, UPDATE) and commands-restrictors for the formulation of more specific queries (WHERE, IN, AND, OR, NOT, BETWEEN, LIKE, LIMIT, OFFSET). Note that in a query, the word order is fixed: first comes "what", then "where", then "how".

You have already worked through the interactive tutorial SQLBolt.

SELECT exercises

* https://sqlbolt.com/lesson/select_queries_introduction
* https://sqlbolt.com/lesson/select_queries_with_joins
* https://sqlbolt.com/lesson/select_queries_with_expressions


DML exercises

* https://sqlbolt.com/lesson/inserting_rows
* https://sqlbolt.com/lesson/updating_rows
* https://sqlbolt.com/lesson/deleting_rows

Creating and deleting tables

* https://sqlbolt.com/lesson/creating_tables


## Programs to use when working with databases

Database is not a text format, you cannot open it with a text editor and examine its contents. To do so, you need to use a special program designed to work with databases. Below I list some of the options:

* [MySQL](https://www.mysql.com/)
* [PostgreSQL](https://www.postgresql.org/)
* [MongoDB](https://www.mongodb.com/)
* [Firebird](https://firebirdsql.org/)

[Here](https://blog.capterra.com/free-database-software/) you can find a discussion of the pros and cons of the above options.

**DBrowser**: https://sqlitebrowser.org/dl/






# IMDB

Let's look at the IMDB database.

[Download here](https://yadi.sk/d/GOxdLhob7et7Hw?w=1)

<img src="img/imdb_schema.png">

**SELECT**


``` mysql
SELECT * FROM titles
```

``` mysql
SELECT * FROM titles LIMIT 10
```

``` mysql
SELECT *
FROM titles
WHERE premiered >= 2019
LIMIT 50
```

``` mysql
SELECT *
FROM titles
    JOIN film_types ON titles.type = film_types.id
WHERE premiered >= 2019
LIMIT 50
```

```mysql
SELECT *
FROM titles
    JOIN film_genres ON titles.title_id = film_genres.title_id
    JOIN genre_types ON film_genres.genre_id = genre_types.id
WHERE premiered >= 2019
LIMIT 50
```

``` mysql
SELECT *
FROM titles
    JOIN film_genres ON titles.title_id = film_genres.title_id
    JOIN genre_types ON film_genres.genre_id = genre_types.id
WHERE premiered >= 2019 AND genre_name = "Comedy"
LIMIT 50
```

``` mysql
SELECT *
FROM titles
    JOIN crew ON titles.title_id = crew.title_id
    JOIN people ON crew.person_id = people.person_id
WHERE name = "Tom Hanks"
ORDER BY premiered DESC
```

``` mysql
SELECT title, premiered
FROM titles
    JOIN crew ON titles.title_id = crew.title_id
    JOIN people ON crew.person_id = people.person_id
WHERE name = "Tom Hanks"
ORDER BY premiered DESC
```

``` mysql
CREATE TABLE mytable (
    field1 text,
    field2 text,
    field3 integer,
    PRIMARY KEY (field1, field2)
);
```

``` mysql
SELECT *
FROM titles
JOIN crew ON titles.title_id = crew.title_id
JOIN people ON crew.person_id = people.person_id
WHERE premiered >= 2019
LIMIT 50
```

``` mysql
CREATE INDEX crew_title ON crew (title_id);
CREATE INDEX crew_person ON crew (person_id);
```

``` mysql
SELECT name, title, premiered, rating
FROM titles
    JOIN crew ON titles.title_id = crew.title_id
    JOIN people ON crew.person_id = people.person_id
	JOIN rating ON titles.title_id = rating.title_id
WHERE name IN ("Tom Hanks", "Julia Roberts", "Natalie Portman")
```

``` mysql
SELECT
    name,
    MAX(rating) as max_rating, -- maximum
    MIN(rating) as min_rating, -- minimum
    AVG(rating) as average_rating, -- average
    COUNT(titles.title_id) as n_films -- counting the number of films
FROM titles
    JOIN crew ON titles.title_id = crew.title_id
    JOIN people ON crew.person_id = people.person_id
    JOIN rating ON titles.title_id = rating.title_id
WHERE name IN ("Tom Hanks", "Julia Roberts", "Natalie Portman")
GROUP BY name
ORDER BY average_rating DESC
```

``` mysql
SELECT
    name,
    MAX(rating) as max_rating, -- maximum
    MIN(rating) as min_rating, -- minimum
    ROUND(AVG(rating), 2) as average_rating, -- average
    COUNT(titles.title_id) as n_films -- counting the number of films
FROM titles
    JOIN crew ON titles.title_id = crew.title_id
    JOIN people ON crew.person_id = people.person_id
    JOIN rating ON titles.title_id = rating.title_id
WHERE name IN ("Tom Hanks", "Julia Roberts", "Natalie Portman")
GROUP BY name
ORDER BY average_rating DESC
```