# Example 1 - Employees Dataset




## Introduction

### Credits

Data and information extracted from the original repository: https://github.com/datacharmer/test_db
More information can be found at the MySQL docs page: https://dev.mysql.com/doc/employee/en/sakila-structure.html

### Overview

A sample database with an integrated test suite, used to test your applications and database servers

This repository was migrated from [Launchpad](https://launchpad.net/test-db).

See usage in the [MySQL docs](https://dev.mysql.com/doc/employee/en/index.html)

#### Where it comes from

The original data was created by Fusheng Wang and Carlo Zaniolo at 
Siemens Corporate Research. The data is in XML format.
http://timecenter.cs.aau.dk/software.htm

Giuseppe Maxia made the relational schema and Patrick Crews exported
the data in relational format.

The database contains about 300,000 employee records with 2.8 million 
salary entries. The export data is 167 MB, which is not huge, but
heavy enough to be non-trivial for testing.

The data was generated, and as such there are inconsistencies and subtle
problems. Rather than removing them, we decided to leave the contents
untouched, and use these issues as data cleaning exercises.



## Structure and Schema

The following diagram provides an overview of the structure of the Employees sample database. 

<img src="https://dev.mysql.com/doc/employee/en/images/employees-schema.png" width="700px">

[Source](https://dev.mysql.com/doc/employee/en/sakila-structure.html)

## Data

<pre>
    +--------------+------------------+
    | table_name   | expected_records |
    +--------------+------------------+
    | employees    |           300024 |
    | departments  |                9 |
    | dept_manager |               24 |
    | dept_emp     |           331603 |
    | titles       |           443308 |
    | salaries     |          2844047 |
    +--------------+------------------+
</pre>



### Preparing folder and downloading dataset files

In [27]:
%%bash

# creating datasets folder
mkdir datasets -p && cd datasets

# downloading dump and unziping
wget https://github.com/datacharmer/test_db/archive/master.zip -q -O db_test_employee_master.zip 
unzip -o db_test_employee_master.zip 

Archive:  db_test_employee_master.zip
0b66c2338736779e3b150c7d125b1012d95a961f
   creating: test_db-master/
  inflating: test_db-master/Changelog  
  inflating: test_db-master/README.md  
  inflating: test_db-master/employees.sql  
  inflating: test_db-master/employees_partitioned.sql  
  inflating: test_db-master/employees_partitioned_5.1.sql  
   creating: test_db-master/images/
  inflating: test_db-master/images/employees.gif  
  inflating: test_db-master/images/employees.jpg  
  inflating: test_db-master/images/employees.png  
  inflating: test_db-master/load_departments.dump  
  inflating: test_db-master/load_dept_emp.dump  
  inflating: test_db-master/load_dept_manager.dump  
  inflating: test_db-master/load_employees.dump  
  inflating: test_db-master/load_salaries1.dump  
  inflating: test_db-master/load_salaries2.dump  
  inflating: test_db-master/load_salaries3.dump  
  inflating: test_db-master/load_titles.dump  
  inflating: test_db-master/objects.sql  
   creating: test_db

### Importing the dataset

In [28]:
%%bash

cd datasets/test_db-master 

# importing the data 
mysql < employees.sql

INFO
CREATING DATABASE STRUCTURE
INFO
storage engine: InnoDB
INFO
LOADING departments
INFO
LOADING employees
INFO
LOADING dept_emp
INFO
LOADING dept_manager
INFO
LOADING titles
INFO
LOADING salaries
data_load_time_diff
00:00:45


### Verifying imported data

In [29]:
%%bash

cd datasets/test_db-master

# verifying the data 
mysql -t < test_employees_md5.sql

+----------------------+
| INFO                 |
+----------------------+
| TESTING INSTALLATION |
+----------------------+
+--------------+------------------+----------------------------------+
| table_name   | expected_records | expected_crc                     |
+--------------+------------------+----------------------------------+
| employees    |           300024 | 4ec56ab5ba37218d187cf6ab09ce1aa1 |
| departments  |                9 | d1af5e170d2d1591d776d5638d71fc5f |
| dept_manager |               24 | 8720e2f0853ac9096b689c14664f847e |
| dept_emp     |           331603 | ccf6fe516f990bdaa49713fc478701b7 |
| titles       |           443308 | bfa016c472df68e70a03facafa1bc0a8 |
| salaries     |          2844047 | fd220654e95aea1b169624ffe3fca934 |
+--------------+------------------+----------------------------------+
+--------------+------------------+----------------------------------+
| table_name   | found_records    | found_crc                        |
+--------------+-------

## Querying Interfaces

### Bash-based

In [30]:
! mysql -e "show databases;"

+--------------------+
| Database           |
+--------------------+
| employees          |
| information_schema |
| mysql              |
| performance_schema |
| sys                |
+--------------------+


In [31]:
! mysql -D "employees" -e "SELECT * FROM employees WHERE hire_date<('1991-1-1') LIMIT 5;" 

+--------+------------+------------+-----------+--------+------------+
| emp_no | birth_date | first_name | last_name | gender | hire_date  |
+--------+------------+------------+-----------+--------+------------+
|  10001 | 1953-09-02 | Georgi     | Facello   | M      | 1986-06-26 |
|  10002 | 1964-06-02 | Bezalel    | Simmel    | F      | 1985-11-21 |
|  10003 | 1959-12-03 | Parto      | Bamford   | M      | 1986-08-28 |
|  10004 | 1954-05-01 | Chirstian  | Koblick   | M      | 1986-12-01 |
|  10005 | 1955-01-21 | Kyoichi    | Maliniak  | M      | 1989-09-12 |
+--------+------------+------------+-----------+--------+------------+


In [32]:
! mysql -e "use employees; SELECT * FROM employees WHERE hire_date<('1991-1-1') LIMIT 5;" 

+--------+------------+------------+-----------+--------+------------+
| emp_no | birth_date | first_name | last_name | gender | hire_date  |
+--------+------------+------------+-----------+--------+------------+
|  10001 | 1953-09-02 | Georgi     | Facello   | M      | 1986-06-26 |
|  10002 | 1964-06-02 | Bezalel    | Simmel    | F      | 1985-11-21 |
|  10003 | 1959-12-03 | Parto      | Bamford   | M      | 1986-08-28 |
|  10004 | 1954-05-01 | Chirstian  | Koblick   | M      | 1986-12-01 |
|  10005 | 1955-01-21 | Kyoichi    | Maliniak  | M      | 1989-09-12 |
+--------+------------+------------+-----------+--------+------------+


In [33]:
%%bash 

mysql -D "employees" -e "SELECT * FROM employees WHERE hire_date<('1991-1-1') LIMIT 5;" 

emp_no	birth_date	first_name	last_name	gender	hire_date
10001	1953-09-02	Georgi	Facello	M	1986-06-26
10002	1964-06-02	Bezalel	Simmel	F	1985-11-21
10003	1959-12-03	Parto	Bamford	M	1986-08-28
10004	1954-05-01	Chirstian	Koblick	M	1986-12-01
10005	1955-01-21	Kyoichi	Maliniak	M	1989-09-12


### Jupyter Magic for SQL

In [34]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [35]:
%sql mysql://root:@127.0.0.1:3306/employees

In [36]:
%sql SELECT * FROM employees WHERE hire_date<('1991-1-1') LIMIT 5;

 * mysql://root:***@127.0.0.1:3306/employees
5 rows affected.


emp_no,birth_date,first_name,last_name,gender,hire_date
10001,1953-09-02,Georgi,Facello,M,1986-06-26
10002,1964-06-02,Bezalel,Simmel,F,1985-11-21
10003,1959-12-03,Parto,Bamford,M,1986-08-28
10004,1954-05-01,Chirstian,Koblick,M,1986-12-01
10005,1955-01-21,Kyoichi,Maliniak,M,1989-09-12


In [37]:
%%sql 

SELECT * FROM employees WHERE hire_date<('1991-1-1') LIMIT 5;

 * mysql://root:***@127.0.0.1:3306/employees
5 rows affected.


emp_no,birth_date,first_name,last_name,gender,hire_date
10001,1953-09-02,Georgi,Facello,M,1986-06-26
10002,1964-06-02,Bezalel,Simmel,F,1985-11-21
10003,1959-12-03,Parto,Bamford,M,1986-08-28
10004,1954-05-01,Chirstian,Koblick,M,1986-12-01
10005,1955-01-21,Kyoichi,Maliniak,M,1989-09-12
