# Introduction to Database - SQL

_November 2, 2020_

Agenda today:
- Overview of databases
- Discuss Differences between SQL DBs
- Explain the basic structures of a RDBMS
- Instantiate SQLite DB instance on your computer & perform queries

Learning Goals:
- Explain the motivations and usage of different databases
- Create a sqlite instance and perform different queries:
    - Selecting
    - Filtering & Ordering
    - Grouping
    - Join

<img src="https://media.giphy.com/media/vzO0Vc8b2VBLi/giphy.gif" width = 300;>

In [None]:
import pandas as pd
import numpy as np
import sqlite3

## Part I. Overview of Database

### What is a Database?
- In general, databases store sets of data that can be queried for use in other applications. A database management system supports the development, administration and use of database platforms.

### What is a Relational Database? 
- An relational database management system (RDBMS) is a type of DBMS with a row-based table structure that connects related data elements and includes functions that maintain the security, accuracy, integrity and consistency of the data.
- The most basic RDBMS functions are related to create, read, update and delete operations, collectively known as CRUD.

### What is SQL?

- SQL (usually pronounced like the word “sequel”) stands for Structured Query Language.
- A programming language used to communicate with data stored in a relational database management system.
- SQL syntax is similar to the English language, which makes it relatively easy to write, read, and interpret.

## Part II. POPULAR RDBMS

- SQLite
- MySQL
- PostgreSql
- Oracle DB
- SQL Server

### SQLite

- SQLite is a popular open source SQL database. 
- It can store an entire database in a single file.
- It is 'lite' because it is not server based.
- Does not have many features of server-based RDBMS like users and permissions.
- Great to get up and running quick, not good for complex projects.

### MySQL

- MySQL is the most popular open source SQL database. 
- It is typically used for web application development, and often accessed using PHP. 
- It is easy to use, inexpensive, reliable and has a large community of developers who can help answer questions.
- Open source development has lagged since Oracle has taken control of MySQL.
- Has been known to suffer from poor performance when scaling, 
- Does not include some advanced features that developers may be used to.

### PostgreSQL

- PostgreSQL is an open-source SQL database that is not controlled by any corporation.
- PostgreSQL shares many of the same advantages of MySQL.
- It is slower in performance than other databases such as MySQL
- Harder to come by hosts or service providers that offer managed PostgreSQL instances. 

### Oracle DB

- Owned by Oracle Corporation, and the code is not open sourced. 
- Oracle DB is for large applications, particularly in the banking industry. 
- The main disadvantage of using Oracle is that it is not free.

### SQL Server

- Microsoft owns SQL Server. 
- Large enterprise applications mostly use SQL Server.
- Microsoft offers a free entry-level version called Express but can become very expensive as you scale your application. 

## Part III. Structures of RDBMS
- Tables
- Indexes
- Triggers

#### Tables
Tables are used to store data within the database.  They are its main component and without them, the database would serve little purpose. 

- Tables can have hundreds, thousands, sometimes even millions of rows of data. These rows are often called **records**.
- The consists of **columns** of data that are labeled with a descriptive name (say, age for example) and have a specific data type.

### Indexes
Indexes are used to make data retrieval faster. Rather than having to scan an entire table for data, an index allows the database to, essentially, directly retrieve the data being asked of it.

Indexes are primariy created using using a **Primary Key**.
A primary key’s main features are:

- It must contain a unique value for each row of data.
- It cannot contain null values.

A primary key is either an existing table column or a column that is specifically generated by the database according to a defined sequence.

**Question for You**

If you were a tax accountant and you wanted to create a database of your clients, which of the following columns who be a good choice for your primary key?

- First Name
- Last Name
- Email Address
- SSN
- Phone Number

### Triggers

Triggers are special instructions that are executed when important events, such as inserting or updating records in a table happen. The most common triggers are Insert, Update, and Delete triggers.  

Two items define a trigger on a table: a stored procedure and an event, such as inserting a record that invokes its execution.

## Part IV. Working with SQLite in Python

SQLite comes standard with Python so all you need to do to get it set up is write:

`import sqlite3`

### Creating a Connection

Before you can do anything with your DB, you must first create a connection with it. For DBs that are server based, this can be more coplicated requiring you to know the server ip, a username, password database name, and port. 
``` python 
import sqlite3
conn = sqlite3.connect(host=host, user=user, passwd=passwd, db=db)
```

In [None]:
## instantiate a sql instance on your local computer 
import sqlite3

# we then need to establish a connection object that represent the database
conn = sqlite3.connect('aggregate.db')

# we then create a cursor that allow us to interact with, and create sql commands
c = conn.cursor()

#### Selecting 

Syntax:

`SELECT * From table`

In [None]:
# to execute a command use c.execute(query)
query = """CREATE TABLE students (
            first_name text,
            birth_date text,
            num_siblings integer
            )"""
c.execute(query)

In [None]:
# insert some values into it 
query = """INSERT INTO students VALUES('Justin','',)"""
c.execute(query)
conn.commit()


In [None]:
# write a query that insert your information into it - student 1
query = """INSERT INTO students VALUES('Joe','10-31',)"""
c.execute(query)
conn.commit()

In [None]:
# write a query that insert your information into it - student 2
query = """INSERT INTO students VALUES('','',)"""
c.execute(query)
conn.commit()

In [None]:
# write a query that insert your information into it - student 3
query = """INSERT INTO students VALUES('','',)"""
c.execute(query)
conn.commit()

In [None]:
### optional level up - write a loop that dynamically put a dictionary or a list into the table in db
student_info = {"first_name":['','','',''],
               "birth_date":[],
               "num_siblings":[]}

# pseudocode - 
'''
for student in student_info:
 c.execute("""insert into __ Values())
 conn.commit()
'''

In [None]:
# selecting all students info from student table
query = """"""

c.execute(query).fetchall()

# fetchall

In [None]:
# let's insert some tables into sql and perform some queries!
auto = pd.read_csv('auto-mpg.csv')
auto.head()

In [None]:
# store auto into our existing database

#auto.to_sql('auto', con = conn)


In [None]:
# select cylinders and displacement cols from table
query = """"""
c.execute(query).fetchall()

#### Filtering
Just like querying with Pandas, sometimes we want to select data that fit certain criteria. Sql queries also allow us to filter! We need to utilize the `WHERE` clause. 
`SELECT * from Table WHERE conditions (ORDER BY conditions DESC/ASC)`

In [None]:
# use pandas to select cars that weigh more than 3000 lbs
query_1_pd = None

In [None]:
# use sql 
query_1_sql = """"""
len(c.execute(query_1_sql).fetchall())


In [None]:
# use pandas to select just the names of the cars where cylinders are greater than 5
query_2_pd = None
# how many are there?
query_2_pd.shape

In [None]:
# write the sql equivalent 
query_2_sql = """"""
len(c.execute(query_2_sql).fetchall())

# what is happening here? Can we troubleshoot?

In [None]:
# exercise - use pandas to select the names of the cars which have mpg > 15 and cynlinder less than 7 and show 
# descending order by model year
query_3_pd = None
query_3_pd

In [None]:
# write the sql equivalent
query_3_sql = """"""
c.execute(query_3_sql).fetchall()

### This afternoon:
- Grouping
- Joining

#### Joining
<img src ='sql-joins-better.png' width = 400>