# Let's Get Relational

Building lightweight databases with Python's SQLite3 module

## What is a SQLite database?

#### Datasets joined by keys
The simplest way to think about a relational database is as a series of datasets in which a special "key" column in one dataset references a specific row in another dataset, allowing you to join and retrieve related pieces of information on quickly and simply.

#### That are stored locally on your computer
Unlike many other relational database engines, like PostgreSQL or MySQL, which can be storage intensive and difficult to set up locally, SQLite is just one self-contained file.

## Example

#### You have the following datasets:
- Demographics by CSA (combined statistical area) in Baltimore
- List of polling locations in Baltimore
- Turnout by voter district in Baltimore
- 9-1-1 Calls

#### And you want to retreive these subsets of the combined data:
- A list of the polling locations in CSAs with populations that are at least 75% white
- The demographic breakdown of CSAs that overlap with the voting district with the highest turnout
- All of the 911 calls that placed within 1 mile of a polling location on election day

## Why are they useful?

- Replace multiple spreadsheets with a single file
- Merge once, share endlessly
- Query what you need, when you need it

## When do they come in handy?

#### If your analysis...
- Rivals Frankenstein in the number of sources it stitches together
- Calls `merge()` and `groupby()` 50 times before returning the first meaningful result
- Takes longer to load than the family van on summer vacation

#### ...you might want to use a SQLite database

## Okay, okay how do I actually create one?

### Let's start with something simple

Combining several playlists into a single music library with the following information:
- Album
- Artist
- Duration
- Genre

### Step 1: Import pandas and playlist for cleanup

In [1]:
import pandas as pd
xlsx = pd.ExcelFile("P4GC_SQLite3_Test.xlsx")
df_gritty = pd.read_excel(xlsx, "Gritty Playlist")
df_mix = pd.read_excel(xlsx, "Mix Tape Playlist")
df_alt = pd.read_excel(xlsx, "Alt Playlist")
df_rainy = pd.read_excel(xlsx, "Rainy Day Playlist")

### Step 2: Data clean up magic


### Step 1: Install SQLite and sqlite3 and create your database

1. Install sqlite and GUI interface for the database by downloading this app: http://sqlitebrowser.org/

1. Install sqlite3 package to manage sqlite databases with python by executing `! pip install sqlite3`

1. Create your database file by executing `! touch DataDay.db`

In [2]:
! touch DataDay.db

### Step 2: Import the library and connect to your database

In [3]:
import sqlite3
conn = sqlite3.connect("DataDay.db")
c = conn.cursor()

In [None]:
Step 3: 