# Introduction to SQL
All companies and organizations store their data in some form of a database

### 3 Main Types of Data
1. Structured
   - Tablular(rows/columns)
   - Relational (storing additional information about certain rows/columns in different tables)

2. Semi-Structured Data
    - Key Value Stores (JSON), graph databases
    - attributes may vary from entry to entry.  More flexible, commonly what you get from an API.
    - Semi-structured data can often be faster to pull or write data from than structured databases.

3. Unstructured Data
    - Audio
    - Video (A/V can be structured, but they are generally described as unstructured regardless)
    - Binary Data

### Why use Databases?
- keeps data clean
- ensures data transactions are correct (ACID)
    - atomicity, consistency, isolation, durability
- can store massive amounts of data
- efficient data retrieval
- centralized, secure, robust
- integration availability with other software tools


- **BAD EXAMPLE:** Excel Spreadsheets

### Why SQL?

- SQL is a relational database management system (RDBMS).
    - Relations created via Primary/Foreign keys
    - Save space by using multiple tables
- SQL databases are used to store **structured** data (relational data).
- Use queries to get only the necessary data (unlike many apis!)
- can caluclate summary statistics on groups subsets of data
- mentioned in most data science job postings
    - As a data scientist, focus on data retrieval, not definition, manipulation, control


#### Challenges with SQL
- SQL is a declarative language (unlike python, which is imperative)
    -  order of operations
- Long, nested queries with many variable names
    - vs. imperative programming where good programs break logic up into multiple steps
- Many things happening concurrently in a single SQL statement, order not explicit
    - vs imperatve programming where code executes line-by-line
- Debugging is more difficult thanks to the above =(
    - can help to break down a complex query into steps and test them incrementally.

- **To review fundamentals, check [this](https://www.w3schools.com/sql/) out.**


## Database Schemata

- How many tables should we create?
- What should be stored in each table?
- How are different tables related?

#### Star Schemata
- fact tables vs dimension tables
- <u>fact tables</u>: Contain events, transactions, observations.  Lots of rows, less columns. Updated often
- <u>Dimension Tables</u>: Contain attributes abot a singgle concept in the fact table.  Rarely updatad--fewer rows.
- Fact tables contain foregin keys, dimension tables contain primary keys
- Dimension tables are not related to each other
- Usually filter/group on items in dimension tables.


#### Snowflake Schemata
- Nested version of the Star Schema
    - Dimension tables can contain foreign keys
- Dimension items can be complex concepts
- More tables, but less duplication
- Concepts of normalization
    - Eliminate repeating groups in individual tables
    - Create a separate table for each set of related data
    - Identify each set of related data with a primary key

## RDBMS Landscape
- SQL is a programming language, but there are many different flavours of SQL (versions).
    - basic SQL statements are the same, but different custom statements
- Closed Source (i.e. paid)
    - **Vendors**: Oracle, SQL Server, IBM DB2, Access
    - could come with integrations and services that makes things easier
- Open Source (ie free)
    - **Projects:** MySQL, PostgreSQL, SQLite, MariaDB
    - Good developer community makes these great options

#### SQLite
- SQLite is great to learn sql
    - Dont have to set up a server
- Stored locally as a file
- Not actually used in companies with multiple users (no concurrency)

#### PostgreSQL (Postgres)
- Open source (easy to upgrade and extend)
- High compliance to the SQL standard
- Runs on almost all operating systems
- MySQL would be a good choice (except it's less compliance to the SQL standard
- can run locally or connect remotely

## Installing ipythonSQL

- 

### SQL Demo/Exercises