# Hands-On: SQL Basics

Welcome to your first interactive SQL lab! In this notebook, you'll practice the SELECT statements we learned in the previous chapter using our **Coffee Shop Database**.

**Learning Goals:**

By completing this notebook, you will:
- Create a PostgreSQL database and table
- Import data from a CSV file
- Connect to a PostgreSQL database from Jupyter
- Write and execute SELECT queries
- Filter, sort, and limit results
- Explore real-world data

## Part 0: Database Setup

### Choosing Your Python Kernel

Before running any code, make sure you have a **Python kernel** selected:

**In Jupyter Notebook:**
1. Go to **Kernel** ‚Üí **Change kernel**
2. Select **Python 3** (or your preferred Python environment)

**In JupyterLab:**
1. Click the kernel name in the top-right corner of the notebook
2. Select **Python 3** from the dropdown

**In VS Code:**
1. Click **Select Kernel** in the top-right corner
2. Choose **Python Environments** ‚Üí select your Python installation

You'll know it's working when you see "Python 3" (or similar) displayed in the kernel indicator.

---

### Setup Steps

Before we can query data, we need to:
1. Install required Python packages
2. Load the SQL extension
3. Create our database
4. Connect to the new database
5. Create the transactions table
6. Import data from CSV

### Step 1: Install Required Packages

Run this cell once to install the necessary packages.

**‚ö†Ô∏è After running this cell, restart the kernel** (Kernel ‚Üí Restart) before proceeding!

In [None]:
# Run this cell to install required packages (only needed once)
# Note: We pin prettytable version to avoid a known compatibility bug
!pip install ipython-sql psycopg2-binary sqlalchemy pandas 'prettytable<3.10'

### Step 2: Load the SQL Extension

**‚ö†Ô∏è IMPORTANT: You must run the cells in order!** The cell below loads the SQL extension. If you skip it, you'll get a `SyntaxError` on all the SQL cells.

Run this cell first, then proceed to the database connection.

In [None]:
# Load the SQL extension - YOU MUST RUN THIS CELL FIRST!
%load_ext sql

# Configure SQL magic
%config SqlMagic.displaycon = False
%config SqlMagic.feedback = False

# Fix for prettytable compatibility - set a valid style
%config SqlMagic.style = 'PLAIN_COLUMNS'

print("‚úÖ SQL extension loaded successfully!")

### Step 3: Create the Database

Now we'll connect to PostgreSQL's default database to create our `coffee_shop_db` database.

**Replace `yourpassword` with your actual PostgreSQL password.**

In [None]:
# Connect to the default postgres database first
# Replace 'yourpassword' with your actual PostgreSQL password
%sql postgresql://postgres:yourpassword@localhost/postgres

In [None]:
%%sql
-- Check if our database already exists
SELECT datname FROM pg_database WHERE datname = 'coffee_shop_db';

In [None]:
%%sql
-- Create the database (if it doesn't exist, you'll need to run this)
-- Note: You cannot use IF NOT EXISTS with CREATE DATABASE in PostgreSQL
-- If the database already exists, this will error - that's OK, just skip to the next step
CREATE DATABASE coffee_shop_db;

> **Note:** If you see an error that the database already exists, that's fine! Just continue to the next cell.

### Step 4: Connect to Our New Database

In [None]:
# Now connect to our coffee_shop_db database
%sql postgresql://postgres:yourpassword@localhost/coffee_shop_db

### Step 5: Create the Transactions Table

Now we'll create the table structure to hold our coffee shop data:

In [None]:
%%sql
-- Drop the table if it exists (allows re-running this notebook)
DROP TABLE IF EXISTS transactions;

-- Create the transactions table
CREATE TABLE transactions (
    transaction_id INTEGER,
    transaction_date DATE,
    transaction_time TIME,
    instore_yn VARCHAR(5),
    quantity INTEGER,
    line_item_amount DECIMAL(10,2),
    unit_price DECIMAL(10,2),
    promo_item_yn VARCHAR(5),
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    position VARCHAR(50),
    start_date DATE,
    location INTEGER,
    home_store_bool VARCHAR(5),
    home_store INTEGER,
    customer_first_name VARCHAR(50),
    customer_email VARCHAR(100),
    customer_since DATE,
    loyalty_card_number VARCHAR(20),
    birthdate DATE,
    gender VARCHAR(20),
    birth_year INTEGER,
    product_group VARCHAR(50),
    product_category VARCHAR(50),
    product_type VARCHAR(50),
    product VARCHAR(100),
    product_description TEXT,
    unit_of_measure VARCHAR(20),
    current_wholesale_price DECIMAL(10,2),
    current_retail_price VARCHAR(20),
    tax_exempt_yn VARCHAR(5),
    promo_yn VARCHAR(5),
    new_product_yn VARCHAR(5),
    sales_outlet_type VARCHAR(50),
    store_square_feet INTEGER,
    store_address VARCHAR(100),
    store_city VARCHAR(50),
    store_state_province VARCHAR(50),
    store_telephone VARCHAR(20),
    store_postal_code VARCHAR(10),
    store_longitude DECIMAL(12,8),
    store_latitude DECIMAL(12,8),
    manager INTEGER,
    neighborhood VARCHAR(50)
);

In [None]:
%%sql
-- Verify the table was created
SELECT column_name, data_type 
FROM information_schema.columns 
WHERE table_name = 'transactions'
ORDER BY ordinal_position
LIMIT 10;

### Step 6: Import Data from CSV

Now we'll load the coffee shop data from our CSV file. We'll use Python's pandas library to read the CSV and then insert the data into PostgreSQL.

**Make sure the `coffee_shop_data.csv` file is in the `../data/` folder relative to this notebook.**

In [None]:
import pandas as pd
from sqlalchemy import create_engine

# Read the CSV file
# Adjust the path if your CSV is in a different location
csv_path = '../data/coffee_shop_data.csv'

try:
    df = pd.read_csv(csv_path)
    print(f"Successfully loaded {len(df)} rows from CSV")
    print(f"Columns: {list(df.columns)}")
except FileNotFoundError:
    print(f"Error: Could not find {csv_path}")
    print("Please make sure the CSV file is in the correct location.")

In [None]:
# Preview the data
df.head()

In [None]:
# Clean up column names (replace hyphens with underscores, lowercase)
df.columns = df.columns.str.replace('-', '_').str.lower()
print("Cleaned column names:", list(df.columns))

In [None]:
# Create SQLAlchemy engine for inserting data
# Replace 'yourpassword' with your actual PostgreSQL password
engine = create_engine('postgresql://postgres:yourpassword@localhost/coffee_shop_db')

# Insert data into the transactions table
# Using 'replace' will drop and recreate the table with the DataFrame's structure
# Using 'append' would add to existing data
df.to_sql('transactions', engine, if_exists='replace', index=False)

print(f"Successfully imported {len(df)} rows into the transactions table!")

In [None]:
%%sql
-- Verify the data was imported
SELECT COUNT(*) AS total_rows FROM transactions;

In [None]:
%%sql
-- Preview the imported data
SELECT * FROM transactions LIMIT 5;

---

## ‚úÖ Setup Complete!

You now have:
- A `coffee_shop_db` database
- A `transactions` table with your imported data

Let's start querying!

---

## About the Coffee Shop Data

Our dataset contains transaction records from a coffee shop chain with multiple locations in New York. The `transactions` table includes:

| Column | Description |
|--------|-------------|
| transaction_id | Unique identifier for each sale |
| transaction_date | Date of the transaction |
| transaction_time | Time of the transaction |
| quantity | Number of items sold |
| line_item_amount | Total amount for the line item |
| unit_price | Price per unit |
| product_category | Category (Beverages, Food, etc.) |
| product_type | Type within category |
| product | Specific product name |
| store_city | City where the store is located |
| first_name, last_name | Staff member who made the sale |

## Part 1: Basic SELECT Queries

Let's start with some basic queries to explore the data.

### Example 1: Select All Columns

View the first few rows of our data:

In [None]:
%%sql
SELECT *
FROM transactions
LIMIT 5;

### Example 2: Select Specific Columns

Focus on just the product and price information:

In [None]:
%%sql
SELECT 
    product,
    product_category,
    unit_price
FROM transactions
LIMIT 10;

### üéØ Your Turn: Exercise 1

Write a query to select the `transaction_date`, `product`, `quantity`, and `line_item_amount` columns. Show 10 rows.

In [None]:
%%sql
-- YOUR CODE HERE
-- Write your SELECT statement below



<details>
<summary>üí° Click to see solution</summary>

```sql
SELECT 
    transaction_date,
    product,
    quantity,
    line_item_amount
FROM transactions
LIMIT 10;
```
</details>

## Part 2: Filtering with WHERE

The WHERE clause lets us filter data based on conditions.

### Example 3: Simple Filter

Find all beverages:

In [None]:
%%sql
SELECT product, product_type, unit_price
FROM transactions
WHERE product_category = 'Beverages'
LIMIT 10;

### Example 4: Multiple Conditions (AND)

Find beverages that cost more than $3.50:

In [None]:
%%sql
SELECT product, product_type, unit_price
FROM transactions
WHERE product_category = 'Beverages'
  AND unit_price > 3.50
LIMIT 10;

### Example 5: Using IN for Multiple Values

Find sales from specific cities:

In [None]:
%%sql
SELECT product, store_city, unit_price
FROM transactions
WHERE store_city IN ('New York', 'Brooklyn')
LIMIT 10;

### Example 6: Pattern Matching with LIKE

Find all products with "Latte" in the name:

In [None]:
%%sql
SELECT DISTINCT product, unit_price
FROM transactions
WHERE product LIKE '%Latte%'
ORDER BY unit_price DESC;

### üéØ Your Turn: Exercise 2

Write a query to find all **Food** items that cost **between $2.00 and $4.00**. Show the product name and price.

In [None]:
%%sql
-- YOUR CODE HERE



<details>
<summary>üí° Click to see solution</summary>

```sql
SELECT DISTINCT product, unit_price
FROM transactions
WHERE product_category = 'Food'
  AND unit_price BETWEEN 2.00 AND 4.00;
```
</details>

## Part 3: Sorting with ORDER BY

Let's organize our results!

### Example 7: Sort by Price (Descending)

Find the most expensive items:

In [None]:
%%sql
SELECT DISTINCT 
    product,
    product_category,
    unit_price
FROM transactions
ORDER BY unit_price DESC
LIMIT 10;

### Example 8: Multiple Sort Columns

Sort by category (alphabetically), then by price (highest first):

In [None]:
%%sql
SELECT DISTINCT 
    product_category,
    product,
    unit_price
FROM transactions
ORDER BY product_category ASC, unit_price DESC
LIMIT 15;

### üéØ Your Turn: Exercise 3

Write a query to find the **5 cheapest beverages**. Show the product name, type, and price.

In [None]:
%%sql
-- YOUR CODE HERE



<details>
<summary>üí° Click to see solution</summary>

```sql
SELECT DISTINCT 
    product,
    product_type,
    unit_price
FROM transactions
WHERE product_category = 'Beverages'
ORDER BY unit_price ASC
LIMIT 5;
```
</details>

## Part 4: Finding Unique Values with DISTINCT

DISTINCT removes duplicate rows from our results.

### Example 9: Unique Categories

In [None]:
%%sql
SELECT DISTINCT product_category
FROM transactions
ORDER BY product_category;

### Example 10: Unique Combinations

In [None]:
%%sql
SELECT DISTINCT product_category, product_type
FROM transactions
ORDER BY product_category, product_type;

### üéØ Your Turn: Exercise 4

What unique store cities are in our data? Write a query to find out.

In [None]:
%%sql
-- YOUR CODE HERE



## Part 5: Challenge Exercises

Put it all together with these more complex queries!

### üéØ Challenge 1: Staff Sales

Find the 10 highest-value transactions, showing:
- Staff member name (first and last)
- Product sold
- Line item amount
- Store city

Sort by amount descending.

In [None]:
%%sql
-- YOUR CODE HERE



### üéØ Challenge 2: Morning Coffee Rush

Coffee shops are busiest in the morning! Find all transactions:
- For coffee products (use LIKE to match '%coffee%' or '%Coffee%')
- Where quantity is 2 or more

Show the product, quantity, and time. Sort by quantity descending.

In [None]:
%%sql
-- YOUR CODE HERE



### üéØ Challenge 3: Explore Your Own Questions!

Come up with your own question about the data and write a query to answer it. Some ideas:
- What products are sold in specific neighborhoods?
- What's the price range of different product types?
- Which staff members work at which stores?

In [None]:
%%sql
-- YOUR EXPLORATION HERE
-- Question: 



## Summary

Congratulations! You've practiced:

- ‚úÖ Creating a database and table
- ‚úÖ Importing data from CSV
- ‚úÖ SELECT statements with specific columns
- ‚úÖ Filtering with WHERE (=, >, <, BETWEEN, IN, LIKE)
- ‚úÖ Sorting with ORDER BY (ASC and DESC)
- ‚úÖ Limiting results with LIMIT
- ‚úÖ Finding unique values with DISTINCT

**Next up:** [Creating and Modifying Tables](../chapters/06-creating-tables.md) with DDL statements!

---

## Bonus: Using Pandas with SQL

You can also run SQL queries and get results as a pandas DataFrame for further analysis:

In [None]:
import pandas as pd

# Run a query and store results in a DataFrame
result = %sql SELECT product_category, product, unit_price FROM transactions LIMIT 20
df = result.DataFrame()

# Now you can use pandas!
df.describe()

In [None]:
# Quick visualization
df.groupby('product_category')['unit_price'].mean().plot(kind='bar', title='Average Price by Category')

---

## Alternative: Using COPY Command (Advanced)

If you prefer to use PostgreSQL's native COPY command (faster for large files), you can use this approach instead of pandas. Run this in DBeaver or psql:

```sql
-- First create the table (as shown above), then:
COPY transactions FROM '/path/to/coffee_shop_data.csv' 
WITH (FORMAT CSV, HEADER true);
```

Note: The file path must be accessible to the PostgreSQL server.