# SQL Fundamentals - DDL, DML, DQL, DCL

SQL (Structured Query Language) is the standard language for managing and manipulating relational databases. SQL commands are categorized into four main types:

| Category | Full Name | Purpose | Key Commands |
|----------|-----------|---------|-------------|
| **DDL** | Data Definition Language | Define database structure | CREATE, ALTER, DROP, TRUNCATE |
| **DML** | Data Manipulation Language | Manipulate data | INSERT, UPDATE, DELETE |
| **DQL** | Data Query Language | Query/retrieve data | SELECT |
| **DCL** | Data Control Language | Control access permissions | GRANT, REVOKE |

## Setup: SQLite3 Connection

We'll use SQLite3, Python's built-in database engine, to demonstrate SQL fundamentals.

In [None]:
import sqlite3
from datetime import datetime

# Create an in-memory database for demonstration
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()

# Helper function to display query results
def execute_and_display(query, params=None, fetch=True):
    """Execute SQL query and display results."""
    print(f"SQL: {query}")
    print("-" * 60)
    if params:
        cursor.execute(query, params)
    else:
        cursor.execute(query)
    
    if fetch and query.strip().upper().startswith('SELECT'):
        columns = [desc[0] for desc in cursor.description]
        rows = cursor.fetchall()
        print(f"Columns: {columns}")
        for row in rows:
            print(row)
        print(f"\nRows returned: {len(rows)}")
    else:
        conn.commit()
        print(f"Rows affected: {cursor.rowcount}")
    print()

---
## 1. DDL - Data Definition Language

DDL commands define the structure of the database. They create, modify, and delete database objects like tables, indexes, and constraints.

### Key DDL Commands:
- **CREATE** - Create new database objects (tables, indexes, views)
- **ALTER** - Modify existing database objects
- **DROP** - Delete database objects
- **TRUNCATE** - Remove all records from a table (DDL because it resets structure)

In [None]:
# CREATE TABLE - Define table structure with constraints
create_departments = """
CREATE TABLE departments (
    dept_id INTEGER PRIMARY KEY AUTOINCREMENT,
    dept_name TEXT NOT NULL UNIQUE,
    location TEXT DEFAULT 'Headquarters',
    budget REAL CHECK(budget >= 0)
);
"""

create_employees = """
CREATE TABLE employees (
    emp_id INTEGER PRIMARY KEY AUTOINCREMENT,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL,
    email TEXT UNIQUE,
    hire_date TEXT DEFAULT CURRENT_DATE,
    salary REAL CHECK(salary > 0),
    dept_id INTEGER,
    manager_id INTEGER,
    FOREIGN KEY (dept_id) REFERENCES departments(dept_id),
    FOREIGN KEY (manager_id) REFERENCES employees(emp_id)
);
"""

cursor.execute(create_departments)
cursor.execute(create_employees)
conn.commit()
print("Tables created successfully!")

# Verify table creation
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
print(f"Tables in database: {cursor.fetchall()}")

In [None]:
# ALTER TABLE - Add a new column
alter_query = "ALTER TABLE employees ADD COLUMN phone TEXT;"
cursor.execute(alter_query)
conn.commit()
print("Column 'phone' added to employees table.")

# Check table schema
cursor.execute("PRAGMA table_info(employees);")
print("\nEmployees table schema:")
for col in cursor.fetchall():
    print(f"  {col[1]} ({col[2]}) - Nullable: {not col[3]}, Default: {col[4]}")

In [None]:
# CREATE INDEX - Improve query performance
cursor.execute("CREATE INDEX idx_emp_lastname ON employees(last_name);")
cursor.execute("CREATE INDEX idx_emp_dept ON employees(dept_id);")
conn.commit()
print("Indexes created for faster lookups.")

# List indexes
cursor.execute("SELECT name FROM sqlite_master WHERE type='index';")
print(f"Indexes: {cursor.fetchall()}")

---
## 2. DML - Data Manipulation Language

DML commands are used to manipulate data within existing tables.

### Key DML Commands:
- **INSERT** - Add new records to a table
- **UPDATE** - Modify existing records
- **DELETE** - Remove records from a table

In [None]:
# INSERT - Single record
insert_dept = """
INSERT INTO departments (dept_name, location, budget)
VALUES ('Engineering', 'Building A', 500000.00);
"""
execute_and_display(insert_dept)

In [None]:
# INSERT - Multiple records using executemany
departments_data = [
    ('Sales', 'Building B', 300000.00),
    ('Marketing', 'Building C', 250000.00),
    ('Human Resources', 'Building A', 150000.00),
    ('Finance', 'Building D', 400000.00),
]

cursor.executemany(
    "INSERT INTO departments (dept_name, location, budget) VALUES (?, ?, ?)",
    departments_data
)
conn.commit()
print(f"Inserted {cursor.rowcount} departments.")

# Verify insertions
execute_and_display("SELECT * FROM departments;")

In [None]:
# INSERT - Employee records with foreign key relationships
employees_data = [
    ('Alice', 'Johnson', 'alice.j@company.com', '2020-01-15', 95000.00, 1, None),
    ('Bob', 'Smith', 'bob.s@company.com', '2019-06-20', 85000.00, 1, 1),
    ('Carol', 'Williams', 'carol.w@company.com', '2021-03-10', 78000.00, 1, 1),
    ('David', 'Brown', 'david.b@company.com', '2018-11-05', 92000.00, 2, None),
    ('Eva', 'Davis', 'eva.d@company.com', '2020-07-22', 72000.00, 2, 4),
    ('Frank', 'Miller', 'frank.m@company.com', '2022-02-14', 68000.00, 3, None),
    ('Grace', 'Wilson', 'grace.w@company.com', '2021-09-01', 88000.00, 4, None),
    ('Henry', 'Moore', 'henry.m@company.com', '2020-05-18', 76000.00, 5, None),
    ('Ivy', 'Taylor', 'ivy.t@company.com', '2023-01-10', 65000.00, 1, 2),
    ('Jack', 'Anderson', 'jack.a@company.com', '2022-08-30', 71000.00, 2, 4),
]

cursor.executemany("""
    INSERT INTO employees (first_name, last_name, email, hire_date, salary, dept_id, manager_id)
    VALUES (?, ?, ?, ?, ?, ?, ?)
""", employees_data)
conn.commit()
print(f"Inserted {len(employees_data)} employees.")

In [None]:
# UPDATE - Modify existing records
print("Before UPDATE:")
execute_and_display("SELECT emp_id, first_name, last_name, salary FROM employees WHERE emp_id = 3;")

# Give Carol a raise
update_query = """
UPDATE employees 
SET salary = salary * 1.10
WHERE emp_id = 3;
"""
execute_and_display(update_query)

print("After UPDATE (10% raise):")
execute_and_display("SELECT emp_id, first_name, last_name, salary FROM employees WHERE emp_id = 3;")

In [None]:
# UPDATE - Bulk update with conditions
bulk_update = """
UPDATE employees
SET salary = salary * 1.05
WHERE hire_date < '2020-01-01';
"""
execute_and_display(bulk_update)
print("5% raise applied to employees hired before 2020.")

In [None]:
# DELETE - Remove specific records
# First, add a temporary employee to delete
cursor.execute("""
    INSERT INTO employees (first_name, last_name, email, salary, dept_id)
    VALUES ('Temp', 'Worker', 'temp@company.com', 50000.00, 1)
""")
conn.commit()

print("Before DELETE:")
execute_and_display("SELECT emp_id, first_name, last_name FROM employees WHERE first_name = 'Temp';")

# Delete the temporary employee
delete_query = "DELETE FROM employees WHERE first_name = 'Temp';"
execute_and_display(delete_query)

print("After DELETE:")
execute_and_display("SELECT emp_id, first_name, last_name FROM employees WHERE first_name = 'Temp';")

---
## 3. DQL - Data Query Language

DQL is focused on querying and retrieving data from the database.

### Key DQL Command:
- **SELECT** - Retrieve data from one or more tables

### SELECT Clauses (in order of execution):
1. `FROM` - Source table(s)
2. `WHERE` - Filter rows before grouping
3. `GROUP BY` - Group rows
4. `HAVING` - Filter groups
5. `SELECT` - Choose columns
6. `ORDER BY` - Sort results
7. `LIMIT` - Restrict number of rows

In [None]:
# SELECT - Basic query
execute_and_display("SELECT * FROM employees;")

In [None]:
# SELECT - Specific columns with aliases
query = """
SELECT 
    emp_id AS "Employee ID",
    first_name || ' ' || last_name AS "Full Name",
    salary AS "Annual Salary",
    ROUND(salary / 12, 2) AS "Monthly Salary"
FROM employees
ORDER BY salary DESC;
"""
execute_and_display(query)

In [None]:
# SELECT - WHERE clause with multiple conditions
query = """
SELECT first_name, last_name, salary, hire_date
FROM employees
WHERE salary > 75000
  AND hire_date >= '2020-01-01'
ORDER BY salary DESC;
"""
execute_and_display(query)

In [None]:
# SELECT - Pattern matching with LIKE
query = """
SELECT first_name, last_name, email
FROM employees
WHERE email LIKE '%@company.com'
  AND last_name LIKE '%son%';
"""
execute_and_display(query)

In [None]:
# SELECT - IN operator
query = """
SELECT first_name, last_name, dept_id
FROM employees
WHERE dept_id IN (1, 2, 3);
"""
execute_and_display(query)

In [None]:
# SELECT - BETWEEN operator
query = """
SELECT first_name, last_name, salary
FROM employees
WHERE salary BETWEEN 70000 AND 90000
ORDER BY salary;
"""
execute_and_display(query)

### 3.1 Aggregation Functions

SQL provides built-in functions to perform calculations on groups of rows:

| Function | Description |
|----------|-------------|
| `COUNT()` | Count number of rows |
| `SUM()` | Sum of values |
| `AVG()` | Average of values |
| `MIN()` | Minimum value |
| `MAX()` | Maximum value |

In [None]:
# Aggregation - Basic statistics
query = """
SELECT 
    COUNT(*) AS total_employees,
    ROUND(AVG(salary), 2) AS avg_salary,
    MIN(salary) AS min_salary,
    MAX(salary) AS max_salary,
    ROUND(SUM(salary), 2) AS total_payroll
FROM employees;
"""
execute_and_display(query)

In [None]:
# Aggregation - GROUP BY
query = """
SELECT 
    dept_id,
    COUNT(*) AS employee_count,
    ROUND(AVG(salary), 2) AS avg_salary,
    ROUND(SUM(salary), 2) AS dept_payroll
FROM employees
GROUP BY dept_id
ORDER BY avg_salary DESC;
"""
execute_and_display(query)

In [None]:
# Aggregation - HAVING (filter on aggregated results)
query = """
SELECT 
    dept_id,
    COUNT(*) AS employee_count,
    ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY dept_id
HAVING COUNT(*) > 1
ORDER BY employee_count DESC;
"""
execute_and_display(query)

### 3.2 JOIN Operations

JOINs combine rows from two or more tables based on related columns.

| JOIN Type | Description |
|-----------|-------------|
| `INNER JOIN` | Returns matching rows from both tables |
| `LEFT JOIN` | Returns all rows from left table + matching from right |
| `RIGHT JOIN` | Returns all rows from right table + matching from left |
| `FULL OUTER JOIN` | Returns all rows when there's a match in either table |
| `CROSS JOIN` | Cartesian product of both tables |
| `SELF JOIN` | Table joined with itself |

In [None]:
# INNER JOIN - Employees with their department names
query = """
SELECT 
    e.first_name || ' ' || e.last_name AS employee_name,
    d.dept_name,
    e.salary
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
ORDER BY d.dept_name, e.salary DESC;
"""
execute_and_display(query)

In [None]:
# LEFT JOIN - All departments with employee counts (including those with no employees)
query = """
SELECT 
    d.dept_name,
    d.budget,
    COUNT(e.emp_id) AS employee_count,
    COALESCE(ROUND(SUM(e.salary), 2), 0) AS total_salaries
FROM departments d
LEFT JOIN employees e ON d.dept_id = e.dept_id
GROUP BY d.dept_id, d.dept_name, d.budget
ORDER BY employee_count DESC;
"""
execute_and_display(query)

In [None]:
# SELF JOIN - Employees with their manager's name
query = """
SELECT 
    e.first_name || ' ' || e.last_name AS employee,
    COALESCE(m.first_name || ' ' || m.last_name, 'No Manager') AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.emp_id
ORDER BY manager, employee;
"""
execute_and_display(query)

In [None]:
# Multiple JOINs - Comprehensive employee report
query = """
SELECT 
    e.emp_id,
    e.first_name || ' ' || e.last_name AS employee,
    d.dept_name AS department,
    COALESCE(m.first_name || ' ' || m.last_name, 'No Manager') AS manager,
    e.salary,
    e.hire_date
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id
LEFT JOIN employees m ON e.manager_id = m.emp_id
ORDER BY d.dept_name, e.salary DESC;
"""
execute_and_display(query)

### 3.3 Subqueries

Subqueries (nested queries) are queries within queries. They can be used in SELECT, FROM, WHERE, and HAVING clauses.

In [None]:
# Subquery in WHERE - Employees earning above average
query = """
SELECT first_name, last_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees)
ORDER BY salary DESC;
"""
execute_and_display(query)

In [None]:
# Subquery with IN - Employees in departments with budget > 300000
query = """
SELECT first_name, last_name, dept_id
FROM employees
WHERE dept_id IN (
    SELECT dept_id 
    FROM departments 
    WHERE budget > 300000
);
"""
execute_and_display(query)

In [None]:
# Correlated Subquery - Employees earning more than their department average
query = """
SELECT 
    e.first_name || ' ' || e.last_name AS employee,
    e.salary,
    e.dept_id
FROM employees e
WHERE e.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.dept_id = e.dept_id
)
ORDER BY e.dept_id, e.salary DESC;
"""
execute_and_display(query)

---
## 4. DCL - Data Control Language

DCL commands manage user permissions and access control. These are crucial for database security.

### Key DCL Commands:
- **GRANT** - Give privileges to users
- **REVOKE** - Remove privileges from users

> **Note:** SQLite doesn't support GRANT/REVOKE as it's a file-based database. Below are examples of how these work in production databases like PostgreSQL or MySQL.

In [None]:
# DCL Examples (Not executable in SQLite - for illustration only)
dcl_examples = """
-- GRANT examples (PostgreSQL/MySQL syntax)

-- Grant SELECT permission on employees table to user 'analyst'
GRANT SELECT ON employees TO analyst;

-- Grant multiple permissions to a user
GRANT SELECT, INSERT, UPDATE ON employees TO hr_manager;

-- Grant all permissions on a table
GRANT ALL PRIVILEGES ON employees TO admin;

-- Grant permission to all tables in a schema
GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly_user;

-- Grant with option to re-grant to others
GRANT SELECT ON employees TO team_lead WITH GRANT OPTION;

-- REVOKE examples

-- Revoke SELECT permission
REVOKE SELECT ON employees FROM analyst;

-- Revoke all permissions
REVOKE ALL PRIVILEGES ON employees FROM former_employee;

-- Revoke with cascade (also revokes from users who got permissions through this user)
REVOKE SELECT ON employees FROM team_lead CASCADE;
"""

print("DCL Commands Reference:")
print(dcl_examples)

### Common Privilege Types

| Privilege | Description |
|-----------|-------------|
| `SELECT` | Read data from table |
| `INSERT` | Add new rows |
| `UPDATE` | Modify existing rows |
| `DELETE` | Remove rows |
| `TRUNCATE` | Remove all rows |
| `REFERENCES` | Create foreign key constraints |
| `TRIGGER` | Create triggers on table |
| `CREATE` | Create new database objects |
| `ALL PRIVILEGES` | All available permissions |

---
## 5. Advanced SQL Concepts

In [None]:
# CASE expression - Conditional logic in SELECT
query = """
SELECT 
    first_name || ' ' || last_name AS employee,
    salary,
    CASE 
        WHEN salary >= 90000 THEN 'Senior'
        WHEN salary >= 75000 THEN 'Mid-Level'
        WHEN salary >= 65000 THEN 'Junior'
        ELSE 'Entry'
    END AS level
FROM employees
ORDER BY salary DESC;
"""
execute_and_display(query)

In [None]:
# Window Functions (available in SQLite 3.25+)
query = """
SELECT 
    first_name || ' ' || last_name AS employee,
    dept_id,
    salary,
    RANK() OVER (ORDER BY salary DESC) AS overall_rank,
    RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_rank,
    ROUND(AVG(salary) OVER (PARTITION BY dept_id), 2) AS dept_avg_salary
FROM employees
ORDER BY dept_id, salary DESC;
"""
execute_and_display(query)

In [None]:
# Common Table Expression (CTE) - WITH clause
query = """
WITH dept_stats AS (
    SELECT 
        dept_id,
        AVG(salary) AS avg_salary,
        COUNT(*) AS emp_count
    FROM employees
    GROUP BY dept_id
),
high_earners AS (
    SELECT * FROM employees WHERE salary > 80000
)
SELECT 
    d.dept_name,
    ROUND(ds.avg_salary, 2) AS avg_salary,
    ds.emp_count,
    COUNT(he.emp_id) AS high_earner_count
FROM departments d
JOIN dept_stats ds ON d.dept_id = ds.dept_id
LEFT JOIN high_earners he ON d.dept_id = he.dept_id
GROUP BY d.dept_id, d.dept_name, ds.avg_salary, ds.emp_count
ORDER BY avg_salary DESC;
"""
execute_and_display(query)

In [None]:
# UNION - Combine results from multiple queries
query = """
SELECT 'High Salary' AS category, first_name, last_name, salary
FROM employees
WHERE salary >= 90000

UNION ALL

SELECT 'Recent Hire' AS category, first_name, last_name, salary
FROM employees
WHERE hire_date >= '2022-01-01'

ORDER BY category, salary DESC;
"""
execute_and_display(query)

In [None]:
# CREATE VIEW - Virtual table based on query
cursor.execute("""
CREATE VIEW employee_summary AS
SELECT 
    e.emp_id,
    e.first_name || ' ' || e.last_name AS full_name,
    d.dept_name,
    e.salary,
    e.hire_date
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.dept_id;
""")
conn.commit()
print("View 'employee_summary' created.")

# Query the view
execute_and_display("SELECT * FROM employee_summary ORDER BY salary DESC LIMIT 5;")

---
## Cleanup

In [None]:
# Close the database connection
conn.close()
print("Database connection closed.")

---
## ðŸŽ¯ Key Takeaways

### SQL Categories Summary

| Category | Purpose | Commands | Characteristics |
|----------|---------|----------|----------------|
| **DDL** | Structure | CREATE, ALTER, DROP | Auto-commit, affects schema |
| **DML** | Data | INSERT, UPDATE, DELETE | Transactional, affects data |
| **DQL** | Query | SELECT | Read-only, retrieves data |
| **DCL** | Access | GRANT, REVOKE | Security, permissions |

### Best Practices

1. **Use Parameterized Queries**: Prevent SQL injection by using `?` placeholders
   ```python
   cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))
   ```

2. **Proper Indexing**: Create indexes on frequently queried columns
   ```sql
   CREATE INDEX idx_column ON table(column);
   ```

3. **Use Transactions**: Wrap related operations in transactions
   ```python
   try:
       conn.execute("BEGIN TRANSACTION")
       # ... operations ...
       conn.commit()
   except:
       conn.rollback()
   ```

4. **Limit Result Sets**: Use LIMIT to prevent memory issues
   ```sql
   SELECT * FROM large_table LIMIT 1000;
   ```

5. **Use Explicit JOINs**: Prefer `JOIN ... ON` over implicit joins in WHERE

6. **Apply Least Privilege**: Grant only necessary permissions to users

### Performance Tips

- Use `EXPLAIN QUERY PLAN` to analyze query execution
- Avoid `SELECT *` in production - specify needed columns
- Use `EXISTS` instead of `IN` for subqueries with large datasets
- Consider denormalization for read-heavy workloads
- Batch INSERT/UPDATE operations for bulk data

### Common Gotchas

- `NULL` comparisons require `IS NULL` / `IS NOT NULL`
- `GROUP BY` must include all non-aggregated SELECT columns
- `HAVING` filters groups (after aggregation), `WHERE` filters rows (before)
- `DISTINCT` can be expensive on large datasets
- Foreign key constraints are disabled by default in SQLite