# Using `sqlite_tools` to Query Our SQLite database containing our copy of the NZGD

This notebook demonstrates how to interact with our SQLite database containing our copy of the NZGD. We'll cover:
1.  Connecting to an SQLite database.
2.  Basic SQLite query structure.
3.  Using the functions provided in the `sqlite_tools.query` module to extract specific data.

## 1. Accessing SQLite Database from Python

Python's built-in `sqlite3` module allows us to connect to and interact with SQLite 
databases, so we import it, along with other required libraries.

In [None]:
import sqlite3
from pathlib import Path

from sqlite_tools import query

Next, specify the path to your SQLite database file. **Remember to replace `"/path/to/your/nzgd_database.db"` with the actual path to your database file.**

In [None]:
# Define the path to your SQLite database
# !!! IMPORTANT: Replace this with the actual path to your database file !!!
db_file_path = Path("/path/to/your/nzgd_database.db")

# Establish a database connection
conn = sqlite3.connect(db_file_path)
print(f"Successfully connected to {db_file_path}")

# Note: The connection `conn` will be used in subsequent cells.
# We will close it at the end of the notebook.

## 2. Understanding SQLite Query Structure

SQL (Structured Query Language) is used to communicate with databases. SQLite uses a dialect of SQL. Here are some common clauses:

*   **`SELECT column1, column2, ... FROM table_name`**: This is the most fundamental clause. It retrieves specified columns from a table.
    *   `SELECT * FROM table_name` selects all columns.
*   **`WHERE condition`**: Filters records based on a specific condition. 
    *   Example: `WHERE age > 30` or `WHERE name = 'Alice'`.
*   **`JOIN another_table ON table_name.column_name = another_table.column_name`**: Combines rows from two or more tables based on a related column between them.
    *   Common types: `INNER JOIN` (default), `LEFT JOIN`.
*   **`ORDER BY column_name [ASC|DESC]`**: Sorts the result set by one or more columns, either in ascending (`ASC`, default) or descending (`DESC`) order.
*   **`LIMIT number`**: Restricts the number of rows returned by the query.
*   **`GROUP BY column_name`**: Groups rows that have the same values in specified columns into summary rows. Often used with aggregate functions like `COUNT()`, `MAX()`, `MIN()`, `SUM()`, `AVG()`.

**Example of a more complex query structure:**

```sql
SELECT 
    t1.columnA, 
    t2.columnB, 
    COUNT(t1.id) as item_count
FROM 
    table1 AS t1
INNER JOIN 
    table2 AS t2 ON t1.common_id = t2.common_id
WHERE 
    t1.status = 'active' AND t2.category = 'electronics'
GROUP BY 
    t1.columnA, t2.columnB
ORDER BY 
    item_count DESC
LIMIT 10;
```
This query selects data from `table1` and `table2`, filters it, groups it, counts items, orders the result, and limits it to the top 10.

## 3. Using Functions from `sqlite_tools.query`

The `sqlite_tools` package provides functions to easily extract some data. These functions encapsulate SQL queries.

**Ensure your database connection (`conn`) is active from Cell 2.**

### Example 1: Get measurements for a specific CPT using its NZGD ID

In [None]:
nzgd_id_for_cpt = 1 

if conn:
    cpt_df = query.cpt_measurements_for_one_nzgd(selected_nzgd_id=nzgd_id_for_cpt, conn=conn)
    if not cpt_df.empty:
        display(cpt_df.head())
    else:
        print(f"No CPT measurements found for NZGD ID {nzgd_id_for_cpt}.")
else:
    print("Database connection is not established.")

### Example 2: Get measurements for a specific SPT using its NZGD ID

In [None]:
nzgd_id_for_spt = 14810

if conn:
    spt_df = query.spt_measurements_for_one_nzgd(selected_nzgd_id=nzgd_id_for_spt, conn=conn)
    if not spt_df.empty:
        display(spt_df.head())
    else:
        print(f"No SPT measurements found for NZGD ID {nzgd_id_for_spt}.")
else:
    print("Database connection is not established.")

### Example 3: Get Soil Type measurements for a specific SPT using its NZGD ID

In [None]:
nzgd_id_for_spt_soil = 14810

if conn:
    spt_soil_df = query.spt_soil_types_for_one_nzgd(selected_nzgd_id=nzgd_id_for_spt_soil, conn=conn)
    if not spt_soil_df.empty:
        display(spt_soil_df.head())
    else:
        print(f"No SPT soil types found for NZGD ID {nzgd_id_for_spt_soil}.")
else:
    print("Database connection is not established.")

### Example 4: Get Vs30 estimates for a specific CPT investigation given its NZGD ID

In [None]:
nzgd_id_for_cpt_vs30 = 1

if conn:
    cpt_vs30_df = query.cpt_vs30s_for_one_nzgd_id(selected_nzgd_id=nzgd_id_for_cpt_vs30, conn=conn)
    if not cpt_vs30_df.empty:
        display(cpt_vs30_df.head())
    else:
        print(f"No CPT Vs30s found for NZGD ID {nzgd_id_for_cpt_vs30}.")
else:
    print("Database connection is not established.")

### Example 5: Get Vs30 Estimates for a specific SPT investigation given its NZGD ID

In [None]:
nzgd_id_for_spt_vs30 = 14810

if conn:
    spt_vs30_df = query.spt_vs30s_for_one_nzgd_id(selected_nzgd_id=nzgd_id_for_spt_vs30, conn=conn)
    if not spt_vs30_df.empty:
        display(spt_vs30_df.head())
    else:
        print(f"No SPT Vs30s found for NZGD ID {nzgd_id_for_spt_vs30}.")
else:
    print("Database connection is not established.")

### Example 6: Get All Estimated Vs30s Given Specific Correlations

Please refer to [available_options.md](./available_options.md) to see the available correlation options.



In [None]:
vs30_corr = "boore_2004"
cpt_vs_corr = "andrus_2007_pleistocene"
spt_vs_corr = "brandenberg_2010"
hammer = "Auto"

if conn:

    all_vs30_data_df = query.all_vs30s_given_correlations(
        selected_vs_to_vs30_correlation=vs30_corr,
        selected_cpt_to_vs_correlation=cpt_vs_corr,
        selected_spt_to_vs_correlation=spt_vs_corr,
        selected_hammer_type=hammer,
        conn=conn
    )
    if not all_vs30_data_df.empty:
        display(all_vs30_data_df.head())
    else:
        print("No Vs30 data found for the given correlations.")
else:
    print("Database connection is not established.")

## Closing the Connection

Finally, it's important to close the database connection when you're done with it to free up system resources.

In [None]:
if conn: # if a connection exists
    conn.close()
    print("\nDatabase connection closed.")
else:
    print("No database connection to close.")