# Using `sqlite_tools` to Query NZGD-style Databases

This notebook demonstrates how to interact with an NZGD-style SQLite database using Python. We'll cover:
1.  Connecting to an SQLite database.
2.  Basic SQLite query structure.
3.  Executing simple example queries.
4.  Using the functions provided in the `sqlite_tools.query_sqlite_db` module to extract specific geotechnical datasets.

## 1. Accessing SQLite Database from Python

Python's built-in `sqlite3` module allows us to connect to and interact with SQLite databases. 

First, you need to import the necessary libraries: `sqlite3` for database interaction, `pathlib` for handling file paths, and `pandas` for working with data in DataFrames. If you plan to use the `sqlite_tools` package, you'll import its functions as well.

In [None]:
import sqlite3
from pathlib import Path

import pandas as pd

# Import the query functions from your sqlite_tools package
# This assumes your package is installed or your PYTHONPATH is set up correctly.
# If running this notebook from within the sqlite_tools project root, 
# you might need to adjust the import path or ensure the package is installed in editable mode (pip install -e .)
from sqlite_tools import query

Next, specify the path to your SQLite database file. **Remember to replace `"/path/to/your/nzgd_database.db"` with the actual path to your database file.**

In [None]:
# Define the path to your SQLite database
# !!! IMPORTANT: Replace this with the actual path to your database file !!!
db_file_path = Path("/path/to/your/nzgd_database.db")

# Establish a database connection
# It's good practice to use a try/except/finally block to ensure the connection is closed.
conn = None  # Initialize conn to None
try:
    if not db_file_path.exists():
        print(f"Database file not found at: {db_file_path}")
        print("Please update the `db_file_path` variable in the cell above.")
        # In a real script, you might raise FileNotFoundError or exit
    else:
        conn = sqlite3.connect(db_file_path)
        print(f"Successfully connected to {db_file_path}")

except sqlite3.Error as e:
    print(f"Database error: {e}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

# Note: The connection `conn` will be used in subsequent cells.
# We will close it at the end of the notebook.

## 2. Understanding SQLite Query Structure

SQL (Structured Query Language) is used to communicate with databases. SQLite uses a dialect of SQL. Here are some common clauses:

*   **`SELECT column1, column2, ... FROM table_name`**: This is the most fundamental clause. It retrieves specified columns from a table.
    *   `SELECT * FROM table_name` selects all columns.
*   **`WHERE condition`**: Filters records based on a specific condition. 
    *   Example: `WHERE age > 30` or `WHERE name = 'Alice'`.
*   **`JOIN another_table ON table_name.column_name = another_table.column_name`**: Combines rows from two or more tables based on a related column between them.
    *   Common types: `INNER JOIN` (default), `LEFT JOIN`.
*   **`ORDER BY column_name [ASC|DESC]`**: Sorts the result set by one or more columns, either in ascending (`ASC`, default) or descending (`DESC`) order.
*   **`LIMIT number`**: Restricts the number of rows returned by the query.
*   **`GROUP BY column_name`**: Groups rows that have the same values in specified columns into summary rows. Often used with aggregate functions like `COUNT()`, `MAX()`, `MIN()`, `SUM()`, `AVG()`.

**Example of a more complex query structure:**

```sql
SELECT 
    t1.columnA, 
    t2.columnB, 
    COUNT(t1.id) as item_count
FROM 
    table1 AS t1
INNER JOIN 
    table2 AS t2 ON t1.common_id = t2.common_id
WHERE 
    t1.status = 'active' AND t2.category = 'electronics'
GROUP BY 
    t1.columnA, t2.columnB
ORDER BY 
    item_count DESC
LIMIT 10;
```
This query selects data from `table1` and `table2`, filters it, groups it, counts items, orders the result, and limits it to the top 10.

## 3. Simple Example Queries

Let's run some simple queries directly using `pandas.read_sql_query()` and our active connection `conn`.

**Make sure you have successfully connected to your database in Cell 2 before running these examples.**

First, let's see a list of all tables in the database.

In [None]:
if conn:
    try:
        # Query to get all table names from the SQLite master table
        tables_df = pd.read_sql_query("SELECT name FROM sqlite_master WHERE type='table';", conn)
        print("Tables in the database:")
        print(tables_df)
    except Exception as e:
        print(f"An error occurred: {e}")
else:
    print("Database connection is not established. Please check the connection cell (Cell 2).")

Now, let's try to select a few rows from a specific table. You'll need to **replace `'your_table_name'` with an actual table name** from the list printed above. For example, if you have a table named `nzgdrecord`, you would use that.

In [None]:
# !!! IMPORTANT: Replace 'your_table_name' with an actual table name from your database !!!
table_to_query = 'nzgdrecord' # Example: use 'cptreport' or 'sptreport' or another table

if conn:
    try:
        print(f"\n--- First 5 rows from '{table_to_query}' table ---")
        # Query to select all columns for the first 5 rows from the specified table
        query = f"SELECT * FROM {table_to_query} LIMIT 5;"
        sample_data_df = pd.read_sql_query(query, conn)
        display(sample_data_df) # display() is a Jupyter function for rich display
    except pd.io.sql.DatabaseError as e:
        print(f"Database error (perhaps '{table_to_query}' doesn't exist or is misspelled?): {e}")
    except Exception as e:
        print(f"An error occurred: {e}")
else:
    print("Database connection is not established. Please check the connection cell (Cell 2).")

Let's try a query with a `WHERE` clause. We'll query the `nzgdrecord` table for records with a specific `nzgd_id`. 
**Replace `123` with an `nzgd_id` that you expect to be in your database.**

In [None]:
# !!! IMPORTANT: Replace 123 with an nzgd_id you know exists in your database !!!
example_nzgd_id = 1 # Try an ID like 1, 2, 3, etc.

if conn:
    try:
        print(f"\n--- Record from 'nzgdrecord' with nzgd_id = {example_nzgd_id} ---")
        query = "SELECT * FROM nzgdrecord WHERE nzgd_id = ?;"
        # Using a parameterized query (? placeholder) is safer against SQL injection
        record_df = pd.read_sql_query(query, conn, params=(example_nzgd_id,))
        if not record_df.empty:
            display(record_df)
        else:
            print(f"No record found with nzgd_id = {example_nzgd_id}.")
    except Exception as e:
        print(f"An error occurred: {e}")
else:
    print("Database connection is not established. Please check the connection cell (Cell 2).")

## 4. Using Functions from `sqlite_tools.query_sqlite_db`

The `sqlite_tools` package provides pre-built functions to easily extract common datasets. These functions encapsulate more complex SQL queries.

**Ensure your database connection (`conn`) is active from Cell 2.**

### Example 1: Get CPT Measurements for a specific NZGD ID

In [None]:
# !!! IMPORTANT: Replace 1 with an nzgd_id for which you expect CPT data !!!
nzgd_id_for_cpt = 1 

if conn:
    try:
        print(f"\n--- CPT Measurements (NZGD ID {nzgd_id_for_cpt}) ---")
        cpt_df = query.cpt_measurements_for_one_nzgd(selected_nzgd_id=nzgd_id_for_cpt, conn=conn)
        if not cpt_df.empty:
            display(cpt_df.head())
        else:
            print(f"No CPT measurements found for NZGD ID {nzgd_id_for_cpt}.")
    except Exception as e:
        print(f"An error occurred while fetching CPT measurements: {e}")
else:
    print("Database connection is not established.")

### Example 2: Get SPT Measurements for a specific NZGD ID

In [None]:
# !!! IMPORTANT: Replace 2 with an nzgd_id for which you expect SPT data !!!
nzgd_id_for_spt = 2 

if conn:
    try:
        print(f"\n--- SPT Measurements (NZGD ID {nzgd_id_for_spt}) ---")
        spt_df = query.spt_measurements_for_one_nzgd(selected_nzgd_id=nzgd_id_for_spt, conn=conn)
        if not spt_df.empty:
            display(spt_df.head())
        else:
            print(f"No SPT measurements found for NZGD ID {nzgd_id_for_spt}.")
    except Exception as e:
        print(f"An error occurred while fetching SPT measurements: {e}")
else:
    print("Database connection is not established.")

### Example 3: Get SPT Soil Types for a specific NZGD ID

In [None]:
# !!! IMPORTANT: Replace 3 with an nzgd_id for which you expect SPT soil type data !!!
nzgd_id_for_spt_soil = 3

if conn:
    try:
        print(f"\n--- SPT Soil Types (NZGD ID {nzgd_id_for_spt_soil}) ---")
        spt_soil_df = query.spt_soil_types_for_one_nzgd(selected_nzgd_id=nzgd_id_for_spt_soil, conn=conn)
        if not spt_soil_df.empty:
            display(spt_soil_df.head())
        else:
            print(f"No SPT soil types found for NZGD ID {nzgd_id_for_spt_soil}.")
    except Exception as e:
        print(f"An error occurred while fetching SPT soil types: {e}")
else:
    print("Database connection is not established.")

### Example 4: Get CPT Vs30 Estimates for a specific NZGD ID

In [None]:
# !!! IMPORTANT: Replace 4 with an nzgd_id for which you expect CPT Vs30 data !!!
nzgd_id_for_cpt_vs30 = 4

if conn:
    try:
        print(f"\n--- CPT Vs30s (NZGD ID {nzgd_id_for_cpt_vs30}) ---")
        cpt_vs30_df = query.cpt_vs30s_for_one_nzgd_id(selected_nzgd_id=nzgd_id_for_cpt_vs30, conn=conn)
        if not cpt_vs30_df.empty:
            display(cpt_vs30_df.head())
        else:
            print(f"No CPT Vs30s found for NZGD ID {nzgd_id_for_cpt_vs30}.")
    except Exception as e:
        print(f"An error occurred while fetching CPT Vs30s: {e}")
else:
    print("Database connection is not established.")

### Example 5: Get SPT Vs30 Estimates for a specific NZGD ID

In [None]:
# !!! IMPORTANT: Replace 5 with an nzgd_id for which you expect SPT Vs30 data !!!
nzgd_id_for_spt_vs30 = 5

if conn:
    try:
        print(f"\n--- SPT Vs30s (NZGD ID {nzgd_id_for_spt_vs30}) ---")
        spt_vs30_df = query.spt_vs30s_for_one_nzgd_id(selected_nzgd_id=nzgd_id_for_spt_vs30, conn=conn)
        if not spt_vs30_df.empty:
            display(spt_vs30_df.head())
        else:
            print(f"No SPT Vs30s found for NZGD ID {nzgd_id_for_spt_vs30}.")
    except Exception as e:
        print(f"An error occurred while fetching SPT Vs30s: {e}")
else:
    print("Database connection is not established.")

### Example 6: Get All Vs30s Given Specific Correlations

This function is more complex as it requires specific correlation names that exist in your database tables (`vstovs30correlation`, `cpttovscorrelation`, `spttovscorrelation`, `spttovs30hammertype`). 

**You will likely need to inspect your database to find valid names for these parameters.** The example values below are placeholders.

In [None]:
# !!! IMPORTANT: Replace these with actual correlation and hammer type names from your database !!!
vs30_corr = "McGann, Bradley, Cubrinovski et al. (2015)"  # Check your vstovs30correlation table
cpt_vs_corr = "Robertson (2009)"  # Check your cpttovscorrelation table
spt_vs_corr = "Ohta & Goto (1978)"    # Check your spttovscorrelation table
hammer = "Safety Hammer"         # Check your spttovs30hammertype table (e.g., "Donut Hammer", "Unknown")

if conn:
    try:
        print("\n--- All Vs30s (Specific Correlations) ---")
        all_vs30_data_df = query.all_vs30s_given_correlations(
            selected_vs30_correlation=vs30_corr,
            selected_cpt_to_vs_correlation=cpt_vs_corr,
            selected_spt_to_vs_correlation=spt_vs_corr,
            selected_hammer_type=hammer,
            conn=conn
        )
        if not all_vs30_data_df.empty:
            display(all_vs30_data_df.head())
        else:
            print("No Vs30 data found for the given correlation combination. \n"
                  "Please check that the correlation names and hammer type exist in the database \n"
                  "and that there are precomputed Vs30 estimates for this combination.")
    except ValueError as e:
        print(f"ValueError: {e}. This often means one of the correlation names was not found in the database tables.")
        print("Please check the names in `vstovs30correlation`, `cpttovscorrelation`, `spttovscorrelation`, and `spttovs30hammertype` tables.")
    except Exception as e:
        print(f"An error occurred while fetching all Vs30s: {e}")
else:
    print("Database connection is not established.")

## Closing the Connection

Finally, it's important to close the database connection when you're done with it to free up resources.

In [None]:
if conn:
    try:
        conn.close()
        print("\nDatabase connection closed.")
        conn = None # Set to None to indicate it's closed
    except Exception as e:
        print(f"Error closing connection: {e}")