# Schema Manager Test Notebook

This notebook tests the functionality of the SchemaManager class, which is responsible for loading and managing database schemas for the Text-to-SQL system.

## 1. Setup and Imports

In [1]:
import os
import sys
import json
import sqlite3
import time

# Add parent directory to path to import core utils
sys.path.append('..')
from core.utils import load_json_file

# Import the SchemaManager class
from schema_manager import SchemaManager

# Define paths
DATA_PATH = "../data/bird"  # Path to the BIRD dataset
DEV_DB_DIRECTORY = os.path.join(DATA_PATH, "dev_databases")  # Database files are here
TABLES_JSON_PATH = os.path.join(DATA_PATH, "dev_tables.json")  # Path to tables.json

# Verify that the required files and directories exist
print("Checking file paths and database structure...")
if os.path.exists(DATA_PATH):
    print(f"✅ Found data path: {DATA_PATH}")
else:
    print(f"❌ Data path not found: {DATA_PATH}")
    
if os.path.exists(TABLES_JSON_PATH):
    print(f"✅ Found tables.json at {TABLES_JSON_PATH}")
else:
    print(f"❌ tables.json not found at {TABLES_JSON_PATH}")
    
# Check the number of database entries in tables.json
tables_data = load_json_file(TABLES_JSON_PATH)
print(f"Number of database entries in tables.json: {len(tables_data)}")

# Display some sample database IDs
sample_db_ids = [entry["db_id"] for entry in tables_data[:5]]
print(f"Sample database IDs: {sample_db_ids}")

Checking file paths and database structure...
✅ Found data path: ../data/bird
✅ Found tables.json at ../data/bird/dev_tables.json
load json file from ../data/bird/dev_tables.json
Number of database entries in tables.json: 11
Sample database IDs: ['debit_card_specializing', 'financial', 'formula_1', 'california_schools', 'card_games']


## 2. Initialize the SchemaManager

In [2]:
try:
    # Initialize the Schema Manager
    schema_manager = SchemaManager(os.path.join(DATA_PATH, "dev_databases"), TABLES_JSON_PATH)
    print("✅ Schema Manager initialized successfully")
    
    # Print the number of database schemas loaded
    print(f"Number of database schemas loaded: {len(schema_manager.db2dbjsons)}")
    
    # Get the list of database IDs
    db_ids = list(schema_manager.db2dbjsons.keys())
    print(f"Sample database IDs: {db_ids[:5] if len(db_ids) >= 5 else db_ids}")
    
    # Verify that the cache is initially empty
    print(f"Initial cache size (db2infos): {len(schema_manager.db2infos)}")
except Exception as e:
    print(f"❌ Error initializing Schema Manager: {e}")

load json file from ../data/bird/dev_tables.json
✅ Schema Manager initialized successfully
Number of database schemas loaded: 11
Sample database IDs: ['debit_card_specializing', 'financial', 'formula_1', 'california_schools', 'card_games']
Initial cache size (db2infos): 0


## 3. Test Database Complexity Analysis

In [3]:
# Test the is_need_prune method on multiple databases
if len(db_ids) > 0:
    print("Testing database complexity analysis...")
    
    # Test on first 5 databases or all if less than 5
    test_dbs = db_ids[:5] if len(db_ids) >= 5 else db_ids
    
    for db_id in test_dbs:
        try:
            db_dict = schema_manager.db2dbjsons[db_id]
            avg_column_count = db_dict.get('avg_column_count', 0)
            total_column_count = db_dict.get('total_column_count', 0)
            table_count = db_dict.get('table_count', 0)
            
            need_prune = schema_manager.is_need_prune(db_id)
            
            print(f"\nDatabase: {db_id}")
            print(f"  Tables: {table_count}")
            print(f"  Total columns: {total_column_count}")
            print(f"  Average columns per table: {avg_column_count}")
            print(f"  Complexity: {'Complex, pruning needed' if need_prune else 'Simple, no pruning needed'}")
            
            # Verify that the logic matches our understanding
            expected_prune = not (avg_column_count <= 6 and total_column_count <= 30)
            if need_prune != expected_prune:
                print(f"  ⚠️ Mismatch in pruning logic: got {need_prune}, expected {expected_prune}")
        except Exception as e:
            print(f"❌ Error analyzing {db_id}: {e}")
else:
    print("No database IDs available for testing")

Testing database complexity analysis...

Database: debit_card_specializing
  Tables: 5
  Total columns: 21
  Average columns per table: 4
  Complexity: Simple, no pruning needed

Database: financial
  Tables: 8
  Total columns: 55
  Average columns per table: 6
  Complexity: Complex, pruning needed

Database: formula_1
  Tables: 13
  Total columns: 94
  Average columns per table: 7
  Complexity: Complex, pruning needed

Database: california_schools
  Tables: 3
  Total columns: 89
  Average columns per table: 29
  Complexity: Complex, pruning needed

Database: card_games
  Tables: 6
  Total columns: 115
  Average columns per table: 19
  Complexity: Complex, pruning needed


## 4. Test Schema Retrieval (Basic)

In [4]:
# Select a simple database to test basic schema retrieval
if len(db_ids) > 0:
    # Find a simple database that doesn't need pruning
    simple_db = None
    for db_id in db_ids:
        if not schema_manager.is_need_prune(db_id):
            simple_db = db_id
            break
    
    if simple_db:
        print(f"Testing basic schema retrieval with database ID: {simple_db}")

        simple_db_path = os.path.join(DEV_DB_DIRECTORY, simple_db, f"{simple_db}.sqlite")
        # Get the database schema without pruning
        schema_info = schema_manager.get_db_schema(simple_db)
        
        # Verify the schema information is complete
        print("\nSchema Output Keys:")
        for key in schema_info:
            print(f"  - {key}: {type(schema_info[key])}")
        
        # Print some statistics about the schema
        chosen_columns = schema_info["chosen_columns"]
        print(f"\nTables in schema: {len(chosen_columns)}")
        
        # Print table names and column counts
        print("\nTable Details:")
        for table, columns in chosen_columns.items():
            print(f"  - {table}: {len(columns)} columns")
            # Print first few columns
            if len(columns) > 0:
                print(f"    Sample columns: {columns[:3]}...")
        
        # Check for foreign keys
        fk_str = schema_info["fk_str"]
        if fk_str:
            print(f"\nForeign Key Count: {len(fk_str.split(chr(10)))}")
            print("First foreign key:" if chr(10) in fk_str else "Foreign key:")
            print(f"  {fk_str.split(chr(10))[0] if chr(10) in fk_str else fk_str}")
        else:
            print("\nNo foreign keys found")
            
        # Print a small portion of the schema string
        schema_str = schema_info["schema_str"]
        preview_length = min(300, len(schema_str))
        print(f"\nSchema String Preview ({preview_length} chars of {len(schema_str)} total):")
        print(schema_str[:preview_length] + "...")
        
        # Verify that cache now contains this database
        print(f"\nCache status - db2infos contains {simple_db}: {simple_db in schema_manager.db2infos}")
    else:
        print("No simple databases found for testing")
else:
    print("No database IDs available for testing")

Testing basic schema retrieval with database ID: debit_card_specializing

Schema Output Keys:
  - schema_str: <class 'str'>
  - fk_str: <class 'str'>
  - chosen_columns: <class 'dict'>

Tables in schema: 5

Table Details:
  - customers: 3 columns
    Sample columns: ['CustomerID', 'Segment', 'Currency']...
  - gasstations: 4 columns
    Sample columns: ['GasStationID', 'ChainID', 'Country']...
  - products: 2 columns
    Sample columns: ['ProductID', 'Description']...
  - transactions_1k: 9 columns
    Sample columns: ['TransactionID', 'Date', 'Time']...
  - yearmonth: 3 columns
    Sample columns: ['CustomerID', 'Date', 'Consumption']...

Foreign Key Count: 1
Foreign key:
  yearmonth.`CustomerID` = customers.`CustomerID`

Schema String Preview (300 chars of 1104 total):
# Table: customers
[
  (CustomerID, CustomerID.),
  (Segment, client segment. Value examples: ['SME', 'LAM', 'KAM'].),
  (Currency, Currency. Value examples: ['CZK', 'EUR'].)
]
# Table: gasstations
[
  (GasStationID, G

## 5. Test Schema Pruning

In [5]:
# Test schema pruning with a complex database
try:
    # Find a complex database that needs pruning
    complex_db = None
    for db_id in db_ids:
        if schema_manager.is_need_prune(db_id):
            complex_db = db_id
            break
    
    if complex_db:
        print(f"Testing schema pruning with database ID: {complex_db}")
        
        # Get full schema first
        full_schema = schema_manager.get_db_schema(complex_db)
        
        # Get all tables from the full schema
        all_tables = list(full_schema["chosen_columns"].keys())
        
        # Count total columns in full schema
        full_column_count = sum(len(cols) for cols in full_schema["chosen_columns"].values())
        
        print(f"\nFull schema has {len(all_tables)} tables with {full_column_count} total columns")
        print(f"Tables: {', '.join(all_tables)}")
        
        # Create a simple pruning configuration - keep only the first 2 tables
        tables_to_keep = all_tables[:min(2, len(all_tables))]
        
        # Create extracted schema dict with table decisions
        extracted_schema = {}
        for table in all_tables:
            if table in tables_to_keep:
                extracted_schema[table] = "keep_all"  # Keep all columns for these tables
            else:
                extracted_schema[table] = "drop_all"  # Drop these tables or limit to 6 columns
        
        print(f"\nPruning configuration:")
        print(f"- Tables to keep all columns: {tables_to_keep}")
        print(f"- Tables to simplify/drop: {[t for t in all_tables if t not in tables_to_keep]}")
        
        # Get pruned schema
        pruned_schema = schema_manager.get_db_schema(complex_db, extracted_schema)
        
        # Count total columns in pruned schema
        pruned_column_count = sum(len(cols) for cols in pruned_schema["chosen_columns"].values())
        
        # Calculate reduction
        tables_reduction = len(all_tables) - len(pruned_schema["chosen_columns"])
        columns_reduction = full_column_count - pruned_column_count
        
        print(f"\nPruning Results:")
        print(f"- Original: {len(all_tables)} tables, {full_column_count} columns")
        print(f"- Pruned: {len(pruned_schema['chosen_columns'])} tables, {pruned_column_count} columns")
        print(f"- Reduction: {tables_reduction} tables ({tables_reduction/len(all_tables)*100:.1f}%), {columns_reduction} columns ({columns_reduction/full_column_count*100:.1f}%)")
        
        # Print remaining tables
        print("\nTables in pruned schema:")
        for table in sorted(pruned_schema["chosen_columns"].keys()):
            cols = pruned_schema["chosen_columns"][table]
            print(f"  - {table}: {len(cols)} columns")
            if len(cols) > 0:
                print(f"    Sample columns: {cols[:3]}...")
        
        # Verify schema string was also pruned
        full_schema_len = len(full_schema["schema_str"])
        pruned_schema_len = len(pruned_schema["schema_str"])
        print(f"\nSchema string length reduced from {full_schema_len} to {pruned_schema_len} characters")
        print(f"Reduction: {full_schema_len - pruned_schema_len} chars ({(full_schema_len - pruned_schema_len)/full_schema_len*100:.1f}%)")
    else:
        print("No complex databases found for pruning test")
except Exception as e:
    print(f"❌ Error testing schema pruning: {e}")

Testing schema pruning with database ID: financial

Full schema has 8 tables with 55 total columns
Tables: account, card, client, disp, district, loan, order, trans

Pruning configuration:
- Tables to keep all columns: ['account', 'card']
- Tables to simplify/drop: ['client', 'disp', 'district', 'loan', 'order', 'trans']

Pruning Results:
- Original: 8 tables, 55 columns
- Pruned: 8 tables, 40 columns
- Reduction: 0 tables (0.0%), 15 columns (27.3%)

Tables in pruned schema:
  - account: 4 columns
    Sample columns: ['account_id', 'district_id', 'frequency']...
  - card: 4 columns
    Sample columns: ['card_id', 'disp_id', 'type']...
  - client: 4 columns
    Sample columns: ['client_id', 'gender', 'birth_date']...
  - disp: 4 columns
    Sample columns: ['disp_id', 'client_id', 'account_id']...
  - district: 6 columns
    Sample columns: ['district_id', 'A2', 'A3']...
  - loan: 6 columns
    Sample columns: ['loan_id', 'account_id', 'date']...
  - order: 6 columns
    Sample columns:

## 6. Test Column Selection Pruning

In [6]:
# Test specific column selection for a table
try:
    # Find a database with at least one table that has many columns
    target_db = None
    target_table = None
    
    for db_id in db_ids:
        # Get the schema to check column counts
        schema_info = schema_manager.get_db_schema(db_id)
        for table, columns in schema_info["chosen_columns"].items():
            if len(columns) > 6:  # Table with more than 6 columns
                target_db = db_id
                target_table = table
                break
        if target_db:
            break
    
    if target_db and target_table:
        print(f"Testing column selection pruning with database: {target_db}, table: {target_table}")
        
        # Get full schema first
        full_schema = schema_manager.get_db_schema(target_db)
        
        # Get all columns for the target table
        all_columns = full_schema["chosen_columns"][target_table]
        
        print(f"\nTable '{target_table}' has {len(all_columns)} columns:")
        print(f"All columns: {all_columns}")
        
        # Select a subset of columns (first 3) plus assume some are important keys
        columns_to_keep = all_columns[:3]
        
        print(f"Selecting these columns: {columns_to_keep}")
        
        # Create extracted schema dict with specific column selection for this table
        extracted_schema = {target_table: columns_to_keep}
        
        # Get pruned schema with column selection
        pruned_schema = schema_manager.get_db_schema(target_db, extracted_schema)
        
        # Check the result for the target table
        pruned_columns = pruned_schema["chosen_columns"].get(target_table, [])
        
        print(f"\nResults:")
        print(f"- Original columns count: {len(all_columns)}")
        print(f"- Pruned columns count: {len(pruned_columns)}")
        print(f"- Selected columns: {pruned_columns}")
        
        # Check if our selected columns are included (some may be added if they are primary/foreign keys)
        for col in columns_to_keep:
            if col in pruned_columns:
                print(f"✅ Selected column '{col}' was preserved")
            else:
                print(f"❌ Selected column '{col}' was not preserved")
        
        # Find columns that were added (likely important keys)
        added_columns = [col for col in pruned_columns if col not in columns_to_keep]
        if added_columns:
            print(f"\nAdditional columns (likely important keys): {added_columns}")
        
        # Find a snippet in the schema string that mentions the pruned table
        schema_str = pruned_schema["schema_str"]
        table_section_start = schema_str.find(f"# Table: {target_table}")
        
        if table_section_start >= 0:
            table_section_end = schema_str.find("# Table:", table_section_start + 1)
            if table_section_end < 0:
                table_section_end = len(schema_str)
            
            table_section = schema_str[table_section_start:table_section_end]
            print(f"\nSchema section for {target_table}:")
            print(table_section)
        else:
            print(f"\nTable section for {target_table} not found in schema string")
            
    else:
        print("No suitable database/table found for column selection test")
except Exception as e:
    print(f"❌ Error testing column selection: {e}")

Testing column selection pruning with database: debit_card_specializing, table: transactions_1k

Table 'transactions_1k' has 9 columns:
All columns: ['TransactionID', 'Date', 'Time', 'CustomerID', 'CardID', 'GasStationID', 'ProductID', 'Amount', 'Price']
Selecting these columns: ['TransactionID', 'Date', 'Time']

Results:
- Original columns count: 9
- Pruned columns count: 6
- Selected columns: ['TransactionID', 'Date', 'Time', 'CustomerID', 'CardID', 'GasStationID']
✅ Selected column 'TransactionID' was preserved
✅ Selected column 'Date' was preserved
✅ Selected column 'Time' was preserved

Additional columns (likely important keys): ['CustomerID', 'CardID', 'GasStationID']

Schema section for transactions_1k:
# Table: transactions_1k
[
  (TransactionID, Transaction ID.),
  (Date, Date. Value examples: ['2012-08-24'].),
  (Time, Time. Value examples: ['08:57:00', '16:20:00', '16:04:00', '15:23:00', '11:55:00', '09:50:00'].),
  (CustomerID, Customer ID.),
  (CardID, Card ID.),
  (GasSt

## 7. Test Cache Performance

In [7]:
# Test the performance of schema loading and caching
import time

try:
    # Measure time to get schemas for multiple databases
    test_db_ids = db_ids[:5] if len(db_ids) >= 5 else db_ids
    
    print(f"Testing schema loading performance for {len(test_db_ids)} databases")
    
    # First run - should populate cache
    start_time = time.time()
    for db_id in test_db_ids:
        _ = schema_manager.get_db_schema(db_id)
    first_run_time = time.time() - start_time
    print(f"First run time (should populate cache): {first_run_time:.4f} seconds")
    
    # Second run - should use cache
    start_time = time.time()
    for db_id in test_db_ids:
        _ = schema_manager.get_db_schema(db_id)
    second_run_time = time.time() - start_time
    print(f"Second run time (should use cache): {second_run_time:.4f} seconds")
    
    # Calculate speedup
    if second_run_time > 0:
        speedup = first_run_time / second_run_time
        print(f"Cache speedup: {speedup:.2f}x")
    
    # Verify cache contents
    cached_dbs = list(schema_manager.db2infos.keys())
    print(f"\nCache now contains {len(cached_dbs)} databases:")
    print(f"Cached DB IDs: {cached_dbs}")
    
    # Test with schema pruning (shouldn't affect caching behavior)
    print("\nTesting pruned schema caching:")
    
    if len(test_db_ids) > 0:
        test_db = test_db_ids[0]
        extracted_schema = {table: "keep_all" for table in schema_manager.get_db_schema(test_db)["chosen_columns"].keys()[:2]}
        
        # Time with pruning (should still use cache)
        start_time = time.time()
        _ = schema_manager.get_db_schema(test_db, extracted_schema)
        pruned_time = time.time() - start_time
        
        print(f"Time to get pruned schema (should still use cache): {pruned_time:.4f} seconds")
except Exception as e:
    print(f"❌ Error testing cache performance: {e}")

Testing schema loading performance for 5 databases
First run time (should populate cache): 2.8477 seconds
Second run time (should use cache): 0.0002 seconds
Cache speedup: 14253.02x

Cache now contains 5 databases:
Cached DB IDs: ['debit_card_specializing', 'financial', 'formula_1', 'california_schools', 'card_games']

Testing pruned schema caching:
❌ Error testing cache performance: 'dict_keys' object is not subscriptable


## 8. Test Error Handling

In [8]:
# Test error handling for various edge cases
try:
    print("Testing error handling for various edge cases...")
    
    # Test 1: Non-existent database ID
    print("\nTest 1: Non-existent database ID")
    try:
        non_existent_db = "non_existent_database_id"
        schema_info = schema_manager.get_db_schema(non_existent_db)
        print(f"❌ Expected error but got result: {schema_info.keys()}")
    except KeyError as e:
        print(f"✅ Correctly raised KeyError: {e}")
    except Exception as e:
        print(f"❓ Raised unexpected exception: {type(e).__name__}: {e}")
    
    # Test 2: Invalid extracted schema format
    print("\nTest 2: Invalid extracted schema format")
    if len(db_ids) > 0:
        try:
            invalid_schema = ["this", "is", "not", "a", "dict"]
            schema_info = schema_manager.get_db_schema(db_ids[0], invalid_schema)
            print(f"❌ Expected error but got result: {schema_info.keys()}")
        except (TypeError, ValueError, AttributeError) as e:
            print(f"✅ Correctly raised exception: {type(e).__name__}: {e}")
        except Exception as e:
            print(f"❓ Raised unexpected exception: {type(e).__name__}: {e}")
    
    # Test 3: Invalid use_gold_schema parameter
    print("\nTest 3: Invalid use_gold_schema parameter")
    if len(db_ids) > 0:
        try:
            schema_info = schema_manager.get_db_schema(db_ids[0], use_gold_schema="not_a_boolean")
            print(f"❌ Expected error but got result: {schema_info.keys()}")
        except (TypeError, ValueError) as e:
            print(f"✅ Correctly raised exception: {type(e).__name__}: {e}")
        except Exception as e:
            print(f"❓ Raised unexpected exception: {type(e).__name__}: {e}")
    
    # Test 4: Invalid paths in constructor
    print("\nTest 4: Invalid paths in constructor")
    try:
        invalid_schema_manager = SchemaManager("/non/existent/path", "/non/existent/tables.json")
        print(f"❌ Expected error but got instance: {invalid_schema_manager}")
    except FileNotFoundError as e:
        print(f"✅ Correctly raised FileNotFoundError: {e}")
    except Exception as e:
        print(f"❓ Raised unexpected exception: {type(e).__name__}: {e}")
    
except Exception as e:
    print(f"❌ Error in error handling tests: {e}")

Testing error handling for various edge cases...

Test 1: Non-existent database ID
✅ Correctly raised KeyError: 'non_existent_database_id'

Test 2: Invalid extracted schema format
✅ Correctly raised exception: AttributeError: 'list' object has no attribute 'get'

Test 3: Invalid use_gold_schema parameter
❌ Expected error but got result: dict_keys(['schema_str', 'fk_str', 'chosen_columns'])

Test 4: Invalid paths in constructor
✅ Correctly raised FileNotFoundError: tables.json not found in /non/existent/tables.json


## 9. Test Foreign Key Handling

In [9]:
# Test foreign key handling in schema manager
try:
    # Find a database with foreign keys
    db_with_fk = None
    for db_id in db_ids:
        schema_info = schema_manager.get_db_schema(db_id)
        if schema_info["fk_str"] and len(schema_info["fk_str"].strip()) > 0:
            db_with_fk = db_id
            fk_string = schema_info["fk_str"]
            break
    
    if db_with_fk:
        print(f"Testing foreign key handling with database: {db_with_fk}")
        
        # Get schema again to confirm consistency
        schema_info = schema_manager.get_db_schema(db_with_fk)
        
        # Parse foreign key information
        fk_lines = schema_info["fk_str"].split('\n')
        
        print(f"\nFound {len(fk_lines)} foreign key relationship(s):")
        for i, line in enumerate(fk_lines, 1):
            print(f"  {i}. {line}")
        
        # Test pruning effect on foreign keys
        # Get all tables referenced in foreign keys
        fk_tables = set()
        for line in fk_lines:
            # Example format: "table1.column1 = table2.column2"
            parts = line.split(' = ')
            if len(parts) == 2:
                src = parts[0].split('.')[0] if '.' in parts[0] else ''
                dst = parts[1].split('.')[0] if '.' in parts[1] else ''
                if src: 
                    fk_tables.add(src)
                if dst:
                    fk_tables.add(dst)
        
        print(f"\nTables involved in foreign key relationships: {fk_tables}")
        
        # Test 1: Pruning that keeps all FK tables
        print("\nTest 1: Pruning with all FK tables kept")
        extracted_schema = {table: "keep_all" for table in fk_tables}
        pruned_schema = schema_manager.get_db_schema(db_with_fk, extracted_schema)
        
        pruned_fk_lines = pruned_schema["fk_str"].split('\n') if pruned_schema["fk_str"] else []
        print(f"Original FK count: {len(fk_lines)}")
        print(f"Pruned FK count: {len(pruned_fk_lines)}")
        
        if len(pruned_fk_lines) == len(fk_lines):
            print("✅ All foreign key relationships preserved")
        else:
            print("❌ Some foreign key relationships lost")
            for line in fk_lines:
                if line not in pruned_fk_lines:
                    print(f"  Missing: {line}")
        
        # Test 2: Pruning that removes some FK tables
        if len(fk_tables) >= 2:
            print("\nTest 2: Pruning with some FK tables removed")
            # Keep only first table with FK
            first_fk_table = list(fk_tables)[0]
            extracted_schema = {first_fk_table: "keep_all"}
            
            pruned_schema = schema_manager.get_db_schema(db_with_fk, extracted_schema)
            pruned_fk_lines = pruned_schema["fk_str"].split('\n') if pruned_schema["fk_str"] else []
            
            print(f"Keeping only table: {first_fk_table}")
            print(f"Pruned FK count: {len(pruned_fk_lines)}")
            
            # Check which FKs were preserved
            if pruned_fk_lines:
                print("Remaining foreign keys:")
                for line in pruned_fk_lines:
                    print(f"  - {line}")
            else:
                print("All foreign keys were removed, as expected")
    else:
        print("No databases with foreign keys found for testing")
except Exception as e:
    print(f"❌ Error testing foreign key handling: {e}")

Testing foreign key handling with database: debit_card_specializing

Found 1 foreign key relationship(s):
  1. yearmonth.`CustomerID` = customers.`CustomerID`

Tables involved in foreign key relationships: {'customers', 'yearmonth'}

Test 1: Pruning with all FK tables kept
Original FK count: 1
Pruned FK count: 1
✅ All foreign key relationships preserved

Test 2: Pruning with some FK tables removed
Keeping only table: customers
Pruned FK count: 1
Remaining foreign keys:
  - yearmonth.`CustomerID` = customers.`CustomerID`


## 10. Summary and Conclusion

### Key Findings

1. **Initialization**: The SchemaManager successfully loads database schemas from tables.json.
2. **Schema Access**: We can access database schemas for different database IDs and correctly identify complex schemas that need pruning.
3. **Schema Pruning**: The manager can prune schemas at both table and column levels, preserving key relationships.
4. **Foreign Key Handling**: Foreign key relationships are properly extracted, formatted, and maintained during pruning.
5. **Performance**: The schema manager efficiently caches results for improved performance on repeated accesses.
6. **Error Handling**: The class properly handles various error conditions like invalid inputs or missing files.

### Next Steps

- Ensure the Schema Manager integrates properly with the Text-to-SQL pipeline.
- Consider optimizations for very large schemas.
- Add more comprehensive error handling for edge cases.
- Consider adding schema validation to ensure consistency.
- Explore methods to improve column value example generation for better schema understanding.