# LangChain SQLDatabase Toolkit Quick Reference

## **Introduction**  
In the world of data-driven applications, interacting with SQL databases efficiently and accurately is crucial. The LangChain library provides a suite of powerful tools designed to simplify and enhance database interactions. Among these tools are the **InfoSQLDatabaseTool**, **ListSQLDatabaseTool**, **QuerySQLCheckerTool**, and **QuerySQLDatabaseTool**, each serving a unique purpose in the SQL workflow.  

- **InfoSQLDatabaseTool** helps users retrieve schema and sample rows for specific tables, making it easier to understand the structure and content of the database.  
- **ListSQLDatabaseTool** provides a quick way to list all tables in the database, enabling users to explore available data sources.  
- **QuerySQLCheckerTool** ensures the correctness of SQL queries by validating and fixing common mistakes, reducing the risk of errors during execution.  
- **QuerySQLDatabaseTool** executes SQL queries and returns results, making it a versatile tool for data retrieval and analysis.  

These tools, when used together, form a robust framework for interacting with SQL databases, whether for data exploration, query validation, or execution. This article explores the functionality and practical applications of these tools, providing examples and insights to help you leverage them effectively in your projects.

### **Comparision of SQL Tools**

| **Feature**                | **InfoSQLDatabaseTool**                                                                 | **ListSQLDatabaseTool**                                                      | **QuerySQLCheckerTool**                                                                 | **QuerySQLDatabaseTool**                                                      |
|----------------------------|----------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|----------------------------------------------------------------------------------------|-------------------------------------------------------------------------------|
| **Purpose**                | Retrieves schema and sample rows for specified tables.                                 | Lists all tables in the database.                                           | Validates SQL queries using an LLM to check for correctness.                           | Executes SQL queries against the database and returns results.                |
| **Input**                  | Comma-separated list of table names.                                                  | Typically an empty string (no input required).                              | SQL query to validate.                                                                 | SQL query to execute.                                                         |
| **Output**                 | Schema and sample rows for the specified tables.                                       | Comma-separated list of table names in the database.                        | Validated SQL query or error message if the query is incorrect.                        | Query results or error message if the query fails.                            |
| **Key Use Case**           | Understanding the structure and sample data of specific tables.                        | Discovering all available tables in the database.                           | Ensuring SQL queries are correct before execution.                                      | Running SQL queries and retrieving results.                                   |
| **Dependencies**           | Requires a `SQLDatabase` object.                                                      | Requires a `SQLDatabase` object.                                            | Requires a `SQLDatabase` object and an LLM (e.g., GPT) for validation.                 | Requires a `SQLDatabase` object.                                              |
| **Common Methods**         | `run`, `arun`, `invoke`, `ainvoke`, `batch`, `abatch`, `stream`, `astream`.            | `run`, `arun`, `invoke`, `ainvoke`, `batch`, `abatch`, `stream`, `astream`. | `run`, `arun`, `invoke`, `ainvoke`, `batch`, `abatch`, `stream`, `astream`.            | `run`, `arun`, `invoke`, `ainvoke`, `batch`, `abatch`, `stream`, `astream`.   |
| **Error Handling**         | Returns schema and sample rows; errors if tables do not exist.                         | Returns table names; errors if the database is inaccessible.                | Returns a corrected query or error message if the query is invalid.                    | Returns query results or error message if the query fails.                    |
| **Asynchronous Support**   | Yes (`arun`, `ainvoke`, `abatch`, `astream`).                                          | Yes (`arun`, `ainvoke`, `abatch`, `astream`).                               | Yes (`arun`, `ainvoke`, `abatch`, `astream`).                                          | Yes (`arun`, `ainvoke`, `abatch`, `astream`).                                 |
| **Configuration Options**  | Supports `with_config`, `with_retry`, `with_listeners`, `with_fallbacks`.              | Supports `with_config`, `with_retry`, `with_listeners`, `with_fallbacks`.   | Supports `with_config`, `with_retry`, `with_listeners`, `with_fallbacks`.              | Supports `with_config`, `with_retry`, `with_listeners`, `with_fallbacks`.     |
| **Example Input**          | `"Customer, Invoice"`                                                                  | `""` (empty string)                                                         | `"SELECT * FROM Customers WHERE Country = 'USA'"`                                      | `"SELECT * FROM Customers WHERE Country = 'USA'"`                            |
| **Example Output**         | Schema and sample rows for `Customer` and `Invoice` tables.                            | `"Customer, Invoice, Order"` (list of tables).                              | `"SELECT * FROM Customers WHERE Country = 'USA'"` (validated query).                   | Query results (e.g., rows from the `Customers` table).                        |

### **When to Use Which Tool?**
- Use **`InfoSQLDatabaseTool`** when you need to understand the structure and sample data of specific tables.
- Use **`ListSQLDatabaseTool`** when you want to discover all tables in the database.
- Use **`QuerySQLCheckerTool`** when you need to validate SQL queries before execution.
- Use **`QuerySQLDatabaseTool`** when you want to execute SQL queries and retrieve results.

---

## Preparation

### Installing Required Libraries
This section installs the necessary Python libraries for working with LangChain, OpenAI embeddings, Anthropic models, and other utilities. These libraries include:
- `langchain-openai`: Provides integration with OpenAI's embedding models and APIs.
- `langchain-anthropic`: Enables integration with Anthropic's models and APIs.
- `langchain_community`: Contains community-contributed modules and tools for LangChain.
- `langchain_experimental`: Includes experimental features and utilities for LangChain.

In [None]:
!pip install -qU langchain-openai
!pip install -qU langchain-anthropic
!pip install -qU langchain_community
!pip install -qU langchain_experimental
!pip install -qU langgraph
!pip install -qU langchainhub

### Initializing OpenAI and Anthropic Chat Models
This section demonstrates how to securely fetch API keys for OpenAI and Anthropic using Kaggle's `UserSecretsClient` and initialize their respective chat models. The `ChatOpenAI` and `ChatAnthropic` classes are used to create instances of these models, which can be used for natural language processing tasks such as text generation and conversational AI.

**Key steps:**
1. **Fetch API Keys**: The OpenAI and Anthropic API keys are securely retrieved using Kaggle's `UserSecretsClient`.
2. **Initialize Chat Models**:
   - The `ChatOpenAI` class is initialized with the `gpt-4o-mini` model and the fetched OpenAI API key.
   - The `ChatAnthropic` class is initialized with the `claude-3-5-sonnet-latest` model and the fetched Anthropic API key.

In [None]:
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from kaggle_secrets import UserSecretsClient

# Fetch API key securely
user_secrets = UserSecretsClient()

# Initialize LLM
model = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=user_secrets.get_secret("my-openai-api-key"))
#model = ChatAnthropic(model="claude-3-5-sonnet-latest", temperature=0, api_key=user_secrets.get_secret("my-anthropic-api-key"))

---

## **Examples of `InfoSQLDatabaseTool`**

### Example 1: Retrieve Schema and Sample Rows for Specific Tables
Fetch the schema and sample rows for the `Customer` and `Invoice` tables to understand their structure and data.

In [None]:
from langchain_community.tools.sql_database.tool import InfoSQLDatabaseTool
from langchain_community.utilities.sql_database import SQLDatabase
from sqlalchemy import create_engine

# Create a SQLAlchemy engine and SQLDatabase object
engine = create_engine("sqlite:////kaggle/input/chinook-database/chinook.db")
db = SQLDatabase(engine)

# Initialize the tool
info_tool = InfoSQLDatabaseTool(db=db)

# Retrieve schema and sample rows for the Customer and Invoice tables
result = info_tool.run("Customer, Invoice")
print(result)

### Example 2: Explore Table Structure for Data Analysis
Analyze the schema and sample rows of the `Track` and `Album` tables to prepare for a data analysis task.

In [None]:
# Initialize the tool
info_tool = InfoSQLDatabaseTool(db=db)

# Retrieve schema and sample rows for the Track and Album tables
result = info_tool.run("Track, Album")
print(result)

### Example 3: Debugging Table Relationships
Inspect the schema of the `PlaylistTrack` and `Playlist` tables to understand their relationships.

In [None]:
# Initialize the tool
info_tool = InfoSQLDatabaseTool(db=db)

# Retrieve schema and sample rows for the PlaylistTrack and Playlist tables
result = info_tool.run("PlaylistTrack, Playlist")
print(result)

---

## **Examples of `ListSQLDatabaseTool`**

### Example 1: List All Tables in the Database
Retrieve a list of all tables in the `chinook.db` database to understand its structure.

In [None]:
from langchain_community.tools.sql_database.tool import ListSQLDatabaseTool

# Create a SQLAlchemy engine and SQLDatabase object
engine = create_engine("sqlite:////kaggle/input/chinook-database/chinook.db")
db = SQLDatabase(engine)

# Initialize the tool
list_tool = ListSQLDatabaseTool(db=db)

# List all tables in the database
tables = list_tool.run("")
print(tables)

### Example 2: Verify Table Existence Before Querying
Check if specific tables (e.g., `Customer`, `Invoice`) exist before running queries.

In [None]:
# Initialize the tool
list_tool = ListSQLDatabaseTool(db=db)

# List all tables and check for specific ones
tables = list_tool.run("")
if "Customer" in tables and "Invoice" in tables:
    print("Tables exist. Proceed with queries.")
else:
    print("Required tables do not exist.")

### Example 3: Dynamically Generate Queries Based on Table List
Use the list of tables to dynamically generate queries for each table.

In [None]:
# Initialize the tool
list_tool = ListSQLDatabaseTool(db=db)

# List all tables and generate a query for each
tables = list_tool.run("").split(", ")
for table in tables:
    print(f"SELECT * FROM {table} LIMIT 5;")

---

## **Examples of `QuerySQLCheckerTool`**

### **Example 1: Validate a Simple Query**
**Description**: Use the `run` method to validate a simple SQL query that fetches all rows from the `Customer` table.

In [None]:
from langchain_community.agent_toolkits.sql.toolkit import SQLDatabaseToolkit
from langchain_community.tools.sql_database.tool import QuerySQLCheckerTool
from langchain_community.utilities.sql_database import SQLDatabase
from langchain import hub
from langgraph.prebuilt import create_react_agent
from sqlalchemy import create_engine

# Set up the database connection
engine = create_engine("sqlite:////kaggle/input/chinook-database/chinook.db")
db = SQLDatabase(engine)

# Initialize the SQLDatabaseToolkit (Toolkit should be created first)
toolkit = SQLDatabaseToolkit(db=db, llm=model)

# Initialize the QuerySQLCheckerTool
checker_tool = QuerySQLCheckerTool(db=db, llm=model)

# Define the SQL query
query = "SELECT * FROM Customer"

# Validate the query using QuerySQLCheckerTool
validated_query = checker_tool.run(query)
print("Validated Query:", validated_query)

### **Example 2: Fix a Query with Common Mistakes**
**Description**: Use the `run` method to validate and fix a query with a common mistake (e.g., missing quotes around a string).

In [None]:
# Initialize the QuerySQLCheckerTool
checker_tool = QuerySQLCheckerTool(db=db, llm=model)

# Define the SQL query with a mistake
query = "SELECT * FROM Customer WHERE Country = USA"

# Validate and fix the query using QuerySQLCheckerTool
validated_query = checker_tool.run(query)
print("Validated Query:", validated_query)

### **Example 3: Validate a Complex Query with Joins**
**Description**: Use the `run` method to validate a complex query involving joins and aggregation.

In [None]:
# Initialize the QuerySQLCheckerTool
checker_tool = QuerySQLCheckerTool(db=db, llm=model)

# Define the SQL query
query = """
    SELECT c.FirstName, c.LastName, SUM(i.Total) AS TotalSpent
    FROM Customer c
    JOIN Invoice i ON c.CustomerId = i.CustomerId
    GROUP BY c.CustomerId
    ORDER BY TotalSpent DESC
"""

# Validate the query using QuerySQLCheckerTool
validated_query = checker_tool.run(query)
print("Validated Query:", validated_query)

---

## **Examples of `QuerySQLDatabaseTool`**

### Example 1: Aggregate Data
Use the `run` method to calculate the total sales per customer.

In [None]:
from langchain_community.tools.sql_database.tool import QuerySQLDatabaseTool
from langchain_community.utilities.sql_database import SQLDatabase
from sqlalchemy import create_engine

# Create a SQLAlchemy engine and SQLDatabase object
engine = create_engine("sqlite:////kaggle/input/chinook-database/chinook.db")
db = SQLDatabase(engine)

# Initialize the tool
query_tool = QuerySQLDatabaseTool(db=db)

# Execute the query
result = query_tool.run("""
    SELECT CustomerId, SUM(Total) AS TotalSpent 
    FROM Invoice 
    GROUP BY CustomerId
""")
print(result)

### Example 2: Join Tables
Use the `run` method to fetch customer names along with their invoice totals.

In [None]:
# Initialize the tool
query_tool = QuerySQLDatabaseTool(db=db)

# Execute the query
result = query_tool.run("""
    SELECT c.FirstName, c.LastName, SUM(i.Total) AS TotalSpent
    FROM Customer c
    JOIN Invoice i ON c.CustomerId = i.CustomerId
    GROUP BY c.CustomerId
""")
print(result)

### Example 3: Use `invoke` for Query Execution
Use the `invoke` method to execute a query and fetch all rows from the `Employee` table.

In [None]:
# Initialize the tool
query_tool = QuerySQLDatabaseTool(db=db)

# Execute the query using invoke
result = query_tool.invoke("SELECT * FROM Employee")
print(result)

### Example 4: Use `batch` for Multiple Queries
Use the `batch` method to execute multiple queries in a single call.

In [None]:
# Initialize the tool
query_tool = QuerySQLDatabaseTool(db=db)

# Execute multiple queries using batch
queries = [
    "SELECT * FROM Customer LIMIT 5",
    "SELECT * FROM Invoice LIMIT 5"
]
results = query_tool.batch(queries)
for result in results:
    print(result)

### Example 5: Use `stream` for Incremental Results
Use the `stream` method to fetch results incrementally for a large query.

In [None]:
# Initialize the tool
query_tool = QuerySQLDatabaseTool(db=db)

# Execute the query using stream
for chunk in query_tool.stream("SELECT * FROM Track LIMIT 10"):
    print(chunk)

### Example 6: Use `with_config` for Custom Configuration
Use the `with_config` method to add metadata or tags to the tool.

In [None]:
# Initialize the tool with custom configuration
configured_tool = query_tool.with_config({"tags": ["query_execution"], "metadata": {"purpose": "data_analysis"}})

# Execute the query
result = configured_tool.run("SELECT * FROM Genre")
print(result)

### Example 7: Use `with_retry` for Error Handling
Use the `with_retry` method to retry the query execution in case of errors.

In [None]:
# Initialize the tool with retry logic
retry_tool = query_tool.with_retry(retry_if_exception_type=(Exception,), stop_after_attempt=3)

# Execute the query using invoke
result = retry_tool.invoke("SELECT * FROM NonExistentTable")  # This will retry on failure
print(result)

---

## Best Practices

### **Example 1: Answering a Business Question Using the SQL Agent**

In this example, we demonstrate how to use the **SQLDatabaseToolkit** and **create_react_agent** to answer a business question by querying a SQL database. The goal is to find the **top 5 customers by total spending** in the `chinook.db` database. This example highlights the power of combining a language model (LLM) with SQL tools to automate complex data retrieval tasks.

#### **Key Components of the Example**

1. **SQLDatabaseToolkit**:
   - This toolkit provides tools like `QuerySQLDatabaseTool`, `InfoSQLDatabaseTool`, and `QuerySQLCheckerTool` to interact with SQL databases.
   - It enables the agent to generate, validate, and execute SQL queries.

2. **create_react_agent**:
   - This function creates an agent that uses the LLM and the tools provided by the `SQLDatabaseToolkit` to answer user questions.
   - The agent follows a reasoning process to determine the correct SQL query and interpret the results.

3. **Business Question**:
   - The question, **"Who are the top 5 customers by total spending?"**, requires the agent to:
     - Identify the relevant tables (`Customer` and `Invoice`).
     - Join the tables on the `CustomerId` field.
     - Aggregate the `Total` field from the `Invoice` table to calculate total spending per customer.
     - Sort the results in descending order and return the top 5 customers.

#### **Why This Example is Useful**

- **Automation**: The agent automates the process of writing and executing SQL queries, saving time and reducing the risk of errors.
- **Natural Language Interface**: Users can ask questions in plain English, and the agent translates them into SQL queries.
- **Scalability**: This approach can be extended to answer a wide range of business questions, such as revenue analysis, customer segmentation, and inventory management.

In [None]:
from langchain_community.agent_toolkits.sql.toolkit import SQLDatabaseToolkit
from langchain_community.utilities.sql_database import SQLDatabase
from langchain import hub
from langgraph.prebuilt import create_react_agent
from sqlalchemy import create_engine

# Set up the database connection
engine = create_engine("sqlite:////kaggle/input/chinook-database/chinook.db")
db = SQLDatabase(engine)

# Initialize the SQLDatabaseToolkit
toolkit = SQLDatabaseToolkit(db=db, llm=model)

# Pull the SQL agent system prompt
prompt_template = hub.pull("langchain-ai/sql-agent-system-prompt")

# Format the system message
system_message = prompt_template.format(dialect="SQLite", top_k=5)

# Create the agent
agent_executor = create_react_agent(model, tools=toolkit.get_tools(), state_modifier=system_message)

# Example 1: Ask the agent a business question
question = "Who are the top 5 customers by total spending?"
response = agent_executor.invoke({"messages": [("user", question)]})
print(response["messages"][-1].content)

In [None]:
# Example 2: Find the Total Revenue by Country
question = "Who are the top 3 best-selling artists based on the number of tracks sold?"
response = agent_executor.invoke({"messages": [("user", question)]})
print(response["messages"][-1].content)

In [None]:
# Example 3: Find the Most Popular Genre
# Ask the agent a business question
question = "Which music genre is the most popular based on the number of tracks sold?"
response = agent_executor.invoke({"messages": [("user", question)]})
print(response["messages"][-1].content)

In [None]:
# Example 4: Find the Employee with the Highest Sales
# Ask the agent a business question
question = "Which employee has generated the highest total sales?"
response = agent_executor.invoke({"messages": [("user", question)]})
print(response["messages"][-1].content)

In [None]:
# Example 5: Find the Most Profitable Track
question = "Which track has generated the highest total revenue?"
response = agent_executor.invoke({"messages": [("user", question)]})
print(response["messages"][-1].content)

### **Example 2: Validating and Executing a SQL Query**

In this example, we demonstrate how to use the **QuerySQLCheckerTool** and **QuerySQLDatabaseTool** to validate and execute a SQL query. The goal is to calculate the **total sales per country** in the `chinook.db` database. This example highlights the importance of ensuring SQL query correctness before execution and showcases how LangChain tools can streamline this process.

#### **Key Components of the Example**

1. **QuerySQLCheckerTool**:
   - This tool uses a language model (LLM) to validate SQL queries for correctness.
   - It checks for common mistakes, such as syntax errors, missing quotes, or incorrect table/column references.
   - If the query is invalid, it suggests corrections or rewrites the query.

2. **QuerySQLDatabaseTool**:
   - This tool executes validated SQL queries against the database and returns the results.
   - It ensures that only correct and safe queries are executed, reducing the risk of errors or data corruption.

3. **SQL Query**:
   - The query, **"SELECT c.Country, SUM(i.Total) AS TotalSales FROM Customer c JOIN Invoice i ON c.CustomerId = i.CustomerId GROUP BY c.Country ORDER BY TotalSales DESC"**, performs the following operations:
     - Joins the `Customer` and `Invoice` tables on the `CustomerId` field.
     - Groups the results by `Country`.
     - Calculates the total sales (`SUM(i.Total)`) for each country.
     - Orders the results in descending order of total sales.

#### **Why This Example is Useful**

- **Error Prevention**: The `QuerySQLCheckerTool` ensures that SQL queries are correct before execution, preventing runtime errors and data inconsistencies.
- **Efficiency**: By automating query validation and execution, this approach saves time and reduces the need for manual intervention.
- **Flexibility**: The tools can handle a wide range of SQL queries, from simple SELECT statements to complex joins and aggregations.

In [None]:
from langchain_community.tools.sql_database.tool import QuerySQLCheckerTool, QuerySQLDatabaseTool
from langchain_community.utilities.sql_database import SQLDatabase
from langchain_openai import ChatOpenAI
from sqlalchemy import create_engine
import os

# Set up the database connection
engine = create_engine("sqlite:////kaggle/input/chinook-database/chinook.db")
db = SQLDatabase(engine)

# Initialize the QuerySQLCheckerTool and QuerySQLDatabaseTool
checker_tool = QuerySQLCheckerTool(db=db, llm=model)
query_tool = QuerySQLDatabaseTool(db=db)

# Define the SQL query
query = """
    SELECT c.Country, SUM(i.Total) AS TotalSales
    FROM Customer c
    JOIN Invoice i ON c.CustomerId = i.CustomerId
    GROUP BY c.Country
    ORDER BY TotalSales DESC
"""

# Validate the query using QuerySQLCheckerTool
validated_query = checker_tool.run(query)
print("Validated Query:", validated_query)

# Execute the validated query using QuerySQLDatabaseTool
result = query_tool.run(validated_query)
print("Query Result:", result)

---

## **Conclusion**  
The **InfoSQLDatabaseTool**, **ListSQLDatabaseTool**, **QuerySQLCheckerTool**, and **QuerySQLDatabaseTool** are indispensable components of the LangChain ecosystem for SQL database interactions. Each tool addresses a specific need, from exploring database structures to validating and executing queries.  

By using these tools, developers and data professionals can streamline their workflows, reduce errors, and gain deeper insights into their data. Whether you're building a data analysis pipeline, developing a chatbot with database access, or simply exploring a dataset, these tools provide the flexibility and reliability needed to work with SQL databases effectively.  

As the demand for intelligent and efficient data handling grows, mastering these tools will empower you to build more robust and scalable applications. Start integrating them into your projects today and unlock the full potential of your SQL databases.