# AI Coder Workflow for Custom Library with LangGraph

## Introduction

This notebook demonstrates how to build an **AI-powered code generation workflow** tailored for your custom library using **LangGraph**, **LangChain**, and **Pydantic**. The workflow is designed to process user queries, generate reliable Python code solutions, and iteratively improve the outputs through error handling and reflection. By leveraging LangGraph, the system connects different stages of code generation into a dynamic and reusable workflow. This makes it especially useful for automating development tasks related to custom libraries, ensuring accurate and executable solutions.

Here’s an outline of the process:

1. **Importing Required Libraries**: Setting up the necessary libraries and dependencies.
2. **Constants and Configuration**: Defining constants such as model names, maximum iterations, and documentation URLs.
3. **Loading Custom Library Documentation**: Implementing a function to fetch and consolidate library documentation from a specified URL. This ensures that all necessary information is accessible for the workflow, using a recursive loader to retrieve multiple pages or sections of the documentation.
4. **Defining Data Models**: Creating data models using Pydantic to structure the workflow's state and code solutions.
5. **Initializing the Language Model**: Setting up the language model (LLM) for code generation.
6. **Creating Prompt Templates**: Designing prompts to guide the LLM in generating structured code solutions.
7. **Defining Workflow Nodes**: Implementing functions that handle code generation, code checking, and reflection on errors.
8. **Building and Compiling the Workflow**: Assembling the workflow graph and compiling it for execution.
9. **Executing Example Queries**: Demonstrating the workflow with sample user questions and displaying the generated code solutions.
10. **Conclusion**: Summarizing the workflow and its capabilities.

## Installation of Required Packages

First, we need to install the necessary packages required for our hierarchical agent system. These packages include various components of LangChain and LangGraph.

In [None]:
!pip install -qU langchain-openai
!pip install -qU langchain-anthropic
!pip install -qU langchain_community
!pip install -qU langchain_experimental
!pip install -qU langgraph

## Importing Required Libraries

First, we import all the necessary libraries and modules required for our workflow. This includes libraries for web scraping, language model interaction, data modeling, and workflow management.

In [None]:
from bs4 import BeautifulSoup as Soup
from langchain_community.document_loaders.recursive_url_loader import RecursiveUrlLoader
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from pydantic import BaseModel, Field
from typing import List, Optional
from langgraph.graph import END, StateGraph, START
from langchain_core.messages import SystemMessage, HumanMessage
from kaggle_secrets import UserSecretsClient

## Constants and Configuration

This section defines key constants used throughout the workflow, such as the model name, the maximum number of iterations for code generation attempts, and the URL for your private library documentation (e.g., LCEL documentation).

In [None]:
# Maximum number of iterations for code generation attempts
MAX_ITERATIONS = 3

# URL for your private library (LCEL documentation is used here)
LCEL_DOCS_URL = "https://python.langchain.com/docs/concepts/lcel/"

## Loading Custom Library Documentation

This section provides a function to load and combine custom library documentation (demonstrated here using LCEL documentation) from a specified URL. A recursive URL loader is utilized to fetch and parse the content, ensuring the inclusion of all relevant documentation pages.

In [None]:
# Load LCEL Documentation
def load_lcel_docs(url: str) -> str:
    """
    Load and concatenate LCEL documentation from the given URL.

    Args:
        url (str): The URL to load the documentation from.

    Returns:
        str: Concatenated content of all the documentation pages.
    """
    loader = RecursiveUrlLoader(
        url=url, max_depth=20, extractor=lambda x: Soup(x, "html.parser").text
    )
    docs = loader.load()
    # Sort documents by source in reverse order to ensure consistent ordering
    sorted_docs = sorted(docs, key=lambda x: x.metadata["source"], reverse=True)
    # Join all document contents with a separator
    return "\n\n\n --- \n\n\n".join(doc.page_content for doc in sorted_docs)

## Defining Data Models

Using Pydantic, we define data models to structure the state of the workflow and the code solutions generated. This ensures type safety and clarity in the data being handled.

### CodeSolution Model

Represents the structure of the code solutions, including a description (`prefix`), import statements (`imports`), and the main code block (`code`).

In [None]:
# Data Model for Code Solutions
class CodeSolution(BaseModel):
    """
    Schema for code solutions to questions about LCEL.

    Attributes:
        prefix (str): Description of the problem and approach.
        imports (str): Code block containing import statements.
        code (str): Code block excluding import statements.
    """
    prefix: str = Field(description="Description of the problem and approach")
    imports: str = Field(description="Code block import statements")
    code: str = Field(description="Code block not including import statements")

### GraphState Model

Represents the state of the workflow graph, tracking errors, messages, generated code solutions, and the number of iterations.

In [None]:
# Graph State Definition using Pydantic
class GraphState(BaseModel):
    """
    Represents the state of the graph.

    Attributes:
        error (str): Indicates if an error occurred ('yes' or 'no').
        messages (List): List of messages (user questions, error messages, etc.).
        generation (Optional[CodeSolution]): Generated code solution.
        iterations (int): Number of attempts made.
    """
    error: str = Field(default="no", description="'yes' or 'no' to indicate if an error occurred")
    messages: List = Field(default_factory=list, description="List of messages (user questions, error messages, etc.)")
    generation: Optional[CodeSolution] = Field(default=None, description="Generated code solution")
    iterations: int = Field(default=0, description="Number of attempts made")

## Initializing the Language Model

We initialize the Language Learning Model (LLM) that will be used for generating code solutions. In this case, we're using the `deepseek-chat` model from DeepSeek, but alternatives like Anthropic's Claude or OpenAI's GPT-4 can also be configured.

In [None]:
user_secrets = UserSecretsClient()

# Initialize LLM
# Anthropic
#llm = ChatAnthropic(temperature=0, model="claude-3-5-sonnet-latest", api_key=user_secrets.get_secret("my-anthropic-api-key"))

# OpenAI
# llm = ChatOpenAI(temperature=0, model="gpt-4o", api_key=user_secrets.get_secret("my-openai-api-key"))

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0, base_url="http://20.243.34.136:2999/v1",
                        openai_api_key="sk-j8r3Pxztstd3wBjF8fEe44E63f69486bAdC2C4562bD1E1F3")

# DeepSeek-V3
#llm = ChatOpenAI(temperature=0, model="deepseek-chat", api_key=user_secrets.get_secret("my-deepseek-api-key"),
#                 base_url="https://api.deepseek.com/v1")

## Creating Prompt Templates

We create a prompt template that guides the LLM in generating structured code solutions. The prompt includes system messages and placeholders for context and user messages.

In [None]:
# Prompt Template for Code Generation
code_gen_prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessage(
            """You are a coding assistant with expertise in LCEL, LangChain expression language.
Here is a full set of LCEL documentation:
------------------------------------------
{context}
------------------------------------------
Answer the user question based on the above provided documentation. Ensure any code you provide can be executed
with all required imports and variables defined. Structure your answer with a description of the code solution.
Then list the imports. And finally list the functioning code block. Here is the user question:"""
        ),
        ("placeholder", "{messages}"),
    ]
)

## Defining Workflow Nodes

Workflow nodes are functions that represent different stages in the workflow. Here, we define three primary nodes: `gen_code_node`, `check_code_node`, and `reflect_code_node`.

### Code Generation Node

Generates a code solution based on the current state, invoking the LLM with the appropriate context and messages.

In [None]:
# Nodes
def gen_code_node(state: GraphState) -> GraphState:
    """
    Generate a code solution based on the current state.

    Args:
        state (GraphState): The current state of the graph.

    Returns:
        GraphState: Updated state with the generated code solution.
    """
    print("---GENERATING CODE SOLUTION---")
    messages = state.messages
    iterations = state.iterations
    error = state.error

    # Add retry message if there was an error
    if error == "yes":
        messages.append(HumanMessage("Now, try again. Invoke the code tool to structure the output with a prefix, imports, and code block."))

    # Generate code solution using the code generation chain
    code_solution = code_gen_chain.invoke({"context": concatenated_content, "messages": messages})
    # Append the generated solution to the messages
    messages.append(HumanMessage(f"{code_solution.prefix} \n Imports: {code_solution.imports} \n Code: {code_solution.code}"))

    # Increment iteration count and return updated state
    return GraphState(
        error="no",
        messages=messages,
        generation=code_solution,
        iterations=iterations + 1,
    )

### Code Checking Node

Checks the generated code for errors by attempting to execute the import statements and the main code block. Updates the state based on the success or failure of these executions.

In [None]:
def check_code_node(state: GraphState) -> GraphState:
    """
    Check the generated code for errors.

    Args:
        state (GraphState): The current state of the graph.

    Returns:
        GraphState: Updated state indicating whether the code passed or failed the checks.
    """
    print("---CHECKING CODE---")
    messages = state.messages
    code_solution = state.generation
    iterations = state.iterations

    # Check imports by attempting to execute them
    try:
        exec(code_solution.imports)
    except Exception as e:
        print("---CODE IMPORT CHECK: FAILED---")
        messages.append(HumanMessage(f"Your solution failed the import test: {e}"))
        return GraphState(
            error="yes",
            messages=messages,
            generation=code_solution,
            iterations=iterations,
        )

    # Check execution by attempting to run the full code (imports + code)
    try:
        exec(code_solution.imports + "\n" + code_solution.code)
    except Exception as e:
        print("---CODE BLOCK CHECK: FAILED---")
        messages.append(HumanMessage(f"Your solution failed the code execution test: {e}"))
        return GraphState(
            error="yes",
            messages=messages,
            generation=code_solution,
            iterations=iterations,
        )

    # If no errors, return the state with error set to 'no'
    print("---NO CODE TEST FAILURES---")
    return GraphState(
        error="no",
        messages=messages,
        generation=code_solution,
        iterations=iterations,
    )

### Reflection Node

Reflects on any errors encountered during code generation or execution, providing insights for improvement.

In [None]:
def reflect_code_node(state: GraphState) -> GraphState:
    """
    Reflect on errors and provide insights for improvement.

    Args:
        state (GraphState): The current state of the graph.

    Returns:
        GraphState: Updated state with reflections on the error.
    """
    print("---REFLECTING ON ERRORS---")
    messages = state.messages
    code_solution = state.generation

    # Generate reflections using the code generation chain
    reflections = code_gen_chain.invoke({"context": concatenated_content, "messages": messages})
    messages.append(HumanMessage(f"Here are reflections on the error: {reflections}"))

    return GraphState(
        error="yes",
        messages=messages,
        generation=code_solution,
        iterations=state.iterations,
    )

## Building and Compiling the Workflow

We construct the workflow graph by adding the defined nodes and specifying the transitions between them based on the state of the workflow. The workflow starts with code generation, followed by code checking, and optionally reflects on errors before retrying.

In [None]:
# Chain for Code Generation
code_gen_chain = code_gen_prompt | llm.with_structured_output(CodeSolution)

# Edges
def decide_to_finish(state: GraphState) -> str:
    """
    Determine whether to finish or retry based on the state.

    Args:
        state (GraphState): The current state of the graph.

    Returns:
        str: Decision to either finish or retry the code generation process.
    """
    error = state.error
    iterations = state.iterations

    # If no error or max iterations reached, finish
    if error == "no" or iterations == MAX_ITERATIONS:
        print("---DECISION: FINISH---")
        return "end"
    else:
        # Otherwise, decide to retry or reflect based on the error
        print("---DECISION: RE-TRY SOLUTION---")
        return "reflect_code_node" if error == "yes" else "gen_code_node"

# Build and Compile the Workflow
workflow = StateGraph(GraphState)
workflow.add_node("gen_code_node", gen_code_node)  # Add code generation node
workflow.add_node("check_code_node", check_code_node)  # Add code checking node
workflow.add_node("reflect_code_node", reflect_code_node)  # Add reflection node

workflow.add_edge(START, "gen_code_node")  # Start with code generation
workflow.add_edge("gen_code_node", "check_code_node")  # Check the generated code
workflow.add_conditional_edges(
    "check_code_node",
    decide_to_finish,
    {"end": END, "reflect_code_node": "reflect_code_node", "gen_code_node": "gen_code_node"},
)
workflow.add_edge("reflect_code_node", "gen_code_node")  # Retry after reflection
app = workflow.compile()  # Compile the workflow

### Optional: Displaying the Workflow Graph

If desired, the workflow graph can be visualized. This requires additional dependencies and is optional.

In [None]:
# Optional: Display the workflow graph (requires extra dependencies)
from IPython.display import Image, display

try:
    display(Image(app.get_graph().draw_mermaid_png()))
except Exception:
    # This requires some extra dependencies and is optional
    pass

## Loading LCEL Documentation Content

We load the LCEL documentation content using the previously defined `load_lcel_docs` function. This content serves as the context for the LLM when generating code solutions.

In [None]:
# Load LCEL Documentation
concatenated_content = load_lcel_docs(LCEL_DOCS_URL)

## Function to Extract and Print Code

A utility function is defined to extract the generated Python code from the workflow's solution and print it in a readable format.

In [None]:
# Function to Extract and Print Code
def extract_and_print_code(solution: dict) -> None:
    """
    Extracts the Python code from the solution and prints it.

    Args:
        solution (dict): The solution dictionary returned by the workflow.
    """
    # Convert the solution dictionary to a GraphState object
    graph_state = GraphState(**solution)

    if graph_state.generation is None:
        print("No code generation found in the solution.")
        return

    # Extract the code from the CodeSolution object
    code_solution = graph_state.generation
    print()
    print("# " + "-"*80)
    print("# Extracted Python Code")
    print("# " + "-"*80)
    print(code_solution.code)

## Executing Example Queries

We demonstrate the workflow's capabilities by executing several example user queries. For each query, we initialize the workflow's state, invoke the workflow, and display the generated code solution.

In [None]:
# Example Usage 1
question = "How can I directly pass a string to a runnable and use it to construct the input needed for my prompt?"

initial_state = GraphState(
    messages=[HumanMessage(question)],
    iterations=0,
    error="no",
    generation=None,
)
solution = app.invoke(initial_state)

# Output the Solution
extract_and_print_code(solution)

In [None]:
# Example Usage 2
question = """How can I create a simple LCEL chain that takes a string as input, converts it to uppercase,
and then appends the text ' - Processed' to the result? Use the pipe operator to chain the steps."""

initial_state = GraphState(
    messages=[HumanMessage(question)],
    iterations=0,
    error="no",
    generation=None,
)
solution = app.invoke(initial_state)

# Output the Solution
extract_and_print_code(solution)

In [None]:
# Example Usage 3
question = """How can I create an LCEL chain that routes the input to one of two runnables based on a condition?
For example, if the input is a number greater than 10, route it to a runnable that multiplies it by 2;
otherwise, route it to a runnable that adds 5."""

initial_state = GraphState(
    messages=[HumanMessage(question)],
    iterations=0,
    error="no",
    generation=None,
)
solution = app.invoke(initial_state)

# Output the Solution
extract_and_print_code(solution)

In [None]:
# Example Usage 4
question = """How can I create an LCEL chain that takes a dictionary with name and age keys,
formats a string like 'Name: {name}, Age: {age}', and then converts the result to uppercase?
Use the pipe operator to chain the steps."""

initial_state = GraphState(
    messages=[HumanMessage(question)],
    iterations=0,
    error="no",
    generation=None,
)
solution = app.invoke(initial_state)

# Output the Solution
extract_and_print_code(solution)

## Conclusion

In this notebook, we’ve built a robust **AI Coder Workflow** for your custom library using **LangGraph**. This workflow automates the process of interpreting user queries, generating Python code, and refining solutions through structured prompts, error handling, and iterative enhancements. By integrating LangGraph’s modular workflow design with comprehensive library documentation, this system ensures that the generated code is both accurate and executable.

This approach not only streamlines development tasks but also provides a flexible foundation for scaling automation in coding processes. With this workflow, developers can confidently address complex coding requirements while maintaining the reliability and precision needed for custom libraries.