# Storm Research Assistant

Reference
https://github.com/langchain-ai/langgraph/blob/main/examples/storm/storm.ipynb


In [1]:
## Prereqs

In [2]:

# %pip install -U langchain_community langchain_openai langgraph wikipedia  scikit-learn  langchain_fireworks
# We use one or the other search engine below
# %pip install -U tavily-python
# %pip install -U duckduckgo-search
# ! apt-get install graphviz graphviz-dev
# %pip install pygraphviz



In [3]:
from storm import *

In [4]:
from langchain_openai import ChatOpenAI

fast_llm = ChatOpenAI(model="gpt-3.5-turbo")
# long_context_llm = ChatOpenAI(model="gpt-4-turbo-preview")
long_context_llm = ChatOpenAI(model="gpt-3.5-turbo-0125")


embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Chroma


interview_config = InterviewConfig(long_llm=long_context_llm, fast_llm=fast_llm, 
                                   max_conversations=5, tags_to_extract=[ "p", "h1", "h2", "h3"],
                                   vectorstore=None,
                                   embeddings=embeddings
                                   )

In [5]:
def cleanup_name(name: str) -> str:

    # Remove all non-alphanumeric characters
    name = re.sub(r"[^a-zA-Z0-9_-]", "", name)

    return name

In [6]:

## Generate Initial Outline

from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List, Optional
from langchain_core.prompts import ChatPromptTemplate
from langchain.output_parsers import PydanticOutputParser

direct_gen_outline_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a Wikipedia writer. Write an outline for a Wikipedia page about a user-provided topic. Be comprehensive and specific.",
        ),
        ("user", "{topic}\n{format_instructions}"),
    ]
)


class Subsection(BaseModel):
    subsection_title: str = Field(..., title="Title of the subsection")
    description: str = Field(..., title="Content of the subsection")

    @property
    def as_str(self) -> str:
        return f"### {self.subsection_title}\n\n{self.description}".strip()


class Section(BaseModel):
    section_title: str = Field(..., title="Title of the section")
    description: str = Field(..., title="Content of the section")
    subsections: Optional[List[Subsection]] = Field(
        default=None,
        title="Titles and descriptions for each subsection of the Wikipedia page.",
    )

    @property
    def as_str(self) -> str:
        subsections = "\n\n".join(
            f"### {subsection.subsection_title}\n\n{subsection.description}"
            for subsection in self.subsections or []
        )
        return f"## {self.section_title}\n\n{self.description}\n\n{subsections}".strip()


class Outline(BaseModel):
    page_title: str = Field(..., title="Title of the Wikipedia page")
    sections: List[Section] = Field(
        default_factory=list,
        title="Titles and descriptions for each section of the Wikipedia page.",
    )

    @property
    def as_str(self) -> str:
        sections = "\n\n".join(section.as_str for section in self.sections)
        return f"# {self.page_title}\n\n{sections}".strip()


outline_parser = PydanticOutputParser(pydantic_object=Outline)

generate_outline_direct = direct_gen_outline_prompt.partial(format_instructions=outline_parser.get_format_instructions()) | fast_llm | outline_parser


In [7]:

example_topic = "Impact of million-plus token context window language models on RAG"

initial_outline = generate_outline_direct.invoke({"topic": example_topic})

print(initial_outline.as_str)

# Impact of million-plus token context window language models on RAG

## Introduction

Overview of million-plus token context window language models and RAG (Retrieval-Augmented Generation) framework.

## Background

Explanation of language models with million-plus token context windows and their impact on natural language processing tasks.

## RAG Framework

Detailed description of the Retrieval-Augmented Generation framework and its components.

## Impact on RAG

Analysis of how million-plus token context window language models enhance the performance of the RAG framework.

## Applications

Exploration of the practical applications of integrating million-plus token context window language models with the RAG framework.

## Challenges and Considerations

Discussion on the challenges and considerations when using million-plus token context window language models in the RAG framework.


In [8]:
## Expand Topics\



In [9]:
gen_related_topics_prompt = ChatPromptTemplate.from_template(
    """I'm writing a Wikipedia page for a topic mentioned below. Please identify and recommend some Wikipedia pages on closely related subjects. I'm looking for examples that provide insights into interesting aspects commonly associated with this topic, or examples that help me understand the typical content and structure included in Wikipedia pages for similar topics.

Please list the as many subjects and urls as you can.

Topic of interest: {topic}
{format_instructions}
"""
)


class RelatedSubjects(BaseModel):
    topics: List[str] = Field(
        description="Comprehensive list of related subjects as background research.",
    )


related_topics_parser = PydanticOutputParser(pydantic_object=RelatedSubjects)

expand_chain = gen_related_topics_prompt.partial(format_instructions=related_topics_parser.get_format_instructions()) | fast_llm | related_topics_parser


In [10]:
related_subjects = await expand_chain.ainvoke({"topic": example_topic})
related_subjects

RelatedSubjects(topics=['Language model', 'Retriever-Reader model', 'Natural language processing', 'Context window', 'Transformer (machine learning model)', 'Information retrieval', 'Knowledge graph', 'Question answering', 'BERT (language model)', 'GPT-3 (language model)'])

## Generate Perspectives

From these related subjects, we can select representative Wikipedia editors as "subject matter experts" with distinct backgrounds and affiliations. These will help distribute the search process to encourage a more well-rounded final report.


In [11]:
class Editor(BaseModel):
    affiliation: str = Field(
        description="Primary affiliation of the editor.",
    )
    name: str = Field(
        description="Name of the editor.",
    )
    role: str = Field(
        description="Role of the editor in the context of the topic.",
    )
    description: str = Field(
        description="Description of the editor's focus, concerns, and motives.",
    )

    @property
    def persona(self) -> str:
        return f"Name: {self.name}\nRole: {self.role}\nAffiliation: {self.affiliation}\nDescription: {self.description}\n"


class Perspectives(BaseModel):
    editors: List[Editor] = Field(
        description="Comprehensive list of editors with their roles and affiliations.",
        # Add a pydantic validation/restriction to be at most M editors
    )

gen_perspectives_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You need to select a diverse (and distinct) group of Wikipedia editors who will work together to create a comprehensive article on the topic. Each of them represents a different perspective, role, or affiliation related to this topic.\
    You can use other Wikipedia pages of related topics for inspiration. For each editor, add a description of what they will focus on.

    Wiki page outlines of related topics for inspiration:
    {examples}""",
        ),
        ("user", "Topic of interest: {topic}\n\n{format_instructions}"),
    ]
)

perspectives_parser = PydanticOutputParser(pydantic_object=Perspectives)

gen_perspectives_chain = gen_perspectives_prompt.partial(format_instructions=perspectives_parser.get_format_instructions()) | fast_llm | perspectives_parser


In [12]:
from langchain_community.retrievers import WikipediaRetriever
from langchain_core.runnables import RunnableLambda, chain as as_runnable

wikipedia_retriever = WikipediaRetriever(load_all_available_meta=True, top_k_results=1)


def format_doc(doc, max_length=1000)-> str:
    related = "- ".join(doc.metadata["categories"])
    return f"### {doc.metadata['title']}\n\nSummary: {doc.page_content}\n\nRelated\n{related}"[
        :max_length
    ]


def format_docs(docs):
    return "\n\n".join(format_doc(doc) for doc in docs)


@as_runnable
async def survey_subjects(topic: str)-> Perspectives:
    print(f"Survey Subjects for Topic: {topic}")
    related_subjects = await expand_chain.ainvoke({"topic": topic})
    retrieved_docs = await wikipedia_retriever.abatch(
        related_subjects.topics, return_exceptions=True
    )
    all_docs = []
    for docs in retrieved_docs:
        if isinstance(docs, BaseException):
            continue
        all_docs.extend(docs)
    print(f"Retrieved {len(all_docs)} docs for Topic: {topic}")
    
    formatted = format_docs(all_docs)
    return await gen_perspectives_chain.ainvoke({"examples": formatted, "topic": topic})

In [13]:
perspectives = await survey_subjects.ainvoke(example_topic)


Survey Subjects for Topic: Impact of million-plus token context window language models on RAG
Retrieved 5 docs for Topic: Impact of million-plus token context window language models on RAG


In [14]:

perspectives.dict()


{'editors': [{'affiliation': 'Research Institution',
   'name': 'Dr. Data Scientist',
   'role': 'Researcher',
   'description': 'Dr. Data Scientist specializes in analyzing the impact of million-plus token context window language models on the Retrieval-Augmented Generation (RAG) approach. They focus on evaluating the effectiveness of these models in enhancing RAG capabilities and understanding how they influence information retrieval and generation processes.'},
  {'affiliation': 'Tech Company',
   'name': 'AI Engineer',
   'role': 'Engineer',
   'description': 'AI Engineer works on implementing million-plus token context window language models in RAG systems. Their role involves optimizing the integration of these models into existing RAG frameworks, ensuring efficient computation and performance while utilizing the extended context window for improved retrieval and generation tasks.'},
  {'affiliation': 'Academic Institution',
   'name': 'Professor Linguist',
   'role': 'Linguistic

## Expert Dialog

Each wikipedia writer is primed to role-play using the perspectives presented above. It will ask a series of questions of a second "domain expert" with access to a search engine. This generate content to generate a refined outline as well as an updated index of reference documents.

### Interview State

The conversation is cyclic, so we will construct it within its own graph. The State will contain messages, the reference docs, and the editor (with its own "persona") to make it easy to parallelize these conversations.


In [15]:
from langgraph.graph import StateGraph, END
from typing_extensions import TypedDict
from langchain_core.messages import AnyMessage
from typing import Annotated, Sequence


def add_messages(left, right):
    if not isinstance(left, list):
        left = [left]
    if not isinstance(right, list):
        right = [right]
    return left + right


def update_references(references, new_references):
    if not references:
        references = {}
    references.update(new_references)
    return references


def update_editor(editor, new_editor):
    # Can only set at the outset
    if not editor:
        return new_editor
    return editor


class InterviewState(TypedDict):
    messages: Annotated[List[AnyMessage], add_messages]
    references: Annotated[Optional[dict], update_references]
    editor: Annotated[Optional[Editor], update_editor]

# Dialog Roles

The graph will have two participants: the wikipedia editor (generate_question), who asks questions based on its assigned role, and a domain expert (`gen_answer_chain), who uses a search engine to answer the questions as accurately as possible.


In [16]:
from langchain_core.prompts import MessagesPlaceholder
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, ToolMessage


gen_qn_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are an experienced Wikipedia writer and want to edit a specific page. \
Besides your identity as a Wikipedia writer, you have a specific focus when researching the topic. \
Now, you are chatting with an expert to get information. Ask good questions to get more useful information.

When you have no more questions to ask, say "Thank you so much for your help!" to end the conversation.\
Please only ask one question at a time and don't ask what you have asked before.\
Your questions should be related to the topic you want to write.
Be comprehensive and curious, gaining as much unique insight from the expert as possible.\

Stay true to your specific perspective:

{persona}""",
        ),
        MessagesPlaceholder(variable_name="messages", optional=True),
    ]
)


def tag_with_name(ai_message: AIMessage, name: str) -> AIMessage:
    ai_message.name = name
    return ai_message


def swap_roles(state: InterviewState, name: str) -> InterviewState:

    # Normalize name
    name = cleanup_name(name)

    print(f'Swapping roles for {name}')

    converted = []
    for message in state["messages"]:
        if isinstance(message, AIMessage) and message.name != name:
            message = HumanMessage(**message.dict(exclude={"type"}))
        converted.append(message)
    
    print(f'Converted messages for {name} while swapping roles: {len(converted)} messages')

    return {"messages": converted}


@as_runnable
async def generate_question(state: InterviewState) -> InterviewState:
    editor = state["editor"]

    name = cleanup_name(editor.name)

    print(f'Generating question for {name}')

    gn_chain = (
        RunnableLambda(swap_roles).bind(name=name)
        | gen_qn_prompt.partial(persona=editor.persona)
        | fast_llm
        | RunnableLambda(tag_with_name).bind(name=name)
    )
    result:AIMessage = await gn_chain.ainvoke(state)

    print(f'Generated question for {name}')
    return {"messages": [result]}

In [17]:
messages = [
    HumanMessage(f"So you said you were writing an article on {example_topic}?")
]
question = await generate_question.ainvoke(
    {
        "editor": perspectives.editors[0],
        "messages": messages,
    }
)

question["messages"][0]

Generating question for DrDataScientist
Swapping roles for DrDataScientist
Converted messages for DrDataScientist while swapping roles: 1 messages
Generated question for DrDataScientist


AIMessage(content="Yes, that's correct. I am researching the impact of million-plus token context window language models on the Retrieval-Augmented Generation (RAG) approach. These language models have the capability to analyze vast amounts of text data within a wide context window, which can potentially enhance the performance of RAG systems. I am interested in understanding how these models influence information retrieval and generation processes within the RAG framework. Is there any specific aspect of this topic that you would like me to elaborate on?", response_metadata={'token_usage': {'completion_tokens': 99, 'prompt_tokens': 238, 'total_tokens': 337}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3bc1b5746c', 'finish_reason': 'stop', 'logprobs': None}, name='DrDataScientist')

### Answer questions

The `gen_answer_chain` first generates queries (query expansion) to answer the editor's question, then responds with citations.


In [18]:
class Queries(BaseModel):
    queries: List[str] = Field(
        description="Comprehensive list of search engine queries to answer the user's questions.",
    )


gen_queries_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful research assistant. Query the search engine to answer the user's questions.\n{format_instructions}",
        ),
        MessagesPlaceholder(variable_name="messages", optional=True),
    ]
)

queries_parser = PydanticOutputParser(pydantic_object=Queries)

gen_queries_chain = gen_queries_prompt.partial(format_instructions=queries_parser.get_format_instructions()) | fast_llm | queries_parser

In [19]:

queries = await gen_queries_chain.ainvoke(
    {"messages": [HumanMessage(content=question["messages"][0].content)]}
)

queries

Queries(queries=['Impact of million-plus token context window language models on Retrieval-Augmented Generation (RAG) approach', 'Capabilities of million-plus token context window language models in analyzing vast amounts of text data within a wide context window', 'Potential enhancements in performance of RAG systems due to million-plus token context window language models', 'Influence of million-plus token context window language models on information retrieval process in RAG framework', 'Influence of million-plus token context window language models on generation process in RAG framework'])

In [20]:

class AnswerWithCitations(BaseModel):
    answer: str = Field(
        description="Comprehensive answer to the user's question with citations.",
    )
    cited_urls: List[str] = Field(
        description="List of urls cited in the answer.",
    )

    @property
    def as_str(self) -> str:
        return f"{self.answer}\n\nCitations:\n\n" + "\n".join(
            f"[{i+1}]: {url}" for i, url in enumerate(self.cited_urls)
        )


gen_answer_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are an expert who can use information effectively. You are chatting with a Wikipedia writer who wants\
 to write a Wikipedia page on the topic you know. You have gathered the related information and will now use the information to form a response.

Make your response as informative as possible and make sure every sentence is supported by the gathered information.
Each response must be backed up by a citation from a reliable source, formatted as a footnote, reproducing the URLS after your response.
{format_instructions}""",
        ),
        MessagesPlaceholder(variable_name="messages", optional=True),
    ]
)

ac_parser = PydanticOutputParser(pydantic_object=AnswerWithCitations)

gen_answer_chain = gen_answer_prompt.partial(format_instructions=ac_parser.get_format_instructions()) | fast_llm | ac_parser 

# .with_structured_output(
#     AnswerWithCitations, include_raw=True
# ).with_config(run_name="GenerateAnswer")
                                             

In [21]:
from langchain_community.utilities.duckduckgo_search import DuckDuckGoSearchAPIWrapper
from langchain_core.tools import tool

# DDG 
search_engine = DuckDuckGoSearchAPIWrapper()

@tool
async def search_engine(query: str):
    """Search engine to the internet."""

    print(f"Searching DuckDuckGo for [{query}]")

    results = DuckDuckGoSearchAPIWrapper()._ddgs_text(query)

    print(f"Got search engine results: {len(results)} for [{query}]")
    
    return [{"content": r["body"], "url": r["href"]} for r in results]

In [75]:
from langchain_core.runnables import RunnableConfig
import json, re


async def gen_answer(
    state: InterviewState,
    config: Optional[RunnableConfig] = None,
    name: str = "SubjectMatterExpert",
    max_str_len: int = 15000,
):
    name = cleanup_name(name)

    print(f'Generating answers for [{name}]')


    swapped_state = swap_roles(state, name)  # Convert all other AI messages
    
    queries:Queries = await gen_queries_chain.ainvoke(swapped_state)

    print(f"Got {len(queries.queries)} search engine queries for [{name}]")

    query_results = await search_engine.abatch(
        queries.queries, config, return_exceptions=True
    )
    successful_results = [
        res for res in query_results if not isinstance(res, Exception)
    ]

    print(f"Got {len(successful_results)} search engine results for [{name}]")

    all_query_results = {
        res["url"]: res["content"] for results in successful_results for res in results
    }

    # We could be more precise about handling max token length if we wanted to here
    dumped = json.dumps(all_query_results)[:max_str_len]
    
    ai_message: AIMessage = str(queries)
    # print(f"Got {ai_message} for [{name}]")
    
    # tool_call = queries["raw"].additional_kwargs["tool_calls"][0]
    # tool_id = tool_call["id"]

    # tool_message = ToolMessage(tool_call_id=tool_id, content=dumped)
    tool_message = HumanMessage(content=dumped)

    swapped_state["messages"].extend([ai_message, tool_message])
    
    # Only update the shared state with the final answer to avoid
    # polluting the dialogue history with intermediate messages
    try:
        generated: AnswerWithCitations = await gen_answer_chain.ainvoke(swapped_state)
    except Exception as e:
        print(f"Error generating answer for [{name}] - {e}")
        generated = AnswerWithCitations(answer="", cited_urls=[])
    
    cited_urls = set(generated.cited_urls)
    
    # Save the retrieved information to a the shared state for future reference
    cited_references = {k: v for k, v in all_query_results.items() if k in cited_urls}
    
    formatted_message = AIMessage(name=name, content=generated.as_str)

    print(f'Finished generating answer for [{name}]')
    return {"messages": [formatted_message], "references": cited_references}
    

In [67]:

example_answer = await gen_answer(
    {"messages": [HumanMessage(content=question["messages"][0].content)]}
)
example_answer["messages"][-1].content

Generating answers for [SubjectMatterExpert]
Swapping roles for SubjectMatterExpert
Converted messages for SubjectMatterExpert while swapping roles: 1 messages
Got 5 search engine queries for [SubjectMatterExpert]
Searching DuckDuckGo for [Impact of million-plus token context window language models on the Retrieval-Augmented Generation (RAG) approach]
Got search engine results: 5 for [Impact of million-plus token context window language models on the Retrieval-Augmented Generation (RAG) approach]
Searching DuckDuckGo for [Capabilities of million-plus token context window language models in analyzing vast amounts of text data within a wide context window]
Got search engine results: 5 for [Capabilities of million-plus token context window language models in analyzing vast amounts of text data within a wide context window]
Searching DuckDuckGo for [Potential enhancements in performance of RAG systems due to million-plus token context window language models]
Got search engine results: 5 fo

'The introduction of Gemini 1.5, which features a 1 million token context window, has sparked discussions in the AI community regarding its impact on Retrieval-Augmented Generation (RAG). Some predict a negative effect on RAG systems due to the high costs associated with fine-tuning long context windows and potential catastrophic values introduced by new token positions. However, research has shown that RAG significantly improves the performance of Large Language Models (LLMs) by providing abundant data for GenAI applications. LongRoPE and Position Interpolation (PI) are examples of techniques that extend context window sizes of pre-trained LLMs, allowing them to process more tokens with minimal fine-tuning. These advancements aim to address the limitations of fixed context windows by combining the power of language models with information retrieval techniques. The ability of LLMs to process and generate coherent text is crucial, and Dual Chunk Attention (DCA) has been proposed to supp

# Construct the Interview Graph

Now that we've defined the editor and domain expert, we can compose them in a graph.


In [76]:
max_num_turns = 5


def route_messages(state: InterviewState, name: str = "SubjectMatterExpert"):

    name = cleanup_name(name)

    print(f'Routing messages for [{name}]')

    messages = state["messages"]
    num_responses = len(
        [m for m in messages if isinstance(m, AIMessage) and m.name == name]
    )

    if num_responses >= max_num_turns:
        return END
    
    last_question = messages[-2]
    if last_question.content.endswith("Thank you so much for your help!"):
        return END
    
    print(f'Continue asking question for [{name}] as this is not the last end of the conversation')
    return "ask_question"


builder = StateGraph(InterviewState)

builder.add_node("ask_question", generate_question)
builder.add_node("answer_question", gen_answer)
builder.add_conditional_edges("answer_question", route_messages)
builder.add_edge("ask_question", "answer_question")

builder.set_entry_point("ask_question")
interview_graph = builder.compile().with_config(run_name="Conduct Interviews")

In [77]:
from IPython.display import Image

# comment out if you have not installed pygraphviz
# Image(interview_graph.get_graph().draw_png())

In [78]:

final_step = None

initial_state = {
    "editor": perspectives.editors[0],
    "messages": [
        AIMessage(
            content=f"So you said you were writing an article on {example_topic}?",
            name="SubjectMatterExpert",
        )
    ],
}
async for step in interview_graph.astream(initial_state):
    name = next(iter(step))
    print(name)
    print(f"Processing step: {name}")
    print("-- ", str(step[name]["messages"])[:300])
    if END in step:
        final_step = step

Generating question for DrDataScientist
Swapping roles for DrDataScientist
Converted messages for DrDataScientist while swapping roles: 1 messages
Generated question for DrDataScientist
ask_question
Processing step: ask_question
--  [AIMessage(content="Yes, that's correct. I'm focusing on how million-plus token context window language models impact the Retrieval-Augmented Generation (RAG) approach. As a researcher in this field, I'm particularly interested in understanding the effectiveness of these models in enhancing RAG capa
Generating answers for [SubjectMatterExpert]
Swapping roles for SubjectMatterExpert
Converted messages for SubjectMatterExpert while swapping roles: 2 messages
Got 4 search engine queries for [SubjectMatterExpert]
Searching DuckDuckGo for [Impact of million-plus token context window language models on RAG]
Got search engine results: 5 for [Impact of million-plus token context window language models on RAG]
Searching DuckDuckGo for [Effectiveness of million-plus 

In [79]:
final_state = next(iter(final_step.values()))


In [80]:
final_state

{'messages': [AIMessage(content='So you said you were writing an article on Impact of million-plus token context window language models on RAG?', name='SubjectMatterExpert'),
  AIMessage(content="Yes, that's correct. I'm focusing on how million-plus token context window language models impact the Retrieval-Augmented Generation (RAG) approach. As a researcher in this field, I'm particularly interested in understanding the effectiveness of these models in enhancing RAG capabilities and how they influence information retrieval and generation processes. Do you have insights or data that could help me understand the specific ways in which these language models impact RAG?", response_metadata={'token_usage': {'completion_tokens': 87, 'prompt_tokens': 243, 'total_tokens': 330}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3bc1b5746c', 'finish_reason': 'stop', 'logprobs': None}, name='DrDataScientist'),
  AIMessage(content='The introduction of language models like Gemini 1.5, with 

## Refine Outline

At this point in STORM, we've conducted a large amount of research from different perspectives. It's time to refine the original outline based on these investigations. Below, create a chain using the LLM with a long context window to update the original outline.


In [85]:
refine_outline_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are a Wikipedia writer. You have gathered information from experts and search engines. Now, you are refining the outline of the Wikipedia page. \
You need to make sure that the outline is comprehensive and specific. \
Topic you are writing about: {topic} 

Old outline:

{old_outline}
""",
        ),
        (
            "user",
            "Refine the outline based on your conversations with subject-matter experts:\n\nConversations:\n\n{conversations}\n\n{format_instructions}\n\nWrite the refined Wikipedia outline:",
        ),
    ]
)


# Using turbo preview since the context can get quite long
refine_outline_chain = refine_outline_prompt.partial(format_instructions=outline_parser.get_format_instructions()) | long_context_llm | outline_parser

In [86]:
refined_outline = refine_outline_chain.invoke(
    {
        "topic": example_topic,
        "old_outline": initial_outline.as_str,
        "conversations": "\n\n".join(
            f"### {m.name}\n\n{m.content}" for m in final_state["messages"]
        ),
    }
)

In [None]:
print(refined_outline.as_str)

# Impact of million-plus token context window language models on RAG

## Introduction

Overview of how large context windows in language models, specifically exemplified by Gemini 1.5 with a 1 million token context window, are influencing the Retrieval-Augmented Generation (RAG) approach in natural language processing (NLP). Discussion on the potential benefits and concerns raised within the AI community.

## Enhancements to RAG

Explanation on how integrating million-plus token context window language models can enhance the capabilities of the Retrieval-Augmented Generation (RAG) framework. Insights into the potential improvements in response generation, understanding of context, and overall performance.

## Practical Application: GraphRAG by Microsoft Research

Case study on GraphRAG, developed by Microsoft Research, showcasing the practical application and impact of combining Large Language Models (LLMs) with tailored retrieval mechanisms to construct knowledge graphs from private d

In [None]:
## Generate Article

In [87]:
from langchain_core.documents import Document

from langchain_community.vectorstores import SKLearnVectorStore
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
reference_docs = [
    Document(page_content=v, metadata={"source": k})
    for k, v in final_state["references"].items()
]

print(f"Number of references: {len(reference_docs)}")

# This really doesn't need to be a vectorstore for this size of data.
# It could just be a numpy matrix. Or you could store documents
# across requests if you want.
vectorstore = SKLearnVectorStore.from_documents(
    reference_docs,
    embedding=embeddings,
)
retriever = vectorstore.as_retriever(k=10)

Number of references: 26


In [88]:
retriever.invoke("What's a long context LLM anyway?")

[Document(page_content="StreamingLLM is an innovative framework that allows large language models to handle text of infinite length without the need for finetuning. This technique preserves attention sinks to maintain a near-normal attention score distribution. When the sequence of the conversation with the LLM surpasses the model's context length, StreamingLLM ...", metadata={'id': 'ef137c5c-7b03-4e85-a336-5030048b329a', 'source': 'https://bdtechtalks.com/2023/11/27/streamingllm/'}),
 Document(page_content='The next stage of LLMs in production is all about making responses hyper-specific: to a dataset, to a user, to a use-case, even to a specific invocation.. This is typically achieved using one of 3 basic techniques:. Context-window-stuffing. RAG (Retrieval Augmented Generation). Fine-tuning. (If none of these mean anything to you - consider subscribing to the newsletter - I will cover each ...', metadata={'id': '67a6a276-7fd6-487b-ad7d-9b9461d959d6', 'source': 'https://ai88.substack

#### Generate Sections

Now you can generate the sections using the indexed docs.


In [89]:
class SubSection(BaseModel):
    subsection_title: str = Field(..., title="Title of the subsection")
    content: str = Field(
        ...,
        title="Full content of the subsection. Include [#] citations to the cited sources where relevant.",
    )

    @property
    def as_str(self) -> str:
        return f"### {self.subsection_title}\n\n{self.content}".strip()


class WikiSection(BaseModel):
    section_title: str = Field(..., title="Title of the section")
    content: str = Field(..., title="Full content of the section")
    subsections: Optional[List[Subsection]] = Field(
        default=None,
        title="Titles and descriptions for each subsection of the Wikipedia page.",
    )
    citations: List[str] = Field(default_factory=list)

    @property
    def as_str(self) -> str:
        subsections = "\n\n".join(
            subsection.as_str for subsection in self.subsections or []
        )
        citations = "\n".join([f" [{i}] {cit}" for i, cit in enumerate(self.citations)])
        return (
            f"## {self.section_title}\n\n{self.content}\n\n{subsections}".strip()
            + f"\n\n{citations}".strip()
        )


section_writer_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an expert Wikipedia writer. Complete your assigned WikiSection from the following outline:\n\n"
            "{outline}\n\nCite your sources, using the following references:\n\n<Documents>\n{docs}\n<Documents>",
        ),
        ("user", "Write the full WikiSection for the {section} section.\n{format_instructions}"),
    ]
)


async def retrieve(inputs: dict):
    docs = await retriever.ainvoke(inputs["topic"] + ": " + inputs["section"])
    formatted = "\n".join(
        [
            f'<Document href="{doc.metadata["source"]}"/>\n{doc.page_content}\n</Document>'
            for doc in docs
        ]
    )
    return {"docs": formatted, **inputs}

wiki_parser = PydanticOutputParser(pydantic_object=WikiSection)

section_writer = (
    retrieve
    | section_writer_prompt.partial(format_instructions=wiki_parser.get_format_instructions())
    | long_context_llm
    | wiki_parser
)

In [90]:
section = await section_writer.ainvoke(
    {
        "outline": refined_outline.as_str,
        "section": refined_outline.sections[1].section_title,
        "topic": example_topic,
    }
)
print(section.as_str)

## Background

Million-plus token context window language models refer to large language models with the capability to process a vast amount of data within a single context window. These models have revolutionized natural language processing tasks by enabling deeper understanding of text and context. In the context of the Retrieval-Augmented Generation (RAG) framework, the use of million-plus token context window language models has significant implications. By allowing the models to consider a larger number of tokens simultaneously, RAG benefits from enhanced information retrieval and generation capabilities.[0] https://www.allabtai.com/rag-vs-context-window/
 [1] https://medium.com/@crskilpatrick807/context-windows-the-short-term-memory-of-large-language-models-ab878fc6f9b5
 [2] https://medium.com/@jm_51428/long-context-window-models-vs-rag-a73c35a763f2
 [3] https://ai.plainenglish.io/context-window-size-and-language-model-performance-balancing-act-2ae2964e3ec1


#### Generate final article

Now we can rewrite the draft to appropriately group all the citations and maintain a consistent voice.


In [91]:
from langchain_core.output_parsers import StrOutputParser

writer_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an expert Wikipedia author. Write the complete wiki article on {topic} using the following section drafts:\n\n"
            "{draft}\n\nStrictly follow Wikipedia format guidelines.",
        ),
        (
            "user",
            'Write the complete Wiki article using markdown format. Organize citations using footnotes like "[1]","" avoiding duplicates in the footer. Include URLs in the footer.',
        ),
    ]
)

writer = writer_prompt | long_context_llm | StrOutputParser()

In [92]:
for tok in writer.stream({"topic": example_topic, "draft": section.as_str}):
    print(tok, end="")

# Impact of Million-Plus Token Context Window Language Models on RAG

## Background

Million-plus token context window language models refer to large language models with the capability to process a vast amount of data within a single context window. These models have revolutionized natural language processing tasks by enabling deeper understanding of text and context. In the context of the Retrieval-Augmented Generation (RAG) framework, the use of million-plus token context window language models has significant implications. By allowing the models to consider a larger number of tokens simultaneously, RAG benefits from enhanced information retrieval and generation capabilities.[1][2][3]

## Implications on RAG

The integration of million-plus token context window language models within the RAG framework has led to notable improvements in information retrieval and text generation processes. These models can capture a broader context in a single window, enabling RAG to access and proces

## Final Flow

Now it's time to string everything together. We will have 6 main stages in sequence:
.

1. Generate the initial outline + perspectives
2. Batch converse with each perspective to expand the content for the article
3. Refine the outline based on the conversations
4. Index the reference docs from the conversations
5. Write the individual sections of the article
6. Write the final wiki

The state tracks the outputs of each stage.


In [93]:
class ResearchState(TypedDict):
    topic: str
    outline: Outline
    editors: List[Editor]
    interview_results: List[InterviewState]
    # The final sections output
    sections: List[WikiSection]
    article: str

In [94]:
import asyncio


async def initialize_research(state: ResearchState):
    topic = state["topic"]
    coros = (
        generate_outline_direct.ainvoke({"topic": topic}),
        survey_subjects.ainvoke(topic),
    )
    results = await asyncio.gather(*coros)
    return {
        **state,
        "outline": results[0],
        "editors": results[1].editors,
    }


async def conduct_interviews(state: ResearchState):
    topic = state["topic"]
    initial_states = [
        {
            "editor": editor,
            "messages": [
                AIMessage(
                    content=f"So you said you were writing an article on {topic}?",
                    name="SubjectMatterExpert",
                )
            ],
        }
        for editor in state["editors"]
    ]
    # We call in to the sub-graph here to parallelize the interviews
    interview_results = await interview_graph.abatch(initial_states)

    return {
        **state,
        "interview_results": interview_results,
    }


def format_conversation(interview_state):
    messages = interview_state["messages"]
    convo = "\n".join(f"{m.name}: {m.content}" for m in messages)
    return f'Conversation with {interview_state["editor"].name}\n\n' + convo


async def refine_outline(state: ResearchState):
    convos = "\n\n".join(
        [
            format_conversation(interview_state)
            for interview_state in state["interview_results"]
        ]
    )

    updated_outline = await refine_outline_chain.ainvoke(
        {
            "topic": state["topic"],
            "old_outline": state["outline"].as_str,
            "conversations": convos,
        }
    )
    return {**state, "outline": updated_outline}


async def index_references(state: ResearchState):
    all_docs = []
    for interview_state in state["interview_results"]:
        reference_docs = [
            Document(page_content=v, metadata={"source": k})
            for k, v in interview_state["references"].items()
        ]
        all_docs.extend(reference_docs)
    await vectorstore.aadd_documents(all_docs)
    return state


async def write_sections(state: ResearchState):
    outline = state["outline"]
    sections = await section_writer.abatch(
        [
            {
                "outline": refined_outline.as_str,
                "section": section.section_title,
                "topic": state["topic"],
            }
            for section in outline.sections
        ]
    )
    return {
        **state,
        "sections": sections,
    }


async def write_article(state: ResearchState):
    topic = state["topic"]
    sections = state["sections"]
    draft = "\n\n".join([section.as_str for section in sections])
    article = await writer.ainvoke({"topic": topic, "draft": draft})
    return {
        **state,
        "article": article,
    }

#### Create the graph


In [95]:
builder_of_storm = StateGraph(ResearchState)

nodes = [
    ("init_research", initialize_research),
    ("conduct_interviews", conduct_interviews),
    ("refine_outline", refine_outline),
    ("index_references", index_references),
    ("write_sections", write_sections),
    ("write_article", write_article),
]
for i in range(len(nodes)):
    name, node = nodes[i]
    builder_of_storm.add_node(name, node)
    if i > 0:
        builder_of_storm.add_edge(nodes[i - 1][0], name)

builder_of_storm.set_entry_point(nodes[0][0])
builder_of_storm.set_finish_point(nodes[-1][0])
storm = builder_of_storm.compile()

In [96]:
async for step in storm.astream(
    {
        "topic": "Building better slack bots using LLMs",
    }
):
    name = next(iter(step))
    print(name)
    print("-- ", str(step[name])[:300])
    if END in step:
        results = step

Survey Subjects for Topic: Building better slack bots using LLMs
Retrieved 9 docs for Topic: Building better slack bots using LLMs
init_research
--  {'topic': 'Building better slack bots using LLMs', 'outline': Outline(page_title='Building Better Slack Bots Using Large Language Models (LLMs)', sections=[Section(section_title='Introduction', description='Overview of using Large Language Models (LLMs) to enhance Slack bots.', subsections=None), Se
Generating question for AlexisChen
Generating question for EthanPatel
Generating question for SashaRodriguez
Generating question for OliverKim
Generating question for LunaChang
Swapping roles for AlexisChen
Converted messages for AlexisChen while swapping roles: 1 messages
Swapping roles for EthanPatel
Converted messages for EthanPatel while swapping roles: 1 messages
Swapping roles for SashaRodriguez
Converted messages for SashaRodriguez while swapping roles: 1 messages
Swapping roles for OliverKim
Converted messages for OliverKim while swappi

In [97]:
article = results[END]["article"]

## Render the Wiki

Now we can render the final wiki page!


In [98]:
from IPython.display import Markdown

# We will down-header the sections to create less confusion in this notebook
Markdown(article.replace("\n#", "\n##"))

# Building Better Slack Bots Using LLMs

### Impact of Million-Plus Token Context Window Language Models on RAG

The introduction section provides an overview of the impact of million-plus token context window language models on the Retrieval-Augmented Generation (RAG) framework. These advanced language models, with the capability to process over a million tokens in their context window, have revolutionized natural language processing tasks. In the context of RAG, these models play a crucial role in enhancing information retrieval and generation processes by enabling a deeper understanding of multimodal inputs and optimizing the handling of longer inputs. The introduction sets the stage for exploring the technical aspects, enhancements in information retrieval and generation, advancements in extending context windows, applications, challenges, and considerations associated with integrating million-plus token context window language models with the RAG framework.

### Understanding Large Language Models (LLMs)

Large Language Models (LLMs) have revolutionized the field of natural language processing with their ability to understand, generate, and manipulate text on a massive scale. These models, such as Gemini 1.5, have significantly impacted the Retrieval-Augmented Generation (RAG) framework by enhancing information retrieval and generation processes.

#### Overview of Large Language Models (LLMs)

Large Language Models (LLMs) are advanced AI models that leverage massive amounts of training data to understand and generate text. They have the capacity to process millions of tokens in a single context window, leading to improved performance in various NLP tasks.

#### Significance of LLMs in Conversational AI

LLMs play a crucial role in advancing Conversational AI capabilities by enabling chatbots and virtual assistants to comprehend and generate human-like text. Through prompt engineering, LLM-based chatbots can be guided to exhibit desired behaviors in conversational settings.

#### StreamingLLM Framework

The StreamingLLM framework is an innovative approach that allows large language models to handle text sequences of infinite length without the need for fine-tuning. By preserving attention sinks and maintaining a near-normal attention score distribution, StreamingLLM ensures seamless processing of lengthy conversations.[3]

### Benefits of Incorporating LLMs in Slack Bots

Large Language Models (LLMs) have revolutionized the capabilities of Slack bots, enabling them to engage in more natural and context-aware conversations with users. By incorporating LLMs into Slack bots, several benefits arise:

#### Enhanced Conversational Abilities

LLMs empower Slack bots to understand nuances in user inputs, provide more relevant responses, and mimic human-like conversations, leading to improved user experience and interaction.

#### Adaptability to Various User Inputs

With LLMs, Slack bots can adapt to a wide range of user inputs, allowing them to handle diverse queries and requests effectively, enhancing the bot's versatility and utility.

#### Improved Context Awareness

LLMs enable Slack bots to maintain context across conversations, making them capable of remembering previous interactions and providing coherent responses, which enhances the overall conversational flow.[4]

### Challenges and Considerations

Implementing million-plus token context window language models in the RAG framework presents several challenges and considerations that need to be addressed for optimal performance and effectiveness.

#### Model Complexity

One of the primary challenges is the increased complexity of million-plus token models, requiring significant computational resources and memory capacity. This complexity can hinder the efficiency of training and inference processes.

#### Fine-Tuning and Adaptation

Fine-tuning million-plus token models for specific tasks within the RAG framework can be a non-trivial task. It requires substantial expertise and time to adapt the models effectively, potentially limiting their widespread application.

#### Data Requirements

Utilizing million-plus token models often demands large volumes of high-quality training data to achieve optimal performance. Acquiring, preprocessing, and managing such extensive datasets can be resource-intensive and challenging.

#### Interpretability and Bias

The interpretability of outputs from million-plus token models can be challenging, as the models' complexity may obscure the reasoning behind their generated responses. Moreover, these models may inherit biases present in the training data, necessitating careful mitigation strategies.

#### Ethical and Legal Implications

Deploying large language models in the RAG framework raises ethical concerns related to misinformation, privacy, and potential misuse. Addressing these implications requires robust governance frameworks and adherence to legal regulations.[5]

### Best Practices for Building Slack Bots with LLMs

Building Slack bots with Large Language Models (LLMs) requires attention to several best practices to ensure optimal performance and user experience. By following these guidelines, developers can create more effective and engaging bots that leverage the power of LLMs to enhance conversational interactions.

#### 1. Data Preparation

Ensure that the training data used for fine-tuning LLMs on Slack messages is relevant and representative of the conversations the bot is expected to engage in. Cleaning and preprocessing the data is essential to remove noise and irrelevant information.

#### 2. Fine-Tuning Process

Use HuggingFace's libraries for fine-tuning LLMs on Slack messages, following best practices to adapt pre-trained models effectively. Consider the specific use case and desired conversational style when fine-tuning the LLM for Slack bot deployment.

#### 3. Prompt Engineering

Implement prompt engineering techniques to guide the behavior of the LLM-powered Slack bot. Crafting appropriate prompts can influence the bot's responses and improve the quality of generated text in conversations.

#### 4. Continuous Monitoring and Evaluation

Regularly monitor the performance of the LLM-powered Slack bot in real-world interactions. Collect feedback from users and evaluate the bot's responses to identify areas for improvement and refine the conversational capabilities.[6]

### Case Studies

Various case studies have demonstrated the practical applications of integrating million-plus token context window language models with the RAG framework. These case studies highlight the effectiveness of utilizing large language models for enhancing information retrieval and generation processes.

#### Medical Data Analysis

In a case study focused on medical data analysis, researchers used million-plus token context window language models to process vast amounts of medical literature and patient records. By leveraging the extended context windows, the models improved the accuracy of generating summaries and answering complex medical questions within the RAG system.

#### Financial News Summarization

Another case study explored the use of million-plus token context window language models for summarizing financial news articles. By training the models on a diverse range of financial data and news sources, the system was able to generate concise and informative summaries that captured the key insights from lengthy articles, showcasing the efficiency of these models in information condensation.

#### Legal Document Analysis

In a legal document analysis case study, million-plus token context window language models were applied to extract relevant information from legal texts and assist in generating case summaries. The models' ability to consider a broader context window enabled more accurate retrieval of legal precedents and facilitated the creation of coherent summaries, demonstrating their utility in the legal domain.[7]

### References

[1] https://yellow.ai/blog/large-language-models/
[2] https://www.analyticsvidhya.com/blog/2023/07/llms-in-conversational-ai/
[3] https://bdtechtalks.com/2023/11/27/streamingllm/
[4] https://opendatascience.com/fine-tuning-llms-on-slack-messages/
[5] https://towardsai.net/p/machine-learning/how-to-create-your-own-llm-powered-slackbot-with-langchain-on-your-own-private-data
[6] https://odsc.medium.com/fine-tuning-llms-on-slack-messages-f580d4996cc3
[7] https://datasciencedojo.com/blog/llm-chatbot/