In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import dspy

lm = dspy.LM("ollama_chat/gpt-oss:20b", api_base="http://localhost:11434", api_key="fake")
dspy.configure(lm=lm)

To invoke the LLM

In [3]:
lm(messages=[{"role": "user", "content": "Hi! How many 'r's are there in strawberry?"}])  

["There are **3** 'r's in the word *strawberry*."]

# 1. Inline signatures

Declare signatures inline using strings and arrows!

## Chain Of Thought
GPT-oss has a 128k context window! Let's make it summarize some documents!

In [4]:
import os

if not os.path.exists("../docs"):
    os.makedirs("../docs")

In [5]:
!wget https://arxiv.org/pdf/2505.20286 -O "../docs/alita_paper.pdf"

--2025-08-09 21:48:10--  https://arxiv.org/pdf/2505.20286
Resolving arxiv.org (arxiv.org)... 151.101.195.42, 151.101.67.42, 151.101.131.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.195.42|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1113373 (1.1M) [application/pdf]
Saving to: ‘../docs/alita_paper.pdf’


2025-08-09 21:48:10 (8.36 MB/s) - ‘../docs/alita_paper.pdf’ saved [1113373/1113373]



In [6]:
from llama_index.core import SimpleDirectoryReader

docs = SimpleDirectoryReader("../docs").load_data()

In [7]:
from IPython.display import display, Markdown
display(Markdown(str(docs[0])))

Doc ID: 61884915-871e-40fd-a2f2-28b59641dd22
Text: arXiv:2505.20286v1  [cs.AI]  26 May 2025 ALITA : G ENERALIST
AGENT ENABLING SCALABLE AGENTIC REASONING WITH MINIMAL PREDEFINITION
AND MAXIMAL SELF -EVOLUTION Jiahao Qiu∗1, Xuan Qi∗2, Tongcheng
Zhang∗3, Xinzhe Juan3,4, Jiacheng Guo1, Yifu Lu1, Yimin Wang3,4, Zixin
Yao1, Qihan Ren3, Xun Jiang5, Xing Zhou5, Dongrui Liu3, Ling Yang1,
Yue Wu1, Kaixua...

In [8]:
doc_text = "\n\n".join([d.get_content() for d in docs])

In [10]:
summarize = dspy.ChainOfThought('full_document -> summary')
response = summarize(full_document = doc_text)

In [11]:
display(Markdown(response.summary))

Alita successfully generated a YouTube Video Subtitle Crawler MCP, executed it to retrieve the transcript of the specified 360 VR video, and extracted the correct number “100000000” mentioned by the narrator after the dinosaur scene. The workflow involved MCP brainstorming, web search for an open‑source tool, environment setup, code generation, MCP packaging, and final answer extraction.

In [12]:
display(Markdown(response.reasoning))

The case study demonstrates Alita’s workflow for extracting a specific piece of information from a YouTube 360 VR video. The process begins with an MCP Brainstorming step, where Alita identifies the need for a “YouTube Video Subtitle Crawler” MCP to automate subtitle extraction. The Web Agent then searches open‑source repositories and locates the `youtube-transcript-api` library on GitHub. The Manager Agent synthesizes this information, writes a Python function that uses the API to fetch the transcript, and generates environment setup instructions (conda environment creation and pip install). Once the code is executed in the prepared environment, the Manager Agent packages the function into the MCP, which is then used to scrape the subtitles from the target video. By parsing the transcript, Alita identifies the number “100000000” mentioned immediately after the dinosaurs are first shown. This answer matches the correct answer provided in the dataset.

In [13]:
dspy.inspect_history()





[34m[2025-08-09T21:48:11.903709][0m

[31mSystem message:[0m

Your input fields are:
1. `full_document` (str):
Your output fields are:
1. `reasoning` (str): 
2. `summary` (str):
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## full_document ## ]]
{full_document}

[[ ## reasoning ## ]]
{reasoning}

[[ ## summary ## ]]
{summary}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Given the fields `full_document`, produce the fields `summary`.


[31mUser message:[0m

[[ ## full_document ## ]]
arXiv:2505.20286v1  [cs.AI]  26 May 2025
ALITA : G ENERALIST AGENT ENABLING SCALABLE AGENTIC
REASONING WITH MINIMAL PREDEFINITION AND MAXIMAL
SELF -EVOLUTION
Jiahao Qiu∗1, Xuan Qi∗2, Tongcheng Zhang∗3, Xinzhe Juan3,4, Jiacheng Guo1, Yifu Lu1, Yimin Wang3,4, Zixin Yao1,
Qihan Ren3, Xun Jiang5, Xing Zhou5, Dongrui Liu3, Ling Yang1, Yue Wu1, Kaixuan Huang1, Shilong Liu1,
Hongru Wang6, Mengdi Wang1
1AI Lab, Prin

## DSPy predict - A zero vector DB example

Adding an instruction to the Signature helps us to couch the LLM's reply.

> Not recommended because the document will greatly clog the LLM's context window. This code just demonstrates the power of having a long context window and how to use DSPy declarative signatures with instructions!

In [14]:
zero_vector_db = dspy.Predict(
    dspy.Signature(
        'document: str, question: str -> answer: str',
        instructions='Only use the document to answer the question and nothing else.'
    )
)

question = 'How does ALITA help LLMs to achieve autonomous reasoning?'
response = zero_vector_db(question=question, document=doc_text)

In [15]:
display(Markdown(response.answer))

ALITA enables large language models (LLMs) to perform autonomous reasoning by adopting a design philosophy of **minimal predefinition and maximal self‑evolution**.  
Key mechanisms include:

1. **MCP Brainstorming** – The LLM first introspects the task, identifies missing capabilities, and proposes new *Model‑agnostic Toolchains* (MCPs) that can be built on‑the‑fly.  
2. **Web Agent Retrieval** – It searches public code repositories and APIs to find existing libraries that can implement the proposed MCP, thereby avoiding the need for the model to write code from scratch.  
3. **Dynamic Environment Construction** – The LLM generates the necessary environment‑setup commands (e.g., conda or pip installs) and integrates them with the retrieved code.  
4. **Self‑Generated MCP Packaging** – The model packages the retrieved code and environment instructions into a reusable MCP, which can be invoked as a tool for the current task.  
5. **Iterative Refinement** – If the first attempt fails, the model can regenerate the MCP or adjust its reasoning chain, effectively learning from its own failures.  

By allowing the model to **create, evolve, and reuse tools in real time**, ALITA turns the LLM into an autonomous reasoner that no longer relies on a fixed set of pre‑built tools or workflows. This self‑evolving capability scales with the underlying model’s coding and reasoning power, enabling more complex, multi‑step problem solving without human‑written tool libraries.

YES!! No vector database!

In [16]:
dspy.inspect_history()





[34m[2025-08-09T21:48:12.008800][0m

[31mSystem message:[0m

Your input fields are:
1. `document` (str): 
2. `question` (str):
Your output fields are:
1. `answer` (str):
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## document ## ]]
{document}

[[ ## question ## ]]
{question}

[[ ## answer ## ]]
{answer}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Only use the document to answer the question and nothing else.


[31mUser message:[0m

[[ ## document ## ]]
arXiv:2505.20286v1  [cs.AI]  26 May 2025
ALITA : G ENERALIST AGENT ENABLING SCALABLE AGENTIC
REASONING WITH MINIMAL PREDEFINITION AND MAXIMAL
SELF -EVOLUTION
Jiahao Qiu∗1, Xuan Qi∗2, Tongcheng Zhang∗3, Xinzhe Juan3,4, Jiacheng Guo1, Yifu Lu1, Yimin Wang3,4, Zixin Yao1,
Qihan Ren3, Xun Jiang5, Xing Zhou5, Dongrui Liu3, Ling Yang1, Yue Wu1, Kaixuan Huang1, Shilong Liu1,
Hongru Wang6, Mengdi Wang1
1AI Lab, Princeton University 2IIIS, Tsi

# 2. Programmatic Signatures and how they integrate with the broader LLM ecosystem
In general, you will have to use DSPy for any (or only the final) LLM centric operation because it is focused on LLM prompting. Every other operation (tool, vector database, etc.) can come from any other framework!

> We use LlamaIndex to provide vector indexing capabilities here.

Creating a vector database to ingest our documents

In [17]:
from llama_index.core import VectorStoreIndex, Settings
from llama_index.embeddings.ollama import OllamaEmbedding

Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text")

index = VectorStoreIndex(docs, embed_model=Settings.embed_model)

base_retriever = index.as_retriever(similarity_top_k=6)

In [18]:
nodes = base_retriever.retrieve(question)
len(nodes)

6

In [None]:
from tqdm.notebook import tqdm

class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""
    context = dspy.InputField(desc="May contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="Often between 1-10 sentences.")

class RewriteQuestion(dspy.Signature):
    question = dspy.InputField()
    rewritten_questions: list[str] = dspy.OutputField(
        desc="Decompose this question into sub questions or rewrite the original user question if necessary to improve retrieval from a vector database. Otherwise return the original question."
    )
    
class RAG(dspy.Module):
    def __init__(self):
        super().__init__()
        self.retriever = base_retriever
        self.rewriter = dspy.Predict(RewriteQuestion)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
        self.consolidate_answer = dspy.Predict(
            dspy.Signature(
                'original_question: str, sub_answers: list[str] -> consolidated_answer:str',
                instructions="Consolidate the sub answers into a coherent answer within a few paragraphs that answers the original question."
            )
        )
        
    def query_rewrite(self, question: str):
        return self.rewriter(question=question)
    
    def forward(self, question: str):
        question_rewrite = self.query_rewrite(question)
        sub_answers = []
        for q in tqdm(question_rewrite.rewritten_questions):
            print(f"\n----\nProcessing question: {q}")            
            context = self.retriever.retrieve(q) #the LlamaIndex component
            sub_answer = self.generate_answer(context=context, question=q)
            print(f"\nAnswer to {q}: {sub_answer}")
            print(f"\nSub question answer reasoning: {sub_answer.reasoning}\n----\n")
            sub_answers.append(sub_answer)
        prediction = self.consolidate_answer(original_question=question, sub_answers=sub_answers)
        return prediction

In [28]:
engine = RAG()
pred = engine(
    "How Agent become autonomous with Atila? Alita use Model context protocol? It write own tools? Tools deploy where?"
)

  0%|          | 0/4 [00:00<?, ?it/s]


----
Processing question: How does an agent become autonomous with Atila?

Answer to How does an agent become autonomous with Atila?: Prediction(
    reasoning='The question asks for a concise explanation of how an agent achieves autonomy using the Alita framework. Based on the provided documents, Alita’s autonomy comes from its minimal predefinition of tools and workflows, a manager that orchestrates planning and execution, and a self‑evolution mechanism that allows the agent to adapt and improve its reasoning over time. The answer should capture these key points in a short, factoid style.',
    answer='Alita achieves autonomy by using a minimal set of predefined tools and workflows, a manager that orchestrates planning and execution, and a self‑evolution mechanism that lets the agent adapt and improve its reasoning over time.'
)

Sub question answer reasoning: The question asks for a concise explanation of how an agent achieves autonomy using the Alita framework. Based on the provid

In [29]:
display(Markdown(pred.consolidated_answer))

Alita achieves autonomy by relying on a very small set of predefined tools and workflows, while a manager component orchestrates planning and execution. The agent’s design is intentionally minimal, allowing it to adapt and improve its reasoning over time through a self‑evolution mechanism.  

It does **not** employ the Model Context Protocol, nor does it generate its own tools. Instead, Alita operates with a concise toolkit that is built into the system.  

The tools are deployed inside Alita’s manager agent, which handles the use of the available toolkits such as MCP Brainstorming, ScriptGeneratingTool, and CodeRunningTool. This manager coordinates the agent’s interactions with those tools to carry out tasks autonomously.

In [30]:
dspy.inspect_history()





[34m[2025-08-09T21:54:44.187729][0m

[31mSystem message:[0m

Your input fields are:
1. `original_question` (str): 
2. `sub_answers` (list[str]):
Your output fields are:
1. `consolidated_answer` (str):
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## original_question ## ]]
{original_question}

[[ ## sub_answers ## ]]
{sub_answers}

[[ ## consolidated_answer ## ]]
{consolidated_answer}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Consolidate the sub answers into a coherent answer within a few paragraphs that answers the original question.


[31mUser message:[0m

[[ ## original_question ## ]]
How Agent become autonomous with Atila? Alita use Model context protocol? It write own tools? Tools deploy where?

[[ ## sub_answers ## ]]
[Prediction(
    reasoning='The question asks for a concise explanation of how an agent achieves autonomy using the Alita framework. Based on the provided docume

# 3. Creating agents with DSPy
Agents in DSPy require tools

In [35]:
class RAGAgentSignature(dspy.Signature):
    question: str = dspy.InputField()
    history: dspy.History = dspy.InputField()
    answer: str = dspy.OutputField()

class AskForMoreInfo(dspy.Signature):
    question = dspy.InputField()
    response = dspy.OutputField()

def ask_for_clarification_tool(question: str):
    """Use this tool if the user's question is unclear. This tool prompts the user for more information"""
    clarification = dspy.Predict(
        AskForMoreInfo(
            question=question,
            instructions="The user has asked an ambiguous question. Ask the user for clarifications to the question."
        )
    )
    return clarification.response

def query_alita_knowledge_base(question: str):
    """Use this tool to query the knowledge base on Alita."""
    engine = RAG()
    pred = engine(question)
    return pred.consolidated_answer

history = dspy.History(messages=[])
agent = dspy.ReAct(RAGAgentSignature, tools=[query_alita_knowledge_base, ask_for_clarification_tool,])

In [36]:
response = agent(question="jkok", history=history)
history.messages.append({"question": "jkok", **response})

In [37]:
response.answer

'Could you please clarify what you mean by “jkok”? Are you asking for information about a topic, a location, or something else?'

In [38]:
clarified_question = "Sorry I accidentally sent that message. Here's the question I intended to ask: How does an agent become autonomous with Alita?"
response = agent(question=clarified_question, history=history)
display(Markdown(response.answer))

  0%|          | 0/3 [00:00<?, ?it/s]


----
Processing question: What steps are required for an agent to become autonomous using Alita?

Answer to What steps are required for an agent to become autonomous using Alita?: Prediction(
    reasoning='The question asks for the steps an agent must follow to become autonomous when using Alita. Based on the provided documents, Alita’s design emphasizes minimal predefinition, self‑evolution, and scalable reasoning. The autonomous workflow therefore involves: (1) giving Alita a high‑level goal or task description; (2) letting Alita generate a plan or set of sub‑tasks (MCPs) without relying on pre‑built tools; (3) executing the plan while monitoring outcomes; (4) using feedback to refine and evolve the plan or internal strategies; and (5) iterating this cycle until the goal is achieved. These steps capture the core autonomous loop enabled by Alita.',
    answer='1. Provide Alita with a high‑level goal or task description.  \n2. Let Alita generate a plan (MCPs) using its minimal‑predef

An agent becomes autonomous with Alita by following a simple, self‑driven loop that relies on two core principles: **minimal predefinition** and **maximal self‑evolution**.

1. **Minimal predefinition** – The agent is given only a high‑level goal or task description, without any hand‑crafted tools, workflows, or detailed instructions. This keeps the initial setup lightweight and allows the agent to discover how to solve the problem on its own.

2. **Self‑generated planning** – Using Alita’s internal reasoning engine, the agent produces a plan (often a set of modular cognitive processes, or MCPs) that outlines the steps needed to reach the goal. Because the plan is generated on‑the‑fly, the agent can adapt it to the specifics of the task.

3. **Execution and monitoring** – The agent carries out the plan while continuously monitoring outcomes. It observes the results of each action and checks whether the intermediate objectives are being met.

4. **Feedback‑driven refinement** – Based on the monitoring data, the agent refines its plan or internal strategies. This iterative adjustment allows the agent to learn from successes and failures, improving its performance over time.

5. **Iteration until completion** – The cycle of planning, executing, monitoring, and refining repeats until the original goal is achieved. Throughout this process, the agent requires only minimal human oversight, demonstrating true autonomy.

In short, an agent becomes autonomous with Alita by starting with a broad goal, letting Alita generate and execute a self‑adaptable plan, and iteratively refining its approach through continuous feedback—all while operating with minimal pre‑defined tools and maximal self‑evolution.

# 4. Prompt Optimization

In [39]:
import os
from dotenv import load_dotenv, find_dotenv
from llama_cloud_services import LlamaParse

_ = load_dotenv(find_dotenv())

parser = LlamaParse(
    api_key=os.getenv("LLAMA_CLOUD_API_KEY"),  # can also be set in your env as LLAMA_CLOUD_API_KEY
    num_workers=4,       # if multiple files passed, split in `num_workers` API calls
    verbose=True,
    language="en",       # optionally define a language, default=en
    use_vendor_multimodal_model=True,
    vendor_multimodal_model_name="anthropic-sonnet-3.5"
)

In [40]:
import nest_asyncio
nest_asyncio.apply()

In [41]:
result = await parser.aparse("../pdf/resume.pdf")

Started parsing the file under job_id 9cdf89a4-1ad1-49ef-9836-5f025a799b91


In [42]:
markdown_documents = result.get_markdown_documents(split_by_page=True)

In [None]:
resume_vector_store_index = VectorStoreIndex.from_documents(markdown_documents)

# Save index to a filepath for easy loading.
resume_vector_store_index.storage_context.persist(persist_dir="../resume_storage")

In [79]:
from llama_index.core import StorageContext, load_index_from_storage

storage_context = StorageContext.from_defaults(persist_dir="../resume_storage")
resume_vector_store_index = load_index_from_storage(storage_context)

Loading llama_index.core.storage.kvstore.simple_kvstore from ../resume_storage/docstore.json.
Loading llama_index.core.storage.kvstore.simple_kvstore from ../resume_storage/index_store.json.


In [80]:
resume_retriever = resume_vector_store_index.as_retriever()

class ResumeRAG(dspy.Module):
    def __init__(self):
        super().__init__()
        self.resume_retriever = resume_retriever
        self.response_synthesizer = dspy.ChainOfThought('question, contexts -> answer')
    def forward(self, question: str):
        contexts = self.resume_retriever.retrieve(question)
        return self.response_synthesizer(question=question, contexts=contexts)

In [81]:
resume_engine = ResumeRAG()
response = resume_engine("What did Titus do in Toyota Tsusho?")
display(Markdown(response.answer))

Titus overhauled Toyota Tsusho’s sales‑planning system by redesigning the formatting of sales planning files and building a centralized, automated forecasting tool in VBA. He applied operational‑research forecasting techniques such as exponential smoothing and Holt‑Winters, which helped Toyota’s automotive distributors cut stock overflows and improved long‑term sales forecast accuracy by roughly 25 %.