# ArXiv API

This notebook walks through the basics of the arXiv API. Throughout the course we use the arXiv API to provide our agents with access to arXiv's huge lake of knowledge.

In [1]:
import requests

res = requests.get(
    "http://export.arxiv.org/api/query",
    params={"search_query": "react agents", "start": 0, "max_results": 10},
)

print(res.text)

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <link href="http://arxiv.org/api/query?search_query%3Dreact%20agents%26id_list%3D%26start%3D0%26max_results%3D10" rel="self" type="application/atom+xml"/>
  <title type="html">ArXiv Query: search_query=react agents&amp;id_list=&amp;start=0&amp;max_results=10</title>
  <id>http://arxiv.org/api/RoKeGBr+BJzFk9M948NjrHPLwiM</id>
  <updated>2024-10-26T00:00:00-04:00</updated>
  <opensearch:totalResults xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">42888</opensearch:totalResults>
  <opensearch:startIndex xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">0</opensearch:startIndex>
  <opensearch:itemsPerPage xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">10</opensearch:itemsPerPage>
  <entry>
    <id>http://arxiv.org/abs/2405.13966v1</id>
    <updated>2024-05-22T20:05:49Z</updated>
    <published>2024-05-22T20:05:49Z</published>
    <title>On the Brittle Foundations of ReAct Prompting fo

The arXiv API returns HTML directly, we need to process this to return structured data.

In [4]:
import requests
import re

def clean_text(text: str) -> str:
    return " ".join([x.strip() for x in text.strip().split("\n")])

# Function to extract information using regex
def extract_info(html_content):
    articles = re.findall(r'<entry>(.*?)</entry>', html_content, re.DOTALL)
    data = []
    for article in articles:
        try:
            info = {
                "id": clean_text(re.search(
                    r"<id>(.*?)</id>",
                    article,
                    re.IGNORECASE | re.DOTALL
                ).group(1)),
                "title": clean_text(re.search(
                    r"<title>\s*(.*?)\s*</title>",
                    article,
                    re.IGNORECASE | re.DOTALL
                ).group(1)),
                "published_date": clean_text(re.search(
                    r"<published>(.*?)</published>",
                    article,
                    re.IGNORECASE | re.DOTALL
                ).group(1)),
                "updated_date": clean_text(re.search(
                    r"<updated>(.*?)</updated>",
                    article,
                    re.IGNORECASE | re.DOTALL
                ).group(1)),
                "summary": clean_text(re.search(
                    r"<summary>(.*?)</summary>",
                    article,
                    re.IGNORECASE | re.DOTALL
                ).group(1)),
                "authors": [
                    clean_text(name) for name in re.findall(
                        r"<name>(.*?)</name>",
                        article,
                        re.IGNORECASE | re.DOTALL
                    )
                ],
                "categories": [
                    clean_text(category) for category in re.findall(
                        r"<category term=\"(.*?)\"",
                        article,
                        re.IGNORECASE | re.DOTALL
                    )
                ]
            }
            data.append(info)
        except Exception as e:
            print(f"Error extracting info: {e}")
            print(article)
            raise e
    return data

# Request to arXiv API
res = requests.get(
    "http://export.arxiv.org/api/query",
    params={"search_query": "react agents", "start": 0, "max_results": 10},
)

# Extract information
articles_info = extract_info(res.text)
for article in articles_info:
    print(article)

{'id': 'http://arxiv.org/abs/2405.13966v1', 'title': 'On the Brittle Foundations of ReAct Prompting for Agentic Large Language Models', 'published_date': '2024-05-22T20:05:49Z', 'updated_date': '2024-05-22T20:05:49Z', 'summary': 'The reasoning abilities of Large Language Models (LLMs) remain a topic of debate. Some methods such as ReAct-based prompting, have gained popularity for claiming to enhance sequential decision-making abilities of agentic LLMs. However, it is unclear what is the source of improvement in LLM reasoning with ReAct based prompting. In this paper we examine these claims of ReAct based prompting in improving agentic LLMs for sequential decision-making. By introducing systematic variations to the input prompt we perform a sensitivity analysis along the claims of ReAct and find that the performance is minimally influenced by the "interleaving reasoning trace with action execution" or the content of the generated reasoning traces in ReAct, contrary to original claims an

We will allow our agents in various examples to use the arXiv search as above.