# arXiv Curating Agent

A simple agent made to read arXiv preprint feeds and look for potentially interesting articles. The agent can either write papers to a remote file (e.g., a page on a static website) or send them in an email.  

This notebook requires an OpenAI API key and (potentitally) GitHub API credentials. If you've somehow ended up here direcly visit the [project page](https://github.com/thisisntnathan/arXivCurator) to make sure things are set up correctly.

In [1]:
# imports
import argparse
import datetime
import os
import re

import toml
from dotenv import load_dotenv
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_openai import ChatOpenAI
from langgraph.checkpoint.memory import MemorySaver
from langgraph.prebuilt import create_react_agent

from tools import (
    get_user_sources,
    read_and_triage,
    read_rss,
    send_email,
    shorten_abstract,
    update_github_target,
    write_paper_entry,
)

# set environment variables
load_dotenv()

# load user config
config = {}
with open("user.toml", "r") as f:
    cfg = toml.load(f)
cfg["thread_id"] = "thread-0"
config["configurable"] = dict(cfg)  # for LG to understand this everything needs to be nested under "configurable"...
print("User config loaded!")

User config loaded!


In [2]:
# create toolkit
tools = [
    get_user_sources,
    read_rss,
    read_and_triage,
    shorten_abstract,
    write_paper_entry,
    update_github_target,
    send_email,
]
print("Agent toolkit:")
for idx, tool in enumerate(tools):
    tool.name = re.sub(r"[^a-zA-Z0-9_-]", "", tool.name.lower().replace(" ", "_"))
    print(idx, tool.name)
    print(f"    {tool.description}")

# checkpointing memory (thread persistence not true memory)
memory = MemorySaver()

sm = SystemMessage(
    "You are a helpful reading assistant. Your primary task is to read \
through rss feeds and summarize articles. Unless the user specifies otherwise produce \
the output as a markdown formatted list."
)

# initialize supervisor agent
llm = ChatOpenAI(model="gpt-4o-mini")
agent_executor = create_react_agent(
    model=llm,
    tools=tools,
    state_modifier=sm,
    checkpointer=memory,
)

Agent toolkit:
0 get_user_sources
    Get this user's top rss feeds for reading.
    Only call this tool if the user does not specify a rss feed url in the query
1 read_rss
    This tool will read data from an RSS feed and return articles from the feed regardless of potential interest.
    If a certain number of articles is requested the list will return no more than the specified number of articles.
    By default this tool returns all articles in the feed.
2 read_and_triage
    This tool will read data from an RSS feed and return papers from the feed are
    interesting to the user.
3 shorten_abstract
    This tool uses an llm to summarize a paper from its title and abstract
4 write_paper_entry
    This tool writes a nice summary of the paper to add to the reading list.
    Do not call this tool without calling 'shorten_abstract()' first.
    The summary is formatted in Markdown language as an entry in an unnumbered list.
5 update_github_target
    This tool updates the target file o

In [3]:
# a simple wrapper func
def ask(msg, agent, verbose=False):
    events = agent.stream(
        {"messages": [HumanMessage(msg)]},
        config,
        stream_mode="values",
    )
    if verbose:
        for event in events:
            event["messages"][-1].pretty_print()
    else:
        *_, final_message = events
        final_message["messages"][-1].pretty_print()

## Ask a question

In [4]:
# interact with the agent using ask()
msg = "What are the last 10 articles from this arxiv feed: \
https://chemrxiv.org/engage/rss/chemrxiv?categoryId=605c72ef153207001f6470ce"
ask(msg, agent_executor)


Here are the last 10 articles from the ChemRxiv RSS feed:

- **[Permeation Enhancer-Induced Membrane Defects Assist the Oral Absorption of Peptide Drugs](https://dx.doi.org/10.26434/chemrxiv-2025-n24f8?rft_dat=source%3Ddrss)**
  - **Authors:** Severin T. Schneebeli, Kyle J. Colston, Kyle T. Faivre
  - **Summary:** This study provides a detailed molecular mechanism for how polar peptides can pass through membranes with the aid of transcellular permeation enhancers, specifically through the formation of membrane defects when paired with salcaprozate sodium (SNAC).

- **[Screening and Design of Aqueous Zinc Battery Electrolytes Based on the Multimodal Optimization of Molecular Simulation](https://dx.doi.org/10.26434/chemrxiv-2025-23xh1?rft_dat=source%3Ddrss)**
  - **Authors:** Wei Feng, Luyan Zhang, Yaobo Cheng, Chunguang Wei, Jin Wu, Junwei Zhang, Kuang Yu
  - **Summary:** This work presents a multimodal optimization workflow for designing aqueous zinc battery electrolytes to prevent fr

In [5]:
# because of threaded memory you can ask the agent follow-up questions
# n.b. using the verbose option we can follow the agent's decision trace
msg = "Summarize the final article (How Local is `Local'? Deep Learning Reveals Locality of the Induced Magnetic Field of Polycyclic Aromatic Hydrocarbons) \
and upload it to the remote github file"
ask(msg, agent_executor, True)


Summarize the final article (How Local is `Local'? Deep Learning Reveals Locality of the Induced Magnetic Field of Polycyclic Aromatic Hydrocarbons) and upload it to the remote github file
Tool Calls:
  shorten_abstract (call_KdMCxeifN9WY3m4celodMijl)
 Call ID: call_KdMCxeifN9WY3m4celodMijl
  Args:
    title: How Local is `Local'? Deep Learning Reveals Locality of the Induced Magnetic Field of Polycyclic Aromatic Hydrocarbons
    abstract: We investigate the locality of magnetic response in polycyclic aromatic molecules using a novel deep-learning approach. Our method employs graph neural networks (GNNs) with a graph-of-rings representation to predict Nucleus-Independent Chemical Shifts in the space around the molecule. We train a series of models, each time reducing the size of the largest molecules used in training. The accuracy of prediction remains high (MAE < 0.5 ppm), even when training the model only on molecules with up to 4 rings, thus providing strong evidence for the locali

In [7]:
# ask another follow-up in reference to previous lists
msg = "Are there any more articles from that list that are similar to that one?"
ask(msg, agent_executor, True)


Are there any more articles from that list that are similar to that one?

Based on the focus on deep learning and its application to molecular properties, particularly in graph neural networks and magnetic responses, the following articles from the list may be considered similar:

1. **[ACES-GNN: Can Graph Neural Network Learn to Explain Activity Cliffs?](https://dx.doi.org/10.26434/chemrxiv-2025-11wfv?rft_dat=source%3Ddrss)**
   - This article discusses a framework that integrates explanation supervision into graph neural networks (GNNs) for predicting molecular properties, which aligns with the computational and machine learning aspects of the article you mentioned.

2. **[Formation and Evolution of Solid Electrolyte Interphase at Calcium Surfaces](https://dx.doi.org/10.26434/chemrxiv-2025-7v8kn?rft_dat=source%3Ddrss)**
   - While focusing on solid electrolyte interphases, this article utilizes computational methods that may involve similar modeling techniques and insights into mole

## Try for yourself!

In [None]:
msg = ""
ask(msg, agent_executor)