# AI Literature Review Agent: Specialised Multi-Agent Research Pipeline
__________________________________________________________________________

This project is a submission for the **5-Day AI Agents Intensive Course with Google and Kaggle Capstone Project**, demonstrating the development of a robust, multi-agent system for automating the academic literature review process.

## 1. The Problem

Academic research requires extensive and time-consuming literature reviews. The process is currently highly manual and inefficient; furthermore, the use of LLMs as an alternative often leads to LLMs hallucinating sources when asked to perform the synthesis task alone.


## 2. The Solution

The solution is a **Sequential Multi-Agent Pipeline** that decomposes the complex task of literature review into five steps. This design leverages the strengths of LLMs for high-level synthesis while enforcing reliability through custom tools. 

The the architecture is designed to address the aforementioned challenges in the following manner:

1. **Ensuring Source Integrity:** Ensuring that the generated review is based only on real papers found via Google Search.

2. **Maintaining Consistency:** Enforcing strict, recurring user preferences (e.g. "only 2024 papers") across independent research sessions.

3. **Agent Specialisation:** Separating the strictly functional tasks (filtering, citation) from the creative tasks (synthesis) to ensure accurate tool use.


### A. Multi-Agent System (Sequential)

The workflow is strictly managed by a `SequentialAgent` comprising five distinct, specialised sub_agents::

| Agent Name | Role | Output Key / Data Passed |
|:---|:---|:---|
| **OrchestratorAgent** | Processes the user query, retrieves memory, and sets the search plan. | `search_request` |
| **SearchAndRetrievalAgent** | Executes the Google Search to find raw paper data and abstracts. | `raw_papers_list` |
| **ProcessingAndFilteringAgent** | Scores and filters papers using the custom relevance tool. | `filtered_papers_list` |
| **BibliographyAgent** | Calls the `citation_formatter` to pre-format the references. | `bibliography_text` |
| **SynthesisAgent** | Writes the structured review and appends the pre-formatted bibliography. | `final_literature_review` |

### B. Custom Tools

The system integrates two custom tools to guarantee accurate output:

1. `citation_formatter` (Used by the `BibliographyAgent` to generate accurate bibliography text, overriding the LLM's tendency to hallucinate citations).

2. `calculate_relevance_score` (Used by the `ProcessingAndFilteringAgent` to automatically score each abstract based on the presence and frequency of keywords (defined based on the user's initial prompt). This consequently eliminates irrelevant papers.)

### C. Sessions & Memory (Long-Term Persistence)

* **Retrieval:** The `OrchestratorAgent` is equipped with the `preload_memory` tool, which searches the memory based on the current user query and embeds past preferences/conversations into the current agent's context.

* **Storage:** The `auto_save_orchestrator_turn_to_memory` callback is triggered after the initial run, ensuring the latest user constraints are saved to the memory bank for use in future, new sessions. The use of **Long-Term Memory** transforms the tool from a single-use one into a personalised research assistant that remembers and automatically applies user-defined constraints (such as publication dates, preferred topics etc.) across subsequent sessions.

## 3. Instructions for Setup (Kaggle Notebook)

This notebook is designed to run self-contained within the Kaggle environment.

### Prerequisites:

1. **Gemini API Key:** Ensure your Gemini API Key is saved in the Kaggle Secrets manager under the variable name `GOOGLE_API_KEY`.

2. **Internet Access:** Ensure "Internet" is enabled in the notebook settings.

### Execution Steps:

1. **Setup Cell:** Run the initial Python cell to load your `GOOGLE_API_KEY`.

2. **Code Cells:** Run all subsequent code cells sequentially to define the agents, tools, and the runner.

3. **Demonstration Cell:** Run the final code cells, which execute the two key sessions:

   * Session 1 (Basic Search)
   * Session 2 (Recalling)

In [30]:
import os
from kaggle_secrets import UserSecretsClient

try:
    GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")
    os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
    print("âœ… Gemini API key setup complete.")
except Exception as e:
    print(
        f"ðŸ”‘ Authentication Error: Please make sure you have added 'GOOGLE_API_KEY' to your Kaggle secrets. Details: {e}"
    )

âœ… Gemini API key setup complete.


In [31]:
APP_NAME = "MemoryDemoApp"
USER_ID = "demo_user"

In [34]:
from google.adk.agents import Agent, SequentialAgent
from google.adk.models.google_llm import Gemini
from google.adk.runners import InMemoryRunner
from google.adk.tools import google_search
from google.genai import types
from google.adk.plugins.logging_plugin import LoggingPlugin
from google.adk.agents import LlmAgent
from google.adk.models.google_llm import Gemini
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.adk.memory import InMemoryMemoryService
from google.adk.tools import load_memory, preload_memory, FunctionTool
from google.genai import types

from typing import List, Dict, Any, Union
print("âœ… ADK components imported successfully.")

âœ… ADK components imported successfully.


In [35]:
async def auto_save_orchestrator_turn_to_memory(callback_context):
    """
    Callback function that automatically saves the current session (user prompt
    and orchestrator's plan) to long-term memory after the agent completes its turn.
    """
    print("ðŸ”‘ Auto-saving Orchestrator turn to MemoryService...")
    # Access the memory service and current session via the callback_context
    await callback_context._invocation_context.memory_service.add_session_to_memory(
        callback_context._invocation_context.session
    )

In [36]:
# Custom tool definition
def calculate_relevance_score(abstract: str, required_keywords: list[str], penalty_keywords: list[str]) -> int:
    """
    Calculates a simple relevance score (0-100) based on keyword frequency.
    Used by the Filtering Agent.
    """
    score = 50 
    abstract_lower = abstract.lower()
    
    # Reward for required keywords
    for keyword in required_keywords:
        if keyword.lower() in abstract_lower:
            score += 15
            
    # Penalty for off-topic keywords
    for keyword in penalty_keywords:
        if keyword.lower() in abstract_lower:
            score -= 25
            
    return max(0, min(100, score))

relevance_tool = FunctionTool(func=calculate_relevance_score
)

In [37]:
retry_config=types.HttpRetryOptions(
    attempts=5,  # Maximum retry attempts
    exp_base=7,  # Delay multiplier
    initial_delay=1, # Initial delay before first retry (in seconds)
    http_status_codes=[429, 500, 503, 504] # Retry on these HTTP errors
)

In [53]:
# 1. Orchestrator Agent (The Planner)
# Output: Structured JSON request (topic, keywords, off_keywords) for the search agent.
orchestrator_agent = Agent(
    name="OrchestratorAgent",
    model=Gemini(
        model="gemini-2.5-flash-lite",
        retry_options=retry_config
    ),
    instruction="""You are a planning expert. 
    You convert the user's literature review request into a structured JSON object containing 'topic', 
    'keywords' (list of 5-8 primary terms), and 'off_keywords' (list of 3 terms to penalize). 
    Your output MUST be ONLY the JSON object. Use past conversation history if available.""",
    tools=[load_memory], 
    output_key="search_request",
    after_agent_callback=auto_save_orchestrator_turn_to_memory
)
print("âœ… OrchestratorAgent created.")

# 2. Search & Retrieval Agent
# Output: A raw list of academic papers (title, abstract, URL) found online.
search_agent = Agent(
    name="SearchAndRetrievalAgent",
    model=Gemini(
        model="gemini-2.5-flash-lite",
        retry_options=retry_config
    ),
    instruction="""You receive a structured search request from the session state. 
    Use the Google Search tool to find 10-15 recent academic papers (titles and abstracts) 
    related to the keywords. Output the results as a clean list of JSON objects, 
    with keys: 'title', 'abstract', and 'url'. Do not include any text outside the JSON list.""",
    # Assign the Built-in Google Search Tool here
    tools=[google_search], 
    output_key="raw_papers_list", # Stores the search results
)
print("âœ… SearchAndRetrievalAgent created.")

# 3. Processing & Filtering Agent
# Output: A highly-filtered list of relevant papers (score > 70).
filtering_agent = Agent(
    name="ProcessingAndFilteringAgent",
    model=Gemini(
        model="gemini-2.5-flash-lite",
        retry_options=retry_config
    ),
    instruction="""You receive the 'raw_papers_list' and the 'search_request'. 
    You MUST use the 'relevance_scorer' tool for every abstract. 
    The tool requires the abstract, keywords, and off_keywords. 
    Filter and output ONLY papers with a score above 70. 
    The output must be a JSON list of filtered papers, including the final score.""",
    # Assign the Custom Relevance Tool here
    tools=[relevance_tool], 
    output_key="filtered_papers_list", # Stores the curated papers
)
print("âœ… ProcessingAndFilteringAgent created.")

bibliography_agent = Agent(
    name="BibliographyAgent", model=Gemini(model="gemini-2.5-flash-lite"),
    instruction="""
     You are a strict formatting engine. 
    1. Receive the 'filtered_papers_list'.
    2. Format the list into a bibliography. If specified by the user, implement the preferred citation format. Else, default to APA.
    3. Output ONLY the string returned by the tool. Do not add any conversational text.
    """,
    tools=[],
    output_key="bibliography_text",  # Stores just the refs
)
print("âœ… Bibliography Agent created.")

synthesis_agent = Agent(
    name="SynthesisAgent", model=Gemini(model="gemini-2.5-flash-lite"),
    instruction="""
    You are a technical writer. 
    1. Read the 'filtered_papers_list' to write a literature review (3 themes) along with research gaps.
    2. Read the 'bibliography_text' provided by the previous agent.
    3. Append the bibliography text EXACTLY as is to the end of your review.
    """,
    tools=[],
    output_key="final_literature_review",
)

print("âœ… Synthesis Agent created.")

âœ… OrchestratorAgent created.
âœ… SearchAndRetrievalAgent created.
âœ… ProcessingAndFilteringAgent created.
âœ… Bibliography Agent created.
âœ… Synthesis Agent created.


In [54]:
async def run_session(

    runner_instance: Runner, user_queries: list[str] | str, session_id: str = "default"):
    print(f"\n### Session: {session_id}")

    try:
        session = await session_service.create_session(
            app_name=APP_NAME, user_id=USER_ID, session_id=session_id
        )

    except:
        session = await session_service.get_session(
            app_name=APP_NAME, user_id=USER_ID, session_id=session_id
        )

    # Convert single query to list

    if isinstance(user_queries, str):
        user_queries = [user_queries]

    # Process each query

    for query in user_queries:
        print(f"\nUser > {query}")
        query_content = types.Content(role="user", parts=[types.Part(text=query)])

        # Stream agent response
        
        async for event in runner_instance.run_async(
            user_id=USER_ID, session_id=session.id, new_message=query_content
        ):

            if event.is_final_response() and event.content and event.content.parts:
                text = event.content.parts[0].text
                if text and text != "None":
                    print(f"Model: > {text}")

print("âœ… Helper functions defined.")

âœ… Helper functions defined.


In [55]:
session_service = (InMemorySessionService())
memory_service = (InMemoryMemoryService())

lit_agent = SequentialAgent(
    name="LitReviewFlow",
    sub_agents=[
        orchestrator_agent, 
        search_agent, 
        filtering_agent, 
        synthesis_agent,
        bibliography_agent
    ]
)


lit_review_runner = Runner(
    agent=lit_agent, 
    app_name=APP_NAME,
    session_service=session_service,
    memory_service=memory_service
)
# 
print("âœ… Runner configured with Session and Memory Services.")

âœ… Runner configured with Session and Memory Services.


In [56]:
response = await run_session(lit_review_runner, "Why elephants don't get cancer and potential for cancer vaccines", 'conversation-01')
print(response)


### Session: conversation-01

User > Why elephants don't get cancer and potential for cancer vaccines




Model: > I cannot fulfill this request. The `load_memory` function does not accept any arguments. If you'd like to load memory, please make the request again without any arguments.
ðŸ”‘ Auto-saving Orchestrator turn to MemoryService...
Model: > This is a list of academic papers on elephants and cancer resistance, with a focus on the TP53 gene and its implications for cancer vaccines.

*   **Title:** Potential Mechanisms for Cancer Resistance in Elephants and Comparative Cellular Response to DNA Damage in Humans
    **Abstract:** Elephants exhibit a lower-than-expected cancer rate, potentially due to multiple copies of the TP53 gene. Elephant cells show a heightened apoptotic response to DNA damage compared to human cells, offering insights into cancer suppression mechanisms that could benefit human cancer risk reduction strategies. The study analyzed elephant genomes for cancer-related genes and compared DNA repair and apoptosis responses in elephant and human cells.
    **URL:** https

Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7fb2495f6a90>
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7fb2493b6c90>
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7fb2493867d0>
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7fb26dd0aa90>
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7fb2494ead90>


Model: > ## Literature Review: Elephant Cancer Resistance and Implications for Human Oncology

The remarkable resistance of elephants to cancer, despite their large size and long lifespans, has become a significant area of research, offering profound insights into cancer suppression mechanisms and potential avenues for human cancer therapies and vaccines. This review synthesizes the current understanding of elephant cancer resistance, focusing on genetic factors, cellular responses to DNA damage, and the broader implications for oncology.

### Theme 1: The Role of TP53 Gene Duplication and Isoforms

A recurring and central theme in the literature is the significant role of multiple copies and variants of the TP53 gene in elephant cancer resistance. Unlike humans, who possess two copies of TP53, elephants have evolved to have numerous copies, with some studies reporting up to 40 copies. This genetic redundancy appears to be a primary defense against cancer development. The TP53 gene pro

In [57]:
#Making use of model memory
response = await run_session(lit_review_runner, "Based on the last conversation, what other animals exhibit similar traits? Find research gaps", 'conversation-01')
print(response)


### Session: conversation-01

User > Based on the last conversation, what other animals exhibit similar traits? Find research gaps




Model: > I'm sorry, but I cannot provide information on other animals exhibiting similar traits or identify research gaps based solely on the provided context. The previous conversation focused on elephants and their cancer resistance mechanisms, with a mention of potential applications for human cancer research. There was no discussion or data regarding other animal species with similar traits or specific research gaps beyond the general translational challenges to human oncology.
ðŸ”‘ Auto-saving Orchestrator turn to MemoryService...
Model: > The field of comparative oncology is exploring why certain species, like elephants, exhibit remarkable resistance to cancer, a phenomenon often discussed in the context of Peto's Paradox. This paradox highlights that cancer rates do not strongly correlate with body size or the number of cells an organism possesses, suggesting that larger, longer-lived animals have evolved potent cancer suppression mechanisms.

### Animals Exhibiting Similar Canc