# TitleAbstract Reviewer

In this notebook, we will demostrate the use of `TitleAbstractReviewer` agents.

## Setting up the notebook

High-level configs

In [1]:
%reload_ext autoreload
%autoreload 2

from dotenv import load_dotenv

# Load environment variables from .env file. Adjust the path to the .env file as needed.
load_dotenv(dotenv_path='../.env')

# Enable asyncio in Jupyter
import asyncio
import nest_asyncio

nest_asyncio.apply()

#  Add the package to the path (required if you are running this notebook from the examples folder)
import sys
sys.path.append('../../')


Import required packages

In [3]:
import pandas as pd 

from lattereview.providers import LiteLLMProvider, GoogleProvider
from lattereview.agents import TitleAbstractReviewer
from lattereview.workflows import ReviewWorkflow

## Data

First, let's load a dummy dataset of some articles that we once pooled for a prior systematic review published [here](https://pubmed.ncbi.nlm.nih.gov/36292201/).

In [4]:
data = pd.read_csv('data.csv')
data

Unnamed: 0,title,abstract,DOI,study_type,clinical_application,organ,modality,cardiovascular,task,deep_learning,external_validation
0,(18)F-FDG PET/CT Uptake Classification in Lymp...,Background Fluorine 18 ((18)F)-fluorodeoxygluc...,10.1148/radiol.2019191114,cross-sectional,diagnosis,lung,pet,0,classification,1,0
1,(18)F-FDG-PET/CT Whole-Body Imaging Lung Tumor...,Under the background of (18)F-FDG-PET/CT multi...,10.1155/2021/8865237,cross-sectional,diagnosis,lung,pet,0,classification,1,0
2,3-D Convolutional Neural Networks for Automati...,Deep two-dimensional (2-D) convolutional neura...,10.1109/jbhi.2018.2879449,cross-sectional,diagnosis,lung,ct,0,detection,1,0
3,3D CNN with Visual Insights for Early Detectio...,The 3D convolutional neural network is able to...,10.1155/2021/6695518,cross-sectional,diagnosis,lung,ct,0,classification,1,0
4,3D deep learning based classification of pulmo...,Classifying ground-glass lung nodules (GGNs) i...,10.1016/j.compmedimag.2020.101814,cross-sectional,diagnosis,lung,ct,0,"segmentation, classification",1,0
...,...,...,...,...,...,...,...,...,...,...,...
973,Vulture-Based AdaBoost-Feedforward Neural Fram...,"In today's scenario, many scientists and medic...",10.1007/s12539-022-00505-3,cross-sectional,diagnosis,lung,xr,0,"segmentation, classification",1,0
974,Wavelet decomposition facilitates training on ...,The adoption of low-dose computed tomography (...,10.1007/s00418-020-01961-y,cross-sectional,diagnosis,lung,ct,0,classification,1,0
975,Weakly unsupervised conditional generative adv...,Because of the rapid spread and wide range of ...,10.1016/j.media.2021.102159,cross-sectional,diagnosis,lung,ct,0,classification,1,0
976,Weakly-supervised lesion analysis with a CNN-b...,Objective.Lesions of COVID-19 can be clearly v...,10.1088/1361-6560/ac4316,cross-sectional,diagnosis,lung,ct,0,classification,1,0


Let's then write some inclusion and exclusion criteria to screen these papers.

In [5]:
inclusion_criteria = "The study must involve CT scans. If multiple modalities are involved, CT scans should be among them."
exclusion_criteria = "The study must not include PET scans as one of its modalities."

## Review

Then, we will define three reviewers (two juniors and one senior) that will go through the dataset and look for those items that meet all the inclusion criteria and does not violate any of the exclusion criteria. Each agent return either of the following scores:

- 1 means absolutely to exclude.
- 2 means better to exclude.
- 3 Not sure if to include or exclude.
- 4 means better to include.
- 5 means absolutely to include.

In [6]:
Agent1 = TitleAbstractReviewer(
    provider=LiteLLMProvider(model="gemini/gemini-1.5-flash"),
    name="Agent1",
    backstory="a PhD researcher in biology and computer science",
    inclusion_criteria = inclusion_criteria,
    exclusion_criteria = exclusion_criteria,        
    max_concurrent_requests=30, 
    model_args={"max_tokens": 200, "temperature": 0.1},
)

Agent2 = TitleAbstractReviewer(
    provider=LiteLLMProvider(model="o4-mini"),  
    name="Agent2",
    backstory="a PhD researcher in biology and computer science",
    inclusion_criteria = inclusion_criteria,
    exclusion_criteria = exclusion_criteria,        
    max_concurrent_requests=30, 
    # model_args={"max_tokens": 200, "temperature": 0.1},
)

Agent3 = TitleAbstractReviewer(
    provider=GoogleProvider(model="gemini-2.5-pro-preview-05-06"),
    name="Agent3",
    backstory="a senior MD-PhD researcher with years of experience in conducting systematic reviews in radiology and deep learning",
    inclusion_criteria = inclusion_criteria,
    exclusion_criteria = exclusion_criteria,        
    max_concurrent_requests=30, 
    model_args={'reasoning_effort': 'medium'},
    additional_context="""
    Two PhD reviewers have already reviewed this article and disagree on how to evaluate it. You can read their evaluation above.
    """
)

Finally, let's create a review workflow and put our agents to work!

In [7]:
def filter_func(row):
    score1 = int(row["round-A_Agent1_output"]["evaluation"])
    score2 = int(row["round-A_Agent2_output"]["evaluation"])
    if score1 != score2:
        if score1 >= 4 and score2 >= 4:
            return False
        if score1 >= 3 or score2 >= 3:
            return True
    elif score1 == score2 == 3:
        return True
    return False

async def review(df, sample_size=None):
    
    title_abs_review = ReviewWorkflow(
        workflow_schema=[
                {
                    "round": 'A',
                    "reviewers": [Agent1, Agent2],
                    "text_inputs": ["title", "abstract"]
                },
                {
                    "round": 'B',
                    "reviewers": [Agent3],
                    "text_inputs": ["title", "abstract", "round-A_Agent1_output", "round-A_Agent2_output"],
                    "filter": filter_func
                }
            ]
        )

    if sample_size:
        df = df.sample(sample_size)

    updated_df = await title_abs_review(df)
    total_cost = title_abs_review.get_total_cost()
    print(f"\n====== Finished reviewing ======\n")
    print(f"\nTotal cost: {total_cost}")
    print("-" * 100)
    return updated_df


updated_df = asyncio.run(review(data, sample_size=10))



Processing 10 eligible rows


['round: A', 'reviewer_name: Agent1'] -                     2025-05-14 12:53:45: 100%|██████████| 10/10 [00:01<00:00,  9.12it/s]


The following columns are present in the dataframe at the end of Agent1's reivew in round A: ['title', 'abstract', 'DOI', 'study_type', 'clinical_application', 'organ', 'modality', 'cardiovascular', 'task', 'deep_learning', 'external_validation', 'round-A_Agent1_output', 'round-A_Agent1_reasoning', 'round-A_Agent1_evaluation']


['round: A', 'reviewer_name: Agent2'] -                     2025-05-14 12:53:46: 100%|██████████| 10/10 [00:06<00:00,  1.55it/s]


The following columns are present in the dataframe at the end of Agent2's reivew in round A: ['title', 'abstract', 'DOI', 'study_type', 'clinical_application', 'organ', 'modality', 'cardiovascular', 'task', 'deep_learning', 'external_validation', 'round-A_Agent1_output', 'round-A_Agent1_reasoning', 'round-A_Agent1_evaluation', 'round-A_Agent2_output', 'round-A_Agent2_reasoning', 'round-A_Agent2_evaluation']


Processing 2 eligible rows


['round: B', 'reviewer_name: Agent3'] -                     2025-05-14 12:53:53: 100%|██████████| 2/2 [00:10<00:00,  5.03s/it]

The following columns are present in the dataframe at the end of Agent3's reivew in round B: ['title', 'abstract', 'DOI', 'study_type', 'clinical_application', 'organ', 'modality', 'cardiovascular', 'task', 'deep_learning', 'external_validation', 'round-A_Agent1_output', 'round-A_Agent1_reasoning', 'round-A_Agent1_evaluation', 'round-A_Agent2_output', 'round-A_Agent2_reasoning', 'round-A_Agent2_evaluation', 'round-B_Agent3_output', 'round-B_Agent3_reasoning', 'round-B_Agent3_evaluation']



Total cost: 0.02108865
----------------------------------------------------------------------------------------------------





In [8]:
updated_df

Unnamed: 0,title,abstract,DOI,study_type,clinical_application,organ,modality,cardiovascular,task,deep_learning,external_validation,round-A_Agent1_output,round-A_Agent1_reasoning,round-A_Agent1_evaluation,round-A_Agent2_output,round-A_Agent2_reasoning,round-A_Agent2_evaluation,round-B_Agent3_output,round-B_Agent3_reasoning,round-B_Agent3_evaluation
616,Integration of convolutional neural networks f...,The early identification of malignant pulmonar...,10.1016/j.cmpb.2019.105172,cross-sectional,diagnosis,lung,ct,0,classification,1,0,{'reasoning': 'The abstract does not specify w...,The abstract does not specify whether CT scans...,1,{'reasoning': 'The study uses only CT scans fr...,The study uses only CT scans from the LIDC dat...,5,{'reasoning': 'The study utilizes the LIDC dat...,"The study utilizes the LIDC dataset, which con...",5.0
529,Ensemble Deep Learning and Internet of Things-...,Coronavirus disease (COVID-19) is a viral infe...,10.1155/2022/7377502,cross-sectional,diagnosis,lung,ct,0,classification,1,0,{'reasoning': 'The abstract mentions the utili...,The abstract mentions the utilization of CT sc...,5,{'reasoning': 'The study uses CT scans for COV...,The study uses CT scans for COVID-19 diagnosis...,5,,,
425,Deep radiomics-based survival prediction in pa...,Heterogeneous clinical manifestations and prog...,10.1038/s41598-021-94535-4,prospective cohort,prognosis,lung,ct,0,survival,1,1,{'reasoning': 'The abstract explicitly states ...,The abstract explicitly states that the study ...,5,{'reasoning': 'The study performs deep radiomi...,The study performs deep radiomics solely on CT...,5,,,
895,SSA-Net: Spatial self-attention network for CO...,Coronavirus disease (COVID-19) broke out at th...,10.1016/j.media.2022.102459,cross-sectional,diagnosis,lung,ct,0,segmentation,1,0,{'reasoning': 'The abstract explicitly states ...,The abstract explicitly states that the study ...,5,{'reasoning': 'The study focuses on COVID-19 p...,The study focuses on COVID-19 pneumonia segmen...,5,,,
311,Contribution of artificial intelligence applic...,INTRODUCTION: Computed tomography (CT) is an a...,10.5578/tt.20219606,cross-sectional,diagnosis,lung,ct,0,"detection, classification",1,0,{'reasoning': 'The abstract mentions the use o...,"The abstract mentions the use of CT scans, ful...",5,{'reasoning': 'The study exclusively involves ...,The study exclusively involves CT imaging for ...,5,,,
303,Computer-Assisted Decision Support System in P...,Pulmonary cancer is considered as one of the m...,10.1016/j.jbi.2018.01.005,cross-sectional,diagnosis,lung,ct,0,classification,1,1,{'reasoning': 'The abstract explicitly mention...,The abstract explicitly mentions the use of CT...,5,{'reasoning': 'The study explicitly uses CT im...,The study explicitly uses CT images for pulmon...,5,,,
925,Think positive: An interpretable neural networ...,The COVID-19 pandemic is an ongoing pandemic a...,10.1016/j.neunet.2022.03.034,cross-sectional,diagnosis,lung,ct,0,classification,1,0,{'reasoning': 'The abstract mentions the use o...,The abstract mentions the use of chest CT-scan...,2,{'reasoning': 'The study uses chest CT-scan im...,The study uses chest CT-scan images for COVID-...,5,{'reasoning': 'The abstract clearly states the...,The abstract clearly states the use of 'chest ...,5.0
573,Hybrid Deep-Learning and Machine-Learning Mode...,The COVID-19 pandemic has had a significant im...,10.1155/2021/9996737,cross-sectional,diagnosis,lung,xr,0,classification,1,1,{'reasoning': 'The abstract explicitly states ...,The abstract explicitly states that the study ...,1,{'reasoning': 'The study uses chest X-ray imag...,The study uses chest X-ray images and does not...,1,,,
396,Adaptive Fruitfly Based Modified Region Growin...,Epicardial adipose tissue is a visceral fat th...,10.1007/s10916-019-1227-3,cross-sectional,diagnosis,heart,ct,1,"segmentation, classification",1,0,{'reasoning': 'The abstract mentions the use o...,"The abstract mentions the use of CT scans, ful...",5,{'reasoning': 'The study applies an algorithm ...,The study applies an algorithm to CT images fo...,5,,,
257,Classification of lung nodules based on CT ima...,Lung cancer is pointed as a leading cause of c...,10.1007/s11547-019-01130-9,cross-sectional,diagnosis,lung,ct,0,classification,1,0,{'reasoning': 'The abstract mentions using CT ...,The abstract mentions using CT images for lung...,5,{'reasoning': 'The study classifies lung nodul...,The study classifies lung nodules based solely...,5,,,


Now let's print the responses of all three agents for the cases where agent 3 needed to intervene.

In [9]:
# Filter the dataframe for cases where the two junior models (Agent1 and Agent2) disagreed

if "round-B_Agent3_evaluation" in updated_df.columns:   
    disagreed_cases = updated_df[
        updated_df["round-B_Agent3_evaluation"].notna()
    ]

    # Print the responses of all agents for these cases
    for index, row in disagreed_cases.iterrows():
        print(f"Title: {row['title']}")
        print(f"Abstract: {row['abstract']}")
        print(f"Agent1's response: {row['round-A_Agent1_output']}")
        print(f"Agent2's response: {row['round-A_Agent2_output']}")
        print(f"Agent3's response: {row['round-B_Agent3_output']}")
        print("-" * 100)

else:
    print("No cases where Agent3 intervened.")

Title: Integration of convolutional neural networks for pulmonary nodule malignancy assessment in a lung cancer classification pipeline
Abstract: The early identification of malignant pulmonary nodules is critical for a better lung cancer prognosis and less invasive chemo or radio therapies. Nodule malignancy assessment done by radiologists is extremely useful for planning a preventive intervention but is, unfortunately, a complex, time-consuming and error-prone task. This explains the lack of large datasets containing radiologists malignancy characterization of nodules; METHODS: In this article, we propose to assess nodule malignancy through 3D convolutional neural networks and to integrate it in an automated end-to-end existing pipeline of lung cancer detection. For training and testing purposes we used independent subsets of the LIDC dataset; RESULTS: Adding the probabilities of nodules malignity in a baseline lung cancer pipeline improved its F1-weighted score by 14.7%, whereas int