LLMscreen is a Python package for screening research abstracts with OpenAI models.
It supports:
simplemode: one-pass JSON decisionzeroshotmode: reasoning step + finalTrue/Falsedecision
This repository currently includes both package code and replication result datasets.
pip install LLMscreenfrom LLMscreen import run
df = run(
csv_file="abstracts.csv",
filter_criteria="Include studies that focus on XYZ",
thread=8,
api_file="api.txt",
model="gpt-4o-mini-2024-07-18",
zeroshot=False,
output_file="result.csv",
)csv_file must contain at least one of the following columns:
titleabstract
Rules:
- If one of
title/abstractis missing, it is auto-filled as empty. - For each row, at least one of
titleorabstractmust be non-empty. - Any additional columns are treated as metadata and will be preserved in output.
Function parameters:
filter_criteria(str): inclusion/exclusion criteriathread(int): worker countapi_file(str): file containing OpenAI API keyhttp_proxy/https_proxy(str): optional proxyk(floatin[0,1]): strictness for simple modemodel(str): model namezeroshot(bool): switch modeoutput_file(str | None): output CSV path,Noneto disable file writeverbose(bool): print runtime summary
run(...) returns a pandas DataFrame with stable columns:
record_idmodejudgementreasontitleabstractraw_responsereasoningn_probabilityperplexity_scoretoken_probabilitymodelerror
Additional metadata behavior:
- All non-
title/abstractinput columns are preserved. - If a metadata name conflicts with a stable output field, it is renamed to
meta_<column>.
LLMscreen/
__init__.py # public API and backward-compatible run()
config.py # runtime configuration
client.py # OpenAI client setup
prompts.py # prompt builders
scoring.py # parsing + probability/perplexity metrics
pipeline.py # orchestration and threaded processing
- Result schema is fixed for downstream consumption.
- Errors are captured per record in the
errorcolumn. output_filecan be disabled for in-memory integration flows.- Internal agent onboarding doc:
docs/AGENT_ONBOARDING.md.