Skip to content

yebarryallen/LLMscreen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLMscreen

License: MIT PyPI version

Overview

LLMscreen is a Python package for screening research abstracts with OpenAI models. It supports:

  • simple mode: one-pass JSON decision
  • zeroshot mode: reasoning step + final True/False decision

This repository currently includes both package code and replication result datasets.

Install

pip install LLMscreen

Quick Start

from LLMscreen import run

df = run(
    csv_file="abstracts.csv",
    filter_criteria="Include studies that focus on XYZ",
    thread=8,
    api_file="api.txt",
    model="gpt-4o-mini-2024-07-18",
    zeroshot=False,
    output_file="result.csv",
)

Input Contract

csv_file must contain at least one of the following columns:

  • title
  • abstract

Rules:

  • If one of title / abstract is missing, it is auto-filled as empty.
  • For each row, at least one of title or abstract must be non-empty.
  • Any additional columns are treated as metadata and will be preserved in output.

Function parameters:

  • filter_criteria (str): inclusion/exclusion criteria
  • thread (int): worker count
  • api_file (str): file containing OpenAI API key
  • http_proxy / https_proxy (str): optional proxy
  • k (float in [0,1]): strictness for simple mode
  • model (str): model name
  • zeroshot (bool): switch mode
  • output_file (str | None): output CSV path, None to disable file write
  • verbose (bool): print runtime summary

Output Contract

run(...) returns a pandas DataFrame with stable columns:

  • record_id
  • mode
  • judgement
  • reason
  • title
  • abstract
  • raw_response
  • reasoning
  • n_probability
  • perplexity_score
  • token_probability
  • model
  • error

Additional metadata behavior:

  • All non-title/abstract input columns are preserved.
  • If a metadata name conflicts with a stable output field, it is renamed to meta_<column>.

Package Structure

LLMscreen/
  __init__.py   # public API and backward-compatible run()
  config.py     # runtime configuration
  client.py     # OpenAI client setup
  prompts.py    # prompt builders
  scoring.py    # parsing + probability/perplexity metrics
  pipeline.py   # orchestration and threaded processing

Notes for Integration

  • Result schema is fixed for downstream consumption.
  • Errors are captured per record in the error column.
  • output_file can be disabled for in-memory integration flows.
  • Internal agent onboarding doc: docs/AGENT_ONBOARDING.md.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages