EU DisinfoTest: a Benchmark for Evaluating Language Models’ Ability to Detect Disinformation Narratives

Overview

This repository contains code for evaluating model performance on the EU DisinfoTest benchmark. The dataset for that benchamrk is contained in the data directory, file EUDisinfoTest.csv.

Requirements

Python 3.9+
Pandas
Numpy
anthropic
openai
scikit-learn

Installation

Ensure that Python 3.9 or higher is installed on your machine.
Clone this repository to your local machine.
Navigate to the project directory and install required Python libraries:
```
pip install -r requirements.txt 
```

Usage

Use the command-line interface to run the script. Here's how you can execute the script with default parameters:
```
python run_evaluator.py
```

To specify parameters like model type, input path, output path, and API key, use the respective command-line arguments:

python run_evaluator.py --model GPT3.5 --input_path data/EUDisinfoTest.csv --output_path your_result_file.csv --api_key YourAPIKey

Arguments

--model: Model type (default is L3-8b). Supported models include L3-8b, L3-70b, Mixtral, GPT3.5, GPT4o, Sonnet, Haiku, and Opus.
--input_path: Path to the input CSV file (default is data/EUDisinfoTest.csv).
--output_path: Path where the output will be saved (default is evaluation_results_L3_8b.csv).
--api_key: API key for accessing the model evaluation services. The key you use depends on the model chosen:
- DeepInfra API key for Llama models and Mixtral.
- Anthropic API key for all Claude models.
- OpenAI API key for OpenAI models. The default value is a placeholder.

Output

The script outputs a JSON file containing aggregate metrics for the overall evaluation and for each rhetorical category. This file is saved to the location specified by the --output_path argument.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
src		src
README.md		README.md
requirements.txt		requirements.txt
run_evaluator.py		run_evaluator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EU DisinfoTest: a Benchmark for Evaluating Language Models’ Ability to Detect Disinformation Narratives

Overview

Requirements

Installation

Usage

Arguments

Output

About

Uh oh!

Releases

Packages

Uh oh!

Languages

wsosnowski/EUDisinfoTest

Folders and files

Latest commit

History

Repository files navigation

EU DisinfoTest: a Benchmark for Evaluating Language Models’ Ability to Detect Disinformation Narratives

Overview

Requirements

Installation

Usage

Arguments

Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages