EU DisinfoTest: a Benchmark for Evaluating Language Models’ Ability to Detect Disinformation Narratives
This repository contains code for evaluating model performance on the EU DisinfoTest benchmark. The dataset for that benchamrk is contained in the data directory, file EUDisinfoTest.csv.
- Python 3.9+
- Pandas
- Numpy
- anthropic
- openai
- scikit-learn
- Ensure that Python 3.9 or higher is installed on your machine.
- Clone this repository to your local machine.
- Navigate to the project directory and install required Python libraries:
pip install -r requirements.txt
- Use the command-line interface to run the script. Here's how you can execute the script with default parameters:
python run_evaluator.py
- To specify parameters like model type, input path, output path, and API key, use the respective command-line arguments:
python run_evaluator.py --model GPT3.5 --input_path data/EUDisinfoTest.csv --output_path your_result_file.csv --api_key YourAPIKey
--model: Model type (default isL3-8b). Supported models includeL3-8b,L3-70b,Mixtral,GPT3.5,GPT4o,Sonnet,Haiku, andOpus.--input_path: Path to the input CSV file (default isdata/EUDisinfoTest.csv).--output_path: Path where the output will be saved (default isevaluation_results_L3_8b.csv).--api_key: API key for accessing the model evaluation services. The key you use depends on the model chosen:- DeepInfra API key for Llama models and Mixtral.
- Anthropic API key for all Claude models.
- OpenAI API key for OpenAI models. The default value is a placeholder.
The script outputs a JSON file containing aggregate metrics for the overall evaluation and for each rhetorical category. This file is saved to the location specified by the --output_path argument.