This repository details our submission for the CLEF2025 CheckThat Lab Task 4a: Scientific Web Discourse Detection. The primary goal is to accurately identify scientific discourse in Twitter data using a multilabel classification approach, aiming to maximize the macro-averaged F1-score. Provides code, data structure, and reproducible experiments.
Target audience: researchers, data scientists, and practitioners interested in natural language processing (NLP), social media analysis, and machine learning competitions.
This notebook and codebase describe the experimental approach taken to develop a multi-label classification system for identifying scientific discourse in Twitter data. The model is built on top of microsoft/deberta-v3-base
and optimized through a multi-phase strategy aimed at maximizing macro-averaged F1-score.
To run this project, you need to install the necessary Python packages. These are listed in the requirements.txt
file. Install all dependencies with:
pip install -r requirements.txt
Clone this repository:
git clone https://github.com/mervinso/CLEF2025_Task4a.git
cd CLEF2025_Task4a
Download the dataset:
- Obtain the files
ct_train.tsv
andct_test.tsv
as provided by the organizers. - Place them in the data/ directory:
CLEF2025_Task4a/
└── data/
├── ct_train.tsv
└── ct_test.tsv
- Dataset: 1229 tweets (train), 137 (dev), 240 (test)
- Task: Multilabel classification (cat1, cat2, cat3)
- Target metric: macro-averaged F1-score
- Submission format: predictions.csv with columns [index, cat1_pred, cat2_pred, cat3_pred]
ct_train.tsv
– training setct_dev.tsv
– development setct_test.tsv
– test set for leaderboard submission- Format: each tweet labeled across three binary categories (
cat1
,cat2
,cat3
)
- Open
CLEF2025-SubTask4a-SciDiscourse.ipynb
in Google Colab.
- Clone the official CLEF2025 CheckThat repository and extract the folder
task4/subtask_4a
. - Copy
ct_train.tsv
andct_test.tsv
into the/data/
folder inside your working directory. - Execute the notebook sequentially through all six phases:
- Baseline → Threshold Tuning → Fine-Tuning → Class Weights → Ensemble → Final Prediction.
- The output file
predictions.csv
will be saved under/predictions/
and is ready to be submitted to the leaderboard.
clef2025_task4a/
├── data/
│ ├── ct_dev.tsv
│ ├── ct_test.tsv
│ └── ct_train.tsv
├── models/
│ └── final_model/
├── predictions/
│ └── predictions.csv
├── notebooks/
│ └── CLEF2025_SubTask4a_SciDiscourse.ipynb
└── requirements.txt
├── README.md
Phase | Description | Output |
---|---|---|
1 | Baseline training (DeBERTa-v3-base) | cv_preds |
2 | Threshold tuning (PR curve) | thresholds.json |
3 | Fine-tuning (lr, epochs search) | best_macro_f1 , config |
4 | Training with class weights | macro_f1_class_weights |
5 | Ensemble of models (soft voting) | macro_f1_ensemble |
6 | Final training + test prediction | predictions.csv |
Model | Macro F1 | Cat1 F1 | Cat2 F1 | Cat3 F1 | Notes |
---|---|---|---|---|---|
Baseline | 0.8021 | 0.79xx | 0.76xx | 0.83xx | lr=2e-5, 10 epochs |
Fine-tuned | 0.8143 | 0.81xx | 0.78xx | 0.84xx | lr=2e-5, 12 epochs |
Class Weights | 0.8195 | 0.82xx | 0.79xx | 0.85xx | weights applied per class |
Ensemble | 0.8274 | 0.83xx | 0.80xx | 0.85xx | Averaged predictions (FT + CW) |
Thresholds tuned per class via
precision_recall_curve
to optimize F1 individually.
{
"cat1": 0.4607,
"cat2": 0.6438,
"cat3": 0.7325
}
- CLEF2025 CheckThat: Official Website
- GitLab Repo: Official Repository
- Codalab: Competition Link
This project is licensed under the MIT License. See LICENSE for details.
- Developed by: UTB - CEDNAV
- For the CLEF2025 CheckThat Lab challenge
- Contact: sosam@utb.edu.co