Codebase for the NAACL 2024 Findings paper "Identifying Self-Disclosures of Use, Misuse and Addiction in Community-based Social Media Posts"
Main Authors:
-
Chenghao Yang (yangalan1996@gmail.com) (University of Chicago, Previously at Columbia University)
-
Tuhin Chakrabarty (tuhin.chakr@cs.columbia.edu) (Columbia University)
Supervisor Team:
-
Nabila El-Bassel (School of Social Work, Columbia University)
-
Smaranda Muresan (Data Science Institute, Columbia University)
If you use this code as part of any published research, please acknowledge the following paper (it encourages researchers who publish their code!):
@inproceedings{yang-2023-identifying,
title = "Identifying Self-Disclosures of Use, Misuse and Addiction in Community-based Social Media Posts",
author = "Chenghao Yang and Tuhin Chakrabarty and Karli R Hochstatter and Melissa N Slavin and Nabila El-Bassel and Smaranda Muresan",
booktitle = "Findings of the Association for Computational Linguistics: NAACL 2024",
year = "2024",
publisher = "Association for Computational Linguistics",
}
codebase
: Codebase for Data Processing, Fine-tuning, evaluation and visualization.data
: The data for prompting and evaluation. As per IRB ethics approval, we kindly request the user to submit a request here to explain the project scope and obtained ethics approval before we send you the access to data.
For the main repo:
conda create -p ./env python=3.9
conda activate ./env # the environment position is optional, you can choose whatever places you like to save dependencies. Here I choose ./env for example.
pip install -r requirements.txt
- Prompting GPT3.5/4: check out codes in
codebase/prompt_gpt4.py
for running prompts and collecting responses. Then runpost_processing_log.py
to do necessary postprocessing for normalizing the model outputs. - Fine-tuning DeBERTa: check out codes in
codebase/deberta_finetuning.py
.
Check out codes in codebase/create_sliver_rational_annotation_files.py
and codebase/process_finetune_data_for_finetuning.py
to see how to combine gold rationale, sliver rationale and random rationale. Here we would re-use the codes in Section 4 for evaluation.
Check out codes in codebase/error_analysis.py
.