A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting Approaches

This repo contains the data and the code for the paper.

Requirements

The python version we used is 3.8, also, you need to install torch. The other requirements are listed in requirements.txt.

Data & Methods

The datasets can be accessed at data/.

The methods are implemented in external/ (we copied the core parts for each method, modified to suit taking in different hyperparameters if necessary) and we provide classes to invoke them in methods/.

We benchmarked the following methods:

Prompt
- Prompting
- Prompting + DC-PMI (github)
- Prompting + ProtoCal (paper)
Seed Matching
- LoTClass (github)
- XClass (github)
- ClassKG (github)
- NPPrompt (paper)

To test a model (e.g., prompting) on a dataset (e.g., NYT-Topics), you may run

method_name=prompt_gpt
lm_name=gpt2
data_name=NYT-Topics

CUDA_VISIBLE_DEVICES=${gpu} python run.py \
    --method ${method_name} \
    --base_model ${lm_name} \
    --hyperparameter_file_path methods/hyperparameters/${method_name}.json \
    --data ${data_name} \
    --label_names_file_name data/${data_name}/label_names.txt \
    --prompt_file_name data/${data_name}/prompt.txt

The performances on the datasets, their behaviors when using different label names, instructions, pre-trained language models, can be found in our paper.

Citation

If you find this repo useful, please cite our paper:

@article{wang2023benchmark,
  title={A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting Approaches},
  author={Wang, Zihan and Wang, Tianle and Mekala, Dheeraj and Shang, Jingbo},
  journal={arXiv preprint arXiv:2305.12749},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 874 Commits
data		data
external		external
methods		methods
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

external

external

methods

methods

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

run.py

run.py

utils.py

utils.py

Repository files navigation

A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting Approaches

Requirements

Data & Methods

Citation

About

Releases

Packages

Contributors 2

Languages

ZihanWangKi/x-TC

Folders and files

Latest commit

History

Repository files navigation

A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting Approaches

Requirements

Data & Methods

Citation

About

Resources

Stars

Watchers

Forks

Languages