Few-shot Pseudo-Labeling for Intent Detection

This repository contains code for the paper Few-shot Pseudo-Labeling for Intent Detection

Before using the repository

Install

This repository uses the virtualenv environment.

# Create environment
python3 -m virtualenv .venv --python=python3.6

# Install environment
.venv/bin/pip install -r requirements.txt

# Activate environment
source .venv/bin/activate

Embedding models

In order to use this repository, you must provide a path to embeddings models. Such paths are defined in util/constants.py. The default path is set to $HOME/.models/

You can also use transformers, by specifying either a name of a model or a path to it.

Input file formats

This repository uses the JSON Lines format for input files. Example:

{"sentence":"Switch the light on", "label":"LightOn"}
{"sentence":"open the door", "label":"OpenDoor"}
...

This is also the format of the output file containing pseudo-labels. For the unlabeled jsonl file, only the input key is required

Usage

Finding pseudo labels

To find pseudo labels, run the following command:

python get_pseudo_labels.py fold-unfold \
    --embedder bert \
    --model-name-or-path bert-base-cased \
    --input-labeled-file data/datasets/snips/few-shot_final/01/n-samples-005/support.jsonl \
    --input-unlabeled-file data/datasets/snips/few-shot_final/01/n-samples-005/query.jsonl \
    -v

This script will compute pseudo labels using labeled and unlabeled data. The default output location is runs/.

Reference

If you use the data or codes in this repository, please cite our paper.

@inproceedings{dopierre-etal-2020-shot,
    title = "Few-shot Pseudo-Labeling for Intent Detection",
    author = "Dopierre, Thomas  and Gravier, Christophe  and Subercaze, Julien  and Logerais, Wilfried",
    booktitle = "Proceedings of the 28th International Conference on Computational Linguistics",
    year = "2020",
}

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
data/datasets		data/datasets
models		models
scripts		scripts
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
get_pseudo_labels.py		get_pseudo_labels.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Few-shot Pseudo-Labeling for Intent Detection

Before using the repository

Install

Embedding models

Input file formats

Usage

Finding pseudo labels

Reference

About

Releases

Packages

Contributors 2

Languages

License

tdopierre/FewShotPseudoLabeling

Folders and files

Latest commit

History

Repository files navigation

Few-shot Pseudo-Labeling for Intent Detection

Before using the repository

Install

Embedding models

Input file formats

Usage

Finding pseudo labels

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages