NeuralBERTClassifier for Medical Slot Filling

Introduction

NeuralBERTClassifier is designed for quick implementation of neural models for multi-label classification problem: Medical Slot Filling (MSF). A salient feature is that NeuralBERTClassifier currently provides a variety of text encoders, such as FastText, TextCNN, TextRNN, RCNN, VDCNN, DPCNN, DRNN, AttentiveConvNet, Transformer encoder, and BERT etc. It also supports other text classification scenarios, including binary-class and multi-class classification. It is built on PyTorch. Corresponding paper Understanding Medical Conversations with Scattered Keyword Attention and Weak Supervision from Responses was accepted by AAAI 2020.

Notice

According to Tencent's regulations, the dataset can only be used for research purposes.

Support tasks

Binary-class text classifcation
Multi-class text classification
Multi-label text classification
Hiearchical (multi-label) text classification (HMC)

Support text encoders

TextCNN (Kim, 2014)
RCNN (Lai et al., 2015)
TextRNN (Liu et al., 2016)
FastText (Joulin et al., 2016)
VDCNN (Conneau et al., 2016)
DPCNN (Johnson and Zhang, 2017)
AttentiveConvNet (Yin and Schutze, 2017)
DRNN (Wang, 2018)
Region embedding (Qiao et al., 2018)
Transformer encoder (Vaswani et al., 2017)
Star-Transformer encoder (Guo et al., 2019)

Requirement

Python 3
PyTorch 0.4+
Numpy 1.14.3+

Usage

Training

python train.py conf/train.json

Detail configurations and explanations see Configuration.

The training info will be outputted in standard output and log.logger_file.

Evaluation

python eval.py conf/train.json

if eval.is_flat = false, hierarchical evaluation will be outputted.
eval.model_dir is the model to evaluate.
data.test_json_files is the input text file to evaluate.

The evaluation info will be outputed in eval.dir.

Input Data Format

JSON example:

{
    "doc_label": ["Computer--MachineLearning--DeepLearning", "Neuro--ComputationalNeuro"],
    "doc_token": ["I", "love", "deep", "learning"],
    "doc_keyword": ["deep learning"],
    "doc_topic": ["AI", "Machine learning"]
}

"doc_keyword" and "doc_topic" are optional.

Update

2020-10-27

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
__pycache__		__pycache__
conf		conf
data		data
dataset		dataset
dict_msf		dict_msf
evaluate		evaluate
model		model
readme		readme
License_for_NeuralClassifier.TXT		License_for_NeuralClassifier.TXT
README.md		README.md
config.py		config.py
eval.py		eval.py
requirements.txt		requirements.txt
train.py		train.py
util.py		util.py

xmshi-trio/MSL

Folders and files

Latest commit

History

Repository files navigation

NeuralBERTClassifier for Medical Slot Filling

Introduction

Notice

Support tasks

Support text encoders

Requirement

Usage

Training

Evaluation

Input Data Format

Update

About

Resources

Stars

Watchers

Forks

Languages