AttentionXML

AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification

Requirements

python==3.7.4
click==7.0
ruamel.yaml==0.16.5
numpy==1.16.2
scipy==1.3.1
scikit-learn==0.21.2
gensim==3.4.0
torch==1.0.1
nltk==3.4
tqdm==4.31.1
joblib==0.13.2
logzero==1.5.0

Datasets

Download the GloVe embedding (840B,300d) and convert it to gensim format (which can be loaded by gensim.models.KeyedVectors.load).

We also provide a converted GloVe embedding at here.

XML Experiments

XML experiments in paper can be run directly such as:

./scripts/run_eurlex.sh

Preprocess

Run preprocess.py for train and test datasets with tokenized texts as follows:

python preprocess.py \
--text-path data/EUR-Lex/train_texts.txt \
--label-path data/EUR-Lex/train_labels.txt \
--vocab-path data/EUR-Lex/vocab.npy \
--emb-path data/EUR-Lex/emb_init.npy \
--w2v-model data/glove.840B.300d.gensim

python preprocess.py \
--text-path data/EUR-Lex/test_texts.txt \
--label-path data/EUR-Lex/test_labels.txt \
--vocab-path data/EUR-Lex/vocab.npy

Or run preprocss.py including tokenizing the raw texts by NLTK as follows:

python preprocess.py \
--text-path data/Wiki10-31K/train_raw_texts.txt \
--tokenized-path data/Wiki10-31K/train_texts.txt \
--label-path data/Wiki10-31K/train_labels.txt \
--vocab-path data/Wiki10-31K/vocab.npy \
--emb-path data/Wiki10-31K/emb_init.npy \
--w2v-model data/glove.840B.300d.gensim

python preprocess.py \
--text-path data/Wiki10-31K/test_raw_texts.txt \
--tokenized-path data/Wiki10-31K/test_texts.txt \
--label-path data/Wiki10-31K/test_labels.txt \
--vocab-path data/Wiki10-31K/vocab.npy

Train and Predict

Train and predict as follows:

python main.py --data-cnf configure/datasets/EUR-Lex.yaml --model-cnf configure/models/AttentionXML-EUR-Lex.yaml

Or do prediction only with option "--mode eval".

Ensemble

Train and predict with an ensemble:

python main.py --data-cnf configure/datasets/Wiki-500K.yaml --model-cnf configure/models/FastAttentionXML-Wiki-500K.yaml -t 0
python main.py --data-cnf configure/datasets/Wiki-500K.yaml --model-cnf configure/models/FastAttentionXML-Wiki-500K.yaml -t 1
python main.py --data-cnf configure/datasets/Wiki-500K.yaml --model-cnf configure/models/FastAttentionXML-Wiki-500K.yaml -t 2
python ensemble.py -p results/FastAttentionXML-Wiki-500K -t 3

Evaluation

python evaluation.py --results results/AttentionXML-EUR-Lex-labels.npy --targets data/EUR-Lex/test_labels.npy

Or get propensity scored metrics together:

python evaluation.py \
--results results/FastAttentionXML-Amazon-670K-labels.npy \
--targets data/Amazon-670K/test_labels.npy \
--train-labels data/Amazon-670K/train_labels.npy \
-a 0.6 \
-b 2.6

Reference

You et al., AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification, NeurIPS 2019

Declaration

It is free for non-commercial use. For commercial use, please contact Mr. Ronghi You and Prof. Shanfeng Zhu (zhusf@fudan.edu.cn).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AttentionXML

Requirements

Datasets

XML Experiments

Preprocess

Train and Predict

Ensemble

Evaluation

Reference

Declaration

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configure		configure
data		data
deepxml		deepxml
scripts		scripts
.gitignore		.gitignore
README.md		README.md
ensemble.py		ensemble.py
evaluation.py		evaluation.py
main.py		main.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt

wolfhu/AttentionXML

Folders and files

Latest commit

History

Repository files navigation

AttentionXML

Requirements

Datasets

XML Experiments

Preprocess

Train and Predict

Ensemble

Evaluation

Reference

Declaration

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages