Knowledge Base Completion Meets Transfer Learning

This code accompanies the paper Knowledge Base Completion Meets Transfer Learning published at EMNLP 2021 and Pre-training and Diagnosing Knowledge Base Completion Models published in AI Journal.

Setup

Following packages are needed to run the code

Python >=3.6
pytorch>=1.6
spacy>=2.0 and en_core_web_sm model
tqdm

Run setup.sh to download and transform GloVe embeddings and OlpBench dataset. Please note that this downloads 3.5GB of files which unzip into around 10GB of content.

Running the code

For full help, run python main.py -h, a couple of examples are given below:

For pre-training TuckER on OlpBench, run

python main.py -data Data/OlpBench -dim 300 -lr 1e-4 -batch 4096 -n_epochs 100 -embedding TuckER -dropout 0.3 -encoder GRU -hits [1,3,5,10,30,50] -output_dir TuckEROlpBench300 -dump_vocab -only_batch_negative

For pretraining, it is important to add -dump_vocab to store encoder vocabulary. Otherwise it is not possible to load the stored model for fine-tuning. For any large-scale pre-training it is important to add -only_batch_negative argument to avoid encoding all entities at every training step.

To fine-tune the model obtained with the above command on ReVerb20K using NoEncoder, use the command below.

python main.py -data Data/ReVerb20K -dim 300 -lr 3e-4 -batch 512 -n_epochs 500 -embedding TuckER -dropout 0.3 -encoder NoEncoder -hits [1,3,5,10,30,50] -output_dir TuckERReVerb20K -pretrained_dir TuckEROlpBench300

To train the with same setup but from a randomly-initialized model, just remove the -pretrained_dir argument.

python main.py -data Data/ReVerb20K -dim 300 -lr 3e-4 -batch 512 -n_epochs 500 -embedding TuckER -dropout 0.3 -encoder NoEncoder -hits [1,3,5,10,30,50] -output_dir TuckERReVerb20K

Running diagnostics on the DOGE dataset

Given you have trained a pre-trained model, use the run_full_diagnostics.sh script to run all the diagnostics on the model. Update the parameters at the top of the script to fit your usecase. This will run the following tests:

Detect how much the model associates gender names with stereotypical occupations on diagnostics/gender_stereotypical dataset
Detect how much the model associates gender names with anti-stereotypical occupations on diagnostics/gender_anti-stereotypical dataset
Detect how well the model performs on the knowledge of occupations, used in the above test, using the diagnostics/occupations_knowledge dataset
Detect how well the model performs on the general knowledge test, using the diagnostics/general_knowledge dataset
Perform the deductive reasoning tests, with and without fine-tuning (for comparison), using diagnostics/deductive_reasoning_test dataset
Repeat the tests on gender stereotypical and gender anti-stereotypical tests, this time with fine-tuning on facts about individuals, to detect how the stereotypes affect its deductive reasoning ability.
Run the diagnostics and tests of statistical significance with diagnostics/output_analysis.py

Reference

If you use the code from this repo, please cite the following works.

@inproceedings{kocijan2021KBCtransfer,
    title = "Knowledge Base Completion Meets Transfer Learning",
    author = "Kocijan, Vid  and
      Lukasiewicz, Thomas",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
}

@article{KOCIJAN2024104081,
    title = {Pre-training and diagnosing knowledge base completion models},
    journal = {Artificial Intelligence},
    volume = {329},
    pages = {104081},
    year = {2024},
    issn = {0004-3702},
    doi = {https://doi.org/10.1016/j.artint.2024.104081},
    url = {https://www.sciencedirect.com/science/article/pii/S0004370224000171},
    author = {Vid Kocijan and Myeongjun Jang and Thomas Lukasiewicz},
    keywords = {Knowledge base completion, Transfer learning, Bias detection},
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Data		Data
diagnostics		diagnostics
LICENSE		LICENSE
README.md		README.md
data_reader.py		data_reader.py
embedding.py		embedding.py
encoder.py		encoder.py
main.py		main.py
run_full_diagnostics.sh		run_full_diagnostics.sh
scoring.py		scoring.py
setup.sh		setup.sh
transform_olpbench_data.py		transform_olpbench_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowledge Base Completion Meets Transfer Learning

Setup

Running the code

Running diagnostics on the DOGE dataset

Reference

About

Releases

Packages

Languages

License

vid-koci/KBCtransferlearning

Folders and files

Latest commit

History

Repository files navigation

Knowledge Base Completion Meets Transfer Learning

Setup

Running the code

Running diagnostics on the DOGE dataset

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages