Skip to content
BERTje is a Dutch pre-trained BERT model developed at the University of Groningen.
Branch: master
Clone or download
Latest commit f7e900d Jan 29, 2020
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md Update README.md Jan 29, 2020
bertje.png Add Bertje image Dec 21, 2019

README.md

BERTje: A Dutch BERT model

BERTje is a Dutch pre-trained BERT model developed at the University of Groningen.

For details, check out our paper on arxiv: https://arxiv.org/abs/1912.09582

Transformers

BERTje is the default Dutch BERT model in Transformers! You can start using it with the following snippet:

from transformers import BertTokenizer, BertModel

tokenizer = BertTokenizer.from_pretrained("bert-base-dutch-cased")
model = BertModel.from_pretrained("bert-base-dutch-cased")

That's all! Check out the Transformers documentation for further instructions.

Benchmarks

The Arxiv paper lists benchmarks. Here are a couple of comparisons between BERTje, multilingual BERT, BERT-NL and RobBERT that were done after writing the paper. Unlike some other comparisons, the fine-tuning procedures for these benchmarks are identical for each pre-trained model. You may be able to achieve higher scores for individual models by optimizing fine-tuning procedures.

More experimental results will be added to this page when they are finished. Technical details about how a fine-tuned these models will be published later as well as downloadable fine-tuned checkpoints.

All of the tested models are base sized (12) layers with cased tokenization.

Named Entity Recognition

Model CoNLL-2002 SoNaR-1
BERTje 90.24 84.93
mBERT 88.61 84.19
BERT-NL 85.05 80.45
RobBERT 84.72 -

Part-of-speech tagging

Model UDv2.5 LassySmall
BERTje 96.48
mBERT 96.49
BERT-NL 96.10
RobBERT 95.91

Download

Download the model here:

The model is fully compatible with Transformers and interchangable with original BERT checkpoints.

Acknowledgements

Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC).

Citation

Do you use BERTje for a publication? Please use the following citation:

@misc{vries2019bertje,
    title={BERTje: A Dutch BERT Model},
    author={Wietse de Vries and Andreas van Cranenburgh and Arianna Bisazza and Tommaso Caselli and Gertjan van Noord and Malvina Nissim},
    year={2019},
    eprint={1912.09582},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
You can’t perform that action at this time.