In traditional language model, such as RNN,
In bidirectional language model, it has larger context,
In this implementation, we simply adopt the following approximation,
export BERT_BASE_DIR=model/uncased_L-12_H-768_A-12
export INPUT_FILE=data/lm/test.en.tsv
python \
--input_file=$INPUT_FILE \
--vocab_file=$BERT_BASE_DIR/vocab.txt \
--bert_config_file=$BERT_BASE_DIR/bert_config.json \
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
--max_seq_length=128 \
for the following test case
$ cat data/lm/test.en.tsv
there is a book on the desk
there is a plane on the desk
there is a book in the desk
$ cat /tmp/lm/output/test_result.json
"tokens": [
"token": "there",
"prob": 0.9988962411880493
"token": "is",
"prob": 0.013578361831605434
"token": "a",
"prob": 0.9420605897903442
"token": "book",
"prob": 0.07452250272035599
"token": "on",
"prob": 0.9607976675033569
"token": "the",
"prob": 0.4983428418636322
"token": "desk",
"prob": 4.040586190967588e-06
"ppl": 17.69329728285426
"tokens": [
"token": "there",
"prob": 0.996775209903717
"token": "is",
"prob": 0.03194097802042961
"token": "a",
"prob": 0.8877727389335632
"token": "plane",
"prob": 3.4907534427475184e-05 # low probability
"token": "on",
"prob": 0.1902322769165039
"token": "the",
"prob": 0.5981084704399109
"token": "desk",
"prob": 3.3164762953674654e-06
"ppl": 59.646456254851806
"tokens": [
"token": "there",
"prob": 0.9969795942306519
"token": "is",
"prob": 0.03379646688699722
"token": "a",
"prob": 0.9095568060874939
"token": "book",
"prob": 0.013939591124653816
"token": "in",
"prob": 0.000823647016659379 # low probability
"token": "the",
"prob": 0.5844194293022156
"token": "desk",
"prob": 3.3361218356731115e-06
"ppl": 54.65941516205144