# Masked Language Modelling

As the SingBERT models were pre-trained on MLM and Next Sentence Prediction tasks, this is to demonstrate the efficacy of the models on one of the pre-training tasks.

In [1]:
from transformers import pipeline

In [2]:
import warnings
warnings.filterwarnings('ignore')

## SingBERT (base)

here we use `[MASK]` to specify the "blank" in which the model should fill

In [3]:
# allow up to 10 mins to download the model when running for the first time
unmasker = pipeline('fill-mask', model='zanelim/singbert')

Some weights of the model checkpoint at zanelim/singbert were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [4]:
unmasker("kopi c siew [MASK]")

[{'sequence': '[CLS] kopi c siew dai [SEP]',
  'score': 0.5092713236808777,
  'token': 18765,
  'token_str': 'dai'},
 {'sequence': '[CLS] kopi c siew mai [SEP]',
  'score': 0.3515934646129608,
  'token': 14736,
  'token_str': 'mai'},
 {'sequence': '[CLS] kopi c siew bao [SEP]',
  'score': 0.05576375499367714,
  'token': 25945,
  'token_str': 'bao'},
 {'sequence': '[CLS] kopi c siew. [SEP]',
  'score': 0.006019321270287037,
  'token': 1012,
  'token_str': '.'},
 {'sequence': '[CLS] kopi c siew sai [SEP]',
  'score': 0.0038361591286957264,
  'token': 18952,
  'token_str': 'sai'}]

In [5]:
unmasker("one teh c siew dai, and one kopi [MASK]")

[{'sequence': '[CLS] one teh c siew dai, and one kopi c [SEP]',
  'score': 0.6176503300666809,
  'token': 1039,
  'token_str': 'c'},
 {'sequence': '[CLS] one teh c siew dai, and one kopi o [SEP]',
  'score': 0.21094971895217896,
  'token': 1051,
  'token_str': 'o'},
 {'sequence': '[CLS] one teh c siew dai, and one kopi. [SEP]',
  'score': 0.13027705252170563,
  'token': 1012,
  'token_str': '.'},
 {'sequence': '[CLS] one teh c siew dai, and one kopi! [SEP]',
  'score': 0.004680239595472813,
  'token': 999,
  'token_str': '!'},
 {'sequence': '[CLS] one teh c siew dai, and one kopi w [SEP]',
  'score': 0.002034128177911043,
  'token': 1059,
  'token_str': 'w'}]

In [6]:
unmasker("die [MASK] must try")

[{'sequence': '[CLS] die die must try [SEP]',
  'score': 0.9552758932113647,
  'token': 3280,
  'token_str': 'die'},
 {'sequence': '[CLS] die also must try [SEP]',
  'score': 0.03644804656505585,
  'token': 2036,
  'token_str': 'also'},
 {'sequence': '[CLS] die liao must try [SEP]',
  'score': 0.003282855963334441,
  'token': 727,
  'token_str': 'liao'},
 {'sequence': '[CLS] die already must try [SEP]',
  'score': 0.0004937972989864647,
  'token': 2525,
  'token_str': 'already'},
 {'sequence': '[CLS] die hard must try [SEP]',
  'score': 0.0003659659414552152,
  'token': 2524,
  'token_str': 'hard'}]

In [7]:
unmasker("dont play [MASK] leh")

[{'sequence': '[CLS] dont play play leh [SEP]',
  'score': 0.9281464219093323,
  'token': 2377,
  'token_str': 'play'},
 {'sequence': '[CLS] dont play politics leh [SEP]',
  'score': 0.010990909300744534,
  'token': 4331,
  'token_str': 'politics'},
 {'sequence': '[CLS] dont play punk leh [SEP]',
  'score': 0.005583590362221003,
  'token': 7196,
  'token_str': 'punk'},
 {'sequence': '[CLS] dont play dirty leh [SEP]',
  'score': 0.0025784350000321865,
  'token': 6530,
  'token_str': 'dirty'},
 {'sequence': '[CLS] dont play cheat leh [SEP]',
  'score': 0.0025066907983273268,
  'token': 21910,
  'token_str': 'cheat'}]

In [8]:
unmasker("confirm plus [MASK]")

[{'sequence': '[CLS] confirm plus chop [SEP]',
  'score': 0.992355227470398,
  'token': 24494,
  'token_str': 'chop'},
 {'sequence': '[CLS] confirm plus one [SEP]',
  'score': 0.0037301010452210903,
  'token': 2028,
  'token_str': 'one'},
 {'sequence': '[CLS] confirm plus minus [SEP]',
  'score': 0.0014284878270700574,
  'token': 15718,
  'token_str': 'minus'},
 {'sequence': '[CLS] confirm plus 1 [SEP]',
  'score': 0.0011354683665558696,
  'token': 1015,
  'token_str': '1'},
 {'sequence': '[CLS] confirm plus chopped [SEP]',
  'score': 0.0003804611915256828,
  'token': 24881,
  'token_str': 'chopped'}]

In [9]:
unmasker("catch no [MASK]")

[{'sequence': '[CLS] catch no ball [SEP]',
  'score': 0.7922210693359375,
  'token': 3608,
  'token_str': 'ball'},
 {'sequence': '[CLS] catch no balls [SEP]',
  'score': 0.20503675937652588,
  'token': 7395,
  'token_str': 'balls'},
 {'sequence': '[CLS] catch no tail [SEP]',
  'score': 0.0006608376861549914,
  'token': 5725,
  'token_str': 'tail'},
 {'sequence': '[CLS] catch no talent [SEP]',
  'score': 0.0002158183924620971,
  'token': 5848,
  'token_str': 'talent'},
 {'sequence': '[CLS] catch no prisoners [SEP]',
  'score': 5.3481446229852736e-05,
  'token': 5895,
  'token_str': 'prisoners'}]

## SingBERT (large)

beside the examples above, `[MASK]` can also be used to answer factual questions

here we use SingBERT large to demonstrate that but it should be applicable to SingBERT base too

In [10]:
# allow up to 10 mins to download the model when running for the first time
unmasker = pipeline('fill-mask', model='zanelim/singbert-large-sg')

Some weights of the model checkpoint at zanelim/singbert-large-sg were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [11]:
unmasker("lee hsien loong is the [MASK] of amk")

[{'sequence': '[CLS] lee hsien loong is the son of amk [SEP]',
  'score': 0.40440040826797485,
  'token': 2365,
  'token_str': 'son'},
 {'sequence': '[CLS] lee hsien loong is the mp of amk [SEP]',
  'score': 0.20868946611881256,
  'token': 6131,
  'token_str': 'mp'},
 {'sequence': '[CLS] lee hsien loong is the boss of amk [SEP]',
  'score': 0.08679094165563583,
  'token': 5795,
  'token_str': 'boss'},
 {'sequence': '[CLS] lee hsien loong is the pm of amk [SEP]',
  'score': 0.06748040020465851,
  'token': 7610,
  'token_str': 'pm'},
 {'sequence': '[CLS] lee hsien loong is the mayor of amk [SEP]',
  'score': 0.03794284909963608,
  'token': 3664,
  'token_str': 'mayor'}]

In [12]:
unmasker("lee kuan yew is the [MASK] of lee hsien loong")

[{'sequence': '[CLS] lee kuan yew is the father of lee hsien loong [SEP]',
  'score': 0.7821384072303772,
  'token': 2269,
  'token_str': 'father'},
 {'sequence': '[CLS] lee kuan yew is the grandfather of lee hsien loong [SEP]',
  'score': 0.16358590126037598,
  'token': 5615,
  'token_str': 'grandfather'},
 {'sequence': '[CLS] lee kuan yew is the ancestor of lee hsien loong [SEP]',
  'score': 0.020847953855991364,
  'token': 13032,
  'token_str': 'ancestor'},
 {'sequence': '[CLS] lee kuan yew is the brother of lee hsien loong [SEP]',
  'score': 0.006570274010300636,
  'token': 2567,
  'token_str': 'brother'},
 {'sequence': '[CLS] lee kuan yew is the predecessor of lee hsien loong [SEP]',
  'score': 0.003436507424339652,
  'token': 8646,
  'token_str': 'predecessor'}]

In [13]:
unmasker("singapore gained independence in [MASK]")

[{'sequence': '[CLS] singapore gained independence in 1965 [SEP]',
  'score': 0.9868552088737488,
  'token': 3551,
  'token_str': '1965'},
 {'sequence': '[CLS] singapore gained independence in 1957 [SEP]',
  'score': 0.0034552591387182474,
  'token': 3890,
  'token_str': '1957'},
 {'sequence': '[CLS] singapore gained independence in 1959 [SEP]',
  'score': 0.002683347323909402,
  'token': 3851,
  'token_str': '1959'},
 {'sequence': '[CLS] singapore gained independence in 1963 [SEP]',
  'score': 0.0014089902397245169,
  'token': 3699,
  'token_str': '1963'},
 {'sequence': '[CLS] singapore gained independence in 1960 [SEP]',
  'score': 0.000977560761384666,
  'token': 3624,
  'token_str': '1960'}]