Skip to content

Commit

Permalink
small fix to tokenizer
Browse files Browse the repository at this point in the history
  • Loading branch information
Alex Amadori committed Oct 30, 2019
1 parent be35422 commit d01f173
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions utils.py
Expand Up @@ -9,7 +9,10 @@
from torchtext.vocab import pretrained_aliases, Vocab

import spacy
from spacy.symbols import ORTH

spacy_en = spacy.load("en")
spacy_en.tokenizer.add_special_case("<mask>", [{ORTH: "<mask>"}])

def set_seed(seed):
np.random.seed(seed)
Expand Down

0 comments on commit d01f173

Please sign in to comment.