Skip to content

Latest commit

 

History

History
14 lines (10 loc) · 453 Bytes

README.md

File metadata and controls

14 lines (10 loc) · 453 Bytes

WPE

Word Pair Encoding (WPE) for semi-automatic meaningful-keywords generation.

Of course, just word-level version of Byte Pair Encoding (BPE) but quite optimized for word-level.

Most of implementation ideas were borrowed from subword-nmt. Which means fast.

from wpe import WPE
wpe = WPE(["a a b", "a a c"], min_freq=2)
wpe.preprocess()
wpe.encode()  # we get "a a", "b", "c" as keywords!