Word Pair Encoding (WPE) for semi-automatic meaningful-keywords generation.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
.gitignore
LICENSE
README.md
main.py
utils.py
wpe.py

README.md

WPE

Word Pair Encoding (WPE) for semi-automatic meaningful-keywords generation.

Of course, just word-level version of Byte Pair Encoding (BPE) but quite optimized for word-level.

Most of implementation ideas were borrowed from subword-nmt. Which means fast.

from wpe import WPE
wpe = WPE(["a a b", "a a c"], min_freq=2)
wpe.preprocess()
wpe.encode()  # we get "a a", "b", "c" as keywords!