ngram-trie
is a Rust library designed to efficiently handle n-gram data structures using a trie-based approach. It provides functionalities for fitting, saving, loading, and querying n-gram models, with support for various smoothing techniques.
-
Include it in the Cargo.toml:
[dependencies] ngram-trie = { git = "https://github.com/behappiness/ngram-trie" }
-
Install from pip:
pip install ngram-trie
from ngram_trie import PySmoothedTrie
trie = PySmoothedTrie(n_gram_max_length=7, root_capacity=None)
trie.fit(tokenized_data, n_gram_max_length=7, root_capacity=None, max_tokens=None)
trie.set_rule_set(["++++++", "+++++", "++++", "+++", "++", "+"])
trie.fit_smoothing()
trie.get_prediction_probabilities(tokenized_context)
trie.fit_smoothing("modified_kneser_ney"/"stupid_backoff")
from ngram_trie import PySmoothedTrie
trie = PySmoothedTrie(n_gram_max_length=7, root_capacity=None)
trie.fit(tokenized_data, n_gram_max_length=7, root_capacity=None, max_tokens=None)
trie.set_rule_set(rules)
trie.get_unsmoothed_probabilities(tokenized_context)
cargo add pyo3 --features extension-module
maturin build
Possible new developments could include http support for all functions. Distributed computing with a master node (distributing based on rules and counts of n-grams).