Skip to content

Implement multilingual KeyNMF #10

@x-tabdeveloping

Description

@x-tabdeveloping

Rationale:

KeyNMF by default is not capable of multilingual topic modeling. This is due to the fact that the model can only label texts with keywords, that are in the text. This does not allow English labels on Spanish texts, and NMF is therefore likely to find them as different topics.

Solution:

We can fix this by allowing keywords to be selected from the whole vocabulary on each text instead of just the words that are in the corpus.

Interface:

KeyNMF could have one more parameter at initialisation that indicates whether the whole vocabulary should be used when extracting keywords.
For example something like this:

model = KeyNMF(10, keyword_scope="corpus")
## OR
model = KeyNMF(10, keyword_scope="document")

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions