Conversation
Nice! Additionally I'd be in favour of adding a property that selects the best (up to you, or us). |
textpipe/doc.py
Outdated
@@ -314,3 +315,39 @@ def sentiment(self): | |||
return sentiment_it(self.clean) | |||
|
|||
raise TextpipeMissingModelException(f'No sentiment model for {self.language}') | |||
|
|||
def extract_keyphrases(self, ranker='textrank', n_terms=10, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a LRU cache to this method?
c5c085b
to
cc1a175
Compare
Don't forget to bump the version to 0.6.0 |
In terms of choosing the "best" ranker, the reason why |
6f92b07
to
67816f4
Compare
I think going with |
dcfee29
to
2776a10
Compare
Cool, one last thing: can you add a matching operation (in operation.py)? |
2776a10
to
91e504d
Compare
FYI, I rebased my changes n your latest ones in |
VERSION
Outdated
@@ -1 +1 @@ | |||
0.5.2 | |||
0.5.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be 0.6.0 bump
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or should, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change is that major? ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, minor :) major would increment the first digit.
minor is for added functionality
the last digit is for patches / bug fixes
809bf64
to
1f1b225
Compare
Does textrank only extract 1-grams or also actual phrases? I only see a multi word term in the sgrank example. If the function is called 'extract_keyphrases' I expect more/only multi-word terms. otherwise use something like extract_keyterms as a name to be clear about the functionality. |
@msappelli It doesn't. Textacy hardcodes the parameter for joining terms for That's totally subjective though, so if you (the maintainers) feel |
@bartdegoede I don't think it's subjective. I see that in the IR community keyterm/ keyword / keyphrase is often used interchangeably. But in NLP/linguistics there is a difference, where 'keyterm' is typically 1-2 or 3 gram (so the interpretation of 'term' is not necessarily a 1-gram, but ), whereas a (key) phrase is a group of words (>1) that have a certain grammatical or semantic function. Additionally keyterms would be consistent with how textacy calls it, and since we are using their function underlying I would keep it consistent. |
Awesome, makes total sense :-) Changing it now. EDIT: Just noticed I put |
Follow up on #73
It adds an
extract_keyphrases
method rather than a property, because there's three different algorithms that we can use. This way, it's up to the user what to pick. Thesgrank
algorithm provides some additional arguments (selecting which ngrams to consider, for example), so I've added a**kwargs
for users to pass these arguments on.