extractive summarization using textrank #139
Conversation
implementation for issue #138 |
Now that I look at the Gensim implementation of TextRank, I see that it also includes an optional Also, this is not the original TextRank algorithm, but a variant that uses BM25 (and a lot of custom text cleaning under the hood). Are we okay with that? |
@lmdehaas The reason I stuck with ratio, is that on a sentence level I find the word count parameter less transparent and less useful --> you don't know how long your sentences are so you don't know the exact number of words you are going to need. On the other hand if we have a use case where we want a fixed summary length, word_count makes more sense. Do we envision such a use case? Then I will add the word_count parameter. I personally have no problem with it being a variation on the textrank algorithm, as my goal was also to simply exploit the gensim implementation as we already wrapped other gensim functions, but if this is a problem, please let me know (we can also rename it 'GensimTextrank' for clarity. |
I can definitely imagine such a use case; it's the most common use case for summarization benchmarking, so I would prefer including that parameter. Concerning the alternative Textrank implementation: I don't mind, I just wanted to make sure we were aware of this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, apart from the issue found by codacy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! And don't forget to bump the version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool feature!
Textrank was implemented using the gensim framework in order to get a list of extracted sentences from a document that could serve as a summary of the document.