extractive summarization using textrank #139

msappelli · 2019-05-31T10:59:55Z

Textrank was implemented using the gensim framework in order to get a list of extracted sentences from a document that could serve as a summary of the document.

…ocument

msappelli · 2019-05-31T11:02:48Z

implementation for issue #138

tests/test_doc.py

textpipe/doc.py

lmdehaas · 2019-06-12T12:16:47Z

Now that I look at the Gensim implementation of TextRank, I see that it also includes an optional word_count parameter. Wouldn't it be nice to also include that in our wrapper?

Also, this is not the original TextRank algorithm, but a variant that uses BM25 (and a lot of custom text cleaning under the hood). Are we okay with that?

msappelli · 2019-06-12T12:39:07Z

@lmdehaas The reason I stuck with ratio, is that on a sentence level I find the word count parameter less transparent and less useful --> you don't know how long your sentences are so you don't know the exact number of words you are going to need.

On the other hand if we have a use case where we want a fixed summary length, word_count makes more sense. Do we envision such a use case? Then I will add the word_count parameter.

I personally have no problem with it being a variation on the textrank algorithm, as my goal was also to simply exploit the gensim implementation as we already wrapped other gensim functions, but if this is a problem, please let me know (we can also rename it 'GensimTextrank' for clarity.

lmdehaas · 2019-06-12T12:49:46Z

I can definitely imagine such a use case; it's the most common use case for summarization benchmarking, so I would prefer including that parameter. Concerning the alternative Textrank implementation: I don't mind, I just wanted to make sure we were aware of this!

…ad of ratio

anneschuth

LGTM, apart from the issue found by codacy

anneschuth

Oh! And don't forget to bump the version

dodijk

Cool feature!

implemented textrank using gensim for extractive summarization of a d…

1b8c8fc

…ocument

graus requested review from anneschuth, dodijk and lmdehaas May 31, 2019 11:47

Maya Sappelli and others added 5 commits May 31, 2019 14:04

added a test for the textrank summary

c59cb6c

fixed bug with summary when document has too few sentences

c6d1983

better fix for the too few sentences bug

5dd75f7

fixed doctests that were going wrong remotely

488d663

remove print

56eee19

anneschuth reviewed Jun 11, 2019

View reviewed changes

tests/test_doc.py Outdated Show resolved Hide resolved

anneschuth reviewed Jun 11, 2019

View reviewed changes

textpipe/doc.py Outdated Show resolved Hide resolved

lmdehaas approved these changes Jun 12, 2019

View reviewed changes

anneschuth reviewed Jun 12, 2019

View reviewed changes

textpipe/doc.py Outdated Show resolved Hide resolved

catched gensim textpipe exception on short inputs

0cf47d7

lmdehaas self-requested a review June 12, 2019 12:17

added the word_count parameter for a fixed word_length summary inste…

fa25764

…ad of ratio

lmdehaas approved these changes Jun 12, 2019

View reviewed changes

anneschuth suggested changes Jun 12, 2019

View reviewed changes

anneschuth added 2 commits June 12, 2019 16:30

Update textpipe/doc.py

acfcbdb

Bumps version

6e3b7b8

anneschuth approved these changes Jun 12, 2019

View reviewed changes

textpipe deleted a comment Jun 12, 2019

dodijk approved these changes Jun 12, 2019

View reviewed changes

msappelli merged commit a9a3cf4 into master Jun 13, 2019

msappelli deleted the feature/138/add-textrank-sentences branch June 13, 2019 07:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extractive summarization using textrank #139

extractive summarization using textrank #139

msappelli commented May 31, 2019

msappelli commented May 31, 2019

lmdehaas commented Jun 12, 2019 •

edited

msappelli commented Jun 12, 2019

lmdehaas commented Jun 12, 2019

anneschuth left a comment

anneschuth left a comment

dodijk left a comment

extractive summarization using textrank #139

extractive summarization using textrank #139

Conversation

msappelli commented May 31, 2019

msappelli commented May 31, 2019

lmdehaas commented Jun 12, 2019 • edited

msappelli commented Jun 12, 2019

lmdehaas commented Jun 12, 2019

anneschuth left a comment

Choose a reason for hiding this comment

anneschuth left a comment

Choose a reason for hiding this comment

dodijk left a comment

Choose a reason for hiding this comment

lmdehaas commented Jun 12, 2019 •

edited