Skip to content
This repository has been archived by the owner on Jun 9, 2021. It is now read-only.

extractive summarization using textrank #139

Merged
merged 10 commits into from Jun 13, 2019

Conversation

msappelli
Copy link

Textrank was implemented using the gensim framework in order to get a list of extracted sentences from a document that could serve as a summary of the document.

@msappelli
Copy link
Author

implementation for issue #138

tests/test_doc.py Outdated Show resolved Hide resolved
textpipe/doc.py Outdated Show resolved Hide resolved
textpipe/doc.py Outdated Show resolved Hide resolved
@lmdehaas
Copy link
Collaborator

lmdehaas commented Jun 12, 2019

Now that I look at the Gensim implementation of TextRank, I see that it also includes an optional word_count parameter. Wouldn't it be nice to also include that in our wrapper?

Also, this is not the original TextRank algorithm, but a variant that uses BM25 (and a lot of custom text cleaning under the hood). Are we okay with that?

@lmdehaas lmdehaas self-requested a review June 12, 2019 12:17
@msappelli
Copy link
Author

@lmdehaas The reason I stuck with ratio, is that on a sentence level I find the word count parameter less transparent and less useful --> you don't know how long your sentences are so you don't know the exact number of words you are going to need.

On the other hand if we have a use case where we want a fixed summary length, word_count makes more sense. Do we envision such a use case? Then I will add the word_count parameter.

I personally have no problem with it being a variation on the textrank algorithm, as my goal was also to simply exploit the gensim implementation as we already wrapped other gensim functions, but if this is a problem, please let me know (we can also rename it 'GensimTextrank' for clarity.

@lmdehaas
Copy link
Collaborator

I can definitely imagine such a use case; it's the most common use case for summarization benchmarking, so I would prefer including that parameter. Concerning the alternative Textrank implementation: I don't mind, I just wanted to make sure we were aware of this!

Copy link
Member

@anneschuth anneschuth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, apart from the issue found by codacy

Copy link
Member

@anneschuth anneschuth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! And don't forget to bump the version

@textpipe textpipe deleted a comment Jun 12, 2019
@textpipe textpipe deleted a comment Jun 12, 2019
@textpipe textpipe deleted a comment Jun 12, 2019
Copy link
Member

@dodijk dodijk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool feature!

@msappelli msappelli merged commit a9a3cf4 into master Jun 13, 2019
@msappelli msappelli deleted the feature/138/add-textrank-sentences branch June 13, 2019 07:32
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants