Skip to content
main
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

textrankr

Build Status Coverage Status PyPI version

Reorder sentences using TextRank algorithm.

  • Mostly designed for Korean, but not limited to.
  • Check out lexrankr, which is another awesome summarizer!
  • Not available for Python 2 anymore (if necessary, use version 0.3).

Installation

pip install textrankr

Tokenizers

Tokenizers are not included. You have to implement one by yourself.

Example:

from typing import List

class MyTokenizer:
    def __call__(self, text: str) -> List[str]:
        tokens: List[str] = text.split()
        return tokens

한국어의 경우 KoNLPy를 사용하는 방법이 있습니다. 아래 예시처럼 phrases를 쓰게되면 엄밀히는 토크나이저가 아니지만 이게 더 좋은 결과를 주는것 같습니다.

from typing import List
from konlpy.tag import Okt

class OktTokenizer:
    okt: Okt = Okt()

    def __call__(self, text: str) -> List[str]:
        tokens: List[str] = self.okt.phrases(text)
        return tokens

Usage

from typing import List
from textrankr import TextRank

mytokenizer: MyTokenizer = MyTokenizer()
textrank: TextRank = TextRank(mytokenizer)

k: int = 3  # num sentences in the resulting summary

summarized: str = textrank.summarize(your_text_here, k)
print(summarized)  # gives you some text

# if verbose = False, it returns a list
summaries: List[str] = textrank.summarize(your_text_here, k, verbose=False)
for summary in summaries:
    print(summary)

Test

Use docker.

docker build -t textrankr -f Dockerfile .
docker run --rm -it textrankr