Skip to content

townie/boilerpipe

Repository files navigation

boilerpipe

API

GET /extractor/method?url=URL

Parameters Descriptions
extractor name of the extractor to use
method extraction method
url the url to extract content from

Extractors

Name Descriptions
article A full-text extractor which is tuned towards news articles. In this scenario it achieves higher accuracy than DefaultExtractor.
keepeverything Treats everything as "content". Useful to track down SAX parsing errors.
keepeverythingwithminkwords -
largestcontent Like DefaultExtractor, but only keeps the largest content block. Good for non-article style texts with only one main content block.
numwordsrules -
canola -
default quite generic full-text extractor, but usually not as good as ArticleExtractor.

Methods

Name Descriptions
text Output the extracted main content as plain text
images -
html Output the whole HTML document and highlight the extracted main content

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published