boilerpipe

API

GET /extractor/method?url=URL

Parameters	Descriptions
extractor	name of the extractor to use
method	extraction method
url	the url to extract content from

Extractors

Name	Descriptions
article	A full-text extractor which is tuned towards news articles. In this scenario it achieves higher accuracy than DefaultExtractor.
keepeverything	Treats everything as "content". Useful to track down SAX parsing errors.
keepeverythingwithminkwords	-
largestcontent	Like DefaultExtractor, but only keeps the largest content block. Good for non-article style texts with only one main content block.
numwordsrules	-
canola	-
default	quite generic full-text extractor, but usually not as good as ArticleExtractor.

Methods

Name	Descriptions
text	Output the extracted main content as plain text
images	-
html	Output the whole HTML document and highlight the extracted main content

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
config		config
src		src
test		test
.gitignore		.gitignore
.jscsrc		.jscsrc
.jshintignore		.jshintignore
.jshintrc		.jshintrc
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.js		app.js
circle.yml		circle.yml
docker-compose.yml		docker-compose.yml
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config

config

src

src

test

test

.gitignore

.gitignore

.jscsrc

.jscsrc

.jshintignore

.jshintignore

.jshintrc

.jshintrc

Dockerfile

Dockerfile

LICENSE

LICENSE

README.md

README.md

app.js

app.js

circle.yml

circle.yml

docker-compose.yml

docker-compose.yml

package.json

package.json

Repository files navigation

boilerpipe

API

Extractors

Methods

About

Releases

Packages

Languages

License

townie/boilerpipe

Folders and files

Latest commit

History

Repository files navigation

boilerpipe

API

Extractors

Methods

About

Resources

License

Stars

Watchers

Forks

Languages