html-article-extractor

A web page content extractor for News websites

installation

npm install html-article-extractor

usage

var htmlArticleExtractor = require("html-article-extractor");

var dom = new JSDOM("...");
var body = dom.window.document.body
result = htmlArticleExtractor(body);
console.log(result)

Outputs:

{
    html: '<div>contents</div>',
    text: 'contents'
}

example

git clone https://github.com/jungyoun/html-article-extractor
cd html-article-extractor
npm install
node example/crawler.js

demo

https://online-article-extractor.herokuapp.com/

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
example		example
src		src
.deepsource.toml		.deepsource.toml
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

html-article-extractor

installation

usage

example

demo

About

Releases

Packages

Contributors 3

Languages

License

woojubb/html-article-extractor

Folders and files

Latest commit

History

Repository files navigation

html-article-extractor

installation

usage

example

demo

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages