Skip to content

vikasing/NiceText

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

NiceText

NiceText removes HTML Clutter from a Web Page, it tries to find out the concentration of the text on a web page and uses heuristics to determine the main block of the text. #####Example Usage NiceText interface is implemented by NTImpl, it has a method extract which takes a URL (String type) as parameter, here is a simple use case:

NiceText niceText = new NTImpl();
String text = niceText.extract("http://www.scientificamerican.com/article/common-parasite-could-manipulate-our-behavior/");
System.out.println(text);

This is what I get:

Already a subscriber or purchased this issue? Sign In. Imagine a world without fear. It might be empowering to go about your daily life uninhibited by everyday distresses. You could cross highways with confidence, take on all kinds of daredevilry and watch horror flicks without flinching. Yet consider the prospect a little more deeply, and the possibilities become darker, even deadly. Our fears, after all, can protect us. The basic aversion that a mouse has for a cat, for instance, keeps the rodent out of death's jaws. But unfortunately for mice everywhere, there is a second enemy with which to contend, one that may prevent them from experiencing that fear in the first place. A unicellular organism (a protozoan), Toxoplasma gondii, can override a rodent's most basic survival instincts. The result is a rodent that does not race away from a cat but is instead strangely attracted to it. Already a subscriber or purchased this issue? Sign In.

About

Removes HTML Clutter from a Web Page

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages