Skip to content

Minimal RegEx-based JavaScript library for converting HTML to Text

Notifications You must be signed in to change notification settings

vorushin/jsHtmlToText

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

jsHtmlToText is a basic regex-based HTML stripper.

How it works

  1. It removes all linebreaks, script-tags, styles, comments and DOCTYPE.
  2. Replaces most of block tags with double newlines.
  3. Replaces some of the block tags with a single newline.
  4. Replaces
    with a single newline.
  5. Removes all html-tags left after steps 1-4
  6. Makes sure there are no more that two linebreaks in a row, removes linebreaks at the start and at the end of string, trims rightmost whitespaces and decodes html entities such as  

Extensions

You are able to change process of formatting by providing hooks in pre-processing, tag-replacing and post-processing. See tests/extensions.coffee for examples of hooks.

History

It was created for a custom WebTV widget framework, which did not support other existing implementations.

Use it when:

  • you need a minimal implementation
  • don't have access to a proper JavaScript implementation (it was originally written for Konqueror)
  • there no strong security requirements

About

Minimal RegEx-based JavaScript library for converting HTML to Text

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 80.2%
  • CoffeeScript 19.8%