Skip to content
This repository has been archived by the owner on Jun 12, 2023. It is now read-only.

wutsi/wutsi-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wutsi-extractor is a HTML content extractor

Content Extractors

This library provide several HTML extractors:

  • Content Extractor - based on eatiht algorithm
  • Main Image Extractor
  • Tags Extractor
  • Site name Extractor
  • Title Extractor
  • URL Extractor
  • Published Date Extractors