Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Implementing java based text extractors as web APIs (currently only Boilerpipe & Goose)
Java
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
TextExtrApi
.gitignore
README.rst

README.rst

Java Text Extractor API

Web API for Java based text extractors. Implemented using Play framework.

Author

Tomaž Kovačič <tomaz.kovacic@gmail.com>

Extractors supported

API Documentation

Note: All parameters should be encoded using x-www-form-urlencoded

Boilerpipe API

method: POST

endpoint: http://yourdomain/boilerpipe/extract/

params:

  • extractorType : (article|default|sentence)
  • rawHtml : html content

JSON response format:

{
        "result": RESULT_TEXT
        "status": (OK|ERROR)
        "errorMsg": ERROR_MESSAGE (optional)
}

Goose API

method: POST

endpoint: http://yourdomain/goose/extract/

params:

  • rawHtml : html content

JSON response format:

{
        "result": RESULT_TEXT
        "status": (OK|ERROR)
        "errorMsg": ERROR_MESSAGE (optional)
}

Dependencies

  • Play framework v1.1.1.

Licence

  • Everything that's not in the /lib/ directory is licenced under GPLv3

  • Jar packages in the /lib/ are licenced under their respective licence listed below:

Copyright (C) Tomaž Kovačič

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

Something went wrong with that request. Please try again.