Skip to content

Latest commit

 

History

History
136 lines (91 loc) · 9.21 KB

File metadata and controls

136 lines (91 loc) · 9.21 KB

Apache OpenNLP License NLP Java

Run a docker container with Apache OpenNLP written in Java, running under the traditional Java 8 (from OpenJDK or another source) or GraalVM.

Find out more about Natural Language Processing from the NLP section section.

Startup in traditional JDK or GraalVM mode.

Goals

  • Run the Docker container containing the Apache OpenNLP tool by using ./docker-runner --runContainer
  • Run in GraalVM mode inside the docker container by using switchToGraal at the prompt (polyglot JVM i.e. GraalVM JDK Community version from Oracle Labs) (optional)
  • Run a number of NLP actions to explore the Apache OpenNLP tool shown below in the Exploring NLP concepts section

Exploring NLP concepts

Detecting language

Detecting language in a single line text or article (see legend of language abbreviations used).

See Detecting Language

Detecting sentences

Detecting sentences in a single line text or article.

See Detecting sentences

Finding names

Finding person name, organization name, date, time, money, location, percentage information in a single line text or article.

See Finding names

Tokenise

Tokenise a line of text or an article into it’s smaller components (i.e. words, punctuation, numbers).

See Tokenise

Parser

Parse a line of text or an article and identify groups of words or phrases that go together (see Penn Treebank tag set for legend of token types).

See Parser

Tag Parts of Speech

Tag parts of speech of each token in a line of text or an article (see Penn Treebank tag set for legend of token types), also see https://nlp.stanford.edu/software/tagger.shtml.

See Tag Parts of Speech

Chunking

Text chunking by dividing a text or an article into syntactically correlated parts of words, like noun groups, verb groups. You apply this feature on the tagged parts of speech text or article. Apply chunking on a text already tagged by PoS tagger. Also see https://nlpforhackers.io/text-chunking/.

See Chunking

Scripts provided

Go to the previous folder to find the below scripts.

All the above scripts check if the respective model(s) exist and downloads them accordingly into the shared folder.

Docker image on Docker Hub (optional)

Find the NLP Java/JVM Docker Image on Docker Hub. The docker-runner.sh --pushImageToHub script pushes the image to the Docker hub and the docker-runner.sh --runContainer script runs it from the local repository. If absent, in the the local repository, it downloads this image from Docker Hub.

Resources

Apache OpenNLP

Other NLP Java/JVM libraries

Other related posts

Contributing

Contributions are very welcome, please share back with the wider community (and get credited for it)!

Please have a look at the CONTRIBUTING guidelines, also have a read about our licensing policy.


Back to main page (table of contents)