Run a docker container with Apache OpenNLP written in Java, running under the traditional Java 8 (from OpenJDK or another source) or GraalVM.
Find out more about Natural Language Processing from the NLP section section.
Startup in traditional JDK or GraalVM mode.
- Run the Docker container containing the Apache OpenNLP tool by using
./docker-runner --runContainer
- Run in GraalVM mode inside the docker container by using
switchToGraal
at the prompt (polyglot JVM i.e. GraalVM JDK Community version from Oracle Labs) (optional) - Run a number of NLP actions to explore the Apache OpenNLP tool shown below in the Exploring NLP concepts section
Detecting language in a single line text or article (see legend of language abbreviations used).
Detecting sentences in a single line text or article.
Finding person name, organization name, date, time, money, location, percentage information in a single line text or article.
See Finding names
Tokenise a line of text or an article into it’s smaller components (i.e. words, punctuation, numbers).
See Tokenise
Parse a line of text or an article and identify groups of words or phrases that go together (see Penn Treebank tag set for legend of token types).
See Parser
Tag parts of speech of each token in a line of text or an article (see Penn Treebank tag set for legend of token types), also see https://nlp.stanford.edu/software/tagger.shtml.
Text chunking by dividing a text or an article into syntactically correlated parts of words, like noun groups, verb groups. You apply this feature on the tagged parts of speech text or article. Apply chunking on a text already tagged by PoS tagger. Also see https://nlpforhackers.io/text-chunking/.
See Chunking
Go to the previous folder to find the below scripts.
- opennlp.sh: download and install the Apache OpenNLP tool into the
shared
folder - detectLanguage.sh: Detecting language in a single line text or article (see legend of language abbreviations used).
- detectSentence.sh: Detecting sentences in a single line text or article.
- nameFinder.sh: Finding person name, organization name, date, time, money, location, percentage information in a single line text or article.
- tokenizer.sh: Tokenise a line of text or an article into it’s smaller components (i.e. words, punctuation, numbers).
- parser.sh: Parse a line of text or an article and identify groups of words or phrases that go together (see Penn Treebank tag set for legend of token types).
- posTagger.sh: Tag parts of speech of each token in a line of text or an article (see Penn Treebank tag set for legend of token types), also see https://nlp.stanford.edu/software/tagger.shtml.
- chunker.sh: Text chunking by dividing a text or an article into syntactically correlated parts of words, like noun groups, verb groups. You apply this feature on the tagged parts of speech text or article. Apply chunking on a text already tagged by PoS tagger. Also see https://nlpforhackers.io/text-chunking/.
All the above scripts check if the respective model(s) exist and downloads them accordingly into the shared
folder.
Find the NLP Java/JVM Docker Image on Docker Hub. The docker-runner.sh --pushImageToHub
script pushes the image to the Docker hub and the docker-runner.sh --runContainer
script runs it from the local repository. If absent, in the the local repository, it downloads this image from Docker Hub.
- Apache OpenNLP | GitHub | Mailing list | @apacheopennlp
- Docs
- Download
- Legend to support the examples in the docs
- Standford CoreNLP (GPL v2)
- NLP4J: NLP Toolkit for JVM Languages
- Word2vec in Java (DL4J)
- ReVerb: Web-Scale Open Information Extraction
- OpenRegex: An efficient and flexible token-based regular expression language and engine
- CogcompNLP: Core libraries developed in the U of Illinois' Cognitive Computation Group
- MALLET - MAchine Learning for LanguagE Toolkit
- RDRPOSTagger - A robust POS tagging toolkit available (in both Java & Python) together with pre-trained models for 40+ languages.
- Java AI/ML/DL resources
- Deep Learning and DL4J Resources
- Awesome AI/ML/DL: NLP resources
- DL4J NLP resources
- Language processing
- Examples
- How to do Deep Learning for Java on the Valohai Platform?
- NLP with DL4J in Java, all from the command-line
Contributions are very welcome, please share back with the wider community (and get credited for it)!
Please have a look at the CONTRIBUTING guidelines, also have a read about our licensing policy.
Back to main page (table of contents)