Run a docker container with NLP libraries/frameworks written in Java/JVM languages, running under the traditional Java 9 (from OpenJDK or another source) or GraalVM.
- Run docker container containing NLP libraries/frameworks written in Java/JVM languages
- Ability to create custom docker images (scripts & docs provided)
- Ability to debug the docker container
- Run using the traditional JDK 9 (OpenJDK or vendor specific versions)
- Run using the polyglot JVM i.e. GraalVM JDK (Community version from Oracle Labs), when running performing operations from the CLI
- Play with and learn from with some examples for each of the libraries provided
Libraries / frameworks provided
- Standford CoreNLP
- Apache OpenNLP | See README for usage and examples
- NLP4J: NLP Toolkit for JVM Languages
- Word2vec in Java
- ReVerb: Web-Scale Open Information Extraction
- OpenRegex: An efficient and flexible token-based regular expression language and engine
- CogcompNLP: Core libraries developed in the U of Illinois' Cognitive Computation Group
- MALLET - MAchine Learning for LanguagE Toolkit
- RDRPOSTagger - A robust POS tagging toolkit available (in both Java & Python) together with pre-trained models for 40+ languages.
- Clojure-openNLP - Natural Language Processing in Clojure (opennlp)
- Infections-clj - Rails-like inflection library for Clojure and ClojureScript
- postagga - A library to parse natural language in Clojure and ClojureScript
- Lingua - A language detection library for Kotlin and Java, suitable for long and short text alike
- Kotidgy — an index-based text data generator written in Kotlin
- Saul - Library for developing NLP systems, including built in modules like SRL, POS, etc.
- ATR4S - Toolkit with state-of-the-art automatic term recognition methods.
- tm - Implementation of topic modeling based on regularized multilingual PLSA.
- word2vec-scala - Scala interface to word2vec model; includes operations on vectors like word-distance and word-analogy.
- Epic - Epic is a high performance statistical parser written in Scala, along with a framework for building complex structured prediction models.
Scroll up to find the below provided scripts
- docker-runner.sh: can perform a number of the below actions depending on the flags passed to it:
- runs the container and brings you to the command prompt inside the container:
- build the docker base and language (i.e. java, clojure, kotlin, scala) specific image takes under 5 minutes to finish on a decent connection
- push pre-built docker images to docker hub (please pass in your own Docker username and later on enter Docker login details, see usage below)
- a housekeeping script to remove dangling images and terminated containers (helps save some diskspace)
- Base Dockerfile | Java Dockerfile: Dockerfile scripts to help build the base and language (i.e. java, clojure, kotlin, scala) specific docker image of NLP Java/JVM in an isolated environment with the necessary dependencies.
- images folder - provided with scripts to build and the scripts included into the container for the base image and language (i.e. java, clojure, kotlin, scala) specific docker image
$ ./docker-runner.sh --help Usage: ./docker-runner.sh --dockerUserName [docker user name] --language [language id] --detach --buildImage --runContainer --pushImageToHub --cleanup --help --dockerUserName docker user name as on Docker Hub (mandatory with build and push commands) --language language id as in java, clojure, scala, etc... --detach run container and detach from it, return control to console --jdk name of the JDK to use (currently supports GRAALVM only, default is blank which enables the traditional JDK) GRAALVM is only for CLI operations --javaopts sets the JAVA_OPTS environment variable inside the container as it starts --cleanup (command action) remove exited containers and dangling images from the local repository --buildImage (command action) build the docker image --runContainer (command action) run the docker image as a docker container --pushImageToHub (command action) push the docker image built to Docker Hub --help shows the script usage help text
Run the NLP Java/JVM docker container:
$ ./docker-runner.sh --runContainer or $ ./docker-runner.sh --runContainer --dockerUserName [your docker user name] or run in GraalVM mode (for CLI operations) $ ./docker-runner.sh --runContainer --jdk "GRAALVM" or run by switching off JVMCI flag (default: on) when running in GRAALVM mode $ ./docker-runner.sh --javaopts "-XX:-UseJVMCINativeLibrary"
Build the docker container:
Ensure your environment has the below variable set, or set it in your
.bash_profile or the relevant startup script:
You must have an account on Docker hub under the above user name.
$ ./docker-runner --buildImage or $ ./docker-runner --buildImage --dockerUserName "your_docker_username" or $ ./docker-runner --buildImage --language [language_id]
[language_id] - defaults to
java when not provided. Accepts:
Push built NLP Java/JVM docker image to Docker hub:
$ ./docker-runner --pushImageToHub or $ ./docker-runner --pushImageToHub --dockerUserName "your_docker_username"
The above will prompt the docker login name and password, before it can push your image to Docker hub (you must have an account on Docker hub).
Docker image on Docker Hub
Find the NLP Java/JVM Docker Image on Docker Hub. The
docker-runner.sh --pushImageToHub script pushes the image to the Docker hub and the
docker-runner.sh --runContainer script runs it from the local repository. If absent, in the the local repository, it downloads this image from Docker Hub.
Contributions are very welcome, please share back with the wider community (and get credited for it)!
Go to NLP page