This project demonstrates text processing capabilities using Apache OpenNLP and JavaFX, featuring a simple interactive GUI for educational purposes. The application performs tokenization, sentence detection, part-of-speech tagging, and named entity recognition, showcasing basic NLP techniques with machine learning models.
- Tokenization: Splits text into individual tokens (words).
- Sentence Detection: Identifies and separates sentences within the text.
- Part-of-Speech Tagging: Tags each token with its corresponding part of speech.
- Named Entity Recognition (NER): Detects and labels named entities (such as people, organizations, etc.).
- Export Results: Allows users to export processed text or results to a local file.
- Java 17 or higher
- JavaFX SDK 22.0.1
- Apache OpenNLP library
git clone https://github.com/vdrvar/java_ml_text_processor.git
cd java_ml_text_processorDownload the JavaFX SDK from Gluon and place it in your preferred directory. Ensure the necessary environment variables are set if needed.
- Make the script executable:
chmod +x run.sh- Run the script:
./run.sh- Run the batch script:
run.batThis will compile the Java source files and then run the application.
Enter the text you want to process in the input area.

Click the Tokenize button to split the input text into individual tokens.

Click the Detect Sentences button to identify and separate sentences within the input text.

Click the POS Tagging button to tag each token with its corresponding part of speech.

Click the NER button to detect and label named entities in the input text.

Click the Export Results button to save the processed results to a file.

This project is licensed under the MIT License. See the LICENSE file for details.