Skip to content

Spark NLP 6.0.1: SmolVLM, PaliGemma 2, Gemma 3, PDF Reader enhancements

Compare
Choose a tag to compare
@DevinTDHa DevinTDHa released this 14 May 19:43
· 50 commits to master since this release
6.0.1

📢 Spark NLP 6.0.1: Introducing New State-of-the-Art Vision-Language Models and Enhanced Document Processing

We are pleased to announce the release of Spark NLP 6.0.1, bringing exciting new vision features and continued enhancements. Expand your NLP capabilities at scale for a wide range of tasks by upgrading to 6.0.1 and leverage these powerful new additions and improvements!

We also have been adding blog posts covering various examples for our newest features. Check them out at Medium - Spark NLP!

🔥 Highlights

  • Added support for several new State-of-the-Art vision language models (VLM) including Gemma 3, PaliGemma, PaliGemma2, and SmolVLM.
  • Introduced new parameter options for the PDF Reader for enhanced document ingestion control.

🚀 New Features & Enhancements

New VLM Implementations

This release adds support for several cutting-edge VLMs, significantly expanding the range of tasks you can tackle with Spark NLP:

  • Gemma 3: The latest version of Google's lightweight, state-of-the-art open models. (link to notebook)
  • PaliGemma and PaliGemma 2: Integration of the original PaliGemma vision-language model by Gogle. This annotator can also read PaliGemma2 models. (link to notebook)
  • SmolVLM: small, fast, memory-efficient, and fully open-source 2B VLM (link to notebook)

PDF Reader Enhancements

The PDF Reader now includes additional parameters and options, providing users with more flexible and controlled ingestion of PDF documents, improving handling of various PDF structures. (link to notebook)

You can now

  • Add splitPage parameter to identify the correct number of pages
  • Add onlyPageNum parameter to display only the number of pages of the document
  • Add textStripper parameter used for output layout and formatting
  • Add sort parameter to enable or disable sorting lines

🐛 Bug Fixes

This release also includes fixes for several issues:

  • Fixed a python error in RoBERtaMultipleChoice, preventing these types of annotators to be loaded in Python
  • Fixed various typos and issues in our Jupyter notebook examples

❤️ Community Support

  • Slack For live discussion with the Spark NLP community and the team
  • GitHub Bug reports, feature requests, and contributions
  • Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!
  • Medium Spark NLP articles
  • JohnSnowLabs official Medium
  • YouTube Spark NLP video tutorials

⚙️ Installation

Python

#PyPI

pip install spark-nlp==6.0.1

Spark Packages

spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x (Scala 2.12):

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:6.0.1

pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:6.0.1

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:6.0.1

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:6.0.1

Apple Silicon

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:6.0.1

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:6.0.1

AArch64

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:6.0.1

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:6.0.1

Maven

spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp_2.12</artifactId>
    <version>6.0.1</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu_2.12</artifactId>
    <version>6.0.1</version>
</dependency>

spark-nlp-silicon:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-silicon_2.12</artifactId>
    <version>6.0.1</version>
</dependency>

spark-nlp-aarch64:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-aarch64_2.12</artifactId>
    <version>6.0.1</version>
</dependency>

FAT JARs

What's Changed

Full Changelog: 6.0.0...6.0.1