Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release/600 release candidate #14534

Open
wants to merge 126 commits into
base: master
Choose a base branch
from
Open
Changes from 17 commits
Commits
Show all changes
126 commits
Select commit Hold shift + click to select a range
1ea76c3
[SPARKNLP-1105] Introducing AlbertForMultipleChoice
danilojsl Dec 27, 2024
9e67d89
[SPARKNLP-1105] Addiong test tags
danilojsl Dec 27, 2024
3b86715
[SPARKNLP-1106] Introducing DistilBertForMultipleChoice
danilojsl Dec 31, 2024
191c78b
[SPARKNLP-1106] Adding notebook examples for DistilBertForMultipleChoice
danilojsl Jan 2, 2025
52e4cd4
[SPARKNLP-1107] Introducing RoBertaForMultipleChoice
danilojsl Jan 3, 2025
4d2c06e
[SPARKNLP-1107] Adding example notebooks for RobertaForMultipleChoice
danilojsl Jan 3, 2025
873e224
[SPARKNLP-1108] Introducing XlmRoBertaForMultipleChoice
danilojsl Jan 6, 2025
6c3e9cc
[SPARKNLP-1108] Adding notebooks example for XlmRoBertaForMultipleChoice
danilojsl Jan 8, 2025
4cd472d
[SPARKNLP-1098] Adding PDF reader support
danilojsl Jan 8, 2025
f3583d1
[SPARKNLP-1098] Adding docs and notebook example for PDF reader
danilojsl Jan 15, 2025
d7be7f0
Merge release/600
DevinTDHa Jan 18, 2025
208bb75
Refactor automatic gpu support
DevinTDHa Oct 25, 2024
e5c24f5
[SPARKNLP-1079] AutoGGUFVisionModel Scala Side
DevinTDHa Dec 14, 2024
544f722
[SPARKNLP-1079] AutoGGUFVisionModel Python Side
DevinTDHa Jan 18, 2025
ac80d3f
[SPARKNLP-1079] AutoGGUFVisionModel documentation and end-to-end example
DevinTDHa Jan 18, 2025
0f5d073
[SPARKNLP-1079] Bump jsl-llamacpp version
DevinTDHa Jan 18, 2025
998d1f5
[SPARKNLP-1079] AutoGGUFVisionModel pretrained model
DevinTDHa Jan 24, 2025
73cd3ad
fixing typo in MXBAI notebook
ahmedlone127 Jan 29, 2025
cdab6bb
Janus Scala API
prabod Feb 6, 2025
195c097
Janus Scala Documentation
prabod Feb 6, 2025
082db05
Janus Python API
prabod Feb 6, 2025
4d8bf47
[SPARKNLP-1079] AutoGGUFVisionModel pretrained model
DevinTDHa Jan 24, 2025
deb3952
[SPARKNLP-1079] Fix loadImagesAsBytes path creation
DevinTDHa Jan 24, 2025
f2be057
[SPARKNLP-1079] Fix batch inference for AutoGGUFVisionModel
DevinTDHa Feb 9, 2025
3d31759
[SPARKNLP-1079] Add note that only CLIP models are supported
DevinTDHa Feb 9, 2025
c3dca2d
update config values on the instance
prabod Feb 12, 2025
f9bd02d
added OLMo scala api
prabod Apr 22, 2024
32635d2
added OLMo scala api
prabod Apr 22, 2024
d52d4e0
added OLMo python API and tests
prabod Apr 24, 2024
2eedcb3
OlMo Notebook and bug fixes
prabod Feb 12, 2025
05cbe8b
update default name and documentation
prabod Feb 12, 2025
408958a
update default name
prabod Feb 12, 2025
a369ce9
Phi3V preprocessing utils
prabod Sep 19, 2024
6896a02
added phi3v
prabod Oct 23, 2024
621411b
add phi3v scala API
prabod Oct 28, 2024
c047188
Added tests
prabod Oct 29, 2024
59e596c
Phi3V python api and tests
prabod Oct 29, 2024
d0ad585
added byte fallback
prabod Oct 29, 2024
c12713d
changed to pretrained
prabod Oct 29, 2024
d12ae3d
export notebook
prabod Oct 30, 2024
5040468
updated testes
prabod Oct 30, 2024
27140d7
update default name and documentation
prabod Feb 13, 2025
9752516
update documentation and resource downloader entry
prabod Feb 13, 2025
1af39be
LLAVA Scala API and Tests
prabod Nov 1, 2024
b5872e7
LLAVA Test
prabod Nov 1, 2024
6f7c4d6
LLAVA python api
prabod Nov 6, 2024
6f2f3a9
LLAVA notebook
prabod Nov 7, 2024
64b6b20
Add custom model requirements
prabod Nov 8, 2024
2661bfb
update documentation and resource downloader entry
prabod Feb 13, 2025
af28bf2
cohere scala and python api
prabod Nov 13, 2024
133b326
Cohere Notebook
prabod Nov 14, 2024
028ca67
update documentation and resource downloader entry
prabod Feb 13, 2025
db48372
update documentation and resource downloader entry
prabod Feb 13, 2025
b967682
Qwen2VL scala API
prabod Dec 9, 2024
e5017e2
QWEN2VL python api
prabod Dec 10, 2024
16c9716
QWEN2VL Notebook
prabod Dec 10, 2024
934af90
update default_model and resource downloader entry
prabod Feb 14, 2025
d19b9f7
update documentation
prabod Feb 14, 2025
2cd2cae
update model
prabod Feb 14, 2025
7052361
added preprocessing utils for MLLama
prabod Dec 25, 2024
89c1803
MLLama tokenizers and utils
prabod Jan 8, 2025
58e309b
MLLama scala api
prabod Jan 20, 2025
46fe907
MLLama scala api changes
prabod Jan 21, 2025
c19f4eb
MLLama python api
prabod Jan 23, 2025
1e0500f
update default model, notebook and documentation
prabod Feb 14, 2025
0f9d4d9
[SPARKNLP-1098] Enabling getStoreSplittedPdf parameter to PDF reader
danilojsl Feb 21, 2025
3815c20
Merge branch 'master' of github.com:JohnSnowLabs/spark-nlp into featu…
danilojsl Feb 21, 2025
638d26b
[SPARKNLP-1098] Adding PdfToText notebook example
danilojsl Feb 24, 2025
ffe4e21
added image generation scala API
prabod Feb 26, 2025
dd2a400
Update HuggingFace_OpenVINO_in_Spark_NLP_MLLama.ipynb
prabod Feb 26, 2025
b4dc462
Update HuggingFace_OpenVINO_in_Spark_NLP_MLLama.ipynb
prabod Feb 26, 2025
c92a544
Update HuggingFace_OpenVINO_in_Spark_NLP_MLLama.ipynb
prabod Feb 26, 2025
f7d5893
Update HuggingFace_OpenVINO_in_Spark_NLP_MLLama.ipynb
prabod Feb 26, 2025
afd7338
added image generation python API and tests
prabod Mar 4, 2025
8eeccf7
[SPARKNLP-1117] Adding storeContent to HTML, Word and Email readers
danilojsl Mar 6, 2025
9386b44
[SPARKNLP-1117] Refactoring documentation for readers
danilojsl Mar 6, 2025
649c862
[SPARKNLP-1102] Adding support to read Excel files
danilojsl Dec 17, 2024
60a8521
[SPARKNLP-1102] Adding notebook example to read Excel files
danilojsl Dec 19, 2024
4bb3ad5
[SPARKNLP-1102] Refactoring documentation for excel reader
danilojsl Mar 6, 2025
af7b13e
[SPARKNLP-1117] Adding storeContent param
danilojsl Mar 6, 2025
1999ae5
[SPARKNLP-1103] Adding support to read PowerPoint files and adds loca…
danilojsl Dec 24, 2024
6dc756e
[SPARKNLP-1103] Adding documentation and notebook example for PowerPo…
danilojsl Dec 24, 2024
26e023e
[SPARKNLP-1117] Adding storeContent param
danilojsl Mar 6, 2025
8710011
[SPARKNLP-1113] Adding Text Reader
danilojsl Feb 17, 2025
30502cc
[SPARKNLP-1113] Adding txt reader notebook example
danilojsl Feb 17, 2025
cdb8f36
[SPARKNLP-1117] Adding storeContent param
danilojsl Mar 7, 2025
19d2dd4
added notebook
prabod Mar 11, 2025
d7bbb51
Improved Error Handling for AutoGGUF models
DevinTDHa Mar 14, 2025
489af7e
Add setNParallel for AutoGGUF models on python side
DevinTDHa Mar 14, 2025
9247f0d
Merge branch 'bug/gguf-embeddings-context' into feature/SPARKNLP-1079…
DevinTDHa Mar 14, 2025
9df68a7
Improved Error Handling and setNParallel alias for batch size
DevinTDHa Mar 14, 2025
f3d353c
Fix notebook error format
DevinTDHa Mar 14, 2025
05000ab
Merge pull request #14242 from JohnSnowLabs/SPARKNLP-1006-Implement-OLMo
maziyarpanahi Mar 16, 2025
6d71770
Merge branch 'release/600-release-candidate' into SPARKNLP-1060-Imple…
maziyarpanahi Mar 16, 2025
44fb92a
Merge pull request #14444 from JohnSnowLabs/SPARKNLP-1060-Implement-P…
maziyarpanahi Mar 16, 2025
f33ce00
Merge branch 'release/600-release-candidate' into SPARKNLP-1033-Imple…
maziyarpanahi Mar 16, 2025
7b65030
Merge pull request #14450 from JohnSnowLabs/SPARKNLP-1033-Implement-L…
maziyarpanahi Mar 16, 2025
92c7e12
Merge branch 'release/600-release-candidate' into SPARKNLP-1032-CoHere
maziyarpanahi Mar 16, 2025
2c867de
Merge pull request #14457 from JohnSnowLabs/SPARKNLP-1032-CoHere
maziyarpanahi Mar 16, 2025
39ed5e7
Merge branch 'release/600-release-candidate' into SPARKNLP-1077-Imple…
maziyarpanahi Mar 16, 2025
c31306f
Merge pull request #14474 from JohnSnowLabs/SPARKNLP-1077-Implementin…
maziyarpanahi Mar 16, 2025
5417d91
updating python and scala model names (#14488)
ahmedlone127 Mar 16, 2025
b35c90c
Merge pull request #14489 from JohnSnowLabs/feature/SPARKNLP-1102-Add…
maziyarpanahi Mar 16, 2025
e7a79fb
Merge pull request #14491 from JohnSnowLabs/feature/SPARKNLP-1103-Add…
maziyarpanahi Mar 16, 2025
dd57c97
Merge pull request #14492 from JohnSnowLabs/feature/SPARKNLP-1105-Imp…
maziyarpanahi Mar 16, 2025
d7e2851
Merge branch 'release/600-release-candidate' into feature/SPARKNLP-11…
maziyarpanahi Mar 16, 2025
cbbca68
Merge pull request #14493 from JohnSnowLabs/feature/SPARKNLP-1106-Imp…
maziyarpanahi Mar 16, 2025
06ef557
Merge branch 'release/600-release-candidate' into feature/SPARKNLP-11…
maziyarpanahi Mar 16, 2025
7bd3ca0
Merge pull request #14495 from JohnSnowLabs/feature/SPARKNLP-1107-Imp…
maziyarpanahi Mar 16, 2025
c737a27
Merge branch 'release/600-release-candidate' into feature/SPARKNLP-11…
maziyarpanahi Mar 16, 2025
0420d04
Merge pull request #14497 from JohnSnowLabs/feature/SPARKNLP-1108-Imp…
maziyarpanahi Mar 16, 2025
2b363e2
Merge branch 'release/600-release-candidate' into feature/SPARKNLP-10…
maziyarpanahi Mar 16, 2025
3999409
Merge pull request #14499 from JohnSnowLabs/feature/SPARKNLP-1098-Add…
maziyarpanahi Mar 16, 2025
7673843
Merge branch 'release/600-release-candidate' into SPARKNLP-1078-Imple…
maziyarpanahi Mar 16, 2025
6283a8f
Merge pull request #14502 from JohnSnowLabs/SPARKNLP-1078-Implement-L…
maziyarpanahi Mar 16, 2025
6194f03
Merge branch 'release/600-release-candidate' into feature/SPARKNLP-10…
maziyarpanahi Mar 16, 2025
3def1a0
Merge pull request #14505 from DevinTDHa/feature/SPARKNLP-1079-AutoGG…
maziyarpanahi Mar 16, 2025
e4f1961
Merge pull request #14510 from JohnSnowLabs/Fixing-MXBAI-Embedding-no…
maziyarpanahi Mar 16, 2025
a9d7980
SPARKNLP-1109 Adding Extractor to Sparknlp (#14519)
danilojsl Mar 16, 2025
94290cc
Merge pull request #14524 from JohnSnowLabs/feature/SPARKNLP-1113-Add…
maziyarpanahi Mar 16, 2025
d807b49
Merge branch 'release/600-release-candidate' into SPARKNLP-1088-Imple…
maziyarpanahi Mar 16, 2025
6b80b40
Merge pull request #14532 from JohnSnowLabs/SPARKNLP-1088-Implement-D…
maziyarpanahi Mar 16, 2025
39e60e3
Merge pull request #14533 from DevinTDHa/bug/gguf-embeddings-context
maziyarpanahi Mar 16, 2025
0dc22eb
Adding missing bracket in SparkNLPReader and formatting some files
danilojsl Mar 17, 2025
311f988
Adding misssing return dataframe for PDF reader in Python
danilojsl Mar 18, 2025
9e7c2fc
Updating reader notebooks
danilojsl Mar 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
202 changes: 202 additions & 0 deletions docs/en/annotator_entries/AutoGGUFVisionModel.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
{%- capture title -%}
AutoGGUFVisionModel
{%- endcapture -%}

{%- capture description -%}
Multimodal annotator that uses the llama.cpp library to generate text completions with large
language models. It supports ingesting images for captioning.

At the moment only CLIP based models are supported.

For settable parameters, and their explanations, see HasLlamaCppInferenceProperties,
HasLlamaCppModelProperties and refer to the llama.cpp documentation of
[server.cpp](https://github.com/ggerganov/llama.cpp/tree/7d5e8777ae1d21af99d4f95be10db4870720da91/examples/server)
for more information.

If the parameters are not set, the annotator will default to use the parameters provided by
the model.

This annotator expects a column of annotator type AnnotationImage for the image and
Annotation for the caption. Note that the image bytes in the image annotation need to be
raw image bytes without preprocessing. We provide the helper function
ImageAssembler.loadImagesAsBytes to load the image bytes from a directory.

Pretrained models can be loaded with `pretrained` of the companion object:

```scala
val autoGGUFVisionModel = AutoGGUFVisionModel.pretrained()
.setInputCols("image", "document")
.setOutputCol("completions")
```

The default model is `"llava_v1.5_7b_Q4_0_gguf"`, if no name is provided.

For available pretrained models please see the [Models Hub](https://sparknlp.org/models).

For extended examples of usage, see the
[AutoGGUFVisionModelTest](https://github.com/JohnSnowLabs/spark-nlp/tree/master/src/test/scala/com/johnsnowlabs/nlp/annotators/seq2seq/AutoGGUFVisionModelTest.scala)
and the
[example notebook](https://github.com/JohnSnowLabs/spark-nlp/tree/master/examples/python/llama.cpp/llama.cpp_in_Spark_NLP_AutoGGUFVisionModel.ipynb).

**Note**: To use GPU inference with this annotator, make sure to use the Spark NLP GPU package and set
the number of GPU layers with the `setNGpuLayers` method.

When using larger models, we recommend adjusting GPU usage with `setNCtx` and `setNGpuLayers`
according to your hardware to avoid out-of-memory errors.
{%- endcapture -%}

{%- capture input_anno -%}
IMAGE, DOCUMENT
{%- endcapture -%}

{%- capture output_anno -%}
DOCUMENT
{%- endcapture -%}

{%- capture python_example -%}
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline
from pyspark.sql.functions import lit

documentAssembler = DocumentAssembler() \
.setInputCol("caption") \
.setOutputCol("caption_document")
imageAssembler = ImageAssembler() \
.setInputCol("image") \
.setOutputCol("image_assembler")

imagesPath = "src/test/resources/image/"
data = ImageAssembler \
.loadImagesAsBytes(spark, imagesPath) \
.withColumn("caption", lit("Caption this image.")) # Add a caption to each image.

nPredict = 40
model = AutoGGUFVisionModel.pretrained() \
.setInputCols(["caption_document", "image_assembler"]) \
.setOutputCol("completions") \
.setBatchSize(4) \
.setNGpuLayers(99) \
.setNCtx(4096) \
.setMinKeep(0) \
.setMinP(0.05) \
.setNPredict(nPredict) \
.setNProbs(0) \
.setPenalizeNl(False) \
.setRepeatLastN(256) \
.setRepeatPenalty(1.18) \
.setStopStrings(["</s>", "Llama:", "User:"]) \
.setTemperature(0.05) \
.setTfsZ(1) \
.setTypicalP(1) \
.setTopK(40) \
.setTopP(0.95)

pipeline = Pipeline().setStages([documentAssembler, imageAssembler, model])
pipeline.fit(data).transform(data) \
.selectExpr("reverse(split(image.origin, '/'))[0] as image_name", "completions.result") \
.show(truncate = False)
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|image_name |result |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|palace.JPEG |[ The image depicts a large, ornate room with high ceilings and beautifully decorated walls. There are several chairs placed throughout the space, some of which have cushions] |
|egyptian_cat.jpeg|[ The image features two cats lying on a pink surface, possibly a bed or sofa. One cat is positioned towards the left side of the scene and appears to be sleeping while holding] |
|hippopotamus.JPEG|[ A large brown hippo is swimming in a body of water, possibly an aquarium. The hippo appears to be enjoying its time in the water and seems relaxed as it floats] |
|hen.JPEG |[ The image features a large chicken standing next to several baby chickens. In total, there are five birds in the scene: one adult and four young ones. They appear to be gathered together] |
|ostrich.JPEG |[ The image features a large, long-necked bird standing in the grass. It appears to be an ostrich or similar species with its head held high and looking around. In addition to] |
|junco.JPEG |[ A small bird with a black head and white chest is standing on the snow. It appears to be looking at something, possibly food or another animal in its vicinity. The scene takes place out] |
|bluetick.jpg |[ A dog with a red collar is sitting on the floor, looking at something. The dog appears to be staring into the distance or focusing its attention on an object in front of it.] |
|chihuahua.jpg |[ A small brown dog wearing a sweater is sitting on the floor. The dog appears to be looking at something, possibly its owner or another animal in the room. It seems comfortable and relaxed]|
|tractor.JPEG |[ A man is sitting in the driver's seat of a green tractor, which has yellow wheels and tires. The tractor appears to be parked on top of an empty field with] |
|ox.JPEG |[ A large bull with horns is standing in a grassy field.] |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
{%- endcapture -%}

{%- capture scala_example -%}
import com.johnsnowlabs.nlp.ImageAssembler
import com.johnsnowlabs.nlp.annotator._
import com.johnsnowlabs.nlp.base._
import org.apache.spark.ml.Pipeline
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.lit

val documentAssembler = new DocumentAssembler()
.setInputCol("caption")
.setOutputCol("caption_document")

val imageAssembler = new ImageAssembler()
.setInputCol("image")
.setOutputCol("image_assembler")

val imagesPath = "src/test/resources/image/"
val data: DataFrame = ImageAssembler
.loadImagesAsBytes(ResourceHelper.spark, imagesPath)
.withColumn("caption", lit("Caption this image.")) // Add a caption to each image.

val nPredict = 40
val model = AutoGGUFVisionModel.pretrained()
.setInputCols("caption_document", "image_assembler")
.setOutputCol("completions")
.setBatchSize(4)
.setNGpuLayers(99)
.setNCtx(4096)
.setMinKeep(0)
.setMinP(0.05f)
.setNPredict(nPredict)
.setNProbs(0)
.setPenalizeNl(false)
.setRepeatLastN(256)
.setRepeatPenalty(1.18f)
.setStopStrings(Array("</s>", "Llama:", "User:"))
.setTemperature(0.05f)
.setTfsZ(1)
.setTypicalP(1)
.setTopK(40)
.setTopP(0.95f)

val pipeline = new Pipeline().setStages(Array(documentAssembler, imageAssembler, model))
pipeline
.fit(data)
.transform(data)
.selectExpr("reverse(split(image.origin, '/'))[0] as image_name", "completions.result")
.show(truncate = false)
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|image_name |result |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|palace.JPEG |[ The image depicts a large, ornate room with high ceilings and beautifully decorated walls. There are several chairs placed throughout the space, some of which have cushions] |
|egyptian_cat.jpeg|[ The image features two cats lying on a pink surface, possibly a bed or sofa. One cat is positioned towards the left side of the scene and appears to be sleeping while holding] |
|hippopotamus.JPEG|[ A large brown hippo is swimming in a body of water, possibly an aquarium. The hippo appears to be enjoying its time in the water and seems relaxed as it floats] |
|hen.JPEG |[ The image features a large chicken standing next to several baby chickens. In total, there are five birds in the scene: one adult and four young ones. They appear to be gathered together] |
|ostrich.JPEG |[ The image features a large, long-necked bird standing in the grass. It appears to be an ostrich or similar species with its head held high and looking around. In addition to] |
|junco.JPEG |[ A small bird with a black head and white chest is standing on the snow. It appears to be looking at something, possibly food or another animal in its vicinity. The scene takes place out] |
|bluetick.jpg |[ A dog with a red collar is sitting on the floor, looking at something. The dog appears to be staring into the distance or focusing its attention on an object in front of it.] |
|chihuahua.jpg |[ A small brown dog wearing a sweater is sitting on the floor. The dog appears to be looking at something, possibly its owner or another animal in the room. It seems comfortable and relaxed]|
|tractor.JPEG |[ A man is sitting in the driver's seat of a green tractor, which has yellow wheels and tires. The tractor appears to be parked on top of an empty field with] |
|ox.JPEG |[ A large bull with horns is standing in a grassy field.] |
+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
{%- endcapture -%}

{%- capture api_link -%}
[AutoGGUFVisionModel](/api/com/johnsnowlabs/nlp/annotators/seq2seq/AutoGGUFVisionModel)
{%- endcapture -%}

{%- capture python_api_link -%}
[AutoGGUFVisionModel](/api/python/reference/autosummary/sparknlp/annotator/seq2seq/auto_gguf_vision_model/index.html)
{%- endcapture -%}

{%- capture source_link -%}
[AutoGGUFVisionModel](https://github.com/JohnSnowLabs/spark-nlp/tree/master/src/main/scala/com/johnsnowlabs/nlp/annotators/seq2seq/AutoGGUFVisionModel.scala)
{%- endcapture -%}

{% include templates/anno_template.md
title=title
description=description
input_anno=input_anno
output_anno=output_anno
python_example=python_example
scala_example=scala_example
api_link=api_link
python_api_link=python_api_link
source_link=source_link
%}
1 change: 1 addition & 0 deletions docs/en/annotators.md
Original file line number Diff line number Diff line change
@@ -47,6 +47,7 @@ There are two types of Annotators:
|---|---|---|
{% include templates/anno_table_entry.md path="" name="AutoGGUFEmbeddings" summary="Annotator that uses the llama.cpp library to generate text embeddings with large language models."%}
{% include templates/anno_table_entry.md path="" name="AutoGGUFModel" summary="Annotator that uses the llama.cpp library to generate text completions with large language models."%}
{% include templates/anno_table_entry.md path="" name="AutoGGUFVisionModel" summary="Multimodal annotator that uses the llama.cpp library to generate text completions with large language models."%}
{% include templates/anno_table_entry.md path="" name="BGEEmbeddings" summary="Sentence embeddings using BGE."%}
{% include templates/anno_table_entry.md path="" name="BigTextMatcher" summary="Annotator to match exact phrases (by token) provided in a file against a Document."%}
{% include templates/anno_table_entry.md path="" name="Chunk2Doc" summary="Converts a `CHUNK` type column back into `DOCUMENT`. Useful when trying to re-tokenize or do further analysis on a `CHUNK` result."%}
Original file line number Diff line number Diff line change
@@ -264,8 +264,7 @@
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
Original file line number Diff line number Diff line change
@@ -31,7 +31,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {},
"outputs": [
{
@@ -320,7 +320,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {},
"outputs": [
{
@@ -335,7 +335,6 @@
"source": [
"from sparknlp.annotator import *\n",
"\n",
"# All these params should be identical to the original ONNX model\n",
"autoGGUFModel = (\n",
" AutoGGUFModel.loadSavedModel(EXPORT_PATH, spark)\n",
" .setInputCols(\"document\")\n",
@@ -355,7 +354,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -389,7 +388,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"metadata": {},
"outputs": [
{
@@ -415,7 +414,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": null,
"metadata": {},
"outputs": [
{
@@ -619,8 +618,7 @@
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
Loading
Oops, something went wrong.
Loading
Oops, something went wrong.