[Bug]: Spark Milvus Connector DataType.ARRAY #15

juanandreas · 2024-05-09T13:53:47Z

Is there an existing issue for this?

I have searched the existing issues

Describe the bug

When I try to write data into a collection with a predefined schema, my spark job write aborts with:

scala.MatchError: Array (of class io.milvus.grpc.DataType)

When I try to write with spark-milvus connector without predefining a schema:

java.lang.Exception: Unsupported data type array

Expected Behavior

Pymilvus should be able to recognize array data types? Predefining schema should also work? Is this a bug in spark to milvus connector?

Steps/Code To Reproduce behavior

fields = [
  FieldSchema(name="id", is_primary=True, dtype=DataType.VARCHAR, max_length=100),

  FieldSchema(name="countries", dtype=DataType.ARRAY, element_type=DataType.VARCHAR, max_length=100, max_capacity=100),

  # FieldSchema(name="vector_field", dtype=DataType.FLOAT_VECTOR, dim=705),
  FieldSchema(name="vector_field", dtype=DataType.SPARSE_FLOAT_VECTOR),

]
schema = CollectionSchema(
  fields,
  description="collection",
  enable_dynamic_field=True
)

collection = Collection(COLLECTION_NAME, schema)

df.write \
  .mode("append") \
  .option("milvus.host", MILVUS_HOST) \
  .option("milvus.port", MILVUS_PORT) \
  .option("milvus.collection.name", COLLECTION_NAME) \
  .option("milvus.collection.vectorField", "vector_field") \
  .option("milvus.collection.vectorDim", "705") \
  .option("milvus.collection.primaryKeyField", "id") \
  .option("milvus.database.name", "default") \
  .format("milvus") \
  .save()

Environment details

- Hardware/Softward conditions (OS, CPU, GPU, Memory):
- Method of installation (Docker, or from source):
- Milvus version (v0.3.1, or v0.4.0): 2.4.1
- Milvus configuration (Settings you made in `server_config.yaml`):

Anything else?

No response

The text was updated successfully, but these errors were encountered:

CauchyLion · 2024-06-07T09:30:31Z

这个问题你有解决嘛？

wayblink · 2024-06-07T10:30:34Z

Some new datatype is not supported yet. Hopes someone can take this issue, contribution is welcomed. If no one take this issue, we may have time to fix it in next months

xiaofan-luan · 2024-06-09T16:05:26Z

array and sparse vector need to be supported.

wayblink · 2024-06-21T08:39:49Z

Advanced data(including json, array, sparse_vector) support is in progress.

juanandreas · 2024-07-03T05:09:56Z

Is there an update on this?

juanandreas mentioned this issue May 9, 2024

[Bug]: Spark Milvus Connector DataType.ARRAY milvus-io/pymilvus#2078

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Spark Milvus Connector DataType.ARRAY #15

[Bug]: Spark Milvus Connector DataType.ARRAY #15

juanandreas commented May 9, 2024

CauchyLion commented Jun 7, 2024

wayblink commented Jun 7, 2024

xiaofan-luan commented Jun 9, 2024

wayblink commented Jun 21, 2024

juanandreas commented Jul 3, 2024

[Bug]: Spark Milvus Connector DataType.ARRAY #15

[Bug]: Spark Milvus Connector DataType.ARRAY #15

Comments

juanandreas commented May 9, 2024

Is there an existing issue for this?

Describe the bug

Expected Behavior

Steps/Code To Reproduce behavior

Environment details

Anything else?

CauchyLion commented Jun 7, 2024

wayblink commented Jun 7, 2024

xiaofan-luan commented Jun 9, 2024

wayblink commented Jun 21, 2024

juanandreas commented Jul 3, 2024