Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Spark Milvus Connector DataType.ARRAY #15

Open
1 task done
juanandreas opened this issue May 9, 2024 · 5 comments
Open
1 task done

[Bug]: Spark Milvus Connector DataType.ARRAY #15

juanandreas opened this issue May 9, 2024 · 5 comments

Comments

@juanandreas
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

When I try to write data into a collection with a predefined schema, my spark job write aborts with:

scala.MatchError: Array (of class io.milvus.grpc.DataType)

When I try to write with spark-milvus connector without predefining a schema:

java.lang.Exception: Unsupported data type array

Expected Behavior

Pymilvus should be able to recognize array data types? Predefining schema should also work? Is this a bug in spark to milvus connector?

Steps/Code To Reproduce behavior

fields = [
  FieldSchema(name="id", is_primary=True, dtype=DataType.VARCHAR, max_length=100),

  FieldSchema(name="countries", dtype=DataType.ARRAY, element_type=DataType.VARCHAR, max_length=100, max_capacity=100),

  # FieldSchema(name="vector_field", dtype=DataType.FLOAT_VECTOR, dim=705),
  FieldSchema(name="vector_field", dtype=DataType.SPARSE_FLOAT_VECTOR),

]
schema = CollectionSchema(
  fields,
  description="collection",
  enable_dynamic_field=True
)

collection = Collection(COLLECTION_NAME, schema)

df.write \
  .mode("append") \
  .option("milvus.host", MILVUS_HOST) \
  .option("milvus.port", MILVUS_PORT) \
  .option("milvus.collection.name", COLLECTION_NAME) \
  .option("milvus.collection.vectorField", "vector_field") \
  .option("milvus.collection.vectorDim", "705") \
  .option("milvus.collection.primaryKeyField", "id") \
  .option("milvus.database.name", "default") \
  .format("milvus") \
  .save()

Environment details

- Hardware/Softward conditions (OS, CPU, GPU, Memory):
- Method of installation (Docker, or from source):
- Milvus version (v0.3.1, or v0.4.0): 2.4.1
- Milvus configuration (Settings you made in `server_config.yaml`):

Anything else?

No response

@CauchyLion
Copy link

这个问题你有解决嘛?

@wayblink
Copy link
Collaborator

wayblink commented Jun 7, 2024

Some new datatype is not supported yet. Hopes someone can take this issue, contribution is welcomed. If no one take this issue, we may have time to fix it in next months

@xiaofan-luan
Copy link

array and sparse vector need to be supported.

@wayblink
Copy link
Collaborator

Advanced data(including json, array, sparse_vector) support is in progress.

@juanandreas
Copy link
Author

Is there an update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants