Skip to content

Fix NPE for media-only Document in QdrantVectorStore (GH-3609) #3653

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

devMtn30
Copy link

Problem

Storing a Document that contains only media currently results in a NPE because
ValueFactory.value(null) is invoked.

Solution

  • Check document.isText().
  • If false, throw an IllegalArgumentException with a clear message.

Issue

Closes #3609

@ilayaperumalg ilayaperumalg self-assigned this Jun 24, 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!document.isText()) {
    throw new IllegalArgumentException("""
            QdrantVectorStore supports only text-based Document for now –
            received media-only Document; see issue #3609.
            """);
}

var payload = QdrantValueFactory.toValueMap(document.getMetadata());
payload.put(CONTENT_FIELD_NAME, io.qdrant.client.ValueFactory.value(document.getText()));
return payload;

How about this code?

It can prevent unnecessary execution of QdrantValueFactory.toValueMap.
And it's more intuitive.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!document.isText()) {
    throw new IllegalArgumentException("""
            QdrantVectorStore supports only text-based Document for now –
            received media-only Document; see issue #3609.
            """);
}

var payload = QdrantValueFactory.toValueMap(document.getMetadata());
payload.put(CONTENT_FIELD_NAME, io.qdrant.client.ValueFactory.value(document.getText()));
return payload;

How about this code?

It can prevent unnecessary execution of QdrantValueFactory.toValueMap. And it's more intuitive.

Thanks for the suggestion-updated!

@devMtn30 devMtn30 force-pushed the gh-3609-media-doc-fail-fast branch from cd42117 to e7d8a51 Compare June 25, 2025 01:52
@@ -296,8 +296,16 @@ private Document toDocument(ScoredPoint point) {
*/
private Map<String, Value> toPayload(Document document) {
try {
String documentContent = document.getText();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if there’s a specific reason for declaring a separate variable named documentContent.
If you believe it’s necessary, feel free to keep it. However, in my opinion, it might not be needed.

Additionally, it might be better to move it either before or after the declaration of payload (line 307).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the extra variable, inlined document.getText(), and force pushed. Thanks for the spot!

…torStore

* Throw IllegalArgumentException when Document.isText() is false

Signed-off-by: san.kim <san.kim@G125047-00.local>
@devMtn30 devMtn30 force-pushed the gh-3609-media-doc-fail-fast branch from e7d8a51 to 7ebbd32 Compare June 25, 2025 02:16
@ilayaperumalg
Copy link
Member

@devMtn30 Thank you for the PR. I have submitted a PR which addresses this issue across the vector stores: #3687. Could you review the PR please?

@ilayaperumalg
Copy link
Member

Closing this PR in favour of #3687

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QdrantVectorStore throws NPE when Document contains only media
4 participants