Skip to content

[Bug]: Elasticsearch Vector Store lazy initialization and clear() issues #19218

Open
@strawgate

Description

@strawgate

Bug Description

The ES Vector Store does lazy initialization of the backing index.

As the vector store is async capable, there can be multiple concurrent requests to the vector store.

Currently create_index_if_not_exists is used to create the index if it doesn't exist.

If two requests are made to add nodes and the index has not been initialized (or has recently been cleared), there is a race condition where both requests will branch off to create_index_if_not_exists, as this operation is several steps, both will run through all of the steps and the slower one will fail at the create index step:

        await self.client.indices.create(
            index=self.index, mappings=mappings, settings=settings
        )

As the index already exists.

We should either make sure that the environment setup is idempotent or synchronous so that requests to add nodes to a new ES vector store, or a recently cleared ES vector store, do not result in failures that drop documents.

Version

12.43

Steps to Reproduce

Create an ES vector store and run two parallel requests to add nodes.

Relevant Logs/Tracbacks

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageIssue needs to be triaged/prioritized

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions