Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize pgvector test for semi-recent enhancements #319

Merged
merged 1 commit into from
May 15, 2024

Conversation

jkatz
Copy link
Contributor

@jkatz jkatz commented May 8, 2024

This commit adds several changes to the pgvector test to create a more representative test environment based on recent and older changes to pgvector. Notable changes include allowing for testing of parallel index buiding parameters, using loading with the recommended binary loading method, and other changes to better emulate what a typical user of pgvector would do.

This commit also has some general cleanups as well.

Co-authored-by: Mark Greenhalgh greenhal@users.noreply.github.com
Co-authored-by: Tyler House tahouse@users.noreply.github.com

@XuanYang-cn
Copy link
Collaborator

/assign @alwayslove2013
/assign

@jkatz
Copy link
Contributor Author

jkatz commented May 10, 2024

@XuanYang-cn @alwayslove2013 Please let us know if this PR requires additional work. There are some other changes we'd like to include for testing other configurations of pgvector, but we'd like to baseline it against the flat implementation first. Thanks!

from abc import abstractmethod
from typing import Any, Mapping, Optional, Sequence, TypedDict

from psycopg import sql
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend avoiding adding specific dependencies in the config.py. Users only install the corresponding toolkit when they are conducting tests.

However, in the scenario where the default results page is opened solely for result display, VDBBench will load the standardized result (json). Serializing this data requires the config.py file from all clients.

Currently, if a user hasn't installed psycopg, they won't be able to open the results page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed in ea29f47

@jkatz
Copy link
Contributor Author

jkatz commented May 14, 2024

@alwayslove2013 Thanks for the feedback! This is resolved in the latest push.

Overall, I would suggest moving to psycopg3 (psycopg) as it's now the maintained version of psycopg; however, that change could be made in a separate pull request.

@jkatz jkatz requested a review from alwayslove2013 May 14, 2024 03:04
@alwayslove2013
Copy link
Collaborator

@jkatz Thank you so much for your contribution! We greatly appreciate it and are thrilled to receive your pull request. We look forward to collaborating with you and driving the project forward together!

Overall, I would suggest moving to psycopg3 (psycopg) as it's now the maintained version of psycopg; however, that change could be made in a separate pull request.

This commit adds several changes to the pgvector test to create a
more representative test environment based on recent and older
changes to pgvector. Notable changes include allowing for testing
of parallel index buiding parameters, using loading with the recommended
binary loading method, and other changes to better emulate what a
typical user of pgvector would do.

This commit also has some general cleanups as well.

Co-authored-by: Mark Greenhalgh <greenhal@users.noreply.github.com>
Co-authored-by: Tyler House <tahouse@users.noreply.github.com>
@jkatz
Copy link
Contributor Author

jkatz commented May 14, 2024

@alwayslove2013 Likewise. I personally appreciate the approach VectorDBBench takes around testing concurrency, which resembles how users interact with databases.

I've pushed up the fix to the latest patch to handle the merge conflict that remained (which I'm still baffled how that got in, but I'll triple check next time).

@jkatz jkatz requested a review from alwayslove2013 May 14, 2024 14:56
Copy link
Collaborator

@alwayslove2013 alwayslove2013 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@sre-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: alwayslove2013, jkatz
To complete the pull request process, please assign xuanyang-cn after the PR has been reviewed.
You can assign the PR to them by writing /assign @xuanyang-cn in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@alwayslove2013 alwayslove2013 merged commit c4fc7c1 into zilliztech:main May 15, 2024
4 checks passed
@alwayslove2013
Copy link
Collaborator

alwayslove2013 commented May 15, 2024

@jkatz I would like to express my sincere gratitude for your support. Our primary goal has been to ensure that the test data reflects the performance characteristics of the real-world usage scenarios as accurately as possible.

... testing concurrency, which resembles how users interact with databases.

If you have any suggestions or innovative ideas with VDBBench, we would be more than happy to discuss them with you. Your valuable input is crucial for us to enhance the functionality and user experience of the tool.

@jkatz jkatz deleted the pgvector-updates branch May 15, 2024 03:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants