Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chroma API change for 0.4.0 version #488

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

jeffchuber
Copy link

** This should land Monday the 17th **

Chroma is upgrading from 0.3.29 to 0.4.0. 0.4.0 is easier to build, more durable, faster, smaller, and more extensible. This comes with a few changes:

  1. A simplified and improved client setup. Instead of having to remember weird settings, users can just do EphemeralClient, PersistentClient or HttpClient (the underlying direct Client implementation is also still accessible)

  2. We migrated data stores away from duckdb and clickhouse. This changes the api for the PersistentClient that used to reference chroma_db_impl="duckdb+parquet". Now we simply set is_persistent=true. is_persistent is set for you to true if you use PersistentClient.

  3. Because we migrated away from duckdb and clickhouse - this also means that users need to migrate their data into the new layout and schema. Chroma is committed to providing extension notification and tooling around any schema and data migrations (for example - this PR!).

After upgrading to 0.4.0 - if users try to access their data that was stored in the previous regime, the system will throw an Exception and instruct them how to use the migration assistant to migrate their data. The migration assitant is a pip installable CLI: pip install chroma_migrate. And is runnable by calling chroma_migrate

Please reference the readme at chroma-core/chroma-migrate to see a full write-up of our philosophy on migrations as well as more details about this particular migration.

Please direct any users facing issues upgrading to our Discord channel called #get-help. We have also created a email listserv to notify developers directly in the future about breaking changes.

TODO

  • Migrated any duckdb+parquet strings to the new format
  • Notified users about the breaking change (this PR, other suggestions?)

@sre-ci-robot
Copy link
Collaborator

Welcome @jeffchuber! It looks like this is your first PR to zilliztech/GPTCache 🎉

@mergify mergify bot added the needs-dco label Jul 17, 2023
@jeffchuber jeffchuber marked this pull request as ready for review July 18, 2023 00:55
@SimFG
Copy link
Collaborator

SimFG commented Jul 18, 2023

please make the dev branch as the target branch

@jeffchuber jeffchuber changed the base branch from main to dev July 18, 2023 02:30
@jeffchuber
Copy link
Author

@SimFG done!

@SimFG
Copy link
Collaborator

SimFG commented Jul 18, 2023

@jeffchuber If I use the Chroma 0.3.29 and run the latest code, there will be a error. right?

@jeffchuber
Copy link
Author

@SimFG that is correct - this new API change only supports 0.4.0 and above.

@SimFG SimFG changed the title API change for 0.4.0 Chroma API change for 0.4.0 version Jul 18, 2023
@SimFG
Copy link
Collaborator

SimFG commented Jul 18, 2023

@jeffchuber please give a look for the failed unit test

@jeffchuber
Copy link
Author

@SimFG looks like sqlite needs to be updated - chroma-core/chroma#836

are you all open to making this change?

@SimFG
Copy link
Collaborator

SimFG commented Jul 19, 2023

@jeffchuber I have a idea.
Is it possible to allow users to choose through parameters, that is to say, keep the previous code by default. If you want to use chrome 0.4.0, you can add additional parameters to use.

def __init__(
        self,
        client_settings=None,
        persist_directory=None,
        collection_name: str = "gptcache",
        top_k: int = 1,
        use_new_version: bool = False,
    ):
        self.top_k = top_k
        if client_settings:
            self._client_settings = client_settings
        else:
            self._client_settings = chromadb.config.Settings()
            if persist_directory is not None:
                if use_new_version:
                    self._client_settings = chromadb.config.Settings(
                        is_persistent=True, persist_directory=persist_directory
                    )
                else:
                    self._client_settings = chromadb.config.Settings(
                        chroma_db_impl="duckdb+parquet", persist_directory=persist_directory
                    )
        self._client = chromadb. Client(self._client_settings)
        self._persist_directory = persist_directory

This can minimize the impact on users. When users want to pursue a better experience, they can manually pass a parameter.

@jeffchuber
Copy link
Author

@SimFG we could so something like this user proposed (and was merged) for langchain - langchain-ai/langchain#7891?

@SimFG
Copy link
Collaborator

SimFG commented Jul 19, 2023

@jeffchuber yes you can try to do it!

@sre-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jeffchuber
To complete the pull request process, please assign cxie after the PR has been reviewed.
You can assign the PR to them by writing /assign @cxie in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jeffchuber
Copy link
Author

@SimFG added backwards compatibility, can you retrigger the tests?

@SimFG
Copy link
Collaborator

SimFG commented Jul 21, 2023

@jeffchuber
Now the error is that the sqlite version is too low. Look at the solution, if it is below python 3.10, you need to manually install a higher version of sqlite and replace it. I think this is very unfriendly to users.

@jeffchuber
Copy link
Author

As far as I can tell - this is a different base OS issue. We use python:3.10-slim-bookworm to back our Docker images that run tests, I'm not sure if GPTCache uses python:3.8-slim-bullseye or ubuntu-20.04 or other?

@SimFG
Copy link
Collaborator

SimFG commented Jul 24, 2023

@jeffchuber
You can solve this problem by merging the latest dev branch. If the user uses chromadb, the lower version 0.3.26 will be installed by default, because I need to ensure the availability of GPTCache. If the user wants to use the new features of a higher version of chromadb, I believe he should also understand this part of the incompatibility problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants