Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(weaviate): support v4.6.3 #1134

Merged
merged 12 commits into from
May 28, 2024
Merged

fix(weaviate): support v4.6.3 #1134

merged 12 commits into from
May 28, 2024

Conversation

tibor-reiss
Copy link
Contributor

@tibor-reiss tibor-reiss commented May 24, 2024

Related to #1123

  • Bump weaviate from 3.26.2 to 4.6.3
  • Update instrumentation to be compatible with v4
  • Update the examples
  • Keep v3-specific code so that 3.26.2 still works

@nirga nirga changed the title chore(deps-dev): bump weaviate-client from 3.26.2 to 4.6.3 in /packages/opentelemetry-instrumentation-weaviate fix(weaviate): support v4.6.3 May 24, 2024
@nirga
Copy link
Member

nirga commented May 24, 2024

Thanks @tibor-reiss, did you try it to make sure it works with the new version?

@tibor-reiss
Copy link
Contributor Author

tibor-reiss commented May 24, 2024

Hi @nirga, yes, 9 passed, 1 skipped.
Took me a while to figure out that it was just a bump in the version in __init__ - but this was a good learning curve :)

Update: are there some other tests which need to be done apart from the unit tests?

@nirga
Copy link
Member

nirga commented May 24, 2024

@tibor-reiss can you run one of the sample apps that uses weaviate and make sure that you see traces, and add a screenshot here? I vaguely remember that there was a reason why @paolorechia who wrote this instrumentation didn't enable it for v4

@tibor-reiss
Copy link
Contributor Author

tibor-reiss commented May 24, 2024

@nirga Took me some time to set up a fully functioning environment locally - consequently, I have updated the sample apps to accommodate for local work. I adjusted v4 so it resembles v3.

Could you please check that it works with openai as well? The free tier was removed, so I tested with cohere.

It seems weaviate_v3.py still runs with weaviate-v4, but it will be deprecated at some point.

Screenshot attached from traceloop.
ScreenshotTraceloop

@nirga
Copy link
Member

nirga commented May 24, 2024

@tibor-reiss actually no :/ the GET / DELETE spans you're seeing are instrumented by otel's URLLIB3 that we're installing :/ Looks like the weaviate spans aren't there. You can easily see it by disabling all instrumentations except for weaviate, like this:

from traceloop.sdk.instruments import Instruments

Traceloop.init(instruments={Instruments.WEAVIATE})

From that, I'm guessing that if we update the test files to use the v4 syntax they will also start failing

@tibor-reiss
Copy link
Contributor Author

tibor-reiss commented May 26, 2024

Thanks for your patience @nirga! Below the updated screenshot. The weaviate api changed quite heavily. Combining their official documentation with the already present instrumentation, now there are references to "private" classes/methods. It works, but it means that this will probably change in the future. Let me know what you think.
Additionally, there might be other interesting methods to instrument - let me know.

I bumped the version just for this instrumentation to 0.20.0 since it's quite a big change.

Could you please check that the examples (weaviate_v4.py and weaviate_v3.py) work with the openai "backend" as well?

image

@nirga
Copy link
Member

nirga commented May 26, 2024

Thanks @tibor-reiss! Don't update version numbers, they get bumped automatically when we release 😃

So now it works both with old and new versions of weaviate? Or did we drop support for old versions?

@tibor-reiss
Copy link
Contributor Author

tibor-reiss commented May 26, 2024

@nirga Oh, sorry, I did not think that you would like to keep support for v3 - installing 0.19.0 with weaviate==3.26.2 would have still worked.

Anyway, I have put it back - with minor changes:

  • _GraphQLInstrumentor is the only overlap
  • marked the v3 so hopefully it can be easily deleted in the future
  • moved the previous tests to *_v3.py

So now both versions work - tested both the weaviate_v3.py and weaviate_v4.py.

Could you please add the cassettes for test_weaviate_instrumentation? I have generated them but they are with localhost since I am running weaviate via docker.

@nirga
Copy link
Member

nirga commented May 26, 2024

@tibor-reiss the recording of the cassettes don't work for you?

@tibor-reiss
Copy link
Contributor Author

Good morning @nirga,

I can generate them - it just has localhost instead of traceloop :)

However, I have noticed that one test fails in test_weaviate_instrumentation.py with vcr enabled. In the old tests (now renamed to test_weaviate_instrumentation_v3.py), there was test_weaviate_create_batch - marked with "Flaky test" - maybe this was due to similar reason?

Example from test_weaviate_instrumentation.py: test_weaviate_query_fetch_object_by_id

  • Store the data from query_fetch_object_by_id() in a variable: data = query_fetch_object_by_id(client, uuid_value)
  • Add assert data.properties.get("author") == "Robert" -> this fails

It seems to me that vcr does not fetch all details, e.g. total_count (from test_weaviate_query_fetch_objects) is also not present in the response. So the test_weaviate_query_fetch_objects test is successful because the database is still running. Given that the db is still required to pass the tests, i.e. there are inserts/deletes happening, does vcr actually speed up them? Additionally, now that weaviate-v4 has a local option (e.g. with docker container - how I did the testing), is vcr needed? Let me know your thoughts please.

@nirga
Copy link
Member

nirga commented May 27, 2024

Hey @tibor-reis! What VCR does is just recording and replaying of HTTP requests, so as long as the database is sending back responses this should work. I can try and look into this as well today.

During CI, I think spinning off a local weaviate might be more cumbersome than recording and replaying HTTP responses but if you think it's easier and stable enough - let's do it. Can you assist in fixing the CI yaml though? (it's under the .github folder)

@tibor-reiss
Copy link
Contributor Author

tibor-reiss commented May 27, 2024

The reason for some test failures is that grpc is built into weaviate-v4, but this is not supported in vcrpy. As a solution for these tests, I have removed the vcr markers, and added a command line flag (with_grpc), i.e. the tests which use grpc (e.g. fetching) need a running weaviate instance.

Copy link
Member

@nirga nirga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tibor-reiss! This looks overall good, 2 small comments:

  1. Can you fix the lint issues?
  2. Can you remove the cassettes that are no longer used?

@@ -1,40 +1,37 @@
# Tested with weaviate-client==3.26.0
# Weaviate instrumentation with opentelemetry-instrumentation-weaviate==0.19.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be removed, right? cause we support both

Copy link
Contributor Author

@tibor-reiss tibor-reiss May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any cassettes left which are not used? I removed both "fetch" cassettes, the rest is needed afais.

@nirga nirga merged commit 9103977 into traceloop:main May 28, 2024
8 checks passed
@tibor-reiss tibor-reiss deleted the bump-weaviate branch July 5, 2024 04:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants