-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(embedding): default embedding creation to base64 #1312
perf(embedding): default embedding creation to base64 #1312
Conversation
7702d54
to
270861b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
83fbd84
to
185dbe5
Compare
e34c241
to
fd14cdf
Compare
This is a great idea! Who doesn't want 1/4 the network bandwidth? |
fd14cdf
to
362f02f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed review, this looks good! Some minor comments and you left a test.only
change in.
Will merge once comments have been addressed.
@RobertCraigie no worries about the delay. Thank you for reviews. I addressed your suggestions. |
c50fa5f
to
84180db
Compare
Requesting base64 encoded embeddings returns smaller body sizes, on average ~60% smaller than float32 encoded. In other words, the size of the response body containing embeddings in float32 is ~2.3x bigger than base64 encoded embedding. We always request embedding creating encoded as base64, and then decoded them to float32 based on the user's provided encoding_format parameter. Closes openai#1310
Co-authored-by: Robert Craigie <robert@craigie.dev>
Co-authored-by: Robert Craigie <robert@craigie.dev>
Co-authored-by: Robert Craigie <robert@craigie.dev>
Co-authored-by: Robert Craigie <robert@craigie.dev>
Co-authored-by: Robert Craigie <robert@craigie.dev>
84180db
to
d2bc20b
Compare
@RobertCraigie PR is now ready for review. Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Sorry again for the delay.
I pushed a commit removing some debug logs as our debug logging system is not particularly great right now so they'd be too verbose IMO. (logging will be fixed in the next major version)
export const toFloat32Array = (base64Str: string): Array<number> => { | ||
if (typeof Buffer !== 'undefined') { | ||
// for Node.js environment | ||
return Array.from(new Float32Array(Buffer.from(base64Str, 'base64').buffer)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious if you've benchmarked how much of a difference just returning the Float32Array
directly would have?
if it's a big difference we should probably have an opt-in flag to just do that. (doesn't block this PR)
Requesting base64 encoded embeddings returns smaller body sizes, on average ~60% smaller than float32 encoded. In other words, the size of the response body containing embeddings in float32 is ~2.3x bigger than base64 encoded embedding.
Closes #1310
Changes being requested
We always request embedding creating encoded as base64, and then decoded them to float32 based on the user's provided encoding_format parameter.
Additional context & links
After running a few benchmarks, requesting base64 encoded embeddings returns smaller body sizes, on average ~60% smaller than float32 encoded. In other words, the size of the response body containing embeddings in float32 is ~2.3x bigger than base64 encoded embedding.
This performance improvement could translate to:
This is the result of a request that creates embedding from a 10kb chunk, run 10 times (the number are the size of response body in kb):
Read more #1310