Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(embedding): default embedding creation to base64 #1312

Merged
merged 11 commits into from
Mar 28, 2025

Conversation

manekinekko
Copy link

Requesting base64 encoded embeddings returns smaller body sizes, on average ~60% smaller than float32 encoded. In other words, the size of the response body containing embeddings in float32 is ~2.3x bigger than base64 encoded embedding.

Closes #1310

  • I understand that this repository is auto-generated and my pull request may not be merged

Changes being requested

We always request embedding creating encoded as base64, and then decoded them to float32 based on the user's provided encoding_format parameter.

Additional context & links

After running a few benchmarks, requesting base64 encoded embeddings returns smaller body sizes, on average ~60% smaller than float32 encoded. In other words, the size of the response body containing embeddings in float32 is ~2.3x bigger than base64 encoded embedding.

This performance improvement could translate to:

  • ✅ Faster HTTP responses
  • ✅ Less bandwidth used when generating multiple embeddings

This is the result of a request that creates embedding from a 10kb chunk, run 10 times (the number are the size of response body in kb):

Benchmark Min (ms) Max (ms) Mean (ms) Min (+) Max (+) Mean (+)
float32 vs base64 41.742 19616.000 9848.819 40.094 (3.9%) 8351.000 (57.4%) 4206.126 (57.3%)

Read more #1310

@manekinekko manekinekko requested a review from a team as a code owner February 8, 2025 17:09
@manekinekko manekinekko force-pushed the perf/wassim-chegham-issue-1310 branch from 7702d54 to 270861b Compare February 8, 2025 17:09
Copy link
Collaborator

@RobertCraigie RobertCraigie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@manekinekko manekinekko force-pushed the perf/wassim-chegham-issue-1310 branch from 83fbd84 to 185dbe5 Compare February 24, 2025 21:10
@manekinekko manekinekko force-pushed the perf/wassim-chegham-issue-1310 branch 2 times, most recently from e34c241 to fd14cdf Compare March 6, 2025 15:26
@IDisposable
Copy link

This is a great idea! Who doesn't want 1/4 the network bandwidth?

@manekinekko manekinekko force-pushed the perf/wassim-chegham-issue-1310 branch from fd14cdf to 362f02f Compare March 12, 2025 09:31
Copy link
Collaborator

@RobertCraigie RobertCraigie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed review, this looks good! Some minor comments and you left a test.only change in.

Will merge once comments have been addressed.

@manekinekko
Copy link
Author

Sorry for the delayed review, this looks good! Some minor comments and you left a test.only change in.

Will merge once comments have been addressed.

@RobertCraigie no worries about the delay. Thank you for reviews. I addressed your suggestions.

@manekinekko manekinekko force-pushed the perf/wassim-chegham-issue-1310 branch 2 times, most recently from c50fa5f to 84180db Compare March 25, 2025 09:02
manekinekko and others added 10 commits March 27, 2025 22:04
Requesting base64 encoded embeddings returns smaller body sizes, on average ~60% smaller than float32 encoded. In other words, the size of the response body containing embeddings in float32 is ~2.3x bigger than base64 encoded embedding.

We always request embedding creating encoded as base64, and then decoded them to float32 based on the user's provided encoding_format parameter.

Closes openai#1310
Co-authored-by: Robert Craigie <robert@craigie.dev>
Co-authored-by: Robert Craigie <robert@craigie.dev>
Co-authored-by: Robert Craigie <robert@craigie.dev>
Co-authored-by: Robert Craigie <robert@craigie.dev>
Co-authored-by: Robert Craigie <robert@craigie.dev>
@manekinekko manekinekko force-pushed the perf/wassim-chegham-issue-1310 branch from 84180db to d2bc20b Compare March 27, 2025 21:04
@manekinekko
Copy link
Author

@RobertCraigie PR is now ready for review. Thank you.

Copy link
Collaborator

@RobertCraigie RobertCraigie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Sorry again for the delay.

I pushed a commit removing some debug logs as our debug logging system is not particularly great right now so they'd be too verbose IMO. (logging will be fixed in the next major version)

export const toFloat32Array = (base64Str: string): Array<number> => {
if (typeof Buffer !== 'undefined') {
// for Node.js environment
return Array.from(new Float32Array(Buffer.from(base64Str, 'base64').buffer));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious if you've benchmarked how much of a difference just returning the Float32Array directly would have?

if it's a big difference we should probably have an opt-in flag to just do that. (doesn't block this PR)

@RobertCraigie RobertCraigie changed the base branch from master to next March 28, 2025 20:45
@RobertCraigie RobertCraigie changed the title perf(embedding): always request embedding creation as base64 perf(embedding): default embedding creation as base64 Mar 28, 2025
@RobertCraigie RobertCraigie changed the title perf(embedding): default embedding creation as base64 perf(embedding): default embedding creation to base64 Mar 28, 2025
@RobertCraigie RobertCraigie merged commit ce2157b into openai:next Mar 28, 2025
4 of 5 checks passed
@stainless-app stainless-app bot mentioned this pull request Mar 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Perf: Improve vector embeddings creation by 60%
3 participants