Fix: S3 GW list multipart uploads ordering by N-o-Z · Pull Request #9847 · treeverse/lakeFS

N-o-Z · 2025-12-19T21:26:50Z

Closes #9554

This PR should not be merged before we decide on migration strategy (see below)

Change Description

Bug Fix

Added a new method to multipart tracker - List
Refactor multipart upload keys - multiparts are now part of the repo partition and the key is a combination of path+uploadID
multipart upload listing now comes from the KV store and not from the object store
Implemented new upload ID iterator
Fixed additional bugs that were found along the way

Testing Details

Added new unit and integration tests

Breaking Change?

Yes

Migration Strategy:

This PR introduces a breaking change due to the change in the upload ID key in our database.
This means that users who upgrade to this version and have ongoing MPUs will basically lose them.
This becomes even more of an issue since in our current implementation we don't save upload IDs in the context of a repository and the UplaodID data does not save the repository information - and therefore we cannot go through the route of standard migration

Below is a proposal on how to deal with existing MPUs:

Create a migration flow that basically aborts any ongoing MPUs and cleanup any remaining keys in the old partition.

Users that try to upgrade without performing migration will fail to load the server. They will either need to complete / abort any outstanding MPUs or run the migration which will de-facto abort the outstanding MPUs
This flow will be valid only for version latest++
For any version > latest++, upgrade will be blocked in case there are outstanding MPUs
This behavior should be well document as well as highlighted in the release notes / changelog

arielshaqed · 2025-12-21T09:30:26Z

users who upgrade to this version and have ongoing MPUs will basically lose them

It's probably a bit worse: users who start an MPU during the rollout may also lose that MPU. I think we may need some product direction here. There is another option - to release a version supporting both modes, upgrade to that version, then after a week release a version dropping the old mode. That one is of course expensive, hence we should ask product.

N-o-Z · 2025-12-21T16:38:15Z

users who upgrade to this version and have ongoing MPUs will basically lose them

It's probably a bit worse: users who start an MPU during the rollout may also lose that MPU. I think we may need some product direction here. There is another option - to release a version supporting both modes, upgrade to that version, then after a week release a version dropping the old mode. That one is of course expensive, hence we should ask product.

I don't think that's necessary since once we declare the migration path we can require users to not perform any MPUs during the upgrade. The choice whether to complete outstanding MPUs before upgrade or let the migration process abort them - all of that responsibility will be rolled down to the user

nopcoder · 2025-12-28T06:09:34Z

esti/s3_gateway_test.go

+	t.Cleanup(func() {
+		_, _ = s3Client.AbortMultipartUpload(ctx, &s3.AbortMultipartUploadInput{
+			Bucket:   aws.String(repo),
+			Key:      aws.String(key),
+			UploadId: resp.UploadId,
+		})
+	})


Worried that this will do more harm than good.
Is there a way to list all the uploads on and delete/abort the ones that relevant for the test before the test starts? it will enable us to identify real failure during the test and the cleanup/setup that we perform before the test.

nopcoder · 2025-12-28T06:18:47Z

esti/s3_gateway_test.go

+	// IsTruncated should be nil or false when not truncated
+	if outputExact.IsTruncated != nil {
+		require.False(t, *outputExact.IsTruncated, "should not be truncated when request is completely fulfilled")
+	}


Relevant to multiple places in the test code.
In this case we require nil or or pointer to false - we can require using apiutil.Value or require.True(outputExact.IsTruncated == nil || !*outputExact.IsTruncated).

Not a blocker.

nopcoder · 2025-12-28T06:22:48Z

pkg/gateway/multipart/iterator.go

+// UploadIterator is an iterator over multipart uploads sorted by Path, then UploadID
+type UploadIterator interface {
+	// Next advances the iterator to the next upload
+	// Returns true if there is a next upload, false otherwise
+	Next() bool
+	// Value returns the current upload
+	// Should only be called after Next returns true
+	Value() *Upload
+	// Err returns any error encountered during iteration
+	Err() error
+	// Close releases resources associated with the iterator
+	Close()
+	// SeekGE seeks to the first upload with key >= uploadIDKey(path, uploadID)
+	// After calling SeekGE, Next() must be called to access the first element at or after the seek position
+	SeekGE(key, uploadID string)
+}


The interface is part of the tracker interface and code here should define the UploadIterator implementation and return a pointer to the actual struct.

Not sure I understand the comment - please clarify

Moving UploadIterator interface to pkg/gateway/multipart/tracker.go where it is used.
The newUploadIterator func should return the implementation type - *kvUploadIterator.

nopcoder · 2025-12-28T06:29:39Z

pkg/gateway/operations/deleteobject.go

+		if errors.Is(err, kv.ErrNotFound) {
+			_ = o.EncodeError(w, req, err, gatewayerrors.Codes.ToAPIErr(gatewayerrors.ErrNoSuchBucket))
+		}
+		_ = o.EncodeError(w, req, err, gatewayerrors.Codes.ToAPIErr(gatewayerrors.ErrInternalError))


missing return or else

nopcoder · 2025-12-28T09:41:49Z

pkg/gateway/multipart/iterator.go

+func (it *kvUploadIterator) Err() error {
+	if it.err != nil {
+		return it.err
+	}
+	if !it.closed {
+		return it.kvIter.Err()
+	}
+	return nil
+}
+
+func (it *kvUploadIterator) Close() {
+	if it.closed {
+		return
+	}
+	it.kvIter.Close()
+	it.closed = true
+}


From this code it seems that we don't need to manage the 'closed' state - we just delegate the state to the underlaying iterator. The closed indicator can be the kvIter itself if needed.

nopcoder · 2025-12-28T12:37:00Z

pkg/gateway/operations/listobjects.go

+		if errors.Is(err, kv.ErrNotFound) {
+			_ = o.EncodeError(w, req, err, gatewayerrors.Codes.ToAPIErr(gatewayerrors.ErrNoSuchBucket))
+		}
+		_ = o.EncodeError(w, req, err, gatewayerrors.Codes.ToAPIErr(gatewayerrors.ErrInternalError))


return or else

nopcoder · 2025-12-28T12:44:36Z

pkg/gateway/operations/listobjects.go

+	// Check if there are more uploads (for IsTruncated flag)
+	// If we exited the loop due to length limit, iter.Next() hasn't been called for the next item yet
+	isTruncated := iter.Next()
+
+	// Set pagination markers for next page
+	var nextKeyMarker, nextUploadIDMarker string
+	if isTruncated && len(uploads) > 0 {
+		last := uploads[len(uploads)-1]
+		nextKeyMarker = last.Key
+		nextUploadIDMarker = last.UploadID
+	}


to prevent endless loop - in case isTruncated is true but len(uploads) is zero, we should set isTruncated to false

It shouldn't happen but added anyway

nopcoder · 2025-12-28T12:45:04Z

pkg/gateway/operations/postobject.go

+		if errors.Is(err, kv.ErrNotFound) {
+			_ = o.EncodeError(w, req, err, gatewayerrors.Codes.ToAPIErr(gatewayerrors.ErrNoSuchBucket))
+		}
+		_ = o.EncodeError(w, req, err, gatewayerrors.Codes.ToAPIErr(gatewayerrors.ErrInternalError))


return or else

nopcoder · 2025-12-28T12:45:25Z

pkg/gateway/operations/postobject.go

+		if errors.Is(err, kv.ErrNotFound) {
+			_ = o.EncodeError(w, req, err, gatewayerrors.Codes.ToAPIErr(gatewayerrors.ErrNoSuchBucket))
+		}
+		_ = o.EncodeError(w, req, err, gatewayerrors.Codes.ToAPIErr(gatewayerrors.ErrInternalError))


return or else

nopcoder · 2025-12-28T12:45:50Z

pkg/gateway/operations/putobject.go

+		if errors.Is(err, kv.ErrNotFound) {
+			_ = o.EncodeError(w, req, err, gatewayerrors.Codes.ToAPIErr(gatewayerrors.ErrNoSuchBucket))
+		}
+		_ = o.EncodeError(w, req, err, gatewayerrors.Codes.ToAPIErr(gatewayerrors.ErrInternalError))


return or else

…-ordering-9554 # Conflicts: # esti/commit_test.go # esti/s3_gateway_test.go

N-o-Z · 2025-12-30T00:38:28Z

@nopcoder Thanks for the thorough review - I hope I didn't miss anything

arielshaqed · 2025-12-30T15:33:38Z

users who upgrade to this version and have ongoing MPUs will basically lose them

It's probably a bit worse: users who start an MPU during the rollout may also lose that MPU. I think we may need some product direction here. There is another option - to release a version supporting both modes, upgrade to that version, then after a week release a version dropping the old mode. That one is of course expensive, hence we should ask product.

I don't think that's necessary since once we declare the migration path we can require users to not perform any MPUs during the upgrade. The choice whether to complete outstanding MPUs before upgrade or let the migration process abort them - all of that responsibility will be rolled down to the user

This is even more product than before. Bear in mind that "the user" is not a single person. For instance, think about how to roll this out on lakeFS Cloud. We would need to announce that at a certain time slot all MPUs are disallowed.¹ And then we must upgrade the entire cluster within that time slot. This will probably require CS involvement.

For instance, some Spark users use S3A, and will end up doing MPUs. ↩

N-o-Z · 2025-12-30T15:57:52Z

users who upgrade to this version and have ongoing MPUs will basically lose them

It's probably a bit worse: users who start an MPU during the rollout may also lose that MPU. I think we may need some product direction here. There is another option - to release a version supporting both modes, upgrade to that version, then after a week release a version dropping the old mode. That one is of course expensive, hence we should ask product.

I don't think that's necessary since once we declare the migration path we can require users to not perform any MPUs during the upgrade. The choice whether to complete outstanding MPUs before upgrade or let the migration process abort them - all of that responsibility will be rolled down to the user

This is even more product than before. Bear in mind that "the user" is not a single person. For instance, think about how to roll this out on lakeFS Cloud. We would need to announce that at a certain time slot all MPUs are disallowed.1 And then we must upgrade the entire cluster within that time slot. This will probably require CS involvement.

Footnotes

For instance, some Spark users use S3A, and will end up doing MPUs. ↩

I agree completely with everything you said.
That's why we're not going to merge this change before we get @treeverse/product's input and decide on the migration path

nopcoder

Thanks for addressing all the comments.
There is one concern I have with the kvUploadIterator implementation:

The Seek implementation first checks for errors which it means we don't enable calling Seek after we failed to New or Seek
Because Seek is not exactly like New in the case we expect the user to call Close in case we didn't return an error - we need to check if we have 'it.kvIter' set, in the case New was successful, Seek failed and Close that will find 'it.kvIter' set to nil.

…-ordering-9554

ozkatz · 2026-01-28T17:04:31Z

You've asked for product feedback so here it is :)

To make things simple, let's put forth a constraint: our rule of thumb should be to never require downtime to upgrade a lakeFS minor version. There might be extreme cases where we'd have to break that rule, but I'm not convinced this is one of them.

I suggest we either find a way to fix this while allowing MPUs to continue uninterrupted during the upgrade process (without downtime) - or consider this a big enough breaking change that we simply can't introduce in lakeFS 1.x.

N-o-Z self-assigned this Dec 19, 2025

N-o-Z added bug Something isn't working include-changelog PR description should be included in next release changelog labels Dec 19, 2025

github-actions bot added area/gateway Changes to the gateway area/testing Improvements or additions to tests labels Dec 19, 2025

N-o-Z force-pushed the fix/s3gw-list-parts-ordering-9554 branch from 9d9c513 to 036db8a Compare December 19, 2025 22:19

Fix: S3 GW list multipart uploads ordering

4d60edf

N-o-Z force-pushed the fix/s3gw-list-parts-ordering-9554 branch from 036db8a to 4d60edf Compare December 19, 2025 22:38

N-o-Z requested review from a team, arielshaqed, itaiad200 and nopcoder December 21, 2025 00:32

nopcoder requested changes Dec 28, 2025

View reviewed changes

N-o-Z added 3 commits December 29, 2025 16:38

Merge remote-tracking branch 'origin/master' into fix/s3gw-list-parts…

2a720b0

…-ordering-9554 # Conflicts: # esti/commit_test.go # esti/s3_gateway_test.go

CR Fixes 1

530d4dc

CR Fixes 2

16b7244

N-o-Z marked this pull request as ready for review December 30, 2025 00:34

N-o-Z requested a review from nopcoder December 30, 2025 00:38

nopcoder requested changes Jan 9, 2026

View reviewed changes

N-o-Z added 2 commits January 12, 2026 18:06

Merge remote-tracking branch 'origin/master' into fix/s3gw-list-parts…

fabed53

…-ordering-9554

CR Fixes

718b9cd

N-o-Z requested a review from nopcoder January 12, 2026 23:38

N-o-Z and others added 3 commits January 20, 2026 16:39

Merge branch 'master' into fix/s3gw-list-parts-ordering-9554

9a265c0

Fix merge

b9ab48f

Merge branch 'master' into fix/s3gw-list-parts-ordering-9554

405c6bd

Conversation

N-o-Z commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This PR should not be merged before we decide on migration strategy (see below)

Change Description

Bug Fix

Testing Details

Breaking Change?

Migration Strategy:

Below is a proposal on how to deal with existing MPUs:

Uh oh!

arielshaqed commented Dec 21, 2025

Uh oh!

N-o-Z commented Dec 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

N-o-Z commented Dec 30, 2025

Uh oh!

arielshaqed commented Dec 30, 2025

Footnotes

Uh oh!

N-o-Z commented Dec 30, 2025

Footnotes

Uh oh!

nopcoder left a comment

Choose a reason for hiding this comment

Uh oh!

ozkatz commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

N-o-Z commented Dec 19, 2025 •

edited

Loading