Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read from branch with compaction data #7701

Merged
merged 29 commits into from
Jun 11, 2024
Merged

Read from branch with compaction data #7701

merged 29 commits into from
Jun 11, 2024

Conversation

idanovo
Copy link
Contributor

@idanovo idanovo commented Apr 25, 2024

No description provided.

@idanovo idanovo added the exclude-changelog PR description should not be included in next release changelog label Apr 25, 2024
@idanovo idanovo self-assigned this Apr 25, 2024
@idanovo idanovo linked an issue Apr 25, 2024 that may be closed by this pull request
Copy link

github-actions bot commented Apr 25, 2024

E2E Test Results - DynamoDB Local - Local Block Adapter

13 passed

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Really exciting to see how we can separate the two functions of "the" (currently single) branch metarange ID: right now it is used both as the parent for the next commit, and also as the base for differences in staging.

I would prefer to avoid the term "compaction" here, and instead name the new metarange ID on a branch by some other name - maybe BaseMetarangeID. It may make our lives easier if we ever have to add user-requested "partial commits". The name we pick right now matters, because it it stored to the KV. So it will be expensive to change in future!

I am not sure why we need a "compaction manager". All it does is duplicate the code that we already have in Graveler, just for a different metarange. We even need the same if statements to select the right base metarange. Furthermore, AFAICT this compaction manager leads to these bugs:

  • An SSTable handle leaks whenever a compacted metarange exists on the branch. This leaks a file descriptor as well as memory.
  • The code opens an unused metarange whenever a compacted metarange exists on the branch. This is a loss of efficiency. IOPs (I/O operations) are sometimes an expensive resource on lakeFS installations.

Can we try to select the correct metarange in if statements before accessing the commit metarange, instead of having the manager? I believe that that code would be easier to understand, and provide fewer opportunities for bugs.

pkg/graveler/graveler.go Outdated Show resolved Hide resolved
pkg/graveler/graveler.go Outdated Show resolved Hide resolved
pkg/graveler/graveler.go Outdated Show resolved Hide resolved
@itaiad200
Copy link
Contributor

itaiad200 commented Apr 28, 2024

Agree with @arielshaqed comments. Is this PR goal to cover all branch reading scenarios? What about functions that are not covered in this PR, like listStagingArea, isStagingEmpty, ..?

Copy link
Contributor

@guy-har guy-har left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Ariel, I would expect compaction to be handled as any other metarange. furthermore, compaction made the term staging a tricky one
i.e some would expect listStagingArea to return only data existing under the staging token and others may expect it to return all staging data (IMO all staging data is the trivial one). Also, we still have many places not considering the compacted data, for example graveler.Get and graverler.Commit which should return "dirty" if staging is empty, etc...

pkg/graveler/graveler.go Outdated Show resolved Hide resolved
pkg/graveler/graveler.go Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tests! With Commit, StagingToken and SealedTokens already present, and with the addition of BaseMetaRange concept, the number of test scenarios is huge. I think we missed some important ones, like testing with tombstones, testing with all 4 (or 5 for multiple sealed tokens) keys, querying for empty staging area, etc. The reason we need to be pedantic about changes here, is that bugs in the versioning engine can be catastrophic. We intend on introducing the compaction operation itself later and that's even more worrying! Finding the bugs are normally easier during the PR itself, and it gets harder as we build on top of it.

You can look at graveler_v2_test.go file too. I find it easier to write the tests with proper generated mocks instead of limited fakes.

Copy link
Contributor

@itaiad200 itaiad200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed everything except graveler_test.go (will get back to it later).
Overall we're almost there! Bunch of medium/small comments..

pkg/catalog/catalog_test.go Show resolved Hide resolved
pkg/catalog/catalog_test.go Show resolved Hide resolved
pkg/catalog/gc_write_uncommitted.go Outdated Show resolved Hide resolved
pkg/catalog/gc_write_uncommitted.go Outdated Show resolved Hide resolved
pkg/catalog/gc_write_uncommitted.go Outdated Show resolved Hide resolved
pkg/graveler/graveler.go Outdated Show resolved Hide resolved
@@ -3109,6 +3164,14 @@ func (g *Graveler) Diff(ctx context.Context, repository *RepositoryRecord, left,
leftValueIterator.Close()
return nil, err
}
if rightBranch.CompactedBaseMetaRangeID != "" {
compactedDiffIterator, err := g.CommittedManager.Diff(ctx, repository.StorageNamespace, leftCommit.MetaRangeID, rightBranch.CompactedBaseMetaRangeID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftCommit & rightBranch.CompactedBaseMetaRangeID? Can you explain why this is true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case there's no compaction data on the right we return a combined iterator with:

  1. diff iterator between the left committed data and the right committed data
  2. iterator of committed data in the left branch
  3. iterator of the staging data of the right branch

So in case there's compacted data on the right branch we need to change the first iterator to contain also changes that have already compacted to the right branch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case there's no compaction data on the right we return a combined iterator with:

But this path handles the case where rightBranch.CompactedBaseMetaRangeID != ""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, why isn't it JoinedDiff(stagingIterator, compactedDiffIterator)? Isn't iterator of committed data in the left branch already being diffed in compactedDiffIterator?

pkg/graveler/graveler.go Outdated Show resolved Hide resolved
pkg/graveler/graveler.go Outdated Show resolved Hide resolved
pkg/graveler/graveler_v2_test.go Outdated Show resolved Hide resolved
pkg/graveler/graveler.go Outdated Show resolved Hide resolved
pkg/graveler/graveler.go Outdated Show resolved Hide resolved
pkg/graveler/graveler.go Outdated Show resolved Hide resolved
pkg/graveler/graveler.go Outdated Show resolved Hide resolved
pkg/graveler/graveler.go Outdated Show resolved Hide resolved
@idanovo idanovo requested a review from itaiad200 May 7, 2024 09:05
Copy link
Contributor

@itaiad200 itaiad200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review, but I think there's some valuable comments already

pkg/catalog/catalog_test.go Show resolved Hide resolved
pkg/graveler/graveler.go Outdated Show resolved Hide resolved
}
}

if updatedValue == nil && reference.CompactedBaseMetaRangeID != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we get here with updatedValue != nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case the object didn't change in staging

pkg/graveler/graveler.go Outdated Show resolved Hide resolved
pkg/graveler/graveler.go Show resolved Hide resolved
pkg/graveler/graveler.go Outdated Show resolved Hide resolved
@@ -2492,7 +2512,7 @@ func (g *Graveler) ResetPrefix(ctx context.Context, repository *RepositoryRecord
newSealedTokens = append(newSealedTokens, branch.SealedTokens...)

// Reset keys by prefix on the new staging token
itr, err := g.listStagingArea(ctx, branch, 0)
itr, err := g.listStagingAreaWithoutCompaction(ctx, branch, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't compaction be involved too? AFAIR ResetPrefix resets all the uncommitted changes in a prefix. Why shouldn't it reset changes that exist in compacted area too?

pkg/graveler/graveler.go Outdated Show resolved Hide resolved
pkg/graveler/graveler.go Outdated Show resolved Hide resolved
@idanovo
Copy link
Contributor Author

idanovo commented May 8, 2024

Graveler function Tested Controller function Tested AI
ResetKey X ResetBranch V Add a controller test for ResetKey for a compacted branch when we add the CompactBranch API call
ResetPrefix X ResetBranch V Add a controller test for ResetPrefix for a compacted branch when we add the CompactBranch API call
prepareForCommitIDUpdate X Revert (dirty branch) V Added a test in graveler testsShould also add a test in the controller once we have the CompactBranch API call
prepareForCommitIDUpdate X CherryPick (dirty branch) V Add a controller test for CherryPick for a compacted branch when we add the CompactBranch API call
prepareForCommitIDUpdate X UpdateBranch (dirty branch) X Add a controller test for ResetPrefix for a compacted branch when we add the CompactBranch API call
prepareForCommitIDUpdate X Merge (dirty branch) V Added a test in graveler testsShould also add a test in the controller once we have the CompactBranch API call
prepareForCommitIDUpdate X Import (dirty branch) X Add a controller test for ResetPrefix for a compacted branch when we add the CompactBranch API call

@idanovo idanovo requested a review from itaiad200 May 8, 2024 12:58
Copy link
Contributor

@itaiad200 itaiad200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pushing this one through, it's certainly wasn't an easy one.
Once it's merged, please open an issue containing the tests table as a dependency for the next compaction task.

pkg/catalog/catalog_test.go Show resolved Hide resolved
pkg/graveler/graveler.go Show resolved Hide resolved
pkg/graveler/graveler.go Show resolved Hide resolved
pkg/graveler/graveler.go Outdated Show resolved Hide resolved
Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only blocking diffs are on JoinedDiffIterator. It is really important code, and needs to be as simple as possible. I think we may have simpler code if we revisit some decisions.

One way might be this. "Advance inner" (now, after seeing the code) seems like the wrong strategy. AFAICT it forces us to assume a particular special DiffIterator.Value() return of nil when the iterator is over. And then it looks like we check almost the same cases but not quite. Because there are so many, I worry that we will miss some edge cases.

numBranch: 3,
numRecords: 0,
expectedCalls: 1,
compactBranch: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Rather than double each test scenario, I would probably just loop over both scenarios by saying for _, compactBranch := range{false, true} { t.Run(...) } on l. 585.

}

if numRecords > 0 {
test.GarbageCollectionManager.EXPECT().
GetUncommittedLocation(gomock.Any(), gomock.Any()).
Times(expectedCalls).
AnyTimes().
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Particularly worried that we might never call this. If no UGC occurs, will the test fail somewhere else?

pkg/catalog/gc_write_uncommitted.go Show resolved Hide resolved
Comment on lines +403 to +404
// CompactedBaseMetaRangeID - the MetaRangeID of the last compaction's
CompactedBaseMetaRangeID MetaRangeID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really not a fan of the "compacted" part of the name. It describes how it happened not what it is. I would prefer to use "BaseMetaRangeID". It is important to get the name right before any users - because if we get it wrong it will be hard to fix existing KVs.

In future we might have other reasons to change the base. For one example: during concurrent merges, one of the merges will fail when it tries to update the branch record of the destination. That is inefficient and slow. We could use your work here to speed up concurrent merges! After the merge fails like this we might optionally update the base metarange of the source branch to point at the intended result. Now the source branch looks like the destination branch got merged into it, so retrying the merge will be faster.

(This is just one example, I have many plans for this feature. I don't want to tie them to the word "compacted"!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @itaiad200 was worried that the word "compact" was missing for this variable.
Decision time 😈

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itaiad200 can you chime in, please?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not tying ourselves to Compacted means that we add another term on top of staging, sealed, compaction and uncommitted.. For example, LastStagingSnapshot - Pro: It describes what it is, not obscure. Con: Another term.

I still hold my opinion but not firmly, I'd rather it be CompactedStaging over BaseMetaRangeID. I believe it does describe what it is, not necessarily the action that happened.
So up to you, not blocking :)

pkg/graveler/graveler_v2_test.go Show resolved Hide resolved
pkg/graveler/joined_diff_iterator.go Outdated Show resolved Hide resolved
// first
nextA = c.iterA.Next()
nextB = c.iterB.Next()
case valA == nil && valB == nil:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not to depend on nil values like this. I think that the problem is you already called Next on the iterators by the time you call this function; perhaps we should figure out how to avoid that?

Comment on lines 67 to 69
if !c.advanceInnerIterators() {
return false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more complex than it should be. You have to call c.Next() the first time, before using the iterator. That's why you need to have the special case for p == nil in advanceInnerIterators. This function:

  1. Relies on DiffIterator returning nil value in certain conditions. Regardless of whether or not it does this, relying on this makes it a very special case - most Go iterators do not support this!
  2. We repeatedly call Value() on each iterator. This is weird.
  3. We have 2 switch statements. The first has 7 cases and 2 special cases in if statements, the second has 5 cases. This is too complex.

Comment on lines 97 to 99
if c.p != nil && c.p.Err() != nil {
return c.p.Err()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. c.p is always either c.iterA or c.iterB. Why do we need to check it separately?

if c.p != nil && c.p.Err() != nil {
return c.p.Err()
}
if c.iterA != nil && c.iterA.Err() != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is c.iterA == nil?

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still worried about the iterator. The switch statement in there is huge, which makes it too hard for me to understand. I think it has issues, but TBH I really don't know.

So still requesting changes, if only to documentation.

Comment on lines +403 to +404
// CompactedBaseMetaRangeID - the MetaRangeID of the last compaction's
CompactedBaseMetaRangeID MetaRangeID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itaiad200 can you chime in, please?

pkg/graveler/graveler.go Show resolved Hide resolved
iterAHasMore bool
iterB DiffIterator
iterBHasMore bool
p DiffIterator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, please document p. The name is very unintuitive.

Comment on lines 47 to 50
case bytes.Equal(valA.Key, valB.Key):
c.iterAHasMore = c.iterA.Next()
c.iterBHasMore = c.iterB.Next()
case bytes.Compare(valA.Key, valB.Key) < 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This traverses the keys twice. Please compare once; it is fine to call a separate routine here, but that will need to be documented. In fact you can return immediately in all the above cases, and then compute bytes.Compare(val!.Key, valB.Key) once.

c.iterBHasMore = c.iterB.Next()
case bytes.Compare(valA.Key, valB.Key) < 0:
c.iterAHasMore = c.iterA.Next()
c.iterBHasMore = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this line? After line 43 we know that iterBHasMore.

default:
// value of iterA < value of iterB
c.iterBHasMore = c.iterB.Next()
c.iterAHasMore = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

return false
}
if !c.iterAHasMore && !c.iterBHasMore {
return false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. Suppose that the last values of iterA and iterB refer to the same key. Then these lines will make this condition true -- and the iterator misses the last value!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We process only values we already returned in the first switch-case statement.
I'll add a test case just to make sure.

Comment on lines 75 to 76
valA = c.iterA.Value()
valB = c.iterB.Value()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If c.iterAHasMore then it is incorrect to call c.iterA.Value(), and similarly for iterB.

Comment on lines +114 to +115
c.iterA.SeekGE(id)
c.iterB.SeekGE(id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems incorrect: AFAIR you must call Next() after SeekGE().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true.
This is the same behavior we have in our CombinedIterator.
I don't know if we want to have the same behavior here or change it.

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A lot of effort went into this, and I really believe it will improve performance - so definitely worth it. Sorry for picking so many nits - this is the core of lakeFS, and we need both tests and manual review.

My few remaining comments are minor - feel free to push to trunk however you decide to handle them.

// UpperBoundForPrefix returns, given a prefix `p`, a slice 'q' such that a byte slice `s` starts with `p`
// if and only if p <= s < q. Namely, it returns an exclusive upper bound for the set of all byte arrays
// that start with this prefix. It returns nil if there is no such byte slice because all bytes of `p` are
// UpperBoundForPrefix returns, given a prefix `currenIter`, a slice 'q' such that a byte slice `s` starts with `currenIter`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this should be currentIter.

Comment on lines 111 to 113
if c.currenIter != nil && c.currenIter.Err() != nil {
return c.currenIter.Err()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong: it says that if one iterator failed but the other still works, then everything is OK. But it's not! Next() returned false, and I need to fail.

Suggested change
if c.currenIter != nil && c.currenIter.Err() != nil {
return c.currenIter.Err()
}
if c.iterA.Err() != nil { // TODO(idan): Return a multierror with _both_ errors if both failed!
return c.iterA.Err()
}
if c.iterB.Err() != nil {
return c.iterB.Err()
}
return nil

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a unit test for this, too, unfortunately.

}
return true
}

func (c *JoinedDiffIterator) Value() *Diff {
if c.p == nil {
if c.currenIter == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that calling Value() here is an error. Even though you return successfully, the caller will probably panic shortly after. Your call, but personally I would at least log an "internal error".

pkg/catalog/gc_write_uncommitted.go Show resolved Hide resolved
pkg/graveler/graveler.go Show resolved Hide resolved
@idanovo idanovo removed the request for review from guy-har June 11, 2024 09:57
@idanovo idanovo dismissed guy-har’s stale review June 11, 2024 09:57

I've got 2 approvals

@idanovo idanovo merged commit 317c985 into master Jun 11, 2024
35 checks passed
@idanovo idanovo deleted the read-compacted-branch branch June 11, 2024 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exclude-changelog PR description should not be included in next release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Manage read from branch with compaction
4 participants