Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: pre-merge is missing commit ID #7750

Merged
merged 9 commits into from
May 13, 2024
Merged

fix: pre-merge is missing commit ID #7750

merged 9 commits into from
May 13, 2024

Conversation

ozkatz
Copy link
Collaborator

@ozkatz ozkatz commented May 9, 2024

Currently the "commit_id" field is empty for pre-merge hooks. Other fields (message, author - and even a meta_range_id exist!)

This Allows accessing the resulting commit from the hook at the expense of potentially referring to a dangling commit in KV.

@ozkatz ozkatz added include-changelog PR description should be included in next release changelog minor-change Used for PRs that don't require issue attached labels May 9, 2024
@ozkatz ozkatz requested a review from itaiad200 May 9, 2024 13:34
Copy link

github-actions bot commented May 9, 2024

E2E Test Results - DynamoDB Local - Local Block Adapter

13 passed

Copy link
Contributor

@itaiad200 itaiad200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably just me, but I don't understand how it solves anything.
Also - @N-o-Z could you take a look too? I don't remember why we decided to implement it this way

@@ -2777,6 +2777,10 @@ func (g *Graveler) Merge(ctx context.Context, repository *RepositoryRecord, dest
}
metadata[MergeStrategyMetadataKey] = mergeStrategyString[mergeStrategy]
commit.Metadata = metadata
commitID, err = g.RefManager.AddCommit(ctx, repository, commit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand where CommitID is being passed to the hook..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'd need to put it on commit for this to work. This may demonstrate that this portion of the code needs a unit test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should just modify TestGraveler_PreMergeHook accordingly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ozkatz that's the only blocking comment from my end - it seems like we're not passing the new commitID to the pre merge hook, which was the sole purpose of this change, no?

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea of this change! It will allow hooks to present a view consistent with the lakeFS branch. (The implementation won't be trivial, but it will be possible!)

That said, we will need to document exactly what this commit is, and especially is not. I think we can say that the ID identified the commit that will be created at the branch head if the hooks and the merge succeed.
But I think this is clarifies stone strange semantics! The hook can succeed and the merge still fails.

@@ -2777,6 +2777,10 @@ func (g *Graveler) Merge(ctx context.Context, repository *RepositoryRecord, dest
}
metadata[MergeStrategyMetadataKey] = mergeStrategyString[mergeStrategy]
commit.Metadata = metadata
commitID, err = g.RefManager.AddCommit(ctx, repository, commit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'd need to put it on commit for this to work. This may demonstrate that this portion of the code needs a unit test.

@ozkatz
Copy link
Collaborator Author

ozkatz commented May 10, 2024

Esti is expecting a certain amount of runs to be returned for a commit ID. This obviously changes that number, failing Esti.

I don't feel comfortable modifying the Esti test to make it pass because I don't understand what it's actually testing.

Copy link
Member

@N-o-Z N-o-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably just me, but I don't understand how it solves anything. Also - @N-o-Z could you take a look too? I don't remember why we decided to implement it this way

The reason we implemented it this way is to avoid spamming KV with dangling commits for every failed pre-merge hook. This is a relatively common scenario so the risk here is high.
Since we don't have any cleanup mechanism for dangling commits I think we should not introduce this change.
I suggest we first have a cleanup mechanism and then introduce this change

@itaiad200
Copy link
Contributor

It's probably just me, but I don't understand how it solves anything. Also - @N-o-Z could you take a look too? I don't remember why we decided to implement it this way

The reason we implemented it this way is to avoid spamming KV with dangling commits for every failed pre-merge hook. This is a relatively common scenario so the risk here is high. Since we don't have any cleanup mechanism for dangling commits I think we should not introduce this change. I suggest we first have a cleanup mechanism and then introduce this change

@N-o-Z what's the concern with dangling commits? AFAIK we won't scan those ever, nor during LogCommits or during any GC operation. If it's just a few bytes in the kvstore for a failed hook, we should be okay, no? Given the fact that every hook run metadata is stored in kv too anyway..

@N-o-Z
Copy link
Member

N-o-Z commented May 12, 2024

It's probably just me, but I don't understand how it solves anything. Also - @N-o-Z could you take a look too? I don't remember why we decided to implement it this way

The reason we implemented it this way is to avoid spamming KV with dangling commits for every failed pre-merge hook. This is a relatively common scenario so the risk here is high. Since we don't have any cleanup mechanism for dangling commits I think we should not introduce this change. I suggest we first have a cleanup mechanism and then introduce this change

@N-o-Z what's the concern with dangling commits? AFAIK we won't scan those ever, nor during LogCommits or during any GC operation. If it's just a few bytes in the kvstore for a failed hook, we should be okay, no? Given the fact that every hook run metadata is stored in kv too anyway..

As a general rule, we should always make effort the cleanup the DB of unnecessary data. We are making this effort in other flows and I believe we should do the same here.
There's a logic and business reason for keeping a failed (and successful) hook run in the DB. I've yet to understand the reason for keeping a dangling commit.

commitID, err = g.RefManager.AddCommit(ctx, repository, commit)
if err != nil {
return nil, fmt.Errorf("add commit: %w", err)
}
if !repository.ReadOnly {
preRunID = g.hooks.NewRunID()
err = g.hooks.PreMergeHook(ctx, HookRecord{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that PostMergeHook no longer needs to call UpdateCommitID after this change

Copy link
Member

@N-o-Z N-o-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ozkatz thanks,

After understanding the use case, this will provide hooks access to the merge commit ID and allow them to run hook logic on it if necessary.
Added some more changes + fixed esti test

Copy link
Contributor

@itaiad200 itaiad200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

github-actions bot commented May 12, 2024

♻️ PR Preview 8101d0a has been successfully destroyed since this PR has been closed.

🤖 By surge-preview

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sorry for delay in reviewing... and please note that docs are good but also a bit confusing, suggested something that would confuse me less.

docs/howto/hooks/webhooks.md Outdated Show resolved Hide resolved
Co-authored-by: Ariel Shaqed (Scolnicov) <ariels@treeverse.io>
@ozkatz ozkatz enabled auto-merge (squash) May 13, 2024 07:55
@ozkatz ozkatz merged commit 3877c20 into master May 13, 2024
36 checks passed
@ozkatz ozkatz deleted the fix/commit-id-on-merge branch May 13, 2024 08:10
emulatorchen pushed a commit to emulatorchen/lakeFS that referenced this pull request May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
include-changelog PR description should be included in next release changelog minor-change Used for PRs that don't require issue attached
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants