-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[ENH] Wire up garbage collector to do 3-phase GC. #4987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
Implement 3-Phase Garbage Collection for WAL Logs and Wire Up End-to-End GC Path This PR introduces a full 3-phase garbage collection (GC) workflow across the WAL-backed log service and garbage collector in Chroma, extending the API, service implementations, and GC operator logic accordingly. The changes add a new Key Changes: Affected Areas: This summary was automatically generated by @propel-code-bot |
ba3fe76
to
9bbea49
Compare
d5d740b
to
a1786d3
Compare
a1786d3
to
5475ab1
Compare
@@ -163,4 +173,6 @@ service LogService { | |||
rpc InspectLogState(InspectLogStateRequest) returns (InspectLogStateResponse) {} | |||
// This endpoint should route to the rust log service. | |||
rpc ScrubLog(ScrubLogRequest) returns (ScrubLogResponse) {} | |||
// This endpoint should route to the rust log service. | |||
rpc GarbageCollectPhase2(GarbageCollectPhase2Request) returns (GarbageCollectPhase2Response) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we rename this so that the name explains what this service does in more details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you name it? It's phase2 of the process known as garbage collect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from my understanding it's truncating the manifest? so maybe TruncateManifest
@@ -183,6 +183,10 @@ func (s *logServer) ScrubLog(ctx context.Context, req *logservicepb.ScrubLogRequ | |||
return | |||
} | |||
|
|||
func (s *logServer) GarbageCollectPhase2(ctx context.Context, req *logservicepb.GarbageCollectPhase2Request) (res *logservicepb.GarbageCollectPhase2Response, err error) { | |||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this return nil
or an actual response?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an empty response. I'm copying other empty responses here. Should we set the struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it's implicitly converting nil to empty response in this case. We can set the struct for clarity or leave it as is for consistency, whichever you prefer
match writer.garbage_collect(&GarbageCollectionOptions::default(), Some(*minimum_log_offset_to_keep)).await { | ||
Ok(()) => Ok(()), | ||
match writer.garbage_collect_phase1(&GarbageCollectionOptions::default(), Some(*minimum_log_offset_to_keep)).await { | ||
Ok(true) => {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when is the writer expected to return true v/s false. And why do we skip phase2 and 3 in case it returns false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is phase1 not done by the log service? It can compute and write out the garbage and send a path to the garbage file as a response of the rpc?
tracing::error!("Unable to garbage collect log for collection [{collection_id}]: {err}"); | ||
return Err(DeleteUnusedLogsError::Gc(err)); | ||
}; | ||
if let Err(err) = writer.garbage_collect_phase3(&GarbageCollectionOptions::default()).await { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also would prefer to name these phases better instead of just phase1, 2 and 3
return Err(DeleteUnusedLogsError::Wal3(err)); | ||
} | ||
}; | ||
if let Err(err) = logs.garbage_collect_phase2(*collection_id).await { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add comments on why phase3 is done by the GC service i.e. the argument of keeping a single service that deletes all the files
c8d817a
to
19793f0
Compare
✅ The Helm chart's version was changed. Your changes to the chart will be published upon merge to |
## Description of changes Wire up the garbage collector to do 3-phase GC. ## Test plan CI ## Documentation Changes N/A
## Description of changes Included PRs in order: - **[TST] Test for #4972 (#4983)** - **[CLN] Fix dedup in get_collections_with_new_data. (#4974)** - **[ENH] Implement three-phase garbage collection for WAL3 (#4984)** - **[CLN] Remove err(Display) from wal3. (#4992)** - **[ENH] Wire up garbage collector to do 3-phase GC. (#4987)** - **[ENH] Do not materialize all fragments to delete. (#5004)** ## Test plan CI through main and then CI on the hotfix branch. ## Documentation Changes N/A
Description of changes
Wire up the garbage collector to do 3-phase GC.
Test plan
CI
Documentation Changes
N/A