Skip to content

Improve checkpoint publication scheduling#940

Merged
phbnf merged 5 commits into
transparency-dev:mainfrom
phbnf:stale
May 7, 2026
Merged

Improve checkpoint publication scheduling#940
phbnf merged 5 commits into
transparency-dev:mainfrom
phbnf:stale

Conversation

@phbnf
Copy link
Copy Markdown
Contributor

@phbnf phbnf commented Apr 29, 2026

This PR schedules checkpoint publication operations as early as possible, as per `checkpoint_interval and the last publication timestamp.

@phbnf phbnf requested a review from AlCutter April 29, 2026 13:20
@phbnf phbnf requested a review from a team as a code owner April 29, 2026 13:20
@AlCutter
Copy link
Copy Markdown
Collaborator

I'm wondering if we can do this with no visible API changes...

Inside each of the publishCheckpoint routines, rather than polling frequently, we could instead do a better job of scheduling attempts so they have more chance to succeed; e.g. when we check when the last CP was published and discover that it's too new, we now know exactly what the interval must be before the next one can be published.

Would something along the lines of the sketch below work?

const retry := 200*time.Milliseconds // whatever
t := time.NewTicker(retry)

for {
   select {
     case <-ctx.Done(): 
       // whatever
     case <-t.C:
   }
   
   checkpointAge := ...
   if !shouldPublish {
     t.Reset(minInterval-checkpointAge)
     continue
   }
   if err := publishCheckpoint(); err != nil {
     t.Reset(retry)
     continue
   }
   t.Reset(minInterval)
}

@phbnf
Copy link
Copy Markdown
Contributor Author

phbnf commented Apr 29, 2026

Ah yes 100%, that was my original design but I had understood you didn't want do to that, sorry. I'll get back to it.

@AlCutter
Copy link
Copy Markdown
Collaborator

Ah yes 100%, that was my original design but I had understood you didn't want do to that, sorry. I'll get back to it.

Ok, well great!
I don't think we talked about it when we were braining on this together yesterday, but either way - great minds - carry on! :)

@phbnf
Copy link
Copy Markdown
Contributor Author

phbnf commented Apr 30, 2026

I've done something along those lines, have a look. Instead of fetching the last checkpoint update time in publishCheckpointJob and thus doing extra reads, I've modified the signature of publishCheckpoint. Happy to change the implementation if you'd prefer something else.

I have not implemented a custom retry interval, I think we can just default back to the user provided value, it keeps things simpler.

I also took that as an opportunity to decrease minCheckpointInterval for GCP.

@phbnf phbnf changed the title Decouple checkpointInterval and minStaleness Improve checkpoint publication scheduling Apr 30, 2026
Comment thread storage/posix/files.go Outdated
Comment on lines +186 to +191
nextPublication := pubInterval - time.Since(publishedAt)
if nextPublication <= 0 {
t.Reset(time.Millisecond) // Schedule a checkpoint update immediately.
} else {
t.Reset(nextPublication)
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the 3 instances of this be simplified to t.Reset(max(time.Millisecond, pubInterval - time.Since(publishedAt)))?

Comment thread storage/gcp/gcp.go Outdated
}

return otel.TraceErr(ctx, "tessera.storage.gcp.publishCheckpoint", tracer, func(ctx context.Context, span trace.Span) error {
if err := otel.TraceErr(ctx, "tessera.storage.gcp.publishCheckpoint", tracer, func(ctx context.Context, span trace.Span) error {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could change otel.TraceErr to otel.Trace2 and return (time.Time, error) from the inner func directly to reduce the readability hit from adding the outer if here. (same in other implementations)

@phbnf phbnf merged commit 5aa29ef into transparency-dev:main May 7, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants