Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WFLY-19419 Distributed timer service should consolidate timeouts that would execute in the past #17962

Merged
merged 7 commits into from
Jul 22, 2024

Conversation

pferraro
Copy link
Contributor

@pferraro pferraro commented Jun 17, 2024

https://issues.redhat.com/browse/WFLY-19419

Also solves: WFLY-19271 Distributed timer service drops timeout events if server is suspended
https://issues.redhat.com/browse/WFLY-19271


<--- THIS SECTION IS AUTOMATICALLY GENERATED BY WILDFLY GITHUB BOT. ANY MANUAL CHANGES WILL BE LOST. --->

Wildfly issue links:

<--- END OF WILDFLY GITHUB BOT REPORT --->

More information about the wildfly-bot[bot]

@bstansberry
Copy link
Contributor

@tadamski @rachmatowicz Please review.

@pferraro pferraro added the hold PR should not be merged for some reason. label Jun 21, 2024
@pferraro
Copy link
Contributor Author

Adding the hold label for now, as there are few things requiring further discussion:

  • Yesterday, I discovered that, based on the implementations here vs here, there seems to be a discrepancy in behavior of missed timeouts between interval vs ScheduleExpression-based timers. While the old timer service implementation coalesces missed timeouts into a single event for interval timers, it does not appear to do the same for ScheduleExpression-based timers. If this is indeed the case, while the proposed changes would bring the behavior of the distributed timer service in line with the old timer service wrt interval timers, it would cause the same behavior wrt ScheduleExpression-timers to diverge.
  • @rachmatowicz suggested that we expose a management attribute to allow users to enable/disable the coalescing of missed timeouts, which I thought was a good idea. If the previous point is valid, then we might actually want to consider an attribute per timer type - which would offer the greatest amount of flexibility wrt compatibility.

@pferraro pferraro force-pushed the WFLY-19361 branch 2 times, most recently from 8cacb66 to cb6c43f Compare July 9, 2024 17:32
@pferraro pferraro removed the hold PR should not be merged for some reason. label Jul 9, 2024
@pferraro
Copy link
Contributor Author

pferraro commented Jul 9, 2024

Removing hold label.

I've included a reproducer and fix for WFLY-19514 to address the issue from my previous comment, where, in the local timer service, interval vs calendar timers behave differently wrt missed timeouts. With this change, both timer services will behave consistently for both interval and calendar-based timers.
I've logged a feature request (https://issues.redhat.com/browse/WFLY-19515) to expose ejb3 subsystem configuration for enabling/disabling the coalescing of missed timeouts, as well as deployment descriptor support to enable/disable this behavior per timer service.
Additionally, fixing the behavior of both timer service implementations wrt suspension will be handled separately.

@bstansberry bstansberry added the 33.x WildFly 33 label Jul 9, 2024
@rachmatowicz
Copy link
Contributor

@bstansberry @pferraro FYI, actively working on the review. Should be done Friday 12 July.

@bstansberry
Copy link
Contributor

/retest

@bstansberry bstansberry added 34.x WildFly 34 and removed 33.x WildFly 33 labels Jul 15, 2024
@bstansberry
Copy link
Contributor

I'm going to defer this to 34.

Copy link
Contributor

@rachmatowicz rachmatowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that the tests pass...so my understanding is just not deep enough at this stage to fully digest everything that is going on, but left some comments for consideration.

Copy link
Contributor

@rachmatowicz rachmatowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Met with Paul and discussed how LocalScheduler.run() and TimerScheduler.InvokeTask() work together to schedule and execute timeouts in such a way that any timeout callbacks that occur when the TimerService is not started are coalesced into one representative timer firing when the TimerService is restarted, so this looks good to me.

@pferraro pferraro merged commit 84e42d7 into wildfly:main Jul 22, 2024
13 checks passed
@pferraro
Copy link
Contributor Author

Thanks @rachmatowicz !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
34.x WildFly 34
Projects
None yet
3 participants