-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cron thread #12
Cron thread #12
Conversation
I hope this isn't too much of a spanner in the works, but I've added a third type of trigger, an 'interval' trigger, that works on an interval rather than at a specific time. It seemed necessary to implement the full Iterable replacement. Happy to chat if you don't think it's necessary, and also happy to do write the cron code to make it work as well, once this is in a state where I can safely build off it and extend it. |
This breaks out the actual scheduling logic to helper functions outside the cron thread class, making it possible to test the scheduling without needing to actually run the thread.
We chatted IRL about this, we decided that the thread should gracefully exit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a bit too much magic in the way that the _inner
behaves, though is very clever.
routemaster/cron.py
Outdated
except Exception as e: | ||
# Something went wrong in the processing of a single item in the | ||
# cron queue, log it and allow the processor to continue. | ||
print(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be in favour of using logging.exception
here, even if we configure logging to go to stdout by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes agreed, this (and other parts) are not yet complete.
routemaster/cron.py
Outdated
@@ -2,34 +2,43 @@ | |||
|
|||
import time | |||
import threading | |||
import contextlib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether it's a result of looking at the diff; unfortunately this file feels rather spaghetti-like to me at the moment. Maybe rearrange the functions so they're more clearly grouped? (I tend to enjoy ordering the file to roughly match an upside-down call stack -- detail/utils at the top, higher level logic at the bottom)
routemaster/cron.py
Outdated
return self._terminating | ||
|
||
|
||
class StopCronProcessing(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we pull this class and the following type definitions to the top of the file? (I think I'd expect the type definitions first, then the exception, but I'm not sure there's a clear standard yet)
routemaster/state_machine/actions.py
Outdated
|
||
with app.db.begin() as conn: | ||
try: | ||
with app.db.begin() as conn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intentional that the process_transitions(app, label)
call below is outside this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I strongly suspect this is an error since process_one_job
is an instance of routemaster.cron._process_cron_job._inner
which silences exceptions thrown within the context. This means that if this context errors we'll run process_transitions(app, label)
for the label anyway. I'm fairly sure we don't want that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've discussed this IRL and resolved in a refactor I think.
routemaster/cron.py
Outdated
) -> None: | ||
|
||
@contextlib.contextmanager | ||
def _inner() -> Iterator[None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is a very neat mechanism, it's a bit surprising that a) it's a context manager, and b) a context manager throws an exception as part of its normal operations. It seems plausible that the exception could accidentally get caught by one of our blanket except Exception
s (which we're using quite a lot of in this project).
Did you consider using something like a threading.Event
, which would have clearer semantics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're going to want to document whatever mechanism we go for somewhere so that it's clear what needs to be done to use it within other parts of the system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having just realised that this manager also silences exceptions and spotted a bug with its usage in routemaster.state_machines.actions.process_retries
(see comments there), I don't think that this is a good mechanism to use.
I think there's just a bit too much magic happening here for it to be safe in general use.
routemaster/cron.py
Outdated
_trigger_gate, | ||
state, | ||
) | ||
elif isinstance(trigger, MetadataTrigger): | ||
scheduler.every(60).seconds.do( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with the action triggering, this might be clearer as scheduler.every().minute.do(
.
routemaster/state_machine/actions.py
Outdated
lock_label(label, conn) | ||
current_state = get_current_state(label, state_machine, conn) | ||
|
||
if current_state != action: | ||
continue | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this need to change from continue
to return
? Surely we don't want to stop processing for all the pending users just because one of them is in the wrong state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, leftover from refactoring. d77e59c
We don't care that it's a daemon anymore as we call join explicitly
Cron integration
https://app.asana.com/0/465891412411656/496393890703244