Skip to content

Make transmit queue limits and timeout configurable #75

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 10, 2025

Conversation

jeff-loughlin
Copy link
Contributor

@jeff-loughlin jeff-loughlin commented Oct 9, 2024

When actors have longer-running work to do while processing a message, messages can get backed up and eventually exceed the limits set by MAX_QUEUED_TRANSMITS, QUEUE_TRANSMIT_UNBLOCK_THRESHOLD, and DROP_TRANSMITS_LEVEL, or exceed the timeout set by DEFAULT_MAX_TRANSMIT_PERIOD.

This update just makes those values configurable by setting environment variables (and defaults to the original values if those variables are not set).

Users can set the following environment variables:

  • THESPIAN_MAX_QUEUED_TRANSMITS to modify MAX_QUEUED_TRANSMITS
  • THESPIAN_QUEUED_TRANSMIT_UNBLOCK_THRESHOLD to modify QUEUED_TRANSMIT_UNBLOCK_THRESHOLD
  • THESPIAN_DROP_TRANSMITS_LEVEL to modify DROP_TRANSMITS_LEVEL
  • THESPIAN_TRANSMIT_TIMEOUT_MINUTES (maybe badly named) to modify DEFAULT_MAX_TRANSMIT_PERIOD

Make MAX_QUEUED_TRANSMITS, QUEUE_TRANSMIT_UNBLOCK_THRESHOLD, and DROP_TRANSMITS_LEVEL configurable with environment variables
@gbanasiak
Copy link
Contributor

Closes #82

Copy link
Contributor

@gbanasiak gbanasiak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a convenience getenvdef() function in thespian.system.utils that could be used here. The benefit is it comes with Thespian log entry, and exception handling.

DROP_TRANSMITS_LEVEL = MAX_QUEUED_TRANSMITS + 100
MAX_QUEUED_TRANSMITS = int(os.environ.get("THESPIAN_MAX_QUEUED_TRANSMITS", 950))
QUEUE_TRANSMIT_UNBLOCK_THRESHOLD = int(os.environ.get("THESPIAN_QUEUED_TRANSMIT_UNBLOCK_THRESHOLD", 780))
DROP_TRANSMITS_LEVEL = MAX_QUEUED_TRANSMITS + int(os.environ.get("THESPIAN_DROP_TRANSMITS_LEVEL", 100))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect THESPIAN_DROP_TRANSMITS_LEVEL to determine the final value of DROP_TRANSMITS_LEVEL, not part of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add MAX_PENDING_TRANSMITS env var for completeness?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect THESPIAN_DROP_TRANSMITS_LEVEL to determine the final value of DROP_TRANSMITS_LEVEL, not part of it.

Yeah, that seems badly named since it's a calculated value - but that's how it was getting set in the original code so I didn't want to mess with it too much. It should either be the value of the env var, or be named something different. I'm fine with whichever you think makes the most sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add MAX_PENDING_TRANSMITS env var for completeness?

Yes. Didn't know about that one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect THESPIAN_DROP_TRANSMITS_LEVEL to determine the final value of DROP_TRANSMITS_LEVEL, not part of it.

Yeah, that seems badly named since it's a calculated value - but that's how it was getting set in the original code so I didn't want to mess with it too much. It should either be the value of the env var, or be named something different. I'm fine with whichever you think makes the most sense

The intent of the original code was to ensure that the DROP_TRANSMITS_LEVEL always exceeded MAX_QUEUED_TRANSMITS to most effectively use the various fallback capabilities. In general, there's a danger here that overrides for any of these values could create significantly sub-optimal behavior and ideally there would be a lot more validation and behavioral checks. That said, all parties on this PR and anyone likely to provide these overrides are expert users to whom control should be ceded rather than second-guessed. The original values for all of these parameters were "tuned" to meet some original configurations anyhow, so none of these should be considered immutable or omniscient.

I agree that when setting a value, it should probably be set to directly the value specified rather than a modified version of that value.

Copy link
Member

@kquick kquick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I agree these are needed and I'm happy to approve them with the requested small changes in implementation.



DEFAULT_MAX_TRANSMIT_PERIOD = timedelta(minutes=5)
DEFAULT_MAX_TRANSMIT_PERIOD = timedelta(minutes=int(os.environ.get("THESPIAN_TRANSMIT_TIMEOUT_MINUTES", 5)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add str_to_timedelta and getenvdef to the utilis import above, and then this can be DEFAULT_MAX_TRANSMIT_PERIOD = getenvdev('THESPIAN_TRANSMIT_TIMEOUT_PERIOD', str_to_timedelta, timedelta(minutes=5))

DROP_TRANSMITS_LEVEL = MAX_QUEUED_TRANSMITS + 100
MAX_QUEUED_TRANSMITS = int(os.environ.get("THESPIAN_MAX_QUEUED_TRANSMITS", 950))
QUEUE_TRANSMIT_UNBLOCK_THRESHOLD = int(os.environ.get("THESPIAN_QUEUED_TRANSMIT_UNBLOCK_THRESHOLD", 780))
DROP_TRANSMITS_LEVEL = MAX_QUEUED_TRANSMITS + int(os.environ.get("THESPIAN_DROP_TRANSMITS_LEVEL", 100))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use getenvdef("...", int, defval) for these as well. If you implemented these following an example elsewhere in the code from before getenvdef was available, my apologies, and please feel free to add tickets for those (or make a PR).

DROP_TRANSMITS_LEVEL = MAX_QUEUED_TRANSMITS + 100
MAX_QUEUED_TRANSMITS = int(os.environ.get("THESPIAN_MAX_QUEUED_TRANSMITS", 950))
QUEUE_TRANSMIT_UNBLOCK_THRESHOLD = int(os.environ.get("THESPIAN_QUEUED_TRANSMIT_UNBLOCK_THRESHOLD", 780))
DROP_TRANSMITS_LEVEL = MAX_QUEUED_TRANSMITS + int(os.environ.get("THESPIAN_DROP_TRANSMITS_LEVEL", 100))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect THESPIAN_DROP_TRANSMITS_LEVEL to determine the final value of DROP_TRANSMITS_LEVEL, not part of it.

Yeah, that seems badly named since it's a calculated value - but that's how it was getting set in the original code so I didn't want to mess with it too much. It should either be the value of the env var, or be named something different. I'm fine with whichever you think makes the most sense

The intent of the original code was to ensure that the DROP_TRANSMITS_LEVEL always exceeded MAX_QUEUED_TRANSMITS to most effectively use the various fallback capabilities. In general, there's a danger here that overrides for any of these values could create significantly sub-optimal behavior and ideally there would be a lot more validation and behavioral checks. That said, all parties on this PR and anyone likely to provide these overrides are expert users to whom control should be ceded rather than second-guessed. The original values for all of these parameters were "tuned" to meet some original configurations anyhow, so none of these should be considered immutable or omniscient.

I agree that when setting a value, it should probably be set to directly the value specified rather than a modified version of that value.

@gbanasiak
Copy link
Contributor

@jeff-loughlin Will you find some spare time in the coming weeks for this? If not I'm happy to introduce the requested changes.

@jeff-loughlin
Copy link
Contributor Author

@jeff-loughlin Will you find some spare time in the coming weeks for this? If not I'm happy to introduce the requested changes.

Yeah I can try to get to it this weekend. I've been busy with real world stuff and haven't had much free time, but I will take a look tomorrow. The requested changes are small, so it shouldn't take too long.

@jeff-loughlin
Copy link
Contributor Author

Okay, I made the requested changes. Also added a config option for MAX_PENDING_TRANSMITS for completeness, as mentioned above, and I changed DROP_TRANSMITS_LEVEL to use the actual value instead of the calculated one (default is still the calculated value). This does mean the user can potentially set that to something smaller than MAX_QUEUED_TRANSMITS (with likely catastrophic results), but as discussed above, these are advanced settings, so we can assume they know what they're doing.

Let me know if you need any additional changes.

@jeff-loughlin jeff-loughlin requested a review from kquick January 10, 2025 15:04
Copy link
Member

@kquick kquick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants