Skip to content

Improve handling of open file descriptors #3784

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jan 15, 2024
Merged

Improve handling of open file descriptors #3784

merged 8 commits into from
Jan 15, 2024

Conversation

lava
Copy link
Member

@lava lava commented Jan 9, 2024

Attempt to improve the behavior of nodes that are running into the maximum number of open files:

  • Increase the upper limit imposed by our systemd unit files to 64Ki
  • Add monitoring of open file descriptors as a health metric so that leaks can be detected by looking at the rate of change over time.

@lava lava added bug Incorrect behavior improvement An incremental enhancement of an existing feature maintenance Tasks for keeping up the infrastructure and removed bug Incorrect behavior labels Jan 9, 2024
@lava lava force-pushed the topic/fd-ulimits branch from 030c942 to 4edfa00 Compare January 9, 2024 15:07
Copy link
Member

@mavam mavam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind adding a changelog entry?

It'd be great if we can reuse logic from os.cpp instead of hand-rolling it again. Otherwise this good.

@lava lava force-pushed the topic/fd-ulimits branch from 2e0437a to 4edfa00 Compare January 9, 2024 17:09
@lava lava force-pushed the topic/health-metrics branch 2 times, most recently from 92aa0d1 to 8311e2e Compare January 12, 2024 10:43
@lava lava force-pushed the topic/fd-ulimits branch from 4edfa00 to fb62682 Compare January 12, 2024 12:38
@lava lava force-pushed the topic/health-metrics branch from 8311e2e to f4561ab Compare January 12, 2024 12:41
@lava lava force-pushed the topic/fd-ulimits branch 6 times, most recently from e2bf523 to 9ad0b58 Compare January 12, 2024 16:23
@lava lava force-pushed the topic/health-metrics branch from 4266b29 to 1b43460 Compare January 12, 2024 16:41
@lava lava force-pushed the topic/fd-ulimits branch from 9ad0b58 to 09b07c5 Compare January 12, 2024 16:50
Copy link
Member

@mavam mavam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. This doubles down on our effort to centralize OS-specific aspects.

Just minor nits in the review comments.

@lava lava force-pushed the topic/health-metrics branch 5 times, most recently from 4074359 to 311a45c Compare January 14, 2024 18:48
Base automatically changed from topic/health-metrics to main January 14, 2024 22:03
lava added 3 commits January 14, 2024 23:04
Attempt to improve the behavior of nodes that are running
into the maximum number of open files:

  * Increase the upper limit imposed by our systemd unit files
    to 64Ki. We avoid going all the way to `LimitNOFILE=infinity`,
    so that in the case of a file descriptor leak the node will
    fail at some point rather than the system hitting the maximum
    number of global open files.
  * Add monitoring of open file descriptors as a health metric
    so that leaks can be detected by looking at the rate of
    change over time.
@lava lava force-pushed the topic/fd-ulimits branch from 09b07c5 to 82dd87b Compare January 14, 2024 22:04
@lava lava enabled auto-merge January 15, 2024 08:30
@lava lava merged commit 379355f into main Jan 15, 2024
@lava lava deleted the topic/fd-ulimits branch January 15, 2024 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement An incremental enhancement of an existing feature maintenance Tasks for keeping up the infrastructure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants