Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAS-128752 / 24.10 / fix catalog.query to stop calling zfs.dataset.query (by yocalebo) #13660

Merged
merged 1 commit into from
May 3, 2024

Conversation

bugclerk
Copy link
Contributor

@bugclerk bugclerk commented May 3, 2024

24.04.0 was released to the wild and we immediately got reports of increased CPU usage when browsing to the APPs web page. The UI team found that they had made a mistake by calling chart.release.query in a tight loop. While this is the "cause" of the increased CPU usage, it's not the root of why it's occurring. Further investigation showed me a cascading set of problems that built on one another.

  1. UI called chart.release.query in a tight loop
  2. chart.release.query called catalog.query
  3. catalog.query called zfs.dataset.query
  4. zfs.dataset.query is in the zfs plugin, so it gets executed in our process pool
  5. because this was happening in a tight loop, our process pool became exhausted (each process in the pool will execute 5 tasks before being torn down and a new child process fork+exec'ed again), our process pool got into a vicious cycle where we were constantly fork+exec'ing ourselves to death

The UI team has fixed their problem, but there was an underlying design issue with catalog.query. It shouldn't be calling zfs.dataset.query especially because of the information it was getting. To fix the design issue, I use the self.dataset_mounted method which does not use our process pool and is 100x faster. This should prevent the aforementioned process pool exhaustion scenario but also improve the chart.release.query (and subsequently catalog.query) calls as well.

Original PR: #13659
Jira URL: https://ixsystems.atlassian.net/browse/NAS-128752

@yocalebo yocalebo merged commit 876f1fe into master May 3, 2024
2 of 3 checks passed
@yocalebo yocalebo deleted the NAS-128752-24.10 branch May 3, 2024 19:57
@bugclerk
Copy link
Contributor Author

bugclerk commented May 3, 2024

This PR has been merged and conversations have been locked.
If you would like to discuss more about this issue please use our forums or raise a Jira ticket.

@truenas truenas locked as resolved and limited conversation to collaborators May 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
2 participants