-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
close all files when PostgreSQL is not running #13
Comments
Just to expand on Josh's comment, the above issues are a common bug that we repeatedly encounter and which will prevent the zone from being rebooted, leaving it in the "down" state (e.g. an operator accidentally leaving their shell cwd within a zone subdirectory). When this happens ops opens a ticket and involves an engineer who will have to manually track down the offending process and kill it before the zone can be restarted. |
We appear to have seen this again a few times with the prefaulter running inside the Manatee zone. This was on Manatee image |
Huh. When I wrote the initial version of this I apparently noticed this was missing and totally forgot about it, but filed an issue requesting the desired functionality required to get the eviction handlers to fire on a Purge event: bluele/gcache#37 I'll add this to our copy and the PR this upstream. |
See
Generate some data:
Look in the prefaulter logs:
Look at
Terminate
Generate some more data to create a backlog and somewhere in the middle of this restart
Verifying the state of the prefaulter before starting PostgreSQL:
Look at the logs to see what's there:
Check
Terminate PostgreSQL at will and when we do, check
Useful logging that is relevant:
Of note, for non-SmartOS platforms that don't potentially have
Unexpected errors by the database will result in the process exiting. Similarly, if the |
As I understand it, a version of the prefaulter having this fix was running in production today and we ran into the same problem again. |
Mantee Failover TestingTest Procedure
|
When PostgreSQL is not running, the prefaulter appears to retain a large number of open files within the PostgreSQL data directory. So that we can cleanly unmount and remount the data file system in Manatee, the prefaulter needs to close these descriptors promptly, and avoid opening any new ones, until it is determined that PostgreSQL is running again. It would also be good to make sure the current working directory of the prefaulter process never resides within the PostgreSQL data directory.
The text was updated successfully, but these errors were encountered: