-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB][Packed Columns] Corruption (yb/master/sys_catalog_writer.cc:219): Unable to initialize catalog manager: Failed to initialize sys tables async: Failed log replay. Reason: System catalog snapshot is corrupted or built using different build type: Fou #14369
Comments
@def- , Does this repro with the latest fixes? |
I’m not sure if something specific to to the old version or the new version caused the problem during upgrade. I can retry from 2.15.4.0-b54 to current master |
I did not see this failure in that upgrade, will close the issue for now. |
I have even seen this issue now on 2.17.1.0-b146 without any upgrade, just by triggering a rolling restart on my puppy-food-arm-1 LRU (all packed columns options enabled):
Initially I thought it was another instance of #14767, but the error message indicates this issue instead. |
Removed from 2.16 blockers since ycql packed columns required and this will not be GA in 2.16 yet. |
This is still failing, just FATALed my puppy-food-arm-2 universe with this after a rolling restart:
|
FATAL observed in master universe with YCQl packed columns enabled on upgrading the universe from 2.17.1.0-b323 to 2.17.2.0-b11
|
Ran into this even today with my LRU. I had YCQL packed columns enabled on my YSQL only LRU. I added the
|
Unable to repro this issue with the initial set of tests performed. Will allow the fix to soak in in the long running universes over more rolling restarts, upgrades while workloads are executing for another week before closing the issue. CC: @rthallamko3 |
We were able to see this issue again. Hence reopened. Seen in manual LRU
|
…enumeration Summary: We expect that system catalog entries always have binary value. But in rare case we could get NULL value for system catalog entry. Currently we don't know real root cause for this issue. Adding ignore_null_sys_catalog_entries to just ignore such entries (false by default) to have ability to recover clusters with such failure. Jira: DB-3792 Test Plan: Jenkins Reviewers: bogdan, qhu, rthallam Reviewed By: rthallam Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D25201
Observed this issue in my long running universe, @spolitov |
… entries during enumeration Summary: We expect that system catalog entries always have binary value. But in rare case we could get NULL value for system catalog entry. Currently we don't know real root cause for this issue. Adding ignore_null_sys_catalog_entries to just ignore such entries (false by default) to have ability to recover clusters with such failure. Orignal commit: 60e7777/D25201 Jira: DB-3792 Test Plan: Jenkins Reviewers: bogdan, qhu, rthallam Reviewed By: qhu, rthallam Subscribers: ybase Differential Revision: https://phorge.dev.yugabyte.com/D27429
Jira Link: DB-3792
Description
During an upgrade of my puppy-food-arm-1 universe from 2.15.4.0-b54 to 2.15.4.0-b72 the master server fails to come up after upgrade:
It keeps failing like this every minute when trying to start master again.
The upgrade failure is also about this master server:
This is with packed columns enabled on YSQL and YCQL, tserver and master.
I will leave the universe in the current state for further analysis, tell me when I can destroy and recreate it.
The text was updated successfully, but these errors were encountered: