-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] [Upgrade] Rolling Upgrade - Invalid argument: Index 13 does not reference a valid sidecar #21229
Closed
1 task done
Labels
2024.1_blocker
area/ysql
Yugabyte SQL (YSQL)
kind/bug
This issue is a bug
priority/highest
Highest priority issue
qa_itest-system
Bugs identified in itest-system automation
Comments
karthik-ramanathan-3006
added a commit
that referenced
this issue
Feb 29, 2024
…ed to output of EXPLAIN(ANALYZE, DIST)." Summary: This reverts commit b5c632c. The addition of new fields to the `PgsqlResponsePB` proto seems to have introduced upgrade failures in the 2.14/2.16 --> 2.20.2. Reverting this change until this issue can be analyzed and fixed. Jira: DB-10156, DB-569 Test Plan: Jenkins Reviewers: telgersma Reviewed By: telgersma Subscribers: aaruj, mihnea, smishra, yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D32727
karthik-ramanathan-3006
added a commit
that referenced
this issue
Mar 18, 2024
…ed to output of EXPLAIN(ANALYZE, DIST)." Summary: This reverts commit b5c632c. The addition of new fields to the PgsqlResponsePB proto seems to have introduced upgrade failures in the 2.14/2.16 --> 2.20.X path. Reverting this change until this issue can be analyzed and fixed. D32727 reverted the change in branch 2.20.2. This revision reverts the change in mainline 2.20 so that subsequent 2.20.X will not have this change by default. Jira: DB-10156, DB-569 Test Plan: Jenkins Reviewers: telgersma, smishra Reviewed By: smishra Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D33214
karthik-ramanathan-3006
added a commit
that referenced
this issue
Mar 26, 2024
Summary: It has been reported that upgrades from 2.14/2.16 to 2.20 (and beyond) fail due to pggate going into a crash loop upon unpacking `PgsqlResponsePB`. This is caused due to the introduction of the 'Scanned Rows' field in 2.20+ (D31111) which is sent in its own RPC metrics sidecar. Versions of pggate lower than 2.17.1 are not capable of unpacking response protos that contain RPC sidecars holding data other than the rows returned by DocDB. During an upgrade, while an un-upgraded pggate may send a request only to its local un-upgraded tserver, it may have responses proxied back from upgraded tservers on other nodes. Thus, the RPC infrastructure needs to be forward compatible in order to ensure that pggate is not broken during upgrades. This revision introduces a guardrail to check that the receiving pggate is capable of unpacking the RPC metrics sidecar before sending the 'Scanned Rows' count. Required backports: 2.20 (original diff + this fix), 2024.1 (only this fix), 2.21 (if needed, depending on branching) Jira: DB-10156 Test Plan: Run rolling upgrade itest from 2.14/2.16 to master, 2.20.x, 2.21.x, 2024.1 Reviewers: hsunder, esheng, sergei Reviewed By: hsunder Subscribers: ybase, yql, mihnea, smishra Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D33503
karthik-ramanathan-3006
added a commit
that referenced
this issue
Mar 29, 2024
…ding scanned rows. Summary: Original commit: 80dc997 / D33503 It has been reported that upgrades from 2.14/2.16 to 2.20 (and beyond) fail due to pggate going into a crash loop upon unpacking `PgsqlResponsePB`. This is caused due to the introduction of the 'Scanned Rows' field in 2.20+ (D31111) which is sent in its own RPC metrics sidecar. Versions of pggate lower than 2.17.1 are not capable of unpacking response protos that contain RPC sidecars holding data other than the rows returned by DocDB. During an upgrade, while an un-upgraded pggate may send a request only to its local un-upgraded tserver, it may have responses proxied back from upgraded tservers on other nodes. Thus, the RPC infrastructure needs to be forward compatible in order to ensure that pggate is not broken during upgrades. This revision introduces a guardrail to check that the receiving pggate is capable of unpacking the RPC metrics sidecar before sending the 'Scanned Rows' count. Required backports: 2.20 (original diff + this fix), 2024.1 (only this fix), 2.21 (if needed, depending on branching) Jira: DB-10156 Test Plan: Run rolling upgrade itest from 2.14/2.16 to master, 2.20.x, 2.21.x, 2024.1 Reviewers: hsunder, esheng, sergei Reviewed By: hsunder Subscribers: smishra, mihnea, yql, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D33550
karthik-ramanathan-3006
added a commit
that referenced
this issue
Apr 2, 2024
…upgrade fix Summary: This revision re-enables the 'Storage Rows Scanned' for `EXPLAIN (ANALYZE, DIST)` functionality in branch 2.20 after applying the upgrade related fix introduced in 80dc997 (D33503). This 'Storage Rows Scanned' functionality was originally introduced as part of commit b5c632c (D31931) on 2.20 and reverted as part of commit a54db61 (D32727). Jira: DB-10156, DB-569 Test Plan: Run the following tests to validate the 'Storage Rows Scanned' functionality. ``` ./yb_build.sh ---java-test org.yb.pgsql.TestPgExplainAnalyze ./yb_build.sh --java-test org.yb.pgsql.TestPgExplainAnalyzeColocated ./yb_build.sh --java-test org.yb.pgsql.TestPgExplainAnalyzeScans#testIndexScanConditionAndFilter ``` To validate the upgrade pathways, run rolling upgrade itest from 2.14/2.16 to 2.20.x. Reviewers: telgersma, hsunder, smishra Reviewed By: telgersma Subscribers: mihnea, ybase, yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D33663
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
2024.1_blocker
area/ysql
Yugabyte SQL (YSQL)
kind/bug
This issue is a bug
priority/highest
Highest priority issue
qa_itest-system
Bugs identified in itest-system automation
Jira Link: DB-10156
Description
During rolling upgrade automation run on 2.20.2.0-b143 we are getting this error:
connection to server at "10.9.215.60", port 5433 failed: FATAL: Invalid argument: Index 13 does not reference a valid sidecar
It has started from b126: https://jenkins.dev.yugabyte.com/job/itest-system-developer/10512/
It is not happening during any query run but probably during trying to connect to node.
In Postgres logs, we can see this error:
This issue in not seen in b125 run: https://jenkins.dev.yugabyte.com/job/itest-system-developer/10509/
This is seen when upgrading from 2.14 or 2.16 to 2.20.2.0.
Logs: https://drive.google.com/file/d/14-q-BgJE1mk0CKrDSqZ2TbwgNTtYe3dJ/view?usp=sharing
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: