You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When looking at cluster initialization during any of the e2e tests, one can see the following errors in the init-job-op-archive pod. They get retried eventually, but the backoff duration increases the length of the already very slow tests.
achulkov2@nebius-yt-dev:~$ kubectl logs yt-scheduler-init-job-op-archive-btqb6 -nquerytrackeraco
++ export YT_DRIVER_CONFIG_PATH=/config/client.yson
++ YT_DRIVER_CONFIG_PATH=/config/client.yson
+++ /usr/bin/ytserver-all --version
+++ head -c4
++ export YTSAURUS_VERSION=23.1
++ YTSAURUS_VERSION=23.1
++ /usr/bin/init_operation_archive --force --latest --proxy http-proxies.querytrackeraco.svc.cluster.local
2024-01-10 19:37:20,124 - INFO - Transforming archive from 48 to 48 version
2024-01-10 19:37:20,134 - INFO - Mounting table //sys/operations_archive/jobs
Traceback (most recent call last):
File "/usr/bin/init_operation_archive", line 749, in <module>
main()
File "/usr/bin/init_operation_archive", line 744, in main
force=args.force,
File "/usr/bin/init_operation_archive", line 731, in run
transform_archive(client, next_version, target_version, force, archive_path, shard_count=shard_count)
File "/usr/bin/init_operation_archive", line 639, in transform_archive
mount_table(client, path)
File "/usr/bin/init_operation_archive", line 55, in mount_table
client.mount_table(path, sync=True)
File "/usr/local/lib/python3.7/dist-packages/yt/wrapper/client_impl_yandex.py", line 1394, in mount_table
freeze=freeze, sync=sync, target_cell_ids=target_cell_ids)
File "/usr/local/lib/python3.7/dist-packages/yt/wrapper/dynamic_table_commands.py", line 524, in mount_table
response = make_request("mount_table", params, client=client)
File "/usr/local/lib/python3.7/dist-packages/yt/wrapper/driver.py", line 126, in make_request
client=client)
File "<decorator-gen-3>", line 2, in make_request
File "/usr/local/lib/python3.7/dist-packages/yt/wrapper/common.py", line 422, in forbidden_inside_job
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/yt/wrapper/http_driver.py", line 301, in make_request
client=client)
File "/usr/local/lib/python3.7/dist-packages/yt/wrapper/http_helpers.py", line 455, in make_request_with_retries
return RequestRetrier(method=method, url=url, **kwargs).run()
File "/usr/local/lib/python3.7/dist-packages/yt/wrapper/retries.py", line 79, in run
return self.action()
File "/usr/local/lib/python3.7/dist-packages/yt/wrapper/http_helpers.py", line 410, in action
_raise_for_status(response, request_info)
File "/usr/local/lib/python3.7/dist-packages/yt/wrapper/http_helpers.py", line 290, in _raise_for_status
raise error_exc
yt.common.YtResponseError: Error committing transaction 1-44d-10001-b753
Error committing transaction 1-44d-10001-b753 at cell 65726e65-ad6b7562-10259-79747361
No healthy tablet cells in bundle "sys"
***** Details:
Received HTTP response with error
origin yt-scheduler-init-job-op-archive-btqb6 on 2024-01-10T19:37:20.203965Z
url http://http-proxies.querytrackeraco.svc.cluster.local/api/v4/mount_table
request_headers {
"User-Agent": "Python wrapper 0.13-dev-5f8638fc66f6e59c7a06708ed508804986a6579f",
"Accept-Encoding": "gzip, identity",
"X-Started-By": "{\"pid\"=17;\"user\"=\"root\";}",
"X-YT-Header-Format": "<format=text>yson",
"Content-Type": "application/x-yt-yson-text",
"X-YT-Correlation-Id": "d71f4e98-4f2880b3-9213c0d0-9a5a9336"
}
response_headers {
"Content-Length": "1242",
"X-YT-Response-Message": "Error committing transaction 1-44d-10001-b753",
"X-YT-Response-Code": "1",
"X-YT-Response-Parameters": {},
"X-YT-Trace-Id": "c0235705-98e9c7a-369cf397-97d28dd7",
"X-YT-Error": "{\"code\":1,\"message\":\"Error committing transaction 1-44d-10001-b753\",\"attributes\":{\"host\":\"hp-0.http-proxies.querytrackeraco.svc.cluster.local\",\"pid\":1,\"tid\":12837479201307132255,\"fid\":18446447647636925386,\"datetime\":\"2024-01-10T19:37:20.202367Z\",\"trace_id\":\"c0235705-98e9c7a-369cf397-97d28dd7\",\"span_id\":1636727892750608515,\"cluster_id\":\"Native(Name=test-ytsaurus)\",\"path\":\"//sys/operations_archive/jobs\"},\"inner_errors\":[{\"code\":1,\"message\":\"Error committing transaction 1-44d-10001-b753 at cell 65726e65-ad6b7562-10259-79747361\",\"attributes\":{\"host\":\"hp-0.http-proxies.querytrackeraco.svc.cluster.local\",\"pid\":1,\"tid\":12837479201307132255,\"fid\":18446447647636925386,\"datetime\":\"2024-01-10T19:37:20.202206Z\",\"trace_id\":\"c0235705-98e9c7a-369cf397-97d28dd7\",\"span_id\":1636727892750608515},\"inner_errors\":[{\"code\":1,\"message\":\"No healthy tablet cells in bundle \\\"sys\\\"\",\"attributes\":{\"request_id\":\"dc5643d9-124e57a5-cf4b0583-8753d056\",\"connection_id\":\"6b2e13-a3e8b3e0-314a5f40-69069dfd\",\"verification_mode\":\"none\",\"realm_id\":\"65726e65-ad6b7562-10259-79747361\",\"timeout\":30000,\"method\":\"CommitTransaction\",\"address\":\"ms-0.masters.querytrackeraco.svc.cluster.local:9010\",\"encryption_mode\":\"optional\",\"service\":\"TransactionSupervisorService\"}}]}]}",
"X-YT-Request-Id": "93a09617-71caa1ec-cbfe7e46-922f5a1f",
"Content-Type": "application/json",
"Cache-Control": "no-store",
"X-YT-Proxy": "hp-0.http-proxies.querytrackeraco.svc.cluster.local",
"Authorization": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
params {
"suppress_transaction_coordinator_sync": false,
"path": "//sys/operations_archive/jobs",
"freeze": false,
"mutation_id": "124ef88f-86123fd-62afd823-512f2084",
"retry": false
}
transparent True
Error committing transaction 1-44d-10001-b753
origin hp-0.http-proxies.querytrackeraco.svc.cluster.local on 2024-01-10T19:37:20.202367Z (pid 1, tid b227e3515560815f, fid fffef266ed3d2bca)
trace_id c0235705-98e9c7a-369cf397-97d28dd7
span_id 1636727892750608515
cluster_id Native(Name=test-ytsaurus)
path //sys/operations_archive/jobs
Error committing transaction 1-44d-10001-b753 at cell 65726e65-ad6b7562-10259-79747361
origin hp-0.http-proxies.querytrackeraco.svc.cluster.local on 2024-01-10T19:37:20.202206Z (pid 1, tid b227e3515560815f, fid fffef266ed3d2bca)
trace_id c0235705-98e9c7a-369cf397-97d28dd7
span_id 1636727892750608515
No healthy tablet cells in bundle "sys"
origin yt-scheduler-init-job-op-archive-btqb6 on 2024-01-10T19:37:20.204007Z
request_id dc5643d9-124e57a5-cf4b0583-8753d056
connection_id 6b2e13-a3e8b3e0-314a5f40-69069dfd
verification_mode none
realm_id 65726e65-ad6b7562-10259-79747361
timeout 30000
method CommitTransaction
address ms-0.masters.querytrackeraco.svc.cluster.local:9010
encryption_mode optional
service TransactionSupervisorService
We should wait for the tablet cells to be healthy before running the init job.
The text was updated successfully, but these errors were encountered:
When looking at cluster initialization during any of the e2e tests, one can see the following errors in the
init-job-op-archive
pod. They get retried eventually, but the backoff duration increases the length of the already very slow tests.We should wait for the tablet cells to be healthy before running the init job.
The text was updated successfully, but these errors were encountered: