You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the network history is enabled level db seems to be locked longer than usually and we randomly getting those issues:
I think it would be nice to add the wait for that file mechanism to the data-node. When network history is disabled i am not seeing this kind of error.
But i see it mostly on mainnet(probably when server is slower), where we run 13 validators & data-node
failed to create and publish segment: failed to create snapshot: failed to get data dump metadata: failed to get database version: FATAL: terminating connection due to administrator command (SQLSTATE 57P01)
failed to initialise network history:failed to create networkHistory service:failed to create network history store:failed to create index:failed to open level db file:resource temporarily unavailable
Error: maximum number of possible restarts has been reached: failed to execute binary /jenkins/workspace/common/system-tests-lnl-mainnet/networkdata/testnet/visor/visor13/current/vega [datanode node --home /jenkins/workspace/common/system-tests-lnl-mainnet/networkdata/testnet/data-node/node13]: exit status 255
Usage:
vegavisor run [flags]
Flags:
-h, --help help for run
--home string Path to visor home folder
Problem encountered
When the network history is enabled level db seems to be locked longer than usually and we randomly getting those issues:
I think it would be nice to add the wait for that file mechanism to the data-node. When network history is disabled i am not seeing this kind of error.
But i see it mostly on mainnet(probably when server is slower), where we run 13 validators & data-node
Steps to reproduce
Manual
We just propose protocol upgrade in system-tests here: https://github.com/vegaprotocol/system-tests/blob/devops-infra/1522-3/tests/LNL/extended_test.py#L311-L316
And then the visor is not able to restart the node.
Automation
Link to automation and explanation on how to run it to reproduce the problem/bug
Evidence
Logs
If applicable, add logs and/or screenshots to help explain your problem.
Additional context
Add any other context about the problem here including; system version numbers, components affected.
Definition of Done
Before Merging
After Merging
Done
if there is NO requirement for new system-testsThe text was updated successfully, but these errors were encountered: