Skip to content
This repository has been archived by the owner on May 6, 2022. It is now read-only.

Initial sync may block with CPU maxed out #6

Closed
tdiesler opened this issue Dec 30, 2020 · 3 comments
Closed

Initial sync may block with CPU maxed out #6

tdiesler opened this issue Dec 30, 2020 · 3 comments
Labels
bug Something isn't working
Milestone

Comments

@tdiesler
Copy link
Owner

tdiesler commented Dec 30, 2020

On arm64 (i.e. RaspberryPi) you may see this ...

  • Initial syncing stops
  • CPU is at 200%
  • Memory < 10%

This looks like a tight endless loop or perhaps some I/O condition that cannot be recovered from.
With an even slower USB-A storage device, this condition showed up almost immediately.

On x86_64 this works fine.

CrossRef: IntersectMBO/cardano-node#2251

@tdiesler tdiesler added the bug Something isn't working label Dec 30, 2020
@tdiesler
Copy link
Owner Author

Oh dear, it shows with the external drive as well

[cdrelay:cardano.node.ChainDB:Info:34534] [2020-12-31 17:36:49.20 UTC] before next, messages elided = 157994328664733
[cdrelay:cardano.node.ChainDB:Info:34534] [2020-12-31 17:36:49.20 UTC] Valid candidate 29e33c21b1d02a39e7dc3ccf87c726c69c4fd194fccf5f999d81d300edef82f3 at slot 2087260
[cdrelay:cardano.node.ChainDB:Notice:35] [2020-12-31 17:36:49.21 UTC] Chain extended, new tip: 29e33c21b1d02a39e7dc3ccf87c726c69c4fd194fccf5f999d81d300edef82f3 at slot 2087260
[cdrelay:cardano.node.ChainDB:Notice:35] [2020-12-31 17:36:49.22 UTC] Chain extended, new tip: bd92b30513d6d641fc3abb003582b5ed283e68c19b727b83351302f1737c4944 at slot 2087261
[cdrelay:cardano.node.Mempool:Info:34538] [2020-12-31 17:36:49.26 UTC] fromList [("tx",Object (fromList [("txid",String "txid: TxId {_unTxId = \"deba7b6e4266c4fd7122f313d521efa5ce905defa33a108db0369a987926114a\"}")])),("kind",String "TraceMempoolRejectedTx"),("mempoolSize",Object (fromList [("numTxs",Number 0.0),("bytes",Number 0.0)])),("err",Object (fromList [("txEra",String "Shelley"),("kind",String "HardForkApplyTxErrWrongEra"),("currentEra",String "Byron")]))]
[cdrelay:cardano.node.Mempool:Info:34538] [2020-12-31 17:36:49.26 UTC] fromList [("tx",Object (fromList [("txid",String "txid: TxId {_unTxId = \"873ba2c802f471f5f54758875e1dfc68986324dc4c08f00ed344fbce1e64c85d\"}")])),("kind",String "TraceMempoolRejectedTx"),("mempoolSize",Object (fromList [("numTxs",Number 0.0),("bytes",Number 0.0)])),("err",Object (fromList [("txEra",String "Shelley"),("kind",String "HardForkApplyTxErrWrongEra"),("currentEra",String "Byron")]))]
[cdrelay:cardano.node.Mempool:Info:34538] [2020-12-31 17:36:49.26 UTC] fromList [("tx",Object (fromList [("txid",String "txid: TxId {_unTxId = \"b811744db11bf595c054ae1c978f23d48f7d6bfb31511d228fc3218bf85ba144\"}")])),("kind",String "TraceMempoolRejectedTx"),("mempoolSize",Object (fromList [("numTxs",Number 0.0),("bytes",Number 0.0)])),("err",Object (fromList [("txEra",String "Shelley"),("kind",String "HardForkApplyTxErrWrongEra"),("currentEra",String "Byron")]))]
CONTAINER ID   NAME      CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O        PIDS
fcddd2457003   relay     199.49%   788.5MiB / 7.633GiB   10.09%    3.54GB / 61.3MB   53.9MB / 168kB   27

@tdiesler tdiesler reopened this Dec 31, 2020
@tdiesler tdiesler changed the title Initial sync on RaspberryPi may lock with CPU maxed out Initial sync may block with CPU maxed out Jan 4, 2021
@tdiesler
Copy link
Owner Author

tdiesler commented Jan 4, 2021

If the node is given the block data from elsewhere it seems to run fine afterwards.

@Scalextrix
Copy link

Scalextrix commented Mar 11, 2021

I compiled version 1.25.1 from source on Pi4 8GB OS Ubuntu Server 64bit and am seeing similar issue. Compiled with cabal-install 3.4.0.0 and ghc 8.10.2, the node runs and starts to sync, after a period of about 20 minutes cardano-cli query tip --mainnet becomes unresponsive. Running cardano-node in the terminal reveals the sync appears to stop, but the node is still running full tilt, left overnight consumed 80% of RAM but no additional blocks written to disk. Restarting the cardano-node resumes sync until it fails again. cardano-cli did error on socket exhaustion one time, but cant recreate that condition reliably.

May be a localhost socket exhaustion issue?

tdiesler added a commit that referenced this issue Mar 14, 2021
[resolves #7] Upgrade to GHC-8.10.4
[resolves #10] Update to Cabal-3.4.0.0
@tdiesler tdiesler added this to the v1.25.1-rev3 milestone Mar 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants