computeRollupGas eats all the CPUs #117

arnaudbriche · 2024-01-10T16:44:22Z

System information

Erigon version: erigon version 0.02.0-unstable (docker image testinprod/op-erigon:2.48.1-0.2.0-amd64)

OS & Version: Linux

Commit hash:

Erigon Command (with flags/config): erigon --datadir=/data/op-erigon --ethash.dagdir=/data/op-erigon --snapshots=false --private.api.addr=0.0.0.0:9090 --http.addr=0.0.0.0 --http.port=8545 --http.vhosts=* --http.corsdomain=* --http.compression=true --authrpc.addr=0.0.0.0 --authrpc.port=8551 --authrpc.vhosts=* --authrpc.jwtsecret=/data/op-erigon/jwt.hex --http.api=eth,erigon,web3,net,debug,trace,txpool,engine --ws=true --ws.compression=true --db.pagesize=16KB --db.size.limit=8TB --db.read.concurrency=96 --torrent.port=42069 --port=30303 --nat=any --networkid=8453 --metrics=true --metrics.addr=0.0.0.0 --metrics.port=6060 --pprof=true --pprof.addr=0.0.0.0 --pprof.port=6061 --rpc.batch.concurrency=2 --rpc.batch.limit=10000 --rpc.returndata.limit=1048576 --nodiscover --rollup.sequencerhttp=https://mainnet-sequencer.base.org

Concensus Layer: op-node

Concensus Layer Command (with flags/config): op-node --l1=<L1_RPC_URL> --l2=http://localhost:8551 --l2.jwt-secret=/data/op-erigon/jwt.hex --rpc.addr=0.0.0.0 --rpc.port=9545 --l1.trustrpc --l1.rpckind=erigon --l1.rpc-rate-limit=0 --l1.rpc-max-batch-size=100 --l1.http-poll-interval=12s --metrics.enabled --metrics.addr=0.0.0.0 --metrics.port=6062 --rollup.config=/data/op-node/rollup.json

Chain/Network:

Expected behaviour

op-erigon should not spend most CPU time on atomic store in computeRollupGas function

Actual behaviour

Most of the CPU is spent on an atomic store in the computeRollupGas function.

Here is a pprof profile taken on the node.

profile.pb.gz

Steps to reproduce the behaviour

The node is in sync. I just ran an RPC client doing some calls at relatively modest concurrency (100 RPS, 5 concurrent calls). The calls are mostly traces calls.
The observed behaviour triggers quickly after the client starts sending requests.

The text was updated successfully, but these errors were encountered:

arnaudbriche · 2024-01-12T16:42:21Z

May be of interest: even after I stopped the RPC client for 24h, CPUs on the machine is still high and profile looks nearly the same. With no rpc traffic at all.
profile (2).pb.gz

ImTei · 2024-01-16T02:22:40Z

@arnaudbriche Are you still having this issue? or is this just one-time issue?

arnaudbriche · 2024-01-16T10:24:20Z

@ImTei I had the issue for a long time. And the issue was very easy to reproduce.
Then I tried to upgrade op-node and op-erigon and now my node is stuck.

op-node image: us-docker.pkg.dev/oplabs-tools-artifacts/images/op-node:v1.4.0
op-erigon image: testinprod/op-erigon:2.51.0-0.3.0-amd64

I can see this message in erigon logs:

[WARN] [01-16|10:20:58.080] Served conn=[::1]:48042 method=engine_forkchoiceUpdatedV2 reqid=18168 t=100.809µs err="missing withdrawals list"

And this one is op-node:

t=2024-01-16T10:23:59+0000 lvl=warn msg="Derivation process temporary error" attempts=16344 err="engine stage failed: temp: temporarily cannot insert new safe block: failed to create new block via forkchoice: unrecognized rpc error: missing withdrawals list"

ImTei · 2024-01-17T02:28:41Z

@arnaudbriche That error seems to be caused by missing canyon config. You're using the manual rollup config json when you run the op-node. Does the config has Canyon time? I recommend you to use --network=base-mainnet flag.

And can you give me an example of RPC calls to reproduce?

arnaudbriche · 2024-01-18T15:17:31Z

@ImTei Yes, that was the issue. I had to resync my node from scratch with the --network=base-mainnet flag.

Regarding my previous issue, the calls were mostly trace_block and eth_getBlockReceipts. Doing tens of these in parallel always led to the contention issue on the atomic.

ImTei · 2024-01-23T06:09:56Z

@arnaudbriche Sorry for the delayed response. Due to lack of our resources, we are unable to investigate this issue right now. We're planning to handle this issue in Q1. Please be patient.

arnaudbriche · 2024-01-29T14:23:52Z

@ImTei No problem. My node is now synced again, and running latest version. The issue still exists. Please let me know if I can help debug whenever you're working on this.

arnaudbriche · 2024-02-12T14:10:46Z

Hi @ImTei , I spend of bit of time debugging the issue and came to a solution.
I sent a PR.
The fix runs on my node for 3 days without any issue.

ImTei · 2024-02-13T00:47:58Z

@arnaudbriche Thank you for your great work! Our team will review the PR and get back to you.

arnaudbriche · 2024-06-18T16:00:52Z

Tx guys! Closing the issue.

ImTei · 2024-06-20T21:52:20Z

@arnaudbriche The fix has been released: https://github.com/testinprod-io/op-erigon/releases/tag/v2.60.1-0.6.4

arnaudbriche mentioned this issue Feb 12, 2024

fix: computeRollupGas high CPU usage under load #135

Merged

ImTei mentioned this issue Jun 17, 2024

Temporarily disable govet-copylocks linter #190

Merged

arnaudbriche closed this as completed Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

computeRollupGas eats all the CPUs #117

computeRollupGas eats all the CPUs #117

arnaudbriche commented Jan 10, 2024 •

edited

Loading

arnaudbriche commented Jan 12, 2024

ImTei commented Jan 16, 2024

arnaudbriche commented Jan 16, 2024 •

edited

Loading

ImTei commented Jan 17, 2024

arnaudbriche commented Jan 18, 2024

ImTei commented Jan 23, 2024

arnaudbriche commented Jan 29, 2024

arnaudbriche commented Feb 12, 2024

ImTei commented Feb 13, 2024

arnaudbriche commented Jun 18, 2024 •

edited

Loading

ImTei commented Jun 20, 2024

computeRollupGas eats all the CPUs #117

computeRollupGas eats all the CPUs #117

Comments

arnaudbriche commented Jan 10, 2024 • edited Loading

System information

Expected behaviour

Actual behaviour

Steps to reproduce the behaviour

arnaudbriche commented Jan 12, 2024

ImTei commented Jan 16, 2024

arnaudbriche commented Jan 16, 2024 • edited Loading

ImTei commented Jan 17, 2024

arnaudbriche commented Jan 18, 2024

ImTei commented Jan 23, 2024

arnaudbriche commented Jan 29, 2024

arnaudbriche commented Feb 12, 2024

ImTei commented Feb 13, 2024

arnaudbriche commented Jun 18, 2024 • edited Loading

ImTei commented Jun 20, 2024

arnaudbriche commented Jan 10, 2024 •

edited

Loading

arnaudbriche commented Jan 16, 2024 •

edited

Loading

arnaudbriche commented Jun 18, 2024 •

edited

Loading