-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Peak memory usage during initial block download causes OOM on 4GB machine #6268
Comments
I'm running an IBD with |
I stopped it at
since it had only made 1% of progress over 12 hours (because that's when it hits the sandblasting/DoS blocks). I'm currently loading the file in |
In /** Time to wait (in seconds) between writing blocks/block index to disk. */
- static const unsigned int DATABASE_WRITE_INTERVAL = 60 * 60;
+ static const unsigned int DATABASE_WRITE_INTERVAL = 5 * 60; and redo the same heaptrack test. |
i'm trying this with heaptrack (and also on the 4GB machine without heaptrack); i think you might be right on, since the current value of |
I get an OOM on the 4GB machine, and this time I get it much earlier:
|
Allow me to join you with my observations, more pairs of eyes see better. Peak memory consumption happens when first full block is being processed, after initial headers sync. At that moment this chain of calls could have happened: ProcessNewBlock() => ActivateBestChain() => FlushStateToDisk() => WriteBatchSync() for all instantiated mapBlockIndex entries (downloaded block headers). Since it's a large batch (1710080) now, larger than before headers-first fix, peak memory consumption is much more significant. |
@daira and I are looking into how Cache sizes are all derived as segments of the configured-or-default Lines 1665 to 1688 in 60a43b9
Upstream does so here (minimally different): The node then uses the in-memory coins cache to influence when flushing happens. In Lines 3695 to 3699 in 60a43b9
In upstream Bitcoin Core they also allow it to consume unused parts of the mempool size limit: https://github.com/bitcoin/bitcoin/blob/78aee0fe2ca72e7bc0f36e7479574ebc1f6a9bee/src/validation.cpp#L2325-L2355 I think that what we need to do is have a function that measures the size of the "chain index cache" (currently just the in-memory Equihash solutions), and then include that in the flush timing decisions, as part of the cache memory controlled by |
Here are the default cache sizes visible in @softminus' logs above (which sum to 450MiB, the default for
So currently by default we are allocating at most 2MiB for LevelDB to use for internal caching of the chain index, and 120MiB for equivalent caching of the chain state (UTXO set, commitment tree states, nullifier sets, history tree leaves/nodes etc). These end up defining the size of the LRU cache for LevelDB blocks (overriding the default of an 8MiB cache): Lines 21 to 26 in 60a43b9
|
The Equihash memory usage issue is only a problem for IBD; once we've reached the chain tip, Equihash solutions will be trimmed as soon as their blocks are connected. Meanwhile, due to the headers-first fix in #6231, |
@str4d wrote:
It's not quite right that the quantity we need to limit is just the total size of in-memory Equihash solutions. When we call If the batch includes Equihash solutions, then we are only trimming those after we've had the peak memory usage. This is fixable independently; we could trim solutions as we are writing the batch, by moving the call to |
Upstream added a |
Looking more closely, I see that |
Yes, that's what I was thinking: we don't need to incur the complexity of avoiding atomic writes because we can estimate in advance how much memory |
The experiment at #6276 (comment) confirms that #6231 was a regression. As I said there:
|
Imo, reverting "Headers-first fix" is an opportunistic resolution of this issue. Is there a way to leave it and control it's behavior through a config option? Let it be disabled by default, but nodes equipped with more resources will be able to utilize proper expensive checks skipping. |
I answered @miodragpop's question at #6276 (comment) . |
Upstream added code to calculate an approximation of this memory usage in bitcoin/bitcoin@e66dbde ; see also #6290. |
Fixed by #6276. |
Restoring headers-first behaviour without a memory regression is being tracked in #6292. |
The steady-state memory utilization has been decreased by #6192, but initial block download on a 4GB machine still reproducibly fails with a kernel-delivered OOM kill:
Here's the full log without the
ProcessNewBlock
s:The text was updated successfully, but these errors were encountered: