Store: Hundreds of errors in logs #1257

R4scal · 2019-06-17T07:41:47Z

Hi

I have hundreds of error in logs of store daemon, like:

Logs

{"cacheType":"Postings","caller":"cache.go:254","curSize":8587457979,"itemSize":4253828,"iterations":500,"level":"error","maxItemSizeBytes":4294967296,"maxSizeBytes":8589934592,"msg":"After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring.","ts":"2019-06-17T07:07:37.983393384Z"}

I think this is not real problem. Maybe change log level to warning?

./thanos --version
thanos, version 0.5.0 (branch: HEAD, revision: 72820b3f41794140403fd04d6da82299f2c16447)
  build user:       root@7d72e9360b09
  build date:       20190606-10:49:10
  go version:       go1.12.5

The text was updated successfully, but these errors were encountered:

bwplotka · 2019-06-17T07:56:05Z

Thanks for report!

So it is an error that we recover from, but it sugest really high preassure on posting index cache in store gateway. It's an error as in this case your cache is literally not working as nothing can be removed from it.

Are you sure this is with store gateway on 0.5.0?

If that's true then #1142 might be still not enough as a fix.

cc @GiedriusS

R4scal · 2019-06-17T09:26:41Z

Yes, store 0.5.0. I can try increase index-cache-size from 8g to 10g, but actually cache is not critical feature for us, because we use fast local s3-storage and don't have hight rps for store.

abursavich · 2019-06-20T03:12:04Z

In my v0.4.0 store (still waiting on the next prometheus-operator release before moving to v0.5.0), once the cache thinks its full and starts triggering evictions it quickly starts complaining about too many iterations. It stays in this mostly-working mode until it evicts everything (and still thinks its nearly full). After this it keeps adding and evicting (small) things for days, but the hit ratio drops to near zero.

Cache overview:

Logs (max'd y-axis cuts off millions of errors):

The logs switch from After max sane iterations of LRU evictions, we still cannot allocate the item. Ignoring. to LRU has nothing more to evict, but we still cannot allocate the item. Ignoring. once the true number of items in the cache falls to below saneMaxIterations. If you zoom in there are still blips of "iteration" errors as the transition occurs.

Zoomed in on transition:

My hypothesis is that the saneMaxIterations value of 500 is too low due to the cache having a wide range of item sizes. For the same store as above:

Postings average size is ~30KB (x500 = ~15MB)
Series average size is ~100 bytes (x500 = ~50KB)
The overall average size is ~4KB (x500 = ~2MB)

I have "iteration" error logs when trying to insert items ranging from ~130KB to ~13MB (most are between ~1.5MB and ~3.5MB).

Unless "nothing more to evict" starts appearing in v0.5.0 logs, I think the issue is the iteration restriction.

Average item sizes:

bwplotka · 2019-06-24T21:07:28Z

That is very plausible! nice.

bwplotka added bug component: store difficulty: hard help wanted labels Jun 17, 2019

abursavich mentioned this issue Jun 24, 2019

store/cache: remove iteration limit and reset on internal inconsistency #1274

Merged

bwplotka closed this as completed in #1274 Jun 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store: Hundreds of errors in logs #1257

Store: Hundreds of errors in logs #1257

R4scal commented Jun 17, 2019

bwplotka commented Jun 17, 2019

R4scal commented Jun 17, 2019

abursavich commented Jun 20, 2019 •

edited

Loading

bwplotka commented Jun 24, 2019

Store: Hundreds of errors in logs #1257

Store: Hundreds of errors in logs #1257

Comments

R4scal commented Jun 17, 2019

bwplotka commented Jun 17, 2019

R4scal commented Jun 17, 2019

abursavich commented Jun 20, 2019 • edited Loading

bwplotka commented Jun 24, 2019

abursavich commented Jun 20, 2019 •

edited

Loading