Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to fetch data after compaction #1983

Closed
nextus opened this issue Jan 10, 2020 · 4 comments · Fixed by #1985
Closed

Unable to fetch data after compaction #1983

nextus opened this issue Jan 10, 2020 · 4 comments · Fixed by #1985
Labels

Comments

@nextus
Copy link

nextus commented Jan 10, 2020

Thanos, Prometheus and Golang version used:

thanos, version 0.9.0 (branch: HEAD, revision: 0833cad83db8b257a2275ec83a3d034c73659056)
  build user:       root@e65730470597
  build date:       20191203-16:50:47
  go version:       go1.13.1

Object Storage Provider:

AWS S3

What happened:
Can't access to the historical data from the object storage after compaction (looks similar to #552 and #146):
Mint: 1571750427740 Maxt: 1578584483987: rpc error: code = Aborted desc = fetch series for block 01DY61NKQNEYFM6BFPDJXZRPQX: preload series: invalid remaining size 65536, expected 71774
Furthermore, it seems persistent for specific workload: prometheus scrapes node_exporter metrics from the hosts. Right now I have three l4 blocks (Nov 15 - Nov 28, Nov 28 - Dec 12, Dec 12 - Dec 26) and several l3 (Dec 26 - Jan 07). After compaction, I still can fetch historical data from the former block (Nov 15 - Nov 28), but not from the others. Query fails with this response from the Store:

return errors.Errorf("invalid remaining size %d, expected %d", len(c), n+int(l))

I have two replicas but the problem exists for each of them. The reason why old compacted block accessible could be in some bug introduced in 0.8 or 0.9 releases: I updated Thanos version for all components on 11 Dec.

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Full logs to relevant components:

There is Compactor logs related to one corrupted block (nothing useful actually):

Jan 10 00:14:56 compactor thanos[17493]: level=info ts=2020-01-10T00:14:56.157049362Z caller=compact.go:441 msg="compact blocks" count=7 mint=1577318400000 maxt=1578528000000 ulid=01DY61NKQNEYFM6BFPDJXZRPQX sources="[01DX6EP3TY2ZP2ENWTSFK1NM7M 01DXAHQ4BMY4NHR1GMZW9H43FJ 01DXFQ0SZM4ZT71PF462D845HA 01DXMV9EJZYZRF4M6Q3FNPFPR3 01DXSZXCV9AEHWY3SSP5A2X1NY 01DXZ4S3GH5FQ94G5S9DV71KT3 01DY49RC1BSSA7KRSZZSBF3XXF]" duration=3h15m32.817318757s
Jan 10 01:56:46 compactor thanos[17493]: level=info ts=2020-01-10T01:56:46.679141025Z caller=compact.go:988 compactionGroup=0@17441958918695203419 msg="deleting compacted block" old_block=01DX6EP3TY2ZP2ENWTSFK1NM7M
Jan 10 01:56:48 compactor thanos[17493]: level=info ts=2020-01-10T01:56:48.879983109Z caller=compact.go:988 compactionGroup=0@17441958918695203419 msg="deleting compacted block" old_block=01DXAHQ4BMY4NHR1GMZW9H43FJ
Jan 10 01:56:51 compactor thanos[17493]: level=info ts=2020-01-10T01:56:51.015043604Z caller=compact.go:988 compactionGroup=0@17441958918695203419 msg="deleting compacted block" old_block=01DXFQ0SZM4ZT71PF462D845HA
Jan 10 01:56:53 compactor thanos[17493]: level=info ts=2020-01-10T01:56:53.26142969Z caller=compact.go:988 compactionGroup=0@17441958918695203419 msg="deleting compacted block" old_block=01DXMV9EJZYZRF4M6Q3FNPFPR3
Jan 10 01:56:55 compactor thanos[17493]: level=info ts=2020-01-10T01:56:55.416314035Z caller=compact.go:988 compactionGroup=0@17441958918695203419 msg="deleting compacted block" old_block=01DXSZXCV9AEHWY3SSP5A2X1NY
Jan 10 01:56:57 compactor thanos[17493]: level=info ts=2020-01-10T01:56:57.586528832Z caller=compact.go:988 compactionGroup=0@17441958918695203419 msg="deleting compacted block" old_block=01DXZ4S3GH5FQ94G5S9DV71KT3
Jan 10 01:57:00 compactor thanos[17493]: level=info ts=2020-01-10T01:57:00.033471227Z caller=compact.go:988 compactionGroup=0@17441958918695203419 msg="deleting compacted block" old_block=01DY49RC1BSSA7KRSZZSBF3XXF

Anything else we need to know:

@bwplotka bwplotka added the bug label Jan 10, 2020
@bwplotka
Copy link
Member

Just checked and it is a valid bug. No one expected series size to be larger than this. Rationales: #146 (comment)

We need to handle this case for you. This was from 0.1.0 version so it's not a release 0.9.0 issue. We need to either increase it for you or allow re-fetching. Let me try to put a PR for it.

bwplotka added a commit that referenced this issue Jan 10, 2020
Fixes: #1983

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
@bwplotka
Copy link
Member

Fix: #1985

bwplotka added a commit that referenced this issue Jan 10, 2020
Fixes: #1983

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
@nextus
Copy link
Author

nextus commented Jan 10, 2020

Thanks for the fast response! Will try it tomorrow.

@nextus
Copy link
Author

nextus commented Jan 11, 2020

Have just tried and it works smoothly.
So there is a new counter for keeping track this kind of event:
thanos_bucket_store_series_refetches_total

Looking forward to it in the new release!

bwplotka added a commit that referenced this issue Jan 11, 2020
Fixes: #1983

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Jan 12, 2020
Fixes: #1983

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
brancz pushed a commit to brancz/objstore that referenced this issue Jan 28, 2022
Fixes: thanos-io/thanos#1983

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants