Skip to content

panic: runtime error: index out of range in pyroscope/pkg/util/loser.(*Tree[...]).Winner(...) #3917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Mrucznik opened this issue Feb 14, 2025 · 4 comments
Labels
storage Low level storage matters type/bug Something isn't working

Comments

@Mrucznik
Copy link

Describe the bug

pyroscope_1 | panic: runtime error: index out of range [-1]
pyroscope_1 |
pyroscope_1 | goroutine 2007 [running]:
pyroscope_1 | github.com/grafana/pyroscope/pkg/util/loser.(*Tree[...]).Winner(...)
pyroscope_1 | github.com/grafana/pyroscope/pkg/util/loser/tree.go:99
pyroscope_1 | github.com/grafana/pyroscope/pkg/iter.(*TreeIterator[...]).Err(0x10?)
pyroscope_1 | github.com/grafana/pyroscope/pkg/iter/tree.go:25 +0x4d
pyroscope_1 | github.com/grafana/pyroscope/pkg/parquet.(*IteratorRowReader).ReadRows(0xc128eefbe0, {0xc024c19c08, 0x40, 0xc0029dd440?})
pyroscope_1 | github.com/grafana/pyroscope/pkg/parquet/row_reader.go:86 +0x172
pyroscope_1 | github.com/grafana/pyroscope/pkg/parquet.CopyAsRowGroups({0x3878b88, 0xc07db0bb60}, {0x386f920, 0xc128eefbe0}, 0x186a0)
pyroscope_1 | github.com/grafana/pyroscope/pkg/parquet/row_writer.go:32 +0xea
pyroscope_1 | github.com/grafana/pyroscope/pkg/phlaredb.(*profileStore).writeRowGroups(0xc031236b40, {0xc10592edc0?, 0x4f31140?}, {0xc0029dd350, 0x3, 0x484302d8?})
pyroscope_1 | github.com/grafana/pyroscope/pkg/phlaredb/profile_store.go:383 +0x21a
pyroscope_1 | github.com/grafana/pyroscope/pkg/phlaredb.(*profileStore).Flush(0xc031236b40, {0x3888f10, 0x4f31140})
pyroscope_1 | github.com/grafana/pyroscope/pkg/phlaredb/profile_store.go:186 +0x317
pyroscope_1 | github.com/grafana/pyroscope/pkg/phlaredb.(*Head).flush(0xc084d24140, {0x3888f10, 0x4f31140})
pyroscope_1 | github.com/grafana/pyroscope/pkg/phlaredb/head.go:563 +0x234
pyroscope_1 | github.com/grafana/pyroscope/pkg/phlaredb.(*Head).Flush(0xc084d24140, {0x3888f10, 0x4f31140})
pyroscope_1 | github.com/grafana/pyroscope/pkg/phlaredb/head.go:540 +0xd9
pyroscope_1 | github.com/grafana/pyroscope/pkg/phlaredb.(*PhlareDB).Flush.func1(0xc084d24140, 0x7f8509a673c8?)
pyroscope_1 | github.com/grafana/pyroscope/pkg/phlaredb/phlaredb.go:240 +0xb4
pyroscope_1 | github.com/samber/lo.Filter[...]({0xc10def6b00, 0x1, 0x7}, 0xc018bcbde0?)
pyroscope_1 | github.com/samber/lo@v1.38.1/slice.go:15 +0x9f
pyroscope_1 | github.com/grafana/pyroscope/pkg/phlaredb.(*PhlareDB).Flush(0xc003ddeb40, {0x3888f10, 0x4f31140}, 0x0, {0x28a5490, 0xc})
pyroscope_1 | github.com/grafana/pyroscope/pkg/phlaredb/phlaredb.go:238 +0x932
pyroscope_1 | github.com/grafana/pyroscope/pkg/phlaredb.(*PhlareDB).loop(0xc003ddeb40)
pyroscope_1 | github.com/grafana/pyroscope/pkg/phlaredb/phlaredb.go:188 +0x28f
pyroscope_1 | created by github.com/grafana/pyroscope/pkg/phlaredb.New in goroutine 1883
pyroscope_1 | github.com/grafana/pyroscope/pkg/phlaredb/phlaredb.go:137 +0x476

To Reproduce

Steps to reproduce the behavior:

Don't know, didn't investigate that much.

Expected behavior

No panic.

Environment

  • Infrastructure: bare-metal, docker on Linux 3.10.0-1062.9.1.el7.x86_64 CentOS Linux 7
  • Deployment tool: docker-compose

Additional Context

Pyroscope verison is grafana/pyroscope:1.12.0 run with -target=all. Mostly standard configuration (beside limits), using s3 as data store.

@kolesnikovae
Copy link
Collaborator

Hi @Mrucznik,

Thank you for reporting the issue! Could you please clarify whether the problem occurs persistently or if it happened for the first time and is not reproducible?

I suspect there might be a silent or unhandled filesystem failure which we do not handle correctly on our end. Could you please specify how the volume is mounted, which filesystem is being used, and whether there have been any issues, such as running out of space or similar problems (one indication of this could be the "cleaned files after high disk utilization" message in the log)?

@kolesnikovae kolesnikovae added type/bug Something isn't working storage Low level storage matters labels Feb 14, 2025
@Mrucznik
Copy link
Author

Mrucznik commented Feb 14, 2025

The pyroscope was working without problems for about a month, panics started happening yesterday and happened 3 times so far. The filesystem is xfs.

This is my docker-compose configuration:

version: '3.3'

services:
  pyroscope:
    image: "grafana/pyroscope:1.12.0"
    ports:
      - "7070:4040" # http
      - "7071:9095" # grpc
    volumes:
      - ./config.yaml:/etc/pyroscope/config.yaml:ro
      - ./data:/var/lib/pyroscope
    command: ["-target=all", "-config.file=/etc/pyroscope/config.yaml"]

I didn't find "cleaned files after high disk utilization" in the logs.
Are you suggesting, that it could happen due to running out of space on the disk? Currently, I don't see a problem with that, but maybe when the docker container gets killed, the files get deleted. I will take a look at that.

@Mrucznik
Copy link
Author

Mrucznik commented Feb 21, 2025

Today I have found very high usage of RAM by pyroscope and some logs

Image
(inuse space profile)

pyroscope_1  | ts=2025-02-21T10:27:43.162488147Z caller=compactor.go:646 level=error component=compactor component=compactor msg="failed to compact user blocks" tenant=anonymous err="compaction: group 0@17241709254077376921-merge--1740038400000-1740042000000: compact blocks [data-compactor/compact/0@17241709254077376921-merge--1740038400000-1740042000000/01JMH5XT1EWPPKJ5A2TVBVH2J7 data-compactor/compact/0@17241709254077376921-merge--1740038400000-1740042000000/01JMHADR4M3EE9B7EPVKZRR2J9]: compact blocks [data-compactor/compact/0@17241709254077376921-merge--1740038400000-1740042000000/01JMH5XT1EWPPKJ5A2TVBVH2J7 data-compactor/compact/0@17241709254077376921-merge--1740038400000-1740042000000/01JMHADR4M3EE9B7EPVKZRR2J9]: decoding page 19 of column \"Samples.list.element.StacktraceID\": decoding definition levels of data page v2: unexpected EOF"

@kolesnikovae
Copy link
Collaborator

Thanks for the details @Mrucznik!

The log message may indicate a file system error (I'd expect to see a CRC error in this case, however).

Could you please post the configuration you use and tell us more about the setup? If you could also specify what profilers you're using and the ingestion rate (the size and the number of profiles sent to pyroscope), that would be very helpful.

The profile fragment may suggest that overly large blocks get compacted, or too many stack traces are stored in blocks. In the meantime, without the full profile, I can't conclude anything – as far as I understand, this is a heap alloc_space profile collected over a period of time: sum of all allocations, included freed ones. You probably want to check inuse_space profile collected over a very short period – ideally, just a single profile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
storage Low level storage matters type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants