Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve compatibility testing against Prometheus and TSDB. #758

Open
bwplotka opened this issue Jan 24, 2019 · 15 comments
Open

Improve compatibility testing against Prometheus and TSDB. #758

bwplotka opened this issue Jan 24, 2019 · 15 comments

Comments

@bwplotka
Copy link
Member

bwplotka commented Jan 24, 2019

Let's start with acceptance criteria for our compatibity tests:

  • I would like to know with which versions of Prometheus Thanos supports on each PR.

But what we mean by supports? We have essentially 2 points of contact:

  • TSDB format (including very low level index and metadata scheme)
  • HTTP API (api/v1/flags, api/v1/config, api/v1/label/, api/v1/read, api/v1/snapshot` etc)

Storage format can change but is versioned (index and metadata separatedly e.g index version changed somewhere between 2.0 and 2.2.1), HTTP API should not for v1 but things get added (e.g api/v1/flags was added in v2.2.1, snaphot endpoint was extended etc)

Goal: Support all minor Prometheus versions (e.g 2.0, 2.2, ... 2.7.. etc) There are expections. For example broken Prometheus releases like 2.1.x. This means that we would like to test and support to tip of minor version (e.g 2.4.3 for 2.4).

How we test this now?

Now (Before #704 PR or #730 lands, depending which will land first), our current method for testing compatibility is to perform on CI:

SUPPORTED_PROM_VERSIONS ?=v2.2.1 v2.3.2 v2.4.3 v2.5.0

@for ver in $(SUPPORTED_PROM_VERSIONS); do \
		THANOS_TEST_PROMETHEUS_PATH="prometheus-$$ver" THANOS_TEST_ALERTMANAGER_PATH="alertmanager-$(ALERTMANAGER_VERSION)" go test $(shell go list ./... | grep -v /vendor/ | grep -v /benchmark/); \
	done

This runs ALL our tests with different THANOS_TEST_PROMETHEUS_PATH var which controlled which Prometheus binary is used for our e2e tests (we have quite few of them. All tests that ends up with e2e suffix in name). This tests were fine to check if we support our common points as mentioned above.

The problems we see:

  • Upgrade of TSDB Golang dependencies (like here) blocks our ability to do any advance testing methods like injecting blocks here or here. This is because, obviously as new Promethus versions are backward compatible with old TSDB format versions, the old Prometheus versions are not forward compatible with new format.
  • We run ALL tests against different Prometheus versions using external for loop. This means:
  • What if config/flags change in some Prometheus version?

Note that upgrading TSDB and Prometheus dependencies is essential to stay up to date with fixes and recent optimizations. We reuse lots of packages.

Extra:
As a nice-to-have we would like to make sure anyone can grab TSDB block from object storage to Prometheus and use it there. This means that we need test if compactor produced block is compatible with Prometheus and if yes, with what version (aiming for just latest is fine). How to test that?

@bwplotka
Copy link
Member Author

bwplotka commented Jan 24, 2019

Proposed solution:

  • As we clearly have problem mostly in our tests and only around tooling (injecting block), I would propose adding another TSDB dependency pinned to version used by Prometheus 2.0 to be sure that those blocks will be understandable by all Prometheus versions (Prometheus will rewrite those anyway), so we can run test smoothly. We potentially can still differentiate to CreateBlock2_0 vs CreateBlock done by to different TSDBs.
  • Instead of for looping in makefile, for loop in code like here: https://github.com/improbable-eng/thanos/blob/bbd0b52490f0352d7e0fd7dabd5648dfc7717623/pkg/testutil/prometheus.go#L88 We might store supported version in code as well and add option to limit to certain version for some tests to make this explicit. That should remove all the concerns around caching, signal handling and different tests for different range of Prometheus versions. TBD: How to install Prometheus versions if list of those are in Golang code?

Any thoughts?

@jojohappy
Copy link
Member

A quick idea to solve those problems:

We might hit golang using cache all the time, because changing some environment variable is not seen by caching logic and it can assume code being not changed, thus cache being used.

Could we execute go clean -testcache first for each loop?

Something is wrong with signal handling, as some tests are green, but actually should fail

Add or operator to check testing result:
THANOS_TEST_PROMETHEUS_PATH="prometheus-$$ver" THANOS_TEST_ALERTMANAGER_PATH="alertmanager-$(ALERTMANAGER_VERSION)" go test $(shell go list ./... | grep -v /vendor/ | grep -v /benchmark/) || exit 1;

@bwplotka
Copy link
Member Author

Agree, but controlling with what version of Prometheus to run ALL tests might be not enough as some tests requires some versions by design.. But we could skip tests that has versions below of something (for code that does not need to be backward compatible) (: That would work.

@stale
Copy link

stale bot commented Jan 11, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 11, 2020
@stale stale bot closed this as completed Jan 18, 2020
@bwplotka
Copy link
Member Author

We still need that ^^ (:

@stale
Copy link

stale bot commented Apr 9, 2020

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

@stale stale bot added the stale label Apr 9, 2020
@bwplotka bwplotka removed the stale label Apr 10, 2020
@stale
Copy link

stale bot commented May 10, 2020

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label May 10, 2020
@bwplotka
Copy link
Member Author

I have some ideas that are slowly unblocked with the work on https://github.com/bwplotka/gobin (:

Otherwise no movement, but would be nice to have.

@stale stale bot removed the stale label May 11, 2020
@stale
Copy link

stale bot commented Jun 10, 2020

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jun 10, 2020
@bwplotka bwplotka removed the stale label Jun 10, 2020
@stale
Copy link

stale bot commented Jul 10, 2020

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jul 10, 2020
@stale
Copy link

stale bot commented Jul 17, 2020

Closing for now as promised, let us know if you need this to be reopened! 🤗

@stale stale bot closed this as completed Jul 17, 2020
@GiedriusS GiedriusS reopened this Sep 2, 2021
@stale stale bot removed the stale label Sep 2, 2021
@stale
Copy link

stale bot commented Jan 9, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jan 9, 2022
@GiedriusS GiedriusS removed the stale label Jan 10, 2022
@stale
Copy link

stale bot commented Apr 17, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Apr 17, 2022
@GiedriusS GiedriusS removed the stale label Apr 17, 2022
@stale
Copy link

stale bot commented Sep 21, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Sep 21, 2022
@johurul000
Copy link

@GiedriusS is this still needed? If needed should I give it a try?

@stale stale bot removed the stale label Jan 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants