Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added more metadata to Series tracing; Improved tracing docs, tests; Added example. #4619

Merged
merged 3 commits into from Sep 1, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Binary file added docs/img/tracing.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/tracing2.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
32 changes: 30 additions & 2 deletions docs/tracing.md
Expand Up @@ -10,7 +10,7 @@ You can either pass YAML file defined below in `--tracing.config-file` or pass t

Don't be afraid of multiline flags!

In Kubernetes it is as easy as (on Thanos sidecar example):
In Kubernetes it is as easy as (using Thanos sidecar example):

```yaml
- args:
Expand Down Expand Up @@ -38,9 +38,37 @@ In Kubernetes it is as easy as (on Thanos sidecar example):

At that point, anyone can use your provider by spec.

See [this issue](https://github.com/thanos-io/thanos/issues/1972) to check our progress on moving to OpenTelemetry Go client library.

## Usage

Once tracing is enabled and sampling per backend is configured, Thanos will generate traces for all gRPC and HTTP APIs thanks to generic "middlewares". Some more interesting to observe APIs like `query` or `query_range` have more low-level spans with focused metadata showing latency for important functionalities. For example, Jaeger view of HTTP query_range API call might look as follows:

![view](img/tracing2.png)

As you can see it contains both HTTP request and spans around gRPC request, since [Querier](components/query.md) calls gRPC services to get fetch series data.

Each Thanos component generates spans related to its work and sends them to central place e.g Jaeger or OpenTelemetry collector. Such place is then responsible to tie all spans to a single trace, showing a request execution path.

### Obtaining Trace ID

Single trace is tied to a single, unique request to the system and is composed of many spans from different components. Trace is identifiable using `Trace ID`, which is a unique hash e.g `131da78f02aa3525`. This information can be also referred as `request id` and `operation id` in other systems. In order to use trace data you want to find trace IDs that explains the requests you are interested in e.g request with interesting error, or longer latency, or just debug call you just made.

When using tracing with Thanos, you can obtain trace ID in multiple ways:

* Search by labels/attributes/tags/time/component/latency e.g. using Jaeger indexing.
* [Exemplars](https://www.bwplotka.dev/2021/correlations-exemplars/)
* If request was sampled, response will have `X-Thanos-Trace-Id` response header with trace ID of this request as value.

![view](img/tracing.png)

### Forcing Sampling

Every request against any Thanos component's API with header `X-Thanos-Force-Tracing` will be sampled if tracing backend was configured.

## Configuration

Current tracing supported backends:
Currently supported tracing backends:

### Jaeger

Expand Down
48 changes: 42 additions & 6 deletions examples/interactive/interactive_test.go
Expand Up @@ -18,6 +18,8 @@ import (
"github.com/thanos-io/thanos/pkg/objstore/client"
"github.com/thanos-io/thanos/pkg/objstore/s3"
"github.com/thanos-io/thanos/pkg/testutil"
tracingclient "github.com/thanos-io/thanos/pkg/tracing/client"
"github.com/thanos-io/thanos/pkg/tracing/jaeger"
"gopkg.in/yaml.v2"
)

Expand Down Expand Up @@ -84,7 +86,8 @@ func createData() (perr error) {
return nil
}

// Test args: -test.timeout 9999m
// TestReadOnlyThanosSetup runs read only Thanos setup that has data from `maxTimeFresh - 2w` to `maxTimeOld`, with extra monitoring and tracing for full playground experience.
// Run with test args `-test.timeout 9999m`.
func TestReadOnlyThanosSetup(t *testing.T) {
t.Skip("This is interactive test - it will until you will kill it or curl 'finish' endpoint. Uncomment and run as normal test to use it!")

Expand Down Expand Up @@ -121,6 +124,21 @@ func TestReadOnlyThanosSetup(t *testing.T) {
testutil.Ok(t, exec("cp", "-r", store1Data+"/.", filepath.Join(m1.Dir(), "bkt1")))
testutil.Ok(t, exec("cp", "-r", store2Data+"/.", filepath.Join(m1.Dir(), "bkt2")))

// Setup Jaeger.
j := e.Runnable("tracing").WithPorts(map[string]int{"http-front": 16686, "jaeger.thrift": 14268}).Init(e2e.StartOptions{Image: "jaegertracing/all-in-one:1.25"})
testutil.Ok(t, e2e.StartAndWaitReady(j))

jaegerConfig, err := yaml.Marshal(tracingclient.TracingConfig{
Type: tracingclient.JAEGER,
Config: jaeger.Config{
ServiceName: "thanos",
SamplerType: "const",
SamplerParam: 1,
Endpoint: "http://" + j.InternalEndpoint("jaeger.thrift") + "/api/traces",
},
})
testutil.Ok(t, err)

// Create two store gateways, one for each bucket (access point to long term storage).
// ┌───────────┐
// │ │
Expand All @@ -144,7 +162,13 @@ func TestReadOnlyThanosSetup(t *testing.T) {
},
})
testutil.Ok(t, err)
store1 := e2edb.NewThanosStore(e, "store1", bkt1Config, e2edb.WithImage("thanos:latest"))
store1 := e2edb.NewThanosStore(
e,
"store1",
bkt1Config,
e2edb.WithImage("thanos:latest"),
e2edb.WithFlagOverride(map[string]string{"--tracing.config": string(jaegerConfig)}),
)

bkt2Config, err := yaml.Marshal(client.BucketConfig{
Type: client.S3,
Expand All @@ -157,7 +181,14 @@ func TestReadOnlyThanosSetup(t *testing.T) {
},
})
testutil.Ok(t, err)
store2 := e2edb.NewThanosStore(e, "store2", bkt2Config, e2edb.WithImage("thanos:latest"))

store2 := e2edb.NewThanosStore(
e,
"store2",
bkt2Config,
e2edb.WithImage("thanos:latest"),
e2edb.WithFlagOverride(map[string]string{"--tracing.config": string(jaegerConfig)}),
)

// Create two Prometheus replicas in HA, and one separate one (short term storage + scraping).
// Add a Thanos sidecar.
Expand Down Expand Up @@ -189,8 +220,8 @@ func TestReadOnlyThanosSetup(t *testing.T) {
promHA1 := e2edb.NewPrometheus(e, "prom-ha1")
prom2 := e2edb.NewPrometheus(e, "prom2")

sidecarHA0 := e2edb.NewThanosSidecar(e, "sidecar-prom-ha0", promHA0, e2edb.WithImage("thanos:latest"))
sidecarHA1 := e2edb.NewThanosSidecar(e, "sidecar-prom-ha1", promHA1, e2edb.WithImage("thanos:latest"))
sidecarHA0 := e2edb.NewThanosSidecar(e, "sidecar-prom-ha0", promHA0, e2edb.WithImage("thanos:latest"), e2edb.WithFlagOverride(map[string]string{"--tracing.config": string(jaegerConfig)}))
sidecarHA1 := e2edb.NewThanosSidecar(e, "sidecar-prom-ha1", promHA1, e2edb.WithImage("thanos:latest"), e2edb.WithFlagOverride(map[string]string{"--tracing.config": string(jaegerConfig)}))
sidecar2 := e2edb.NewThanosSidecar(e, "sidecar2", prom2, e2edb.WithImage("thanos:latest"))

testutil.Ok(t, exec("cp", "-r", prom1Data+"/.", promHA0.Dir()))
Expand Down Expand Up @@ -273,7 +304,9 @@ global:
sidecarHA0.InternalEndpoint("grpc"),
sidecarHA1.InternalEndpoint("grpc"),
sidecar2.InternalEndpoint("grpc"),
}, e2edb.WithImage("thanos:latest"),
},
e2edb.WithImage("thanos:latest"),
e2edb.WithFlagOverride(map[string]string{"--tracing.config": string(jaegerConfig)}),
)
testutil.Ok(t, e2e.StartAndWaitReady(query1))

Expand All @@ -285,6 +318,9 @@ global:
testutil.Ok(t, e2einteractive.OpenInBrowser(fmt.Sprintf("http://%s/%s", query1.Endpoint("http"), path)))
testutil.Ok(t, e2einteractive.OpenInBrowser(fmt.Sprintf("http://%s/%s", prom2.Endpoint("http"), path)))

// Tracing endpoint.
testutil.Ok(t, e2einteractive.OpenInBrowser("http://"+j.Endpoint("http-front")))
// Monitoring Endpoint.
testutil.Ok(t, m.OpenUserInterfaceInBrowser())
testutil.Ok(t, e2einteractive.RunUntilEndpointHit())
}
5 changes: 1 addition & 4 deletions go.mod
Expand Up @@ -18,7 +18,7 @@ require (
github.com/chromedp/chromedp v0.5.3
github.com/cortexproject/cortex v1.10.1-0.20210820081236-70dddb6b70b8
github.com/davecgh/go-spew v1.1.1
github.com/efficientgo/e2e v0.9.0
github.com/efficientgo/e2e v0.11.1-0.20210829161758-f4cc6dbdc6ea
github.com/efficientgo/tools/extkingpin v0.0.0-20210609125236-d73259166f20
github.com/facette/natsort v0.0.0-20181210072756-2cd4dd1e2dcb
github.com/fatih/structtag v1.1.0
Expand Down Expand Up @@ -87,9 +87,6 @@ replace (
// Using a 3rd-party branch for custom dialer - see https://github.com/bradfitz/gomemcache/pull/86.
// Required by Cortex https://github.com/cortexproject/cortex/pull/3051.
github.com/bradfitz/gomemcache => github.com/themihai/gomemcache v0.0.0-20180902122335-24332e2d58ab

// TODO(bwplotka): Remove when dev finishes.
github.com/efficientgo/e2e => github.com/efficientgo/e2e v0.10.0
github.com/efficientgo/tools/core => github.com/efficientgo/tools/core v0.0.0-20210731122119-5d4a0645ce9a
// Update to v1.1.1 to make sure windows CI pass.
github.com/elastic/go-sysinfo => github.com/elastic/go-sysinfo v1.1.1
Expand Down
8 changes: 6 additions & 2 deletions go.sum
Expand Up @@ -329,6 +329,7 @@ github.com/containerd/cgroups v0.0.0-20200531161412-0dbf7f05ba59/go.mod h1:pA0z1
github.com/containerd/cgroups v0.0.0-20200710171044-318312a37340/go.mod h1:s5q4SojHctfxANBDvMeIaIovkq29IP48TKAxnhYRxvo=
github.com/containerd/cgroups v0.0.0-20200824123100-0b889c03f102/go.mod h1:s5q4SojHctfxANBDvMeIaIovkq29IP48TKAxnhYRxvo=
github.com/containerd/cgroups v0.0.0-20210114181951-8a68de567b68/go.mod h1:ZJeTFisyysqgcCdecO57Dj79RfL0LNeGiFUqLYQRYLE=
github.com/containerd/cgroups v1.0.1 h1:iJnMvco9XGvKUvNQkv88bE4uJXxRQH18efbKo9w5vHQ=
github.com/containerd/cgroups v1.0.1/go.mod h1:0SJrPIenamHDcZhEcJMNBB85rHcUsw4f25ZfBiPYRkU=
github.com/containerd/console v0.0.0-20180822173158-c12b1e7919c1/go.mod h1:Tj/on1eG8kiEhd0+fhSDzsPAFESxzBBvdyEgyryXffw=
github.com/containerd/console v0.0.0-20181022165439-0650fd9eeb50/go.mod h1:Tj/on1eG8kiEhd0+fhSDzsPAFESxzBBvdyEgyryXffw=
Expand Down Expand Up @@ -513,8 +514,8 @@ github.com/eclipse/paho.mqtt.golang v1.2.0/go.mod h1:H9keYFcgq3Qr5OUJm/JZI/i6U7j
github.com/edsrzf/mmap-go v0.0.0-20170320065105-0bce6a688712/go.mod h1:YO35OhQPt3KJa3ryjFM5Bs14WD66h8eGKpfaBNrHW5M=
github.com/edsrzf/mmap-go v1.0.0 h1:CEBF7HpRnUCSJgGUb5h1Gm7e3VkmVDrR8lvWVLtrOFw=
github.com/edsrzf/mmap-go v1.0.0/go.mod h1:YO35OhQPt3KJa3ryjFM5Bs14WD66h8eGKpfaBNrHW5M=
github.com/efficientgo/e2e v0.10.0 h1:QFgE7W06nYLmASJ4HKUXanHCkKhmz0sIu5ym4zv0Ibs=
github.com/efficientgo/e2e v0.10.0/go.mod h1:5Z90zeIm2FTFD0xmhNn+vXGlOnFiizmnOMZjtLAFIUw=
github.com/efficientgo/e2e v0.11.1-0.20210829161758-f4cc6dbdc6ea h1:PFKVWZOnEfthNTcdpiz0wGzmikPOINnbRYEo4MneloE=
github.com/efficientgo/e2e v0.11.1-0.20210829161758-f4cc6dbdc6ea/go.mod h1:vDnF4AAEZmO0mvyFIATeDJPFaSRM7ywaOnKd61zaSoE=
github.com/efficientgo/tools/core v0.0.0-20210731122119-5d4a0645ce9a h1:Az9zRvQubUIHE+tHAm0gG7Dwge08V8Q/9uNSIFjFm+A=
github.com/efficientgo/tools/core v0.0.0-20210731122119-5d4a0645ce9a/go.mod h1:OmVcnJopJL8d3X3sSXTiypGoUSgFq1aDGmlrdi9dn/M=
github.com/efficientgo/tools/extkingpin v0.0.0-20210609125236-d73259166f20 h1:kM/ALyvAnTrwSB+nlKqoKaDnZbInp1YImZvW+gtHwc8=
Expand Down Expand Up @@ -738,8 +739,10 @@ github.com/gocql/gocql v0.0.0-20200526081602-cd04bd7f22a7/go.mod h1:DL0ekTmBSTdl
github.com/godbus/dbus v0.0.0-20151105175453-c7fdd8b5cd55/go.mod h1:/YcGZj5zSblfDWMMoOzV4fas9FZnQYTkDnsGvmh2Grw=
github.com/godbus/dbus v0.0.0-20180201030542-885f9cc04c9c/go.mod h1:/YcGZj5zSblfDWMMoOzV4fas9FZnQYTkDnsGvmh2Grw=
github.com/godbus/dbus v0.0.0-20190402143921-271e53dc4968/go.mod h1:/YcGZj5zSblfDWMMoOzV4fas9FZnQYTkDnsGvmh2Grw=
github.com/godbus/dbus v0.0.0-20190422162347-ade71ed3457e h1:BWhy2j3IXJhjCbC68FptL43tDKIq8FladmaTs3Xs7Z8=
github.com/godbus/dbus v0.0.0-20190422162347-ade71ed3457e/go.mod h1:bBOAhwG1umN6/6ZUMtDFBMQR8jRg9O75tm9K00oMsK4=
github.com/godbus/dbus/v5 v5.0.3/go.mod h1:xhWf0FNVPg57R7Z0UbKHbJfkEywrmjJnf7w5xrFpKfA=
github.com/godbus/dbus/v5 v5.0.4 h1:9349emZab16e7zQvpmsbtjc18ykshndd8y2PG3sgJbA=
github.com/godbus/dbus/v5 v5.0.4/go.mod h1:xhWf0FNVPg57R7Z0UbKHbJfkEywrmjJnf7w5xrFpKfA=
github.com/gofrs/uuid v3.3.0+incompatible/go.mod h1:b2aQJv3Z4Fp6yNu3cdSllBxTCLRxnplIgP/c0N/04lM=
github.com/gofrs/uuid v4.0.0+incompatible/go.mod h1:b2aQJv3Z4Fp6yNu3cdSllBxTCLRxnplIgP/c0N/04lM=
Expand Down Expand Up @@ -1311,6 +1314,7 @@ github.com/opencontainers/runtime-spec v0.1.2-0.20190507144316-5b71a03e2700/go.m
github.com/opencontainers/runtime-spec v1.0.1/go.mod h1:jwyrGlmzljRJv/Fgzds9SsS/C5hL+LL3ko9hs6T5lQ0=
github.com/opencontainers/runtime-spec v1.0.2-0.20190207185410-29686dbc5559/go.mod h1:jwyrGlmzljRJv/Fgzds9SsS/C5hL+LL3ko9hs6T5lQ0=
github.com/opencontainers/runtime-spec v1.0.2/go.mod h1:jwyrGlmzljRJv/Fgzds9SsS/C5hL+LL3ko9hs6T5lQ0=
github.com/opencontainers/runtime-spec v1.0.3-0.20200929063507-e6143ca7d51d h1:pNa8metDkwZjb9g4T8s+krQ+HRgZAkqnXml+wNir/+s=
github.com/opencontainers/runtime-spec v1.0.3-0.20200929063507-e6143ca7d51d/go.mod h1:jwyrGlmzljRJv/Fgzds9SsS/C5hL+LL3ko9hs6T5lQ0=
github.com/opencontainers/runtime-tools v0.0.0-20181011054405-1d69bd0f9c39/go.mod h1:r3f7wjNzSs2extwzU3Y+6pKfobzPh+kKFJ3ofN+3nfs=
github.com/opencontainers/selinux v1.6.0/go.mod h1:VVGKuOLlE7v4PJyT6h7mNWvq1rzqiriPsEqVhc+svHE=
Expand Down
2 changes: 1 addition & 1 deletion pkg/api/api.go
Expand Up @@ -195,7 +195,7 @@ func GetRuntimeInfoFunc(logger log.Logger) RuntimeInfoFn {

type InstrFunc func(name string, f ApiFunc) http.HandlerFunc

// Instr returns a http HandlerFunc with the instrumentation middleware.
// GetInstr returns a http HandlerFunc with the instrumentation middleware.
func GetInstr(
tracer opentracing.Tracer,
logger log.Logger,
Expand Down