Description
My actions before raising this issue
- Followed the troubleshooting guide
- Read/searched the docs
- Searched past issues
Expected Behaviour
During function proxy, the Gateway should be able to produce open telemetry spans.
Current Behaviour
There are no tracing spans
List All Possible Solutions and Workarounds
Which Solution Do You Recommend?
I recently did a walk-through for integrating OpenTelemetry with OpenFaaS functions and think it would be nice if the Gateway could produce an OpenTelemetry spans during function invocation. Adding tracing during the function proxy would provide a more accurate picture of the networking in the cluster and enable accurate assessments of the overhead (or lack thereof) from the Gateway.
We previously discussed this in general in #1354 but OpenTelemetry was not a active project at the time, only OpenTracing. OpenTelemetry. OpenTelemetry makes this integration much more feasbile now because we can more easily provide support for multiple exporters. Additionally, the OpenTelemetry providers generally allow all of the required configuration via env variables, which means the integration should require only minimal changes to the Gateway.
During the Gateway startup we would initialize and set the global tracing provider using something like this
shutdownTracing, err := tracing.Provider(config.Version, config.Commit)
if err != nil {
log.Fatal(err)
}
// Cleanly shutdown and flush telemetry when the application exits.
defer shutdownTracing(ctx)
We can then encapsulate all of the tracing specific code in the Provider
implemenation
func Provider(version, commit string) (shutdown Shutdown, err error) {
exporter := Exporter(os.Getenv("OTEL_EXPORTER"))
var exp tracesdk.TracerProviderOption
switch exporter {
case JaegerExporter:
// configure the collector from the env variables,
// OTEL_EXPORTER_JAEGER_ENDPOINT/USER/PASSWORD
j, e := jaeger.New(jaeger.WithCollectorEndpoint())
exp, err = tracesdk.WithBatcher(j), e
case LogExporter:
w := os.Stdout
opts := []stdouttrace.Option{stdouttrace.WithWriter(w)}
if truthyEnv("OTEL_EXPORTER_LOG_PRETTY_PRINT") {
opts = append(opts, stdouttrace.WithPrettyPrint())
}
if !truthyEnv("OTEL_EXPORTER_LOG_TIMESTAMPS") {
opts = append(opts, stdouttrace.WithoutTimestamps())
}
s, e := stdouttrace.New(opts...)
exp, err = tracesdk.WithSyncer(s), e
// additional exporters
default:
logrus.Warn("tracing disabled")
// We explicitly DO NOT set the global TracerProvider using otel.SetTracerProvider().
// The unset TracerProvider returns a no-op "non-recording" span, but still passes through context.
otel.SetTextMapPropagator(
propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}),
)
// return no-op shutdown function
return func(_ context.Context) {}, nil
}
if err != nil {
return nil, err
}
// some additional work to
// finish initializing the provider
otel.SetTracerProvider(provider)
shutdown = func(ctx context.Context) {
// Do not let the application hang forever when it is shutdown.
ctx, cancel := context.WithTimeout(ctx, time.Second*5)
defer cancel()
err := provider.Shutdown(ctx)
if err != nil {
logrus.WithError(err).Error("tracing provider did not gracefully shutdown")
}
}
return shutdown, nil
}
Inside the function invocation hanlder here
faas/gateway/handlers/forwarding_proxy.go
Line 55 in 8a87b57
var err error
_, span := otel.Tracer("Gateway").Start(r.Context(), "Proxy")
defer func() {
if err != nil {
span.SetStatus(codes.Error, err.Error())
span.RecordError(err)
}
span.End()
}()
This would then show as a new span named "Proxy" between the ingress and the function (if they have tracing enabled). There are a few other things we could do, e.g. adding the status code, original url, and request url as metadata to the span, but this is optional for a minimal implementation.
Steps to Reproduce (for bugs)
- Follow this walkthrough https://github.com/LucasRoesler/openfaas-tracing-walkthrough
Context
https://github.com/LucasRoesler/openfaas-tracing-walkthrough