Framework which aims to ease logging affair: Logs
, Traces
and Metrics
.
V2 version launch usage of OpenTelemetry specification for all logging directions.
This mean that all logging propagators uses OTEL
protocol.
Tel use zap.Logger
as the heart of system.
That why it’s pass all zap functions through.
Library establish connection via GRPC OTLP protocol with opentelemetry-collector-contrib
(official OTEL server) and send logs
, traces
and metrics
.
Collector by the way distribute them to Loki
, Tempo
and Prometheus
or any other services which you prefer and which collector support.
Furthermore, we prepared for you working dashboards
in ./__grafana
folder which created for our middlewares
for most popular servers and clients.
Our goal to support logfmt format for Loki
viewing.
Via simple zap
library interface.
By the way, you can enrich log attributes which should be written only when whey would really need
// create copy of ctx which we enrich with log attributes
cxt := tel.Global().Copy().Ctx()
// pass ctx in controller->store->root layers and enrich information
err := func (ctx context.Context) error{
tel.FromCtx(ctx).PutAttr(extract...)
return return fmt.Errorf("some error")
}(ctx)
// and you write log message only when it really needed
// with all putted attribute via ctx from ALL layers earlier
// No need to look previous info/debug messages
// All needed information in one message with all attributes which you already added, but which would be writen only when you really do call `Error()`, `Info()`, `Debug()` and so on
//
// for example: only when you got error
if err != nil{
tel.FromCtx(ctx).Error("error happened", tel.Error(err))
}
Library simplify usage with creating Spans
of trace
Also, you can send not only logs and also encroach trace events
span, ctx := tel.StartSpanFromContext(req.Context(), "my trace")
defer span.End()
tel.FromCtx(ctx).Info("this message will be saved both in LogInstance and trace",
// and this will come to the trace as attribute
tel.Int("code", errCode))
Simplify working with metrics
m := tel.Global().Meter("github.com/MyRepo/MyLib/myInstrumenation")
requestLatency, err := m.SyncFloat64().Histogram("demo_client.request_latency",
instrument.WithDescription("The latency of requests processed"))
if err != nil {
t.Fatal("metric load error", tel.Error(err))
}
...
start := time.Now()
....
ms := float64(time.Now().Sub(start).Microseconds())
requestLatency.Record(ctx, ms,
attribute.String("userID", "e64916d9-bfd0-4f79-8ee3-847f2d034d20"),
attribute.Int("orderID", 1),
)
-
Recovery flow
-
Instantiate new copy of
tel
for further handler -
Basic metrics with respective dashboard for grafana
-
Trace propagation
-
client part - send (inject) current trace span to the server
-
server part - read (extract) trace and create new trace child one (or absolutly new if no trace info was provided or this info where not properly wrapped via propagator protocol of OTEL specification)
-
Logging data exported via OTEL’s
GRPC protocol. tel
developed to trespass it via open-telemetry collector which should route log data up to any desired log receivers.
Keep in mind that collector has plugin version collector contrib - this is gateway-adapter to numerous protocols which not yet support OTEL
, for example grafana loki.
For instance, you can use opentelemetry-collector-contrib
as tel
receiver and route logging data to Grafana Loki
, trace data to Grafana Tempo
and metric data to Prometheus + Grafana ;)
tel
approach to put traceID
field with actual trace ID.
All our middlewares should do that or developer should do it by himself
Just call UpdateTraceFields
before write some logs
tel.UpdateTraceFields(ctx)
understood grafana should setup derivedFields
for Loki data source
- name: Loki
type: loki
url: http://loki:3100
uid: loki
jsonData:
derivedFields:
- datasourceUid: tempo
matcherRegex: "traceID=(\\w+)"
name: trace
url: '$${__value.raw}'
We match tempo
with loki
by service_name
label.
All logs should contain traceID by any key form and service_name
.
In grafana tempo datasource should be configured with tracesToLogs
- name: Tempo
type: tempo
access: proxy
orgId: 1
url: http://tempo:3200
basicAuth: false
isDefault: false
version: 1
editable: false
apiVersion: 1
uid: tempo
jsonData:
nodeGraph:
enabled: true
tracesToLogs:
datasourceUid: loki
filterBySpanID: false
filterByTraceID: true
mapTagNamesEnabled: false
tags:
- service_name
service name
type
: string
project namespace
type
: string
ENUM: dev, stage, prod
type
: string
info log
type
: string
NOTE: debug, info, warn, error, dpanic, panic, fatal
valid options: console
and json
or "none"
none - disable print to console (only OTEL or critical errors)
for IsDebug() function
type
: bool
default: true
address where health
, prometheus
would be listen
Note
|
address logic represented in net.Listen description |
default: true
Address to otel collector server via GRPC protocol
With insecure …
default: true
Enables gzip compression for grpc connections
default: "15"
Interval metrics gathered
Check server certificate DNS name given from server.
Disable OTEL_EXPORTER_WITH_INSECURE
if set
default: false
required OTEL_ENABLE
= true
Inject logger adapter to otel library related to grpc client and get log information related to this transport
default: false
required OTEL_ENABLE
= true
Inject logger adapter to otel processor library related to collectors behaviour
default: false
Enable retrying to send logs to collector.
default: 1s
Limit how often logs are flushed with level.Error.
Example: 1s means allowed 1 flush per second if logs have level.Error.
default: 256
Limit message size. If limit is exceeded, message is truncated.
default: 100
Limit rate of messages per second. If limit is exceeded, warning is logged and messages are dropped. Value 0 disables limit.
default: ``
The same as LOGS_MAX_MESSAGES_PER_SECOND but allows to configure limit per level.
Value format: <level1>=<n>,<level2>=<n>. Ex: LOGS_MAX_LEVEL_MESSAGES_PER_SECOND="error=0,info=100"
default: false
Enable retrying to send traces to collector.
default: statustraceidratio:0.1
Set sampling strategy. There are options: - never - always - traceidratio:<float64> - statustraceidratio:<float64>
where <float64> is required and valid floating point number from 0.0 to 1.0
default: false
Enable adding all log messages to active span as event.
default: true
Enable adding all log fields to active span as attributes.
default: true
Enable cardinality check for span names.
default: 0
Limit cardinality of span’s attributes. Not used, so default value is 0.
default: 500
Limit the number of unique span names.
default: 10m
Enable diagnostic loop that checks for cardinality violations and logs a warning.
You can disable it by setting the value to 0.
default: false
Enable retrying to send metrics to collector.
default: true
Enable cardinality check for metrics' labels.
default: 100
Limit cardinality of metric’s labels. If limit is exceeded, metric is ignored, but the previous metrics work as before
default: 500
Limit the number of unique metric names (without labels. only name).
default: 10m
Enable diagnostic loop that checks for cardinality violations and logs a warning.
You can disable it by setting the value to 0.
TLS CA certificate body
TLS client certificate
TLS client key
This optional variable, handled by open-telemetry SDK. Separator is semicolon. Put additional resources variables, very suitable!
-
❏ Expose health check to specific metric
-
❏ Duplicate trace messages for root - ztrace.New just add to chain tree