Skip to content


Latest commit





Folders and files

Last commit message
Last commit date

parent directory


DataSet Exporter

Stability alpha: logs, traces
Distributions contrib
Issues Open issues Closed issues
Code Owners @atoulme, @martin-majlis-s1, @zdaratom-s1, @tomaz-s1

This exporter sends logs to DataSet.

See the Getting Started guide.


Required Settings

  • dataset_url (no default): The URL of the DataSet API that ingests the data. Most likely
  • api_key (no default): The "Log Write" API Key required to use API. Instructions how to get API key.

If you do not want to specify api_key in the file, you can use the builtin functionality and use api_key: ${env:DATASET_API_KEY}.

Server Host Settings

Specifying the server host is crucial for ensuring the correct functionality of DataSet. DataSet expects the server host value to be provided in the serverHost attribute. If the server host value is stored in a different attribute, you can use the resourceprocessor or attributesprocessor to copy it into the serverHost attribute.

You can also utilize the server_host settings (described below) to populate the serverHost attribute with different values.

The process of populating the serverHost attribute works as follows:

  • If the serverHost attribute is specified and not empty in the log or trace, then it is used.
  • If the serverHost attribute is specified and not empty in the resource, then it is used.
  • If the attribute is specified and not empty in the resource, then it is used.
  • If the server_host.server_host setting is specified and not empty, then it is used.
  • If server_host.use_host_name setting is set to true, the hostname of the node is used.

Make sure to provide the appropriate server host value in the serverHost attribute to ensure the proper functionality of DataSet and accurate handling of events.

Optional Settings

  • debug (default = false): Adds session_key to the server fields. It's useful for debugging throughput issues.
  • buffer:
    • max_lifetime (default = 5s): The maximum delay between sending batches from the same session.
    • purge_older_than (default = 30s): The maximum delay between receiving data for the same session after which resources associated with it are purged.
    • group_by (default = []): The list of attributes based on which events should be grouped. They are moved from the event attributes to the session info and shown as server fields in the UI.
    • retry_initial_interval (default = 5s): Time to wait after the first failure before retrying.
    • retry_max_interval (default = 30s): Is the upper bound on backoff.
    • retry_max_elapsed_time (default = 300s): Is the maximum amount of time spent trying to send a buffer.
    • retry_shutdown_timeout (default = 30s): The maximum time for which it will try to send data to the DataSet during shutdown. This value should be shorter than container's grace period.
    • max_parallel_outgoing (default = 100): The maximum number of parallel outgoing requests.
  • logs:
    • export_resource_info_on_event (default = false): Include LogRecord resource information (if available) on the DataSet event.
    • export_resource_prefix (default = 'resource.attributes.'): A prefix string for the resource, if export_resource_info_on_event is enabled.
    • export_scope_info_on_event (default = true): Include LogRecord scope information (if available) on the DataSet event.
    • export_scope_prefix (default = 'scope.attributes.'): A prefix string for the scope, if export_scope_info_on_event is enabled.
    • export_separator (default = '.'): The separator to add between keys when flattening nested structures (maps, arrays).
    • export_distinguishing_suffix (default = '_'): A suffix string to resolve naming collisions when flattening.
    • decompose_complex_message_field (default = false): Decompose complex body / message field types (e.g. a maps, arrays) into separate fields.
    • decomposed_complex_message_prefix (default = ''): A prefix string to use when a complex message is decomposed.
  • traces:
    • export_separator (default = '.'): The separator to add between keys when flattening nested structures (maps, arrays).
    • export_distinguishing_suffix (default = '_'): A suffix string to resolve naming collisions when flattening.
  • server_host:
    • server_host (default = ''): Specifies the server host to be used for the events.
    • use_hostname (default = true): Determines whether the hostname of the node should be used as the server host for the events. When set to true, the node's hostname is automatically used.
  • retry_on_failure: See retry_on_failure
  • sending_queue: See sending_queue
  • timeout: See timeout


Enabled attributes are exported in the order:

  1. Log properties
  2. Body
  3. Resource attributes
  4. Scope attributes
  5. Log attributes

If there is a name conflict, the export_distinguishing_suffix value is appended to the later attribute's name. If the export_distinguishing_suffix value is an empty string, then the value from the last attribute is used.


Example LogRecord:

- body:
  - b: 1
  - x: "b"
- resource:
  - r: 2
  - x: "r"
- scope:
  - s: 3
  - x: "s"
- attribute:
  - a: 4
  - x: "a"
  - map:
    - m1: 5
    - m2: 6

Then the event will look like:

  • Default settings for logs:
    • Event:
      - message: "{\"b\": 1, \"x\": \"b\"}"
      - scope.attributes.s: 3
      - scope.attributes.x: "s"
      - a: 4
      - x: "a"
      - map.m1: 5
      - map.m2: 6
  • Everything enabled:
    • Configuration:
          export_resource_info_on_event: true
          export_resource_prefix: "r."
          export_scope_info_on_event: true
          export_scope_prefix: "s."
          decompose_complex_message_field: true
          decomposed_complex_message_prefix: "m."
          export_separator: "-"
          export_distinguishing_suffix: "_"
    • Event:
      - message: "{\"b\": 1, \"x\": \"b\"}"
      - m.b: 1
      - m.x: "b"
      - r.r: 2
      - r.x: "r"
      - s.s: 3
      - s.x: "s"
      - a: 4
      - x: "a"
      - map-m1: 5
      - map-m2: 6
  • Everything enabled, prefixes are empty strings:
    • Configuration:
          export_resource_info_on_event: true
          export_resource_prefix: ""
          export_scope_info_on_event: true
          export_scope_prefix: ""
          decompose_complex_message_field: true
          decomposed_complex_message_prefix: ""
          export_separator: "-"
          export_distinguishing_suffix: "_"
    • Event:
      - message: "{\"b\": 1, \"x\": \"b\"}"
      - b: 1
      - x: "b"
      - r: 2
      - x_: "r"
      - s: 3
      - x__: "s"
      - a: 4
      - x___: "a"
      - map-m1: 5
      - map-m2: 6
  • Everything enabled, prefixes are empty strings, suffix is empty string:
    • Configuration:
          export_resource_info_on_event: true
          export_resource_prefix: ""
          export_scope_info_on_event: true
          export_scope_prefix: ""
          decompose_complex_message_field: true
          decomposed_complex_message_prefix: ""
          export_separator: "-"
          export_distinguishing_suffix: ""
    • Event:
      - message: "{\"b\": 1, \"x\": \"b\"}"
      - b: 1
      - r: 2
      - s: 3
      - a: 4
      - x: "a"
      - map-m1: 5
      - map-m2: 6

Field names can have . dots, _ underscores, and - hyphens. You must escape slashes in Search and PowerQueries. For example, search the field name as\/component.


    - key: serverHost
      action: insert
      from_attribute: container_id
      - key: serverHost
        from_attribute: node_id
        action: insert      

    # DataSet API URL, for DataSet EU instance
    # API Key
    api_key: your_api_key
      # Send buffer to the API at least every 5s
      max_lifetime: 5s
      # Group data based on these attributes
        - container_id
      # try to send data to the DataSet for at most 30s during shutdown
      retry_shutdown_timeout: 30s
      # If the serverHost attribute is not specified or empty,
      # use the value from the env variable SERVER_HOST
      server_host: ${env:SERVER_HOST}
      # If server_host is not set, use the hostname value
      use_hostname: true

    # DataSet API URL, for DataSet EU instance
    # API Key
    api_key: your_api_key
      max_lifetime: 15s

      receivers: [otlp]
      processors: [batch, attributes]
      # add dataset among your exporters
      exporters: [dataset/logs]
      receivers: [otlp]
      processors: [batch]
      # add dataset among your exporters
      exporters: [dataset/traces]

Handling serverHost Attribute

Based on the given configuration and scenarios, here's the expected behavior:

  1. Resource: {'node_id:' 'node-pay-01', '': 'host-pay-01'}, Log: {'container_id': 'cont-pay-01'}, Env: SERVER_HOST='server-pay-01', Hostname: ip-172-31-27-19
    • Since the attribute container_id is set, attributesprocessor will copy this value to the serverHost.
    • Used serverHost will be cont-pay-01.
  2. Resource: {'node_id': 'node-pay-01', '': 'host-pay-01'}, Log: {'': 'Bar'}, Env: SERVER_HOST='server-pay-01', Hostname: ip-172-31-27-19
    • Since the resource attribute node_id is set, resourceprocessor will copy this value to the serverHost.
    • Used serverHost will be node-pay-01.
  3. Resource: {'': 'host-pay-01'}, Log: {'': 'Bar'}, Env: SERVER_HOST='server-pay-01', Hostname: ip-172-31-27-19
    • Since the resource attribute is set, it will be used.
    • Used serverHost will be host-pay-01.
  4. Resource: {}, Log: {'': 'Bar'}, Env: SERVER_HOST='server-pay-01', Hostname: ip-172-31-27-19
    • Since the attribute container_id is not set, the value from the environmental variable SERVER_HOST will be copied to the serverHost.
    • Used serverHost will be server-pay-01.
  5. Resource: {}, Log: {'': 'Bar'}, Env: SERVER_HOST='', Hostname: ip-172-31-27-19
    • Since the attribute container_id is not set and the environmental variable SERVER_HOST is empty, the hostname of the node (ip-172-31-27-19) will be used as the fallback value for serverHost.
    • Used serverHost will be ip-172-31-27-19.


To enable metrics you have to:

  1. Run collector with enabled feature gate telemetry.useOtelForInternalMetrics. This can be done by executing it with one additional parameter - --feature-gates=telemetry.useOtelForInternalMetrics.
  2. Enable metrics scraping as part of the configuration and add receiver into services:
            - job_name: 'otel-collector'
              scrape_interval: 5s
                - targets: ['']
          # add prometheus among metrics receivers
          receivers: [prometheus]
          processors: [batch]
          exporters: [otlphttp/prometheus, debug]

Available Metrics

Available metrics contain dataset in their name. There are counters related to the number of processed events (events), buffers (buffer), sessions (sessions), and transferred bytes (bytes). There are also histograms related to response times (responseTime) and payload size (payloadSize).

There are several counters related to events/buffers:

  • enqueued - the number of received entities
  • processed - the number of entities that were accepted by the next layer
  • dropped - the number of entities that were not accepted by the next layer
  • broken - the number of entities that were somehow corrupted during processing (should be 0)

The number of entities, that are still in the queue can be computed as enqueued - (processed + dropped + broken).