Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/azureblob] add implementation for azure blob exporter #38028

Merged
merged 17 commits into from
Mar 5, 2025

Conversation

hgaol
Copy link
Member

@hgaol hgaol commented Feb 18, 2025

Description

Implementation of new component, azure blob exporter.

Link to tracking issue

Fixes #34319 .

Testing

Validated using different authentication, and tested using multiple azure blob accounts.
example config.

processors:
  batch:
  batch/2:
    send_batch_size: 10000
    timeout: 10s

exporters:
  debug:
    verbosity: detailed
  azureblob/conn-string:
    # for connection string auth, no need to specify url, because it's already included in connection string
    auth:
      type: "connection_string"
      connection_string: "DefaultEndpointsProtocol=https;AccountName=ghaccoujt;AccountKey=<secret>;EndpointSuffix=core.windows.net"
    container:
      metrics: "test"
      logs: "test"
      traces: "test"
  azureblob/url:
    url: "https://cs110032000abaee117.blob.core.windows.net/"
    auth:
      type: "service_principal"
      tenant_id: "4d94074b-32c4-4a70-949c-c7bd7ff7af89"
      client_id: "5ee98aba-dee7-4fcb-9a79-fd6797f752cf"
      client_secret: "<secret>"
    container:
      metrics: "filesystem"
      logs: "filesystem"
      traces: "filesystem"

service:
  pipelines:
    metrics/1:
      receivers: [otlp]
      exporters: [azureblob/conn-string]
    traces/1:
      receivers: [otlp]
      exporters: [azureblob/conn-string]
    logs/1:
      receivers: [otlp]
      exporters: [azureblob/conn-string]
    metrics/2:
      receivers: [otlp]
      processors: [batch/2]
      exporters: [azureblob/url]
    traces/2:
      receivers: [otlp]
      processors: [batch/2]
      exporters: [azureblob/url]
    logs/2:
      receivers: [otlp]
      processors: [batch/2]
      exporters: [azureblob/url]

output.
image
image

Documentation

See README.md

@hgaol hgaol requested a review from a team as a code owner February 18, 2025 16:22
@hgaol hgaol requested a review from dehaansa February 18, 2025 16:22
Copy link
Contributor

@MovieStoreGuy MovieStoreGuy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of things from me:

  • Could you please include some tests around this functionality?
  • The consume methods looks like duplicated code, could they be collapse into one method?

My initial thoughts by doing something like:

func (t *T) ConsumeTraces(ctx context.Context, pt ptrace.Traces) error {
  buf, err := t.marshalTraces(pt)
  if err != nil {
     return err
  }
  return t.consumeBuffer(ctx, buf)
}

func (t *T) consumeBuffer(ctx context.Context, buf []byte) error {
  // blob creationtion here and uploading
  return nil
}

Comment on lines 34 to 37
var fileExtensionMap = map[string]string{
formatTypeJSON: "json",
formatTypeProto: "pb",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the event that I have configured an extension to perform the marshalling, it may be straight up raw text, or some other format that doesn't match json or protobuf.

Is it possible for a user to define their own extension for the data written?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supported, it can support other encoding extension with below config. And updated descriptions in README.md.

  azureblob/1:
    encodings:
      logs: text_encoding

client *azblob.Client
signal pipeline.Signal
marshaller *marshaller
blobNameTemplate *template.Template
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand why there is a template here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed template.

return fmt.Errorf("failed to parse template: %w", err)
}

// create client based on auth type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here to explain that this is only required until there is an azure auth extension?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments in Config.URL.

Comment on lines 124 to 136
func (e *azureBlobExporter) generateBlobName(data map[string]any) (string, error) {
// Get current time
now := time.Now()

// Execute the template
var result bytes.Buffer
if err := e.blobNameTemplate.Execute(&result, data); err != nil {
return "", fmt.Errorf("failed to execute template: %w", err)
}

blobStr := result.String()
return now.Format(blobStr), nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption here is that this works in two parts?

Provide a directory key, and ensure the blob has a unique name to avoid any collisions with other exporters?

I am mildly concerned that using a template might be slow in the hot path, and it might be better to use a strings builder instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed template and use string operations directly.

return fmt.Errorf("failed to upload metrics data: %w", err)
}

e.logger.Info("Successfully exported metrics to Azure Blob Storage",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest changing this to debug by default, it is going to be very noisey

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

}

blobContentReader := bytes.NewReader(data)
_, err = e.client.UploadStream(context.TODO(), e.config.Container.Traces, blobName, blobContentReader, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the provided context instead of using context.TODO

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@hgaol
Copy link
Member Author

hgaol commented Mar 1, 2025

Couple of things from me:

  • Could you please include some tests around this functionality?
  • The consume methods looks like duplicated code, could they be collapse into one method?

My initial thoughts by doing something like:

func (t *T) ConsumeTraces(ctx context.Context, pt ptrace.Traces) error {
  buf, err := t.marshalTraces(pt)
  if err != nil {
     return err
  }
  return t.consumeBuffer(ctx, buf)
}

func (t *T) consumeBuffer(ctx context.Context, buf []byte) error {
  // blob creationtion here and uploading
  return nil
}

Thanks @MovieStoreGuy for the careful review! I've did some changes as below

  1. collapsed into one method.
  2. added tests for exporter, marshaller.
  3. support specifying encoding extension for logs, metrics and traces.
  4. deprecated using string template to generate blob name.

I also validated using 2 Azure blob storage account. Pls help to take a look at your convenience. Thanks!

@atoulme
Copy link
Contributor

atoulme commented Mar 4, 2025

Please make tidy

Container *Container `mapstructure:"container"`
Auth *Authentication `mapstructure:"auth"`

// BlobNameFormat is the format of the blob name. It controls the uploaded blob name, e.g. "2006/01/02/metrics_15_04_05_{{.SerialNum}}.json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is no longer accurate.

Copy link
Member Author

@hgaol hgaol Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

return fmt.Errorf("failed to upload traces data: %w", err)
}

e.logger.Debug("Successfully exported traces to Azure Blob Storage",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Message leftover from refactor, remove signal type from message and add it as a log attribute?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making it an attribute will make querying the data easier

Copy link
Member Author

@hgaol hgaol Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found there's data_type attribute which contains signal type info. I just updated the message and not adding signal attribute.

assert.NoError(t, te.start(context.Background(), componenttest.NewNopHost()))
}

func TestExporterConsumeMetrics(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func TestExporterConsumeMetrics(t *testing.T) {
func TestExporterConsumeTelemetry(t *testing.T) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the test function name

MetricsFormat: "2006/01/02/metrics_15_04_05_{{.SerialNum}}.{{.FileExtension}}",
LogsFormat: "2006/01/02/logs_15_04_05_{{.SerialNum}}.{{.FileExtension}}",
TracesFormat: "2006/01/02/traces_15_04_05_{{.SerialNum}}.{{.FileExtension}}",
MetricsFormat: "2006/01/02/metrics_15_04_05.json",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should default have millisecond precision? For large workloads.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also serial number, maybe minutes is enough as default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also plan to support append blob feature, but will do it in another PR for future.

if encoding == nil {
return nil, fmt.Errorf("unknown encoding %q", config.Encodings.Logs)
}
// cast with ok to avoid panics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're casting to avoid panics, we should check the ok and return an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

if m.tracesMarshaler == nil {
return nil, errors.New("traces are not supported by encoding")
}
buf, err := m.tracesMarshaler.MarshalTraces(td)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can return here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! updated

Copy link
Contributor

@MovieStoreGuy MovieStoreGuy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes looks safe to merge as is, anything extra can be added in future Prs.

@MovieStoreGuy
Copy link
Contributor

Please run go mod tidy to resolve the issues in packages.

@hgaol
Copy link
Member Author

hgaol commented Mar 5, 2025

Thank you all for the careful review! Updated and pls take a look!

@hgaol hgaol requested a review from dehaansa March 5, 2025 01:22
Copy link
Contributor

@dehaansa dehaansa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with one docs question. Marked as ready to merge as I'm fine with fixing it as a quick followup if a maintainer gets in here first and merges.

@@ -36,18 +36,38 @@ The following settings can be optionally configured and have default values:
- traces_format (default `2006/01/02/traces_15_04_05_{{.SerialNum}}.{{.FileExtension}}`): blob name format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these documented defaults still correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, I'll fix them in future PR.

@dehaansa dehaansa added the ready to merge Code review completed; ready to merge by maintainers label Mar 5, 2025
@atoulme atoulme merged commit b8f05c0 into open-telemetry:main Mar 5, 2025
167 checks passed
@github-actions github-actions bot added this to the next release milestone Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exporter/azureblob ready to merge Code review completed; ready to merge by maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New component: Azure Blob Exporter
4 participants