Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring of azure obj store #3957 #3970

Merged
merged 18 commits into from Jul 5, 2021
Merged

Refactoring of azure obj store #3957 #3970

merged 18 commits into from Jul 5, 2021

Conversation

wiardvanrij
Copy link
Member

@wiardvanrij wiardvanrij commented Mar 24, 2021

Signed-off-by: Wiard van Rij wiard@outlook.com

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

  • Adds configuration options for the Azure Pipeline method
  • Adds configuration options for the Azure Reader method
  • Adds configuration options for the HTTP transport method
  • Adds HTTP Transport override
  • Adds MSI authentication (Cherry-pick from master: Fix parseStep bug introduced in PR #3740 (#3887) #3889)
  • Made parsing of the config strict
  • Breaks configuration by moving max_retries to it's own reader_config
  • Rewrote tests

Verification

How you tested it?
No, other than the tests included

Especially the MSI authentication part is untested.

How do you know it works?
This is tested via other users on #3957 :

@wiardvanrij ran your image all night locally with no failures.

Originally posted by @airkewld in #3957 (comment)


I hope this PR can get tested by users of Azure to confirm the working of the changes AND if this has positive impact for on, for example #3952 - relates to #3957

@wiardvanrij
Copy link
Member Author

Need to check why the test fails on the cortex dep. :D

@wiardvanrij
Copy link
Member Author

We use this part; https://github.com/cortexproject/cortex/blob/master/pkg/storage/bucket/azure/bucket_client.go#L10 and Cortex uses the Thanos part.
I think for this change both Cortex and Thanos need to be updated but I'm not sure on how to proceed with this.

@pracucci could you weight in here if my statement is even correct? Thank you!

@bwplotka
Copy link
Member

bwplotka commented Apr 8, 2021

Cycling dependency problem, so:

  • Fork Cortex, fix this on your fork, push remote branch to your fork
  • Update go.mod to replace cortex dep with your fork URL and Commit SHA
  • Let's review AND merge this PR
  • Propose the change in the fork to Cortex
  • Remove replace.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions otherwise look good!

pkg/objstore/azure/azure.go Show resolved Hide resolved
pkg/objstore/azure/helpers.go Show resolved Hide resolved
@phillebaba
Copy link
Contributor

Is there anything I can do to help move this along? :) Or is this dependent on changes that have to be made in Cortex?

@wiardvanrij
Copy link
Member Author

Is there anything I can do to help move this along? :) Or is this dependent on changes that have to be made in Cortex?

It's currently not blocking but just needs some fixes in packages. Which eventually also requires some changes at the Cortex side. I did however not foresee the complexity at first :p

But it's good to know that people need this, so I can make it a higher prio!

@airkewld
Copy link

Glad to see this moving along. My compact pod is dying multiple times a day at this point, hopefully this will resolve the issue for me and others.
Thanks

@wiardvanrij
Copy link
Member Author

@airkewld & @phillebaba see #3957 - I've build an image. But feel free to build it yourself. Wouldn't mind more testers on this feature :)

@phillebaba
Copy link
Contributor

I can probably test this as I am already running Thanos in AKS. Hopefully this will work with aad-pod-identity.

@wiardvanrij
Copy link
Member Author

I can probably test this as I am already running Thanos in AKS. Hopefully this will work with aad-pod-identity.

Don't use it on production yet tho.. :D Also can we move potential discussions to the issue? Makes it a bit easier. Thanks so much btw :)

@wiardvanrij
Copy link
Member Author

Cycling dependency problem, so:

  • Fork Cortex, fix this on your fork, push remote branch to your fork
  • Update go.mod to replace cortex dep with your fork URL and Commit SHA
  • Let's review AND merge this PR
  • Propose the change in the fork to Cortex
  • Remove replace.

I changed the configuration in a way that it's not breaking anymore AND it has no dependency issue anymore.
There is a technical trade-off here by implementing a configuration value that is somewhat obsolete.

We now use the max_retries config value if no specific other 'retry related' configuration is set. This means there is 0 change for those we do not use the extra added configuration options. If one sets specific retry value configurations, the max_retries config value is ignored in favor of the more specific config values.

So this resolved two things:

  • No dependency issue anymore
  • Backwards compatible

@nicolastakashi
Copy link
Contributor

Hi dudes!
any prevision to release it?

}

// If we don't have config specific retry values but we do have the generic MaxRetries.
// This is part backwards compatibility but also ease of configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless I have misunderstood, do you mean

Suggested change
// This is part backwards compatibility but also ease of configuration
// This part backwards compatibility but also ease of configuration

😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks, I will make this proper English :D

Copy link
Contributor

@yashrsharma44 yashrsharma44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small question about the comment 😛

bwplotka
bwplotka previously approved these changes May 6, 2021
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, 💪🏽

pkg/objstore/azure/azure.go Show resolved Hide resolved
@wiardvanrij
Copy link
Member Author

I will update this PR as soon as this is merged: #4104 (comment)
Then I can use that generic http method 💯

@phillebaba
Copy link
Contributor

phillebaba commented May 9, 2021

@wiardvanrij had some time now to try this PR. The container is created without an issue but I am seeing authorization issues when uploading blocks from the receiver. I am using aad-pod-identity which adds additional complexities, but unless someone else has verified that this feature works I think you should hold off on merging.

@wiardvanrij
Copy link
Member Author

@wiardvanrij had some time now to try this PR. The container is created without an issue but I am seeing authorization issues when uploading blocks from the receiver. I am using aad-pod-identity which adds additional complexities, but unless someone else has verified that this feature works I think you should hold off on merging.

Thanks! Could you perhaps provide more information in #3957 ?

Copy link
Contributor

@phillebaba phillebaba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just small suggestions to fix the msi resource value.

docs/storage.md Outdated Show resolved Hide resolved
pkg/objstore/azure/azure_test.go Outdated Show resolved Hide resolved
pkg/objstore/azure/azure_test.go Outdated Show resolved Hide resolved
bwplotka
bwplotka previously approved these changes Jun 17, 2021
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! There are minor style nits, otherwise I would be happy to merge this.

Great work and sorry for major lag on review!

pkg/objstore/azure/azure.go Show resolved Hide resolved
pkg/objstore/azure/helpers.go Outdated Show resolved Hide resolved
pkg/objstore/azure/helpers.go Outdated Show resolved Hide resolved
pkg/objstore/azure/helpers.go Outdated Show resolved Hide resolved
pkg/objstore/azure/helpers.go Outdated Show resolved Hide resolved
pkg/objstore/azure/helpers.go Show resolved Hide resolved
pkg/objstore/azure/helpers.go Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
http_config:
idle_conn_timeout: 0s
response_header_timeout: 0s
insecure_skip_verify: false

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could tls_config be added instead of just insecure_skip_verify?
I know blob storage itself does not support it but we'd like to use a proxy (by changing the endpoint in the configuration) with mTLS for blob storage to avoid having to provide the storage account key to every cluster we monitor.
I also think it would be better to make the config as similar to Prometheus as possible for consistency and code resuse (if possible)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes and no. This is in line with all other storage options in Thanos: https://thanos.io/tip/thanos/storage.md/
There is some work being done to make these configs more logical, but for now I try to keep it aligned with how everything in Thanos works (:

edit:
We must also not confuse it on which part the TLS is for. I.e. this is for the connection towards whatever object store you have. Not to implement TLS on the component itself. Just for clarification.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the answer, I guess the solution to this problem is to implement a custom proxy and run it as a sidecar, using https with a dummy certificate and insecure_skip_verify

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the answer, I guess the solution to this problem is to implement a custom proxy and run it as a sidecar, using https with a dummy certificate and insecure_skip_verify

Yea I understand the case, it's just that I don't feel making one specific change here. However, it would make sense to implement a generic tls_config for non-component related configs, but for 'object storage connections'. As this would/should be implemented not only for Azure Blob, but also for all the other options Thanos supports.
Perhaps you could create a new issue for this?

wiardvanrij and others added 18 commits July 5, 2021 16:36
Signed-off-by: Wiard van Rij <wiard@outlook.com>
Signed-off-by: Wiard van Rij <wiard@outlook.com>
Signed-off-by: Wiard van Rij <wiard@outlook.com>
Signed-off-by: Wiard van Rij <wiard@outlook.com>
Signed-off-by: Wiard van Rij <wiard@outlook.com>
Signed-off-by: Wiard van Rij <wiard@outlook.com>
Signed-off-by: Wiard van Rij <wiard@outlook.com>
Co-authored-by: Philip Laine <philip.laine@gmail.com>
Signed-off-by: Wiard van Rij <wiard@outlook.com>
Co-authored-by: Philip Laine <philip.laine@gmail.com>
Signed-off-by: Wiard van Rij <wiard@outlook.com>
Co-authored-by: Philip Laine <philip.laine@gmail.com>
Signed-off-by: Wiard van Rij <wiard@outlook.com>
Signed-off-by: Wiard van Rij <wiard@outlook.com>
Signed-off-by: Wiard van Rij <wiard@outlook.com>
Signed-off-by: Wiard van Rij <wiard@outlook.com>
Signed-off-by: Wiard van Rij <wiard@outlook.com>
Signed-off-by: Wiard van Rij <wiard@outlook.com>
Signed-off-by: Wiard van Rij <wiard@outlook.com>
Signed-off-by: Wiard van Rij <wiard@outlook.com>
Signed-off-by: Wiard van Rij <wiard@outlook.com>
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! Let's get this in and get some nice, sweet feedback.

Nice work @wiardvanrij , Mr Golang dev! (:

@amrmahdi
Copy link
Contributor

amrmahdi commented Sep 3, 2021

@wiardvanrij Thanks for adding these changes!
Couple of questions:

  1. The msi_resource why is it needed? Isn't it just a concat of storage account name and the endpoint?
  2. I'm assuming this does not support user assigned managed identity, correct?

@amrmahdi
Copy link
Contributor

amrmahdi commented Sep 3, 2021

@wiardvanrij Thanks for adding these changes!
Couple of questions:

  1. The msi_resource why is it needed? Isn't it just a concat of storage account name and the endpoint?
  2. I'm assuming this does not support user assigned managed identity, correct?

I think got my questions answered. I have a follow up PR to support user-assigned managed identity. #4636

@wiardvanrij
Copy link
Member Author

@wiardvanrij Thanks for adding these changes!
Couple of questions:

  1. The msi_resource why is it needed? Isn't it just a concat of storage account name and the endpoint?
  2. I'm assuming this does not support user assigned managed identity, correct?

I think got my questions answered. I have a follow up PR to support user-assigned managed identity. #4636

Wow thanks! Really good and thanks for helping out!

alvinlin123 pushed a commit to cortexproject/cortex that referenced this pull request Dec 13, 2021
* Introduce `http` config settings in Azure storage

Cortex v1.11.0 included thanos-io/thanos#3970, which added configuration
options to Azure's http client and transport, replacing usage of
`http.DefaultClient`. Unfortunately since Cortex was not setting this
config, Cortex implicitly switched from `http.DefaultClient` to all
empty values (e.g. `MaxIdleConns: 0` rather than 100).

Introduce `http` config settings to Azure storage. This motivated moving
`s3.HTTPConfig` into a new `pkg/storage/bucket/config` package, to allow
`azure` and `s3` to share it.

Also update the instructions for running the website to include
installing `embedmd`.

Signed-off-by: Andrew Seigner <andrew@sig.gy>

* feedback: `config.HTTP` -> `http.Config`

also back out changelog cleanup

Signed-off-by: Andrew Seigner <andrew@sig.gy>

* Back out accidental changelog addition

Signed-off-by: Andrew Seigner <andrew@sig.gy>
alvinlin123 pushed a commit to ac1214/cortex that referenced this pull request Jan 14, 2022
* Introduce `http` config settings in Azure storage

Cortex v1.11.0 included thanos-io/thanos#3970, which added configuration
options to Azure's http client and transport, replacing usage of
`http.DefaultClient`. Unfortunately since Cortex was not setting this
config, Cortex implicitly switched from `http.DefaultClient` to all
empty values (e.g. `MaxIdleConns: 0` rather than 100).

Introduce `http` config settings to Azure storage. This motivated moving
`s3.HTTPConfig` into a new `pkg/storage/bucket/config` package, to allow
`azure` and `s3` to share it.

Also update the instructions for running the website to include
installing `embedmd`.

Signed-off-by: Andrew Seigner <andrew@sig.gy>

* feedback: `config.HTTP` -> `http.Config`

also back out changelog cleanup

Signed-off-by: Andrew Seigner <andrew@sig.gy>

* Back out accidental changelog addition

Signed-off-by: Andrew Seigner <andrew@sig.gy>
Signed-off-by: Alvin Lin <alvinlin@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants