Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add optional explicit ForceTLS option to ClientTLS config #2036

Merged
merged 8 commits into from Oct 15, 2021

Conversation

thewmo
Copy link
Contributor

@thewmo thewmo commented Oct 8, 2021

What changed?
Add an optional useTls config property under publicClient to override whether or not TLS is used when connecting to the endpoint, regardless of the tls.frontend.server configuration.

Why?
To fix issue 2035, which I encountered while setting up Temporal under ECS in AWS with a frontend ALB load balancer terminating TLS.

How did you test it?
Tested locally and in my nonproduction environment.

Potential risks
Worker role may fail to start if property is incorrectly set. Not setting the property at all results in the status quo behavior where TLS is based on configuration present in tls.frontend.server.

Is hotfix candidate?
No

Note that I'm a newb to golang and am more than open to recommendations here.

@thewmo thewmo requested a review from a team October 8, 2021 23:45
@CLAassistant
Copy link

CLAassistant commented Oct 8, 2021

CLA assistant check
All committers have signed the CLA.

@thewmo thewmo changed the title 2035 explicit tls add optional explicit useTls option to publicClient config Oct 8, 2021
@sergeybykov
Copy link
Member

@thewmo Thank you so much for digging into the TLS code and submitting this!

I want to bring up a potential alternative here.

We added RootTLS.SystemWorker almost a year ago to decouple worker TLS settings from the frontend's. The thought at the time was to deprecate and eventually remove the old code path of configuring system workers implicitly. Deprecation is hard because of the breaking nature of the change. So, we still have both paths supported.

What if instead of passing tlsOption, we leave the legacy behavior as is, and only change the SystemWorker case to infer the need to use TLS from the fact that is it configured (isSystemWorker is true)?

The benefits I see is that we wouldn't need to change TLSConfigProvider interface and pass the option explicitly. The mere fact that SystemWorker is configured with a cert will tell us to use TLS. One downside of this approach is that the legacy way of configuring system workers will not support this case. Since we want to deprecate it anyway, I think that's an acceptable tradeoff.

What do yo think?

@thewmo
Copy link
Contributor Author

thewmo commented Oct 12, 2021

@sergeybykov Thank you for reviewing my PR. Apologies that I based it off the commit tagged 1.12.2, I did that for my own dev purposes but it made for a messy pull request.

I definitely can see that a solution that doesn't change the interface of TLSConfigProvider is desirable; that is the main thing I didn't like about mine. I think my only concern with the SystemWorker proposal is that, for a user in an environment that has an installed system-level set of CA certs (where Golang will find the cert on its own) they still need some way to tell the server "this endpoint is a TLS endpoint". (But maybe Temporal server doesn't support system-installed root certs?)

One option might be adding an explicit isEnabled option to the ClientTLS struct so the user can say "use TLS here even though I don't need to configure anything out of the ordinary". Part of the issue here is that whether or not an endpoint is TLS is logically a property of the endpoint and so having it in the "custom TLS" config is maybe problematic. If the HostPort supported the grpcs:// prefix that might be another way to go, though I don't know GRPC well enough to say.

Thanks again for looking at this, I have no objection to whatever you all ultimately decide to do, so long as I can configure the worker client to use TLS without having the frontend running TLS. :-)

@sergeybykov
Copy link
Member

I think my only concern with the SystemWorker proposal is that, for a user in an environment that has an installed system-level set of CA certs (where Golang will find the cert on its own) they still need some way to tell the server "this endpoint is a TLS endpoint". (But maybe Temporal server doesn't support system-installed root certs?)

I feel that I'm missing something here. Please help me understand.
System-level root certs are used if installed for checking client and server certs, but only to make sure they are valid. On top of that, server uses ServerTLS.ClientCAFiles or ServerTLS.ClientCAData to restrict what clients can connect to it, so that only client certs signed by the specified client CAs are accepted. I don't see how these settings can be implicitly set via the system-level set of CA certs. Otherwise, a client with any valid cert issues by those CAs would be able to connect. This is where I lost your point about environment.

Similarly, on the client side (system worker is a client) we need to tell it what certificate and key to use for connecting to the frontend. My thinking was that if we change the code to use TLS for system worker whenever SystemWorker properties are set, regardless of the server TLS configuration, then system worker will be able to connect to the ALB even if server TLS is not set at all. But maybe we just need to change

s.settings.Frontend.IsEnabled()

here to an equivalent of

client.IsEnabled()

Is this similar to what you meant by the following sentence? Would this resolve the issue for you?

One option might be adding an explicit isEnabled option to the ClientTLS struct so the user can say "use TLS here even though I don't need to configure anything out of the ordinary".

Then I lost you again on

Part of the issue here is that whether or not an endpoint is TLS is logically a property of the endpoint and so having it in the "custom TLS" config is maybe problematic. If the HostPort supported the grpcs:// prefix that might be another way to go, though I don't know GRPC well enough to say.

If you like, we can do a quick Zoom call to clear my misunderstanding.

Can you rebase this on top of the latest master? I can help with that if needed.

@thewmo
Copy link
Contributor Author

thewmo commented Oct 13, 2021

I'm sure any confusion is on me. I have rebased my branch on master. I would be happy to do a Zoom. I'm in the MDT timezone and I'm on the Temporal Slack so feel free to reach out to me there. Most of my day tomorrow is free.

To simplify what I'm trying to say, consider the scenario where no mutual TLS is used but the frontend is behind a TLS-terminating load balancer, and the worker client can find the CA cert for that server cert in the system store. In that scenario it would seem that potentially no configuration at all under RootTLS would be required - but we still need the worker client to attempt a TLS connection. So how to model that in the config?

Thanks for your patience, I hope we can connect soon.

@sergeybykov
Copy link
Member

I'm not sure how to find you on Slack without a full name. Could you DM me there, and we'll arrange a call.

@sergeybykov
Copy link
Member

@thewmo Thank you for explaining your use case on the call today.
What do you think about this alternative for enabling such connections? Would it enable your case?

@thewmo
Copy link
Contributor Author

thewmo commented Oct 14, 2021

@sergeybykov yes, adding such an option to the ClientTLS struct is one of the other options I entertained - I would be good with this solution.

I do think though, that for this comment to be most accurate:

// Optional - Use TLS even is neither client certificate nor root CAs are configured
// This is for non-mTLS cases when client validates serve against a set of trusted CA certificates configured in the environment

Code would need to be added to also enable TLS when a server CA is set (in either rootCaFiles or rootCaData) in the relevant ClientTLS struct. Neither isSystemWorker or isEnabled currently checks those. Thank you!

@sergeybykov
Copy link
Member

@thewmo For some reason, I don't have permissions to push to your branch. Can you take a look and merge from my branch?

@thewmo
Copy link
Contributor Author

thewmo commented Oct 15, 2021

@sergeybykov this looks good to me. I don't know why you couldn't push to my branch, I do have the "allow maintainers..." box checked. I went ahead and invited you explicitly to my repo. Feel free to push to it, or I will pull your branch when I'm back in front of my work computer.

@sergeybykov
Copy link
Member

Thank you, @thewmo. I was able to push now.
Will ask our server engineers to review, although they already blessed the general approach.

@sergeybykov sergeybykov changed the title add optional explicit useTls option to publicClient config add optional explicit ForceTLS option to ClientTLS config Oct 15, 2021
@sergeybykov sergeybykov merged commit 043c974 into temporalio:master Oct 15, 2021
@sergeybykov
Copy link
Member

Thank you, @thewmo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants