Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClusterMetadata: Default cluster potentially uses wrong frontend RPC address #149

Closed
thempatel opened this issue Mar 23, 2021 · 7 comments
Closed

Comments

@thempatel
Copy link

thempatel commented Mar 23, 2021

We're working on standing up the temporal service via helm and I noticed this while I was configuring the various yaml files. If a user configures a custom gRPC port for the frontend service, then the hardcoded default of 7933 will be incorrect.

rpcAddress: "127.0.0.1:7933"

It also seems that the localhost address 127.0.0.1 address would be incorrect in a deployed environment assuming that the various services (history, matching, frontend, worker) are deployed separately.

https://github.com/temporalio/temporal/blob/e2e26004552cbc0867afb342238bb3f9efeee6ce/client/clientBean.go#L87-L96

@emmercm
Copy link

emmercm commented Nov 17, 2022

@thempatel did you happen to resolve this in your environment? I believe I'm running into a similar issue.

@thempatel
Copy link
Author

@emmercm we ended up forking the helm chart for temporal to fix the various bugs in it. For this one, I did end up changing the hard coded port to instead be sourced from user configuration (values.yaml).

note: it's been a really long time, so take this with a grain of salt:

IIRC, the localhost is OK because i think there's actually a proxy that listens on localhost for the frontend service, so connecting to localhost will just forward the connection to the internally configured frontend RPC service which will then forward to the actual services. 🤷🏽‍♂️

@emmercm
Copy link

emmercm commented Nov 18, 2022

@thempatel we've also forked the chart, but more so we can better configure our unique Kubernetes environment and multi-cluster than anything.

The proxy would make a ton of sense, but I didn't find any trace of it in GitHub: https://github.com/search?q=org%3Atemporalio+7933&type=code. I would think localhost in this case would be Kube node-local rather than container-local, right? I'm running into issues with multi-cluster where I believe I'm getting some cross-talk, and I've convinced myself it's this localhost config.

@thempatel
Copy link
Author

@emmercm after reading #333 , noticed you're trying to run 2 unique temporal clusters. you cannot do this without isolating them, the services use a gossip protocol where they broadcast messages on a port. if your two clusters have services that are all broadcasting on the same port, but you've configured two different storage instances (sql, etc), you're going to run into problems.

The reason why I filed this (and subsequently forked) was exactly so that we could run multiple clusters all configured using different ports so that the two clusters don't run into each other.

One thing you could try to see if it solves your problem (if you haven't already), is to configure each of those clusters to be in their own K8s namespaces. If that works, then you'll just need to account for adding namespaces within the connection to the cluster in your clients.

@emmercm
Copy link

emmercm commented Nov 21, 2022

For some reason I swore the gossip behavior was deprecated/removed in Temporal as a step away from Cadence, but checking a quick tctl admin membership list_gossip shows all the pods that I would expect. Thank you for redirecting me on this one, this probably helps explain some of the behavior I'm seeing.

@dmateusp
Copy link

This was super helpful! After a fresh deployment, no communications with the task queues were working. I was getting context deadline timeouts on tctl tq describe --taskqueue all.

Then following this thread I changed the rpcAddress to match the frontend service name and port in my cluster: rpcAddress: "temporal-frontend:7233"

Now I'm able to list task queues and I can see that the matching server joined tctl admin membership list_gossip

@robholland
Copy link
Contributor

Fixed by #497, but also not used anymore anyway since Temporal 1.18.

robholland added a commit that referenced this issue Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants