-
Notifications
You must be signed in to change notification settings - Fork 772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error of Persistence Max QPS Reached for List Operations #3900
Comments
Hi we increase in #3753 and released in https://github.com/uber/cadence/releases/tag/v0.17.0 Please check it out and let us know if that meet you expectation. :D |
Ah, looks good, missed that one. I can close this issue then! |
Unfortunately, this issue is still causing problems for me. I am running Cadence locally using the docker-compose.yml auto setup. I have recently upgraded all my dockers to the latest versions. After running a workflow and then navigating to the Cadence GUI, I get the following error when I navigate to my domain workflows: Persistence Max QPS Reached for List Operations. This is then fixed by creating a custom dynamic config file, mounting the folder it is located in the docker and change the DYNAMIC_CONFIG_FILE_PATH environment variable of the cadence server docker to the corresponding file. I would have expected this to be not necessary anymore? |
@frtelg I have tested with my
So make sure you upgrade the Like we said in https://github.com/uber/cadence/tree/master/docker#using-a-released-image (we probably should make it more clear), |
@longquanzheng I have pulled the latest version of the image before testing it. I will test it again tomorrow, maybe I was still using an older version after all. I'll let you know. |
@frtelg
I only run the helloworld sample. So can you also check if your workflow is claling the ListAPI? |
I have tested again, using the following steps:
public interface GreetingWorkflow {
String TASK_LIST = "Example";
@WorkflowMethod(executionStartToCloseTimeoutSeconds = 360, taskList = TASK_LIST)
void greet();
@SignalMethod
void changeName(String name);
@SignalMethod
void terminate();
@QueryMethod
String getCurrentName();
}
This is the cadence server docker:
After this, I stop the dockers and the application and I add the following to the deployment.yaml file: frontend.visibilityListMaxQPS:
- value: 10000
frontend.esVisibilityListMaxQPS:
- value: 10000 When I retest then, the GUI works as expected. You can check out the application if you want, it is in my github: https://github.com/frtelg/cadence-spring-boot. |
I can't reproduce it. For a time I saw it and made it thought it was an issue in webUI but then I cannot reproduce it anymore... |
@frtelg does the released docker compose files help? |
@longquanzheng it is not really clear to me what files you are referring to. I have used the default docker-compose from the cadence project. My docker-compose file looks like this: version: '3'
services:
cassandra:
image: cassandra:3.11
ports:
- "9042:9042"
statsd:
image: graphiteapp/graphite-statsd
ports:
- "8080:80"
- "2003:2003"
- "8125:8125"
- "8126:8126"
cadence:
image: ubercadence/server:master-auto-setup
ports:
- "7933:7933"
- "7934:7934"
- "7935:7935"
- "7939:7939"
environment:
- "CASSANDRA_SEEDS=cassandra"
- "STATSD_ENDPOINT=statsd:8125"
- "DYNAMIC_CONFIG_FILE_PATH=custom-config/development.yaml"
depends_on:
- cassandra
- statsd
volumes:
- "./config:/etc/cadence/custom-config"
cadence-web:
image: ubercadence/web:latest
environment:
- "CADENCE_TCHANNEL_PEERS=cadence:7933"
ports:
- "8088:8088"
depends_on:
- cadence |
@frtelg I understand this is annoying, I have open PR: #4138 cadence/docker/config_template.yaml Line 3 in d0a8f7e
And let me know when you see the debug logs like
and
If they are not from your application, we will have a clue how to fix it. |
@longquanzheng the supplied container version is not working:
This is my docker-compose.yml: version: '3'
services:
cassandra:
image: cassandra:3.11
ports:
- "9042:9042"
statsd:
image: graphiteapp/graphite-statsd
ports:
- "8080:80"
- "2003:2003"
- "8125:8125"
- "8126:8126"
cadence:
image: ubercadence/qlong-server:master-04-15-2021-auto-setup
ports:
- "7933:7933"
- "7934:7934"
- "7935:7935"
- "7939:7939"
environment:
- "CASSANDRA_SEEDS=cassandra"
- "STATSD_ENDPOINT=statsd:8125"
# - "DYNAMIC_CONFIG_FILE_PATH=custom-config/development.yaml"
- "LOG_LEVEL=debug"
depends_on:
- cassandra
- statsd
volumes:
- "./config:/etc/cadence/custom-config"
cadence-web:
image: ubercadence/web:latest
environment:
- "CADENCE_TCHANNEL_PEERS=cadence:7933"
ports:
- "8088:8088"
depends_on:
- cadence |
@frtelg Sorry that error was totally my bad when building the customized image. I forgot to add the auto-setup argument. Can you try this one: LMK. Thanks |
@frtelg I finally reproduced stably this myself. |
^ I think I have root cause the issue. I think I got it repro because I updated my web image. TL;DRThere is a change in the WebUI which always make 2 requests in the default page, so that it can show both open and closed workflows. However, our ratelimiting only have 1 as bucket size, even though the refiling rate is 10. So it will reject requests in very fast rate. Note that this is mostly only an issue in local docker-compose. There is a change in the WebUI that by default it will try to get both open and closed workflows. So the default page at least has to make two List requests. However, looks like the ratelimiting doesn't work as we expected-- or we didn't configure it correctly. Even though MaxQPS defaults to 10, but it's only refiling rates. It doesn't allow 2 requests at the same time. There are a couple ways to fix:
To mitigate, user can select the closed or open view themselves, and ignore the error for now. @just-at-uber Do you think we can implement a retry logic in webUI? I think it's useful in many ways. Even we could potentially add some initial size configuration for ratelimiting, it's still good to have some retry on WebUI when talking to Cadence Frontend. |
Thanks! Great that you managed to find the bug. I did not find the time yet to retest it. |
I think retry logic here is good to have anyway for this screen incase the API fails. Ideally the server should handle a higher load by default. |
@just-at-uber yeah I agreed that server should also improve. I took a look but currently all the ratelimiting in server doesn't allow any bursting. It may take more effort to introduce it(also new configuration) |
See misplaced issue: uber/cadence-web#227
The text was updated successfully, but these errors were encountered: