Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very high CPU with v0.16.0 #1037

Closed
max-wittig opened this issue Dec 22, 2021 · 21 comments
Closed

Very high CPU with v0.16.0 #1037

max-wittig opened this issue Dec 22, 2021 · 21 comments

Comments

@max-wittig
Copy link

Seems like that since upgrading to version 0.16.0, the machine requirements increased a lot. Before our machine was almost idle at maximum ~25%. Now it's always going to 100%.

Is this expected?

image

image

@Mogztter

/cc @dlouzan @ercanucan

@ggrossetie
Copy link
Member

I recently discovered an infinite loop in PlantUML [1] which is already fixed on the latest snapshot but in your case it seems that Chrome is taking a lot of CPU.
I've seen some issues with Mermaid in the past but it would interesting to find out which diagram library and which diagram in particular is causing this issue. You can use docker logs to get the logs from yuzutech/kroki between 10:00 and 10:30.

If it's indeed related to a particular container, as a workaround, you can restart it.

[1] plantuml/plantuml#836

@max-wittig
Copy link
Author

Not sure. We have a lot of traffic, but it's almost completely plantuml actually. With some ditaa in between

@max-wittig
Copy link
Author

We've just increased the machine size now to avoid issues over the holidays. But looks like maybe it was just a temporary thing.

@ggrossetie
Copy link
Member

Thats odd... PlantUML and ditaa are not using Chrome.
Maybe a Chrome instance went haywire during the upgrade/restart of the bpmn, mermaid or excalidraw container.

Anyway, if you find the root cause or investigate further please share the results (even in raw form).

@hof
Copy link

hof commented Dec 31, 2021

I'm seeing the same thing. After starting the latest containers immediately seeing a bunch of chrome processes in mermaid, bpmn and excalidraw that take the CPU to 100%.

The logs of mermaid, bpmn and excalidraw just show the endpoint only.

The kroki container shows a warning:

{"timestamp":"1640914105494","level":"WARN","thread":"vertx-blocked-thread-checker","logger":"io.vertx.core.impl.BlockedThreadChecker","message":"Thread Thread[vert.x-eventloop-thread-1,5,main] has been blocked for 2294 ms, time limit is 2000 ms","context":"default"}
{"timestamp":"1640914105702","level":"INFO","thread":"vert.x-eventloop-thread-0","logger":"io.vertx.core.impl.launcher.commands.VertxIsolatedDeployer","message":"Succeeded in deploying verticle","context":"default"}

@ggrossetie
Copy link
Member

Might be an issue with the latest version of Chrome, not sure... does restarting the containers solve this issue?

@hof
Copy link

hof commented Jan 4, 2022

When I restarted the services again on January 3rd, the issue was gone. No idea what happened here.

@ggrossetie
Copy link
Member

Since @max-wittig had the same issue it would be interesting to compare your runtime environment:

  • docker version
  • are you using docker-compose? docker-compose version?
  • are you using Kubernetes or similar?
  • host OS and version
  • command used to start the container(s)

It might help us identify the root cause. Anyway, I'm glad this is a one-time issue but at the same I'm a bit confused 🤔

@dlouzan
Copy link

dlouzan commented Jan 6, 2022

@Mogztter I'm on the same team as @max-wittig. Unfortunately, even after increasing the node size, we've just recently run out of credits on the instance, so it's not a one time thing, the latest version increased the CPU requirements significantly.

@ggrossetie
Copy link
Member

@dlouzan Could you please provide information about your runtime environment?

Version 0.16.0 has been running fine on https://kroki.io for more than 2 weeks now and we haven't see any high CPU usage.

Host

$ cat /etc/*release*
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.3 LTS"
NAME="Ubuntu"
VERSION="20.04.3 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.3 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
Versions
$ docker-compose -v
docker-compose version 1.24.0, build 0aa59064
$ docker -v
Docker version 20.10.12, build e91ed57
$ docker ps
CONTAINER ID   IMAGE                              COMMAND                  CREATED       STATUS       PORTS                                       NAMES
55bd9030bdae   yuzutech/kroki-mermaid:0.16.0      "/usr/bin/mermaid"       2 weeks ago   Up 2 weeks   8002/tcp                                    kroki_mermaid_1
d9ce053dad1b   yuzutech/kroki-excalidraw:0.16.0   "/usr/bin/excalidraw"    2 weeks ago   Up 2 weeks   8004/tcp                                    kroki_excalidraw_1
d8322d1ffe8a   yuzutech/kroki-bpmn:0.16.0         "/usr/bin/bpmn"          2 weeks ago   Up 2 weeks   8003/tcp                                    kroki_bpmn_1
e4fda95fbf03   yuzutech/kroki-blockdiag:0.16.0    "gunicorn --preload …"   2 weeks ago   Up 2 weeks   8001/tcp                                    kroki_blockdiag_1
57ab98bed580   yuzutech/kroki:0.16.0              "/bin/sh -c 'exec ja…"   2 weeks ago   Up 2 weeks   0.0.0.0:8000->8000/tcp, :::8000->8000/tcp   kroki_core_1

docker stats

CONTAINER ID   NAME                 CPU %     MEM USAGE / LIMIT    MEM %     NET I/O           BLOCK I/O         PIDS
55bd9030bdae   kroki_mermaid_1      0.00%     119.3MiB / 768MiB    15.53%    26MB / 340MB      16.8GB / 16.4kB   32
d9ce053dad1b   kroki_excalidraw_1   0.00%     80.45MiB / 512MiB    15.71%    10.9MB / 37.9MB   96.1MB / 16.4kB   32
d8322d1ffe8a   kroki_bpmn_1         0.00%     114.6MiB / 512MiB    22.38%    13.6MB / 29.1MB   219MB / 16.4kB    32
e4fda95fbf03   kroki_blockdiag_1    0.01%     266.2MiB / 512MiB    52.00%    157MB / 1.54GB    424GB / 0B        2
57ab98bed580   kroki_core_1         1.61%     859.4MiB / 1.75GiB   47.96%    3.8GB / 25.7GB    104MB / 0B        45

CPU 95th percentile last 2 weeks

image

Green: Docker CPU total (percent)
Blue: System CPU total normalized 2 core (percent)

System load 95th last 2 weeks

image

Pink: system load 1
Green: system load 5
Blue: system load 15

As you can see everything is looking good and we generated approximately 650K diagrams over the last 2 weeks.

@ggrossetie
Copy link
Member

Any update? are you still experiencing issues with 0.16.0? Did you revert back to 0.15.1?

@max-wittig
Copy link
Author

Yeah we reverted back to 0.15.1. There was not enough time yet to investigate the issues in 0.16.0.

@cboos
Copy link

cboos commented Jan 26, 2022

Same here with 0.16.0. Let me describe the problem and the "workaround" I found.

Using the documented start-up procedure with the example docker-compose.yml (docker-compose version 1.13.0, build 1719ceb), I get:

# docker-compose up -d
Creating network "kroki_default" with the default driver
Creating kroki_blockdiag_1 ...
Creating kroki_bpmn_1 ...
Creating kroki_excalidraw_1 ...
Creating kroki_mermaid_1 ...
Creating kroki_blockdiag_1
Creating kroki_bpmn_1
Creating kroki_mermaid_1
Creating kroki_bpmn_1 ... done
Creating kroki_kroki_1 ...
Creating kroki_kroki_1 ... done

# docker stats
e905b7a307d0   kroki_kroki_1        0.08%     124.4MiB / 62.62GiB   0.19%     360B / 0B   154MB / 0B      29
39b97d381b3c   kroki_mermaid_1      0.00%     82.94MiB / 62.62GiB   0.13%     690B / 0B   581MB / 418kB   54
0c7994848045   kroki_excalidraw_1   128.00%   80.27MiB / 62.62GiB   0.13%     690B / 0B   568MB / 418kB   51
b936bb2861e2   kroki_bpmn_1         127.56%   81.44MiB / 62.62GiB   0.13%     690B / 0B   567MB / 418kB   51
4ce0aa9bd371   kroki_blockdiag_1    0.01%     29.23MiB / 62.62GiB   0.05%     744B / 0B   35.3MB / 0B     2

As you can see, the excalidraw and bpmn are not doing well (same chrome --type=renderer process as already reported by the others).

Now if I stop these and simply restart them:

# docker-compose stop excalidraw bpmn
Stopping kroki_excalidraw_1 ... done
Stopping kroki_bpmn_1 ... done

# docker stats
CONTAINER ID   NAME                CPU %     MEM USAGE / LIMIT     MEM %     NET I/O     BLOCK I/O       PIDS
e905b7a307d0   kroki_kroki_1       0.25%     125.2MiB / 62.62GiB   0.20%     360B / 0B   154MB / 0B      29
39b97d381b3c   kroki_mermaid_1     0.00%     82.63MiB / 62.62GiB   0.13%     690B / 0B   581MB / 418kB   44
4ce0aa9bd371   kroki_blockdiag_1   0.02%     29.26MiB / 62.62GiB   0.05%     744B / 0B   35.3MB / 0B     2

# docker-compose up -d
kroki_blockdiag_1 is up-to-date
Starting kroki_bpmn_1 ...
Starting kroki_excalidraw_1 ...
Starting kroki_excalidraw_1
Starting kroki_bpmn_1
Starting kroki_excalidraw_1 ... done
kroki_kroki_1 is up-to-date

# docker stats
CONTAINER ID   NAME                 CPU %     MEM USAGE / LIMIT     MEM %     NET I/O     BLOCK I/O       PIDS
e905b7a307d0   kroki_kroki_1        0.65%     125.2MiB / 62.62GiB   0.20%     360B / 0B   154MB / 0B      29
39b97d381b3c   kroki_mermaid_1      0.01%     82.63MiB / 62.62GiB   0.13%     690B / 0B   581MB / 418kB   44
0c7994848045   kroki_excalidraw_1   0.00%     68.16MiB / 62.62GiB   0.11%     360B / 0B   0B / 418kB      44
b936bb2861e2   kroki_bpmn_1         0.00%     68.73MiB / 62.62GiB   0.11%     360B / 0B   0B / 418kB      44
4ce0aa9bd371   kroki_blockdiag_1    0.02%     29.27MiB / 62.62GiB   0.05%     744B / 0B   35.3MB / 0B     2

Problem gone! Once more:

# docker-compose down
Stopping kroki_kroki_1 ...
Stopping kroki_kroki_1 ... done
Stopping kroki_excalidraw_1 ...
Stopping kroki_bpmn_1 ...
Stopping kroki_blockdiag_1 ...
kroki_kroki_1 exited with code 143
blockdiag_1   | [2022-01-26 09:37:39 +0000] [1] [INFO] Handling signal: term
blockdiag_1   | [2022-01-26 09:37:39 +0000] [7] [INFO] Worker exiting (pid: 7)
Stopping kroki_mermaid_1 ... done
Stopping kroki_excalidraw_1 ... done

Stopping kroki_bpmn_1 ... done
kroki_mermaid_1 exited with code 137
kroki_bpmn_1 exited with code 137
Removing kroki_kroki_1 ... done
Removing kroki_mermaid_1 ... done
Removing kroki_excalidraw_1 ... done
Removing kroki_bpmn_1 ... done
Removing kroki_blockdiag_1 ... done
Removing network kroki_default

# docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

# docker-compose up -d
Creating network "kroki_default" with the default driver
Creating kroki_excalidraw_1 ...
Creating kroki_bpmn_1 ...
Creating kroki_blockdiag_1 ...
Creating kroki_mermaid_1 ...
Creating kroki_bpmn_1
Creating kroki_excalidraw_1
Creating kroki_blockdiag_1
Creating kroki_mermaid_1 ... done
Creating kroki_kroki_1 ...
Creating kroki_kroki_1 ... done

# docker stats
CONTAINER ID   NAME                 CPU %     MEM USAGE / LIMIT     MEM %     NET I/O     BLOCK I/O       PIDS
777ea743b7e4   kroki_kroki_1        0.10%     125.8MiB / 62.62GiB   0.20%     360B / 0B   155MB / 0B      29
1be6af9d88dc   kroki_mermaid_1      0.00%     81.93MiB / 62.62GiB   0.13%     580B / 0B   581MB / 418kB   52
91747f3f9550   kroki_blockdiag_1    0.01%     29.09MiB / 62.62GiB   0.05%     690B / 0B   35.3MB / 0B     2
65265fe4eacc   kroki_excalidraw_1   129.63%   80.28MiB / 62.62GiB   0.13%     690B / 0B   568MB / 418kB   50
4128ca121e41   kroki_bpmn_1         128.08%   80.8MiB / 62.62GiB    0.13%     744B / 0B   567MB / 418kB   50

# docker-compose restart bpmn excalidraw
Restarting kroki_excalidraw_1 ... done
Restarting kroki_bpmn_1 ... done

# docker stats
CONTAINER ID   NAME                 CPU %     MEM USAGE / LIMIT     MEM %     NET I/O     BLOCK I/O       PIDS
777ea743b7e4   kroki_kroki_1        0.28%     126MiB / 62.62GiB     0.20%     360B / 0B   155MB / 0B      29
1be6af9d88dc   kroki_mermaid_1      0.00%     81.72MiB / 62.62GiB   0.13%     580B / 0B   581MB / 418kB   44
91747f3f9550   kroki_blockdiag_1    0.02%     29.1MiB / 62.62GiB    0.05%     690B / 0B   35.3MB / 0B     2
65265fe4eacc   kroki_excalidraw_1   0.00%     68MiB / 62.62GiB      0.11%     360B / 0B   0B / 418kB      51
4128ca121e41   kroki_bpmn_1         0.00%     68.58MiB / 62.62GiB   0.11%     360B / 0B   0B / 418kB      50

In all of the above runs, only bpmn and excalidraw had the issue, but that's not always the case, the mermaid service can be affected as well:

# docker stats
CONTAINER ID   NAME                 CPU %     MEM USAGE / LIMIT     MEM %     NET I/O     BLOCK I/O       PIDS
a325e4cfe432   kroki_kroki_1        0.13%     129.5MiB / 62.62GiB   0.20%     360B / 0B   155MB / 0B      29
7ac56dc66f9f   kroki_bpmn_1         125.93%   81.14MiB / 62.62GiB   0.13%     470B / 0B   567MB / 418kB   50
18b32d45b80d   kroki_mermaid_1      127.79%   82.77MiB / 62.62GiB   0.13%     470B / 0B   581MB / 418kB   53
7cd4d9e01824   kroki_blockdiag_1    0.01%     29.15MiB / 62.62GiB   0.05%     634B / 0B   35.3MB / 0B     2
cdab6744cb4e   kroki_excalidraw_1   128.42%   80.5MiB / 62.62GiB    0.13%     634B / 0B   568MB / 418kB   50

Same procedure, a docker-compose restart excalidraw bpmn mermaid fixes it. A bit more surprisingly, even a docker-compose restart does.

It gets weirder: whether I do docker-compose up --no-deps kroki; docker-compose up or if I do docker-compose up mermaid bpmn excalidraw; docker-compose up, the problem is there. Only restarting fixes it apparently.

Sorry for the long post, I thought that maybe the full output may enable you to see things I didn't.

Also:

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.3 LTS
Release:        20.04
Codename:       focal

# uname -a
Linux XXXXXX 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

# docker version
Client: Docker Engine - Community
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        f0df350
 Built:             Wed Jun  2 11:56:47 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       b0f5bc3
  Built:            Wed Jun  2 11:54:58 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.6
  GitCommit:        d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc:
  Version:          1.0.0-rc95
  GitCommit:        b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

I thought I would also verify that reverting to 0.15.1 also "fixes" the issue. It does.

# docker-compose up -d 
...

# docker stats
CONTAINER ID   NAME                 CPU %     MEM USAGE / LIMIT     MEM %     NET I/O     BLOCK I/O     PIDS
08fbefd8edfe   kroki_kroki_1        0.28%     131.5MiB / 62.62GiB   0.21%     360B / 0B   156MB / 0B    29
7595ab004d92   kroki_blockdiag_1    0.02%     28.95MiB / 62.62GiB   0.05%     470B / 0B   35.3MB / 0B   2
b0125b67b3c4   kroki_mermaid_1      0.00%     75.34MiB / 62.62GiB   0.12%     580B / 0B   610MB / 0B    50
6956417e37c5   kroki_excalidraw_1   0.00%     73.53MiB / 62.62GiB   0.11%     580B / 0B   599MB / 0B    50
53d7944e25a0   kroki_bpmn_1         0.00%     74.6MiB / 62.62GiB    0.12%     690B / 0B   596MB / 0B    50

@ggrossetie
Copy link
Member

Thanks for the detailed report it really helps!
Could you please run the following command:

$ docker-compose -v

I'm using almost the same version on kroki.io and I was able to reproduce this issue:

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.3 LTS
Release:	20.04
Codename:	focal
$ uname -a
Linux xxx 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
$ docker version
Client: Docker Engine - Community
 Version:           20.10.12
 API version:       1.41
 Go version:        go1.16.12
 Git commit:        e91ed57
 Built:             Mon Dec 13 11:45:27 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.12
  Git commit:       459d0df
  Built:            Mon Dec 13 11:43:36 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.12
  GitCommit:        7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

I can only reproduce it when I start more than one container that relies on Puppeteer. I also cannot reproduce it on my machine using the latest version of docker-compose (version 2.2.2).

@ggrossetie
Copy link
Member

Probably an issue between Docker and Puppeteer/Chrome, we did upgrade from puppeteer 10.4.0 to 12.0.1.

Puppeteer 10.4.0 was using Chromium 92.0.4512.0 (r884014) whereas Puppeteer 12.0.1 is using Chromium 97.0.4692.0 (r938248). Hopefully this issue is fixed in the latest version of Puppeteer 13.1.2 (using Chromium 98.0.4758.0 (r950341)).

We might also consider disabling a few features from Chromium: https://peter.sh/experiments/chromium-command-line-switches/

  • --no-zygote : Disables the use of a zygote process for forking child processes. Instead, child processes will be forked and exec'd directly. Note that --no-sandbox should also be used together with this flag because the sandbox needs the zygote to work.
  • --disable-dev-shm-usage: The /dev/shm partition is too small in certain VM environments, causing Chrome to fail or crash (see http://crbug.com/715363). Use this flag to work-around this issue (a temporary directory will always be used to create anonymous shared memory files).
  • --no-first-run: Skip First Run tasks, whether or not it's actually the First Run. Overridden by kForceFirstRun. This does not drop the First Run sentinel and thus doesn't prevent first run from occuring the next time chrome is launched without this flag.
  • --no-initial-navigation : Stops new Shell objects from navigating to a default url

We might also try to add --single-process (Runs the renderer and plugins in the same process as the browser)... 🤔

@ggrossetie
Copy link
Member

I also cannot reproduce it on my machine using the latest version of docker-compose (version 2.2.2).

I was able to reproduce this issue using docker-compose 2.2.3 on kroki.io but not always 😞

@cboos
Copy link

cboos commented Jan 26, 2022

# docker-compose -v
docker-compose version 1.13.0, build 1719ceb

@ethereal-engineer
Copy link

Also having this issue. My first thought was perhaps the chrome instances were thrashing due to hitting a critical memory usage point, but after reading these comments, I'm not sure. Running similar config to those listed. Will revert to 0.15.1 for now until I hear more back from you all.

@cboos
Copy link

cboos commented Jan 27, 2022

As you pointed to Puppeteer, I suppose you already found the following issue: puppeteer/puppeteer/issues/7892. Not much action on it yet, though :/

@ggrossetie
Copy link
Member

Interesting, especially this part:

Inspecting the process list, it seems that two particular instances of the headless_shell get locked:

Using --single-process might solve this deadlock issue between processes.

@ggrossetie
Copy link
Member

I can still reproduce this issue using --single-process.
Since Alpine 3.15 is using Chrome 93.0.4577.82-r3, I think we should revert to Puppeteer 10.4.0 which is known to be working with Chrome 93.

For reference, Puppeteer 13 is based on Chromium 100.0.4889.0 which is not yet available on Alpine. The latest version available on Alpine edge is Chromium 98.0.4758.102-r3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants