Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU usage rises until reload #1833

Closed
xorander00 opened this issue Apr 7, 2022 · 26 comments
Closed

CPU usage rises until reload #1833

xorander00 opened this issue Apr 7, 2022 · 26 comments
Labels
bug Something isn't working

Comments

@xorander00
Copy link

What Operating System(s) are you seeing this problem on?

Windows, FreeBSD X11

WezTerm version

20220319-142410-0fcdea07

Did you try the latest nightly build to see if the issue is better (or worse!) than your current version?

No, and I'll explain why below

Describe the bug

First off, just want to say that I've used a bunch of terminals over the years and this is my favorite terminal by far. Keep up the good work and hopefully I'll be able to contribute in the future someday :) Now, onto the issue...

I've noticed that wezterm-gui will start to eat up more and more CPU usage over time. It doesn't seem to happen all of a sudden, but the average utilization seems to climb higher and higher. I realize it after I've noticed that my CLI has become slower and slower. I'll switch to the htop tab and see wezterm-gui at the top. I'll hit Ctrl+Shift+R to reload the config and the CPU usage subsides quickly after that. It's strange.

I have three workstations (freebsd/amd64, win10/amd64, & macOS/arm64). I haven't noticed it on macOS yet, but I rarely use wezterm directly on there (mainly just Windows + FreeBSD). I've only really noticed it on FreeBSD, but that happens to be my primary workstation.

The fact that it builds up slowly and then subsides on config reload makes me wonder if it's accumulating file handles or whatnot. I haven't looked at the source for wezterm and don't know which crates it's using, but assuming that underneath it's using kqueue() to watch config files, FreeBSD has a known issue where it has to open a fd for each file being watched (whereas inotify just creates a watcher on the root directory).

When I started writing this post I did a reload of the wezterm config and wezterm-gui CPU utilization was at 0.2%. Over the past few minutes it has risen to 10% with no interaction with wezterm. It's strange.

To Reproduce

Unsure, as I don't even know when or how it's triggered.

Configuration

animation_fps = 1
cursor_blink_ease_in = "Linear"
cursor_blink_ease_out = "Linear"

Expected Behavior

CPU usage remains <1% at idle.

Logs

14:19:21.471 INFO wezterm_gui::termwindow > Ding! (this is the bell) in pane 1
14:19:37.095 INFO wezterm_gui::termwindow > Ding! (this is the bell) in pane 1
17:37:56.570 INFO wezterm_gui::termwindow > Ding! (this is the bell) in pane 4
17:37:58.824 INFO wezterm_gui::termwindow > Ding! (this is the bell) in pane 4
17:38:08.018 INFO wezterm_gui::termwindow > Ding! (this is the bell) in pane 4
18:27:53.003 WARNING wezterm_term::terminalstate > unhandled DecPrivateMode SetDecPrivateMode(Unspecified(1015))
18:27:53.004 WARNING wezterm_term::terminalstate > unhandled DecPrivateMode ResetDecPrivateMode(Unspecified(1015))
18:28:31.712 WARNING wezterm_term::terminalstate > unhandled DecPrivateMode SetDecPrivateMode(Unspecified(1015))
18:28:31.713 WARNING wezterm_term::terminalstate > unhandled DecPrivateMode ResetDecPrivateMode(Unspecified(1015))
18:28:43.346 WARNING wezterm_term::terminalstate > unhandled DecPrivateMode SetDecPrivateMode(Unspecified(1015))
18:28:43.347 WARNING wezterm_term::terminalstate > unhandled DecPrivateMode ResetDecPrivateMode(Unspecified(1015))
18:28:52.629 WARNING wezterm_term::terminalstate > unhandled DecPrivateMode SetDecPrivateMode(Unspecified(1015))
18:28:52.629 WARNING wezterm_term::terminalstate > unhandled DecPrivateMode ResetDecPrivateMode(Unspecified(1015))
18:44:25.816 INFO wezterm_gui::termwindow > Ding! (this is the bell) in pane 1

Anything else?

No response

@xorander00 xorander00 added the bug Something isn't working label Apr 7, 2022
@xorander00
Copy link
Author

Hmm, so I was exploring the behavior a bit and ran truss on wezterm-gui. Based on a quick look of the syscalls and returns, there didn't seem to be anything that really jumped out. Noticed some errors, but those might be nothing (mostly looked like either waiting on recvmsg to return something or a thread trying to get a lock on some resource).

I did notice that the resident set was rising slowly. On a hunch, I killed all instances and launched via "wezterm-gui start --always-new-process" instead of using "gtk-launch org.wezfurlong.wezterm" (desktop shortcut that comes with the FreeBSD package which just launches wezterm without any args). I'm not sure if there's something about the way it's launched from there, but the resident set stopped rising & the CPU idles at <1% now.

Weird. I don't know if it's something with my config, environment, FreeBSD, or wezterm.

@wez
Copy link
Owner

wez commented Apr 7, 2022

Is that really your complete configuration?

Is there something about how you're using wezterm (eg: what sort of things are being output to it) that might influence this?

It may be instructive to start wezterm will full trace logging enabled; it will be very verbose, so redirect it to a file:

WEZTERM_LOG=trace wezterm 2> /tmp/wezterm.log

then when you notice it using high CPU, capture the logs for the past few seconds (each line has a timestamp) and attach them here.

@wez wez added the waiting-on-op Waiting for more information from the original poster label Apr 7, 2022
@xorander00
Copy link
Author

Is that really your complete configuration?

Nope, sorry. Issue template said what's relevant, but I wasn't sure which parts, so I just grabbed that as a sample. I can post the full config in a bit, but I need to consolidate it as it's spread across a few lua modules.

Is there something about how you're using wezterm (eg: what sort of things are being output to it) that might influence this?

Shouldn't be, as far as I can tell. Nothing being output, just an idle shell prompt.

It may be instructive to start wezterm will full trace logging enabled; it will be very verbose, so redirect it to a file:

WEZTERM_LOG=trace wezterm 2> /tmp/wezterm.log

Will do, forgot about that.

then when you notice it using high CPU, capture the logs for the past few seconds (each line has a timestamp) and attach them here.

I can do that, but I will mention that it seems to gradually rise over a period of a few minutes. I'll still clip a few samples from the trace output, but just something to note.

I'll see if I can make screen recording to actually see the behavior, that may help.

@no-response no-response bot removed the waiting-on-op Waiting for more information from the original poster label Apr 7, 2022
@wez wez added the waiting-on-op Waiting for more information from the original poster label Apr 7, 2022
@wez
Copy link
Owner

wez commented Apr 8, 2022

Do you see this with multiple wezterm windows or just a single window?
#1838 could manifest in a similar way if you have an application that repaints in response to focus events

@xorander00
Copy link
Author

Yup, multiple windows. I'll try giving nightly a shot later today and see how it goes.

@no-response no-response bot removed the waiting-on-op Waiting for more information from the original poster label Apr 8, 2022
@wez wez added the waiting-on-op Waiting for more information from the original poster label Apr 8, 2022
@xorander00
Copy link
Author

I just built b908e2d and launched. Just started rising again. Reloaded config via CTRL+SHIFT+R and utilization dropped back down.

I'll explore further here later today. Will try to make a gif as well.

@no-response no-response bot removed the waiting-on-op Waiting for more information from the original poster label Apr 8, 2022
@wez
Copy link
Owner

wez commented Apr 8, 2022

The logs will be the most useful thing; I don't think a screenshot or gif will be especially informative!

@wez wez added the waiting-on-op Waiting for more information from the original poster label Apr 8, 2022
@github-actions
Copy link
Contributor

This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.

@xorander00
Copy link
Author

Sorry about that, been crazy busy. I just generated a log. Started wezterm with trace logging, launched htop in the newly created window. Then created another window with just a terminal prompt and waited until CPU usage started rising at idle. It didn't take long, maybe a minute.

Log file is 30mb. Would you prefer I snip it down or just attach it raw?

@github-actions github-actions bot removed the waiting-on-op Waiting for more information from the original poster label Apr 23, 2022
@github-actions github-actions bot reopened this Apr 23, 2022
@wez
Copy link
Owner

wez commented Apr 23, 2022

Can you tail the last 100000 lines and attach that?

@wez wez added the waiting-on-op Waiting for more information from the original poster label Apr 23, 2022
@xorander00
Copy link
Author

trimmed.log

@github-actions github-actions bot removed the waiting-on-op Waiting for more information from the original poster label Apr 23, 2022
@wez
Copy link
Owner

wez commented Apr 23, 2022

Thanks; please also share your config file, and describe what is running inside wezterm when the CPU usage climbs!

@wez wez added the waiting-on-op Waiting for more information from the original poster label Apr 23, 2022
@xorander00
Copy link
Author

config.tar.gz
That's my config directory for wezterm. Apologies in advance for how it's structured :/ Also, I had to redact some info just to make sure, but nothing that affects the config, just machine/workspace names, etc.

As far as what's running inside wezterm when the CPU climbs, literally nothing (well, except for my shell, which is fish). Even if I login and then run htop and let it sit, the CPU utilization will slowly rise on wezterm over time. Last I remembered it got to around 50% within 10-15 minutes. If I do the exact same thing with a different terminal (KiTTY in my case), CPU stays idle. Just to make sure, I launched wezterm and left it at the prompt and ran htop in a separate KiTTY instance (to be sure that htop wasn't somehow causing it). Same behavior, wezterm process CPU utilization would rise according to htop inside KiTTY.

Also, this happens on both FreeBSD and Windows. I periodically restart wezterm on Windows to bring the CPU back down. It's installed on FreeBSD but not currently using it there until this is fixed as it's not as necessary for me there as it is on Windows (though I'm switching back as soon as this is fixed).

Something else to note is that I'm running multi-monitor (3 x on the windows desktop + 2 x on the FreeBSD laptop). I'm also running barrier (Windows is the server, FreeBSD is the client). Not sure if that info hints at anything, but I'm sure it couldn't hurt.

Versions are as follows:

  • wezterm 20220421-183827-26500c67 on Windows 10 Version 21H2 (OS build 19044.1645), amd64
  • wezterm 20220421-183827-26500c67 on FreeBSD 13-STABLE (1300522), amd64
  • barrier v2.4.0
  • fish v3.3.1
  • starship v1.6.2

@github-actions github-actions bot removed the waiting-on-op Waiting for more information from the original poster label Apr 23, 2022
@xorander00
Copy link
Author

My gut instinct is pondering two things, though I could be completely wrong:

  • GL context switching contention. I'm sure the trace log probably dumped warnings on that, though I haven't really looked at it yet. Both systems have nVidia chips.
  • File watch handles for the config dir are accumulating, or maybe watching the parent dir (which in my case has lots of files). I noticed a bit of I/O activity in my process monitor and also noticed that most of the time seemed to be spent in kernel threads, hence why I was wondering if "automatically_reload_config = false" was being ignored somehow.

@xorander00
Copy link
Author

Hmm, I wonder if it has something to do with fish or starship. On a hunch I decided to try launching "wezterm start /bin/sh" and see if the same behavior happens. So far it doesn't appear so. This is on FreeBSD. I'm going to try it on Windows too (normally I SSH into my FreeBSD boxes and use fish+starship from there, but to test this I'll just launch a local PowerShell instance).

One more thing that I forgot to mention was that it didn't appear to just be the CPU that was rising before, but the vmem size as well (though just a slow, steady climb).

@xorander00
Copy link
Author

Wait no, nevermind. I think I spoke too soon. It's rising again, even with just /bin/sh. Though it could be rising a bit slower, not sure yet.

@xorander00
Copy link
Author

Interesting, it doesn't seem to be happening when launched with --always-new-process.

@xorander00
Copy link
Author

Is there a signal that I can send to wezterm that will trigger a config reload? I'd like to go back to using wezterm on my FreeBSD workstation, but I can't leave it idle for too long without coming back to it chewing up the CPU. Until this issue is resolved, I don't mind starting a background process that periodically sends a SIGUSR1 or SIGHUP or whatever to trigger a reload and bring the CPU utilization back down.

@xorander00
Copy link
Author

So just to toss in some additional info, I flattened out my config into a single file again (~/.config/wezterm/wezterm.lua),launched wezterm-gui, and let it idle.

So far the process has stayed at <1%. If it doesn't start rising and sustains over 1% at idle in the next 30-60 minutes, then it seems likely the culprit has to do with how it's running the lua/config stuff.

I do have automatically_reload_config = false, so it makes me wonder if that's being ignored somehow when it comes to require'd lua modules.

@wez
Copy link
Owner

wez commented May 21, 2022

It's interesting that flattening the config affected this.
c85fce7 has subsequently landed to avoid watching the home directory.
Did your granular config rely on things pulled in from there?
Alternatively, are there other processes that might write to your config directories and trigger some work?

Could you try updating to a more recent build and see if things have changed?

Starting wezterm with WEZTERM_LOG=config=trace,info set in the environment should log when config files are reloaded and help to prove/disprove that idea.

@xorander00
Copy link
Author

Flattening out the config did seem to help it quite a bit, but it does still happen. It's weird. I've switched back to using wezterm as my primary driver on my FreeBSD box again. Currently, whenever I notice the box getting sluggish, I confirm with top (or htop) and then close everything out and restart wezterm.

I'll triple-check to make sure, but I'm 99% sure that there's nothing touching the config files outside of me editing them. I do use chezmoi to manage dotfiles across machines, but I manually invoke that. However, I wonder if calling chezmoi to save changes & push the commits up coincides with wezterm CPU load rising. I'll check that case and see if it's true (but I don't expect it to be).

As far as monitoring the home directory, yeah that'll be a terrible in my case lol. I have tons of files and data spread around under home. I'm not sure if the watcher utilizes ignore files (.ignore, .gitignore, & .fossil-settings/ignore-glob in my case), but I have lots of repos with nodejs, python, rust, and others. So if it's creating a fs watcher on my home directory, then that would certainly do it.

Especially on FreeBSD, where if it's using kqueue to watch directory trees, it'll slow it to a crawl. kqueue currently doesn't support placing a single watcher on the parent directory to include children under that tree for notifications. So the way it's currently implemented is that creating a watcher on a directory tree in turn creates watchers for all child objects as well. I love FreeBSD, but this one of the pain points currently (though I think it's being worked on). It's one of the reasons large trees can be slow if objects are being stat'ed or whatnot.

Just yesterday I actually did update both my FreeBSD and Windows desktops to nightly. I'll monitor it over the coming day or two (with WEZTERM_LOG=config=trace,info) and report back.

@xorander00
Copy link
Author

The other thing I was considering, though less likely given that flattening the config files affected this, is whether or not one of the apps I use on a regularly basis is doing something funky with escape codes and somehow getting caught in a loop or something.

The only thing I can think of is NeoVim, which is my daily driver for editing. Though when CPU load rises, it's only the wezterm process, not nvim, so it's probably less likely.

@xorander00
Copy link
Author

Just a quick update on this issue, I haven't forgotten about it. I just noticed that in the last couple of days, I've left wezterm running idle and the CPU usage has stayed <1%.

I'm not sure if it's due to a graphics driver update or whether I haven't yet caused the conditions (through usage) to trigger the bug. It's pretty strange.

@wez
Copy link
Owner

wez commented Jun 16, 2022

I'm going to close this out as it doesn't appear to be reproducible. If you can get it to trigger then we can investigate further, but it sounds like it is no longer an issue.

@wez wez closed this as completed Jun 16, 2022
@xorander00
Copy link
Author

I may have an idea of what's going on, not sure, but figured it's worth updating this issue. It still happens, but I had a hard time reliably reproducing before and it wasn't the highest priority, so I deferred it previously. It just happened again and I decided to try my best at digging into it again instead of just restarting the process.

TL;DR

  • I manage my config files across machines using chezmoi, including wezterm.lua. chezmoi regenerates wezterm.lua and so it makes me wonder if wezterm is somehow holding onto an invalid config file fd after the file is re-generated.
  • automatically_reload_config = false in my config file, so I'm not sure if it's retaining a poll fd or something upon startup (or if it's just watching something else).
  • Supporting this further is that I saw this issue a lot more when my config was broken out into separate files, and then the issue was less frequent after flattening the config files out. All of those files were managed with chezmoi.
  • Maybe the fix is as simple as validating the config file fd? I haven't looked at the source yet (my rust is very rusty rimshot).

A bit more info:

After digging into it with DTrace, when the process CPU utilization starts to climb, it appears it's because it's hitting a race condition between poll() & read(). I wanted to see what poll() was being called with, but I started dtrace after this was happening already, and since I can't reliably reproduce it, I didn't want to kill the process just yet. However, I did notice that read() appears to be getting an invalid fd when it's called. I suspect poll is getting the same thing since in a normal instance of the process, read()+write() are called the most often (in a 5-second span it's just a couple of hundred calls for the two). For the process with the CPU issue, both poll()+read() were called over 100,000 times in the span of a few seconds while everything else remained normal. I did try reloading the configuration file in that instance of wezterm-gui to see if it would cause it to get a new fd to give poll/read, but it didn't seem to have an affect so I guess it's not getting a new fd or this isn't the condition that causes the race.

When I get the chance, I'll see if I can glean more info. I haven't used either DTrace or Rust in quite a while now, so I have to get back up to speed.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2023

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants