Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zellij hangs with 100% CPU when opening a new pane with a very long line on screen #2622

Open
aidanhs opened this issue Jul 13, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@aidanhs
Copy link
Contributor

aidanhs commented Jul 13, 2023

~ $ zellij --version
zellij 0.37.2
~ $ stty size
55 211
~ $ uname -av
Linux a-winthinkpad 5.19.0-45-generic #46~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jun 7 15:06:04 UTC 20 x86_64 x86_64 x86_64 GNU/Linux
~ $ gnome-terminal --version
# GNOME Terminal 3.44.0 using VTE 0.68.0 +BIDI +GNUTLS +ICU +SYSTEMD

Reproduction:

$ zellij
$ cat x.c
#include <stdio.h>
int main() { while (1) { printf("/this/is/a/quite/long/path/for/printing"); } }
$ gcc -O2 x.c
$ ./a.out

Wait for 5s then hit ctrl+c. Now attempt to open a new pane with ctrl+p, n. Zellij will be unresponsive and taking 100% of a CPU.

@aidanhs aidanhs added the bug Something isn't working label Jul 13, 2023
@andreytkachenko
Copy link

Yes, very annoying bug.

@aidanhs
Copy link
Contributor Author

aidanhs commented Dec 29, 2023

For watchers of this issue, my two PRs linked above improve the situation by at least 10x, more for longer lines (as some of the behavior was quadratic). Up to 10MB lines the performance is now tolerable.

If you're building for yourself, take the second PR - it includes all commits from the first.

Edit: it's now a series of three, take the third i.e. #3043 if you're building for yourself.

@aidanhs
Copy link
Contributor Author

aidanhs commented Jan 22, 2024

The three PRs have been merged and presumably will be in the next release. In the original test case, trying to split to a new pane might respond in about 5 mins (sometimes less, sometimes more). After the changes, it takes <10s for me - not fast, but at least you can slowly navigate back to the original pane and clear or reset to fix the issue.

I am ok with this performance for now. Not ideal, but good enough that I don't need to force close sessions - I don't mind if this issue is closed.

For future people wanting to look at this issue, I'd start by looking at reducing memory copies. If you open zellij-server/src/panes/grid.rs and take a look at struct Grid, based on my profiling it's my belief that the majority of the time is being taken by shifting terminal characters in the rows in lines_above/viewport/lines_below. You'll be interested in transfer_rows_from_lines_above_to_viewport, transfer_rows_from_viewport_to_lines_above and transfer_rows_from_lines_below_to_viewport - all of these are called from change_size. Transferring rows out of the viewport combines canonical rows and any subsequent wrapped rows into a single canonical row. Transferring rows in to the viewport requires splitting the row based on the current width. For extremely long lines, this can be slow.

I suspect the eventual solution is to have all characters in a single block in memory, and then perform re-splitting and combining of rows by indexing into this big block of memory and shifting around the index points. This eliminates all terminal characters copies when performing row split/merge. You'll want a wrapper datastructure that does the bookkeeping for you and has a special 'index' type rather than a plain usize. The trickiest part of this is managing memory correctly to allow adding/removing characters to the grid:

  1. zellij (like many terminal multiplexers) configures scrollback based on lines rather than bytes [0], so you can't allocate a fixed buffer per pane [1]. So you'll need to allow resizing of this large memory block, but ideally not too often
  2. character removal typically happens at the beginning of the block, when the scrollback is exceeded. You really don't want to be shifting the rest of the characters whenever the scrollback is exceeded, so you need some kind of ring buffer
    2a. VecDeque is the classic choice here, but it operates on individual elements because the split can be at any point...so returning a slice of terminal characters may involve allocation. Always allocating defeats the point of this work, so you need a wrapper around a VecDeque that returns a Cow that only allocates if necessary. This will require reaching into VecDeque::as_slices, rather than using the element-based interface of VecDeque
    2b. the harder alternative if you can't achieve the performance with 2a is to use virtual memory trickery to map the same region of memory in multiple places (nice diagram from magic-buffer), so you can always take a contiguous slice to represent a row. This apparently can be done in a cross-platform way (per vmap and vmcircbuffer). Unfortunately, I don't know the quality/maintenance status of either of these crates (or any others like vmcircbuf)

[0] though honestly I think a configurable character limit upper bound for sanity reasons is reasonable - I don't think any user seriously wants 100 million character scrollback in a single pane
[1] nor would you want to - always allocating the worst case is very wasteful

Finally, here's the script I was using for benchmarking/profiling for the record

Script for profiling
#!/bin/bash
set -o errexit
set -o pipefail
set -o nounset

export TMPDIR=$(pwd)/tmp
export XDG_CACHE_HOME=$(pwd)/tmp/cache
export XDG_CONFIG_HOME=$(pwd)/tmp/config
export XDG_DATA_HOME=$(pwd)/tmp/data
export XDG_RUNTIME_DIR=$(pwd)/tmp/runtime
export XDG_STATE_DIR=$(pwd)/tmp/state
zellij="./target/debug/zellij --data-dir $(pwd)/tmp"
zellij="./target/release/zellij --data-dir $(pwd)/tmp"

recreate_tmpdir() {
    rm -rf $TMPDIR
    mkdir -p $TMPDIR
}

if [ "$1" = zellij ]; then
    shift
    recreate_tmpdir
    $zellij --data-dir=$TMPDIR setup --clean

elif [ "$1" = grind-zellij ]; then
    shift
    recreate_tmpdir
    valgrind --tool=callgrind --vgdb-prefix=$(pwd)/tmp/vgdb-pipe --instr-atstart=no --collect-atstart=yes --trace-children=yes $zellij --data-dir=$TMPDIR/zellij-data setup --clean

elif [ "$1" = time-zellij ]; then
    shift
    #$zellij & # can't do this because it can't find a tty
    mkdir -p $TMPDIR/time
    export TMPDIR=$(pwd)/tmp/time

    #$zellij run -- flock mylock -c 'sleep 3'
    #$zellij run -- flock mylock -c 'cat longline1M'
    $zellij run -- flock mylock -c 'cat longline10M'
    sleep 1 && flock mylock echo 'command done'

    time (
        $zellij action toggle-fullscreen
        $zellij action toggle-fullscreen
        $zellij action close-pane
        $zellij run -c -- true
    )
    #$zellij action close-pane &
    #wait %1
    #kill %2

    #time ($zellij run -c -- flock mylock -c 'sleep 3' && sleep 1 && flock mylock true)
    #time ($zellij run -c -- flock mylock -c 'sleep 3' && sleep 1 && flock mylock true)

elif [ "$1" = start-grind ]; then
    shift
    onoff=$1
    if [ "$onoff" != on -a "$onoff" != off ]; then
        echo pass on or off
        exit 1
    fi
    pid=$(
        callgrind_control --vgdb-prefix=$(pwd)/tmp/vgdb-pipe |
            grep -- 'zellij --server' |
            sed 's/^PID \([0-9]*\):.*/\1/g'
    )
    echo "Found server at $pid"
    callgrind_control --vgdb-prefix=$(pwd)/tmp/vgdb-pipe -i $onoff $pid
fi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants