Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mixed up buffered data in demo #4485

Closed
jerch opened this issue Apr 20, 2023 · 4 comments · Fixed by #4527
Closed

mixed up buffered data in demo #4485

jerch opened this issue Apr 20, 2023 · 4 comments · Fixed by #4527
Labels
type/bug Something is misbehaving

Comments

@jerch
Copy link
Member

jerch commented Apr 20, 2023

Repo:

  • spawn demo terminal with a col width of multiples of 10
  • exec python3 -c "print(''.join(['e\u0301e\u0301e\u0301e\u0301e\u0301e\u0301e\u0301e\u0301e\u0301#' for i in range(1000000)]))", which inserts 9 combined char + '#' over and over

Expected:

  • '#' position should not change col

Results:

  • Firefox: all good, no weirdness in scrollback
  • safari, all good, no weirdness in scrollback
  • recent chromium (on linux, 112.0.5615.49) - '#' shifts around indicating that something is wrong with buffer contents. Scrolling through scrollback reveals this:
    grafik

For some non obivous reason the initial shell prompt occurs in the middle of the scrollback again. This behavior is stable across all renderers in chromium, therefore my guess is that the buffer is partially broken. I cannot provoke the same in firefox or safari, which makes me wonder if we have chrome specific code somewhere or if this is a real chrome bug.

Still needs more digging...

Edit - some more tests:

  • ttyd 1.7.3 (based on xterm.js 5.1)
    • with firefox: cannot repro
    • with webkit: cannot repro
    • with chrome (again same chrome as above): repro'ed
  • vscode 1.77.3 terminal (internal chromium 102.0.5005.196) - cannot repro
@jerch jerch added the type/bug Something is misbehaving label Apr 20, 2023
@jerch
Copy link
Member Author

jerch commented Apr 20, 2023

This is a bug in server.js of the demo, here the binary message buffering does not work as expected. For some reason socket.send mixed data up here:

socket.send(Buffer.from(dataBuffer.buffer, 0, length));

I could not quite get down to issue behind that, as it points deep into ws.js and nodejs socket implementation. At least one aspect looks suspicious - the use of the underlying array buffer with a byteoffset of 0 is most certainly wrong in nodejs, as sometimes Buffer might not start from byte 0 of the underlying array buffer (imho its even mentioned somewhere in docs as a speed optimization underneath).

@Tyriar I saw that you changed it from the old Buffer.concat method to the current version for GC reasons in 7908b7c. To me it seems we run now into edge cases, where basic assumptions in ws.js or in Socket are not fullfilled anymore:

One guess is that with a "static" buffer chunk we'd have to wait for the callback of socket.send(data, opts, callback) before we can reuse the same memory. I did not find any clue how ws.js goes about ownership of the data chunk, if and how long send would block, or when we can write to that memory location again. I also looked at socket.write of nodejs but dont know how that exactly maps to ws.send.
With Buffer.concat we dont have the same issue, as it does always an interim copy of the accumulated data, but sure, now alloc/GC gets busy.

I dont care much about perf optimization of the nodejs side of our demo, but if you want to stick with the no alloc/GC approach we will have to implement a bigger static fifo buffer battery with some sort of backpressure until we can write again to that buffer. We cannot have both at once (no alloc and no blocking), as we either have to deal with tons of data (alloc), or have to tell the pty to stop data madness for some time. (For the sake of correctness - ofc blocking at times is the right approach, Buffer.concat is just a lame coding excuse with OOM risk...)

While changing the buffering in server.js fixes things for chrome, I still could not figure out why it only affects chrome in the first place, hmm.


Edit:
It is def. related to the borrowed nature of the memory - I replaced the line above with these 2 variants:

// also borrowed, but w'o the byteOffset issue - still repros!
socket.send(dataBuffer.subarray(0, length));

// interim copy with slice - does not repro anymore
socket.send(dataBuffer.slice(0, length));

So not a non-0 byte offset is the reason, but borrow mechs messing things up. Thats surprising, a non-0 zero byte offset would have explained the re-occurrence pretty good (like the shell prompt written in the past at pos 0 gets reprinted by a later non-0 buffer chunk wrongly pulling from pos 0).

@jerch jerch changed the title serious buffer bug mixed up buffered data in demo Apr 20, 2023
@jerch
Copy link
Member Author

jerch commented Apr 20, 2023

Here is a quickly hacked version based on the current code with proper backpressure:

    // binary message buffering
    function bufferUtf8(socket, timeout, maxSize) {
      const dataBuffer = new Uint8Array(maxSize);
      let sender = null;
      let length = 0;

      // whether we paused pty due to websocket pressure
      let isPaused = false;
      // must hold excess chunks somewhere temporarily
      // those excess chunks still create some alloc/GC actions,
      // hopefully not many depending on how fast the PTY
      // can stop the data inflow
      const excessChunks = [];

      function flush(d) {
        // cannot write to our memory while socket is undrained
        // accumulate chunks in excessChunks, gets flushed on drain below
        if (isPaused) {
          if (d) excessChunks.push(d);
          return;
        }
        socket.send(Buffer.from(dataBuffer.buffer, 0, length));
        // ws.js does not expose return value from socket.write
        // use writableNeedDrain property instead
        if (ws._socket.writableNeedDrain) {
          term.pause();
          isPaused = true;
        }
        length = 0;
        if (sender) {
          clearTimeout(sender);
          sender = null;
        }
      }

      function write(data) {
        if (length + data.length > maxSize) {
          flush(data);
        }
        dataBuffer.set(data, length);
        length += data.length;
        if (length > maxSize || userInput) {
          userInput = false;
          flush(null);
        } else if (!sender) {
          sender = setTimeout(() => {
            sender = null;
            flush(null);
          }, timeout);
        }
      }

      socket._socket.on('drain', () => {
        // resume early, so pty has time to prepare next chunk
        // (opportunistic, might get blocked again right away,
        // if there was too much data in excessChunks)
        if (isPaused) {
          term.resume();
          isPaused = false;
        }
        // need to catch up writing the excess data
        if (excessChunks.length) {
          let i = 0;
          for (; i < excessChunks.length; ++i) {
            write(excessChunks[i]);
            // too much data in excessChunks,
            // need to wait for next drain event :(
            if (isPaused) break;
          }
          if (i < excessChunks.length - 1) {
            excessChunks = excessChunks.splice(0, i);
          } else {
            excessChunks.length = 0;
          }
        }
      });

      return write;
    }
    const send = (USE_BINARY ? bufferUtf8 : buffer)(ws, 5, 262144);

The code is not nice by any means. Better would be to return the blocking state from send/write, and handle that on the caller level. But that needs a full rewrite of the buffering logic, meh.
Also the excessChunks array is unlucky, in C one would just request as much as you can really digest/hold, but we have no such possibility from an event callback (writing alloc-free code in a mainly GC-driven language, where really everything relies on hire&forget memory, is quite challenging).

@Tyriar
Copy link
Member

Tyriar commented May 17, 2023

We can revert my change if it introduced a bug, perf on the demo isn't critically important anyway

@jerch
Copy link
Member Author

jerch commented May 18, 2023

Ah ok, then reverting is prolly the easiest fix here. It still has the "stocking up data" issue, but a fully-fledged integration should deal with that anyway by proper flow control.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something is misbehaving
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants