-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster binary data writing to socket #894
Comments
Here is a simple benchmark, which shows how slow creating a buffer slice using a buffer-like objects is: Note:
|
Please add to your benchmark the new Node v20 ArrayBuffer resize method https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer/resize |
Thanks @e3dio, good to know about And it does not solve the problem, as:
Also, I just realized that this benchmark is not a good one for the |
Created a better benchmark for Still, But more importantly, even if
|
interface HttpResponse {
endFast(buf: ArrayBuffer, offset: number, length: number): void;
// Always 3 arguments, does not close connection.
}
interface WebSocket {
sendFast(buf: ArrayBuffer, offset: number, length: number): void;
// Always 3 arguments, binary message, default compression.
} |
You want to use buffer as a shared buffer, but how do you deal with race conditions when the uWS reads your buffer are you already overwriting this buffer slice in the node process? For example
|
Yes creating a resizable ArrayBuffer is slow and resize is also slow, that is disappointing. This is the fastest way I see, I don't see this in your benchmark: const buf = Buffer.allocUnsafe(1000); // initial fast buffer
const buf2 = buf.subarray(0,500); // zero copy buffer view of data Update: new Uint8Array() is slightly faster for Buffer see #894 (comment) |
The |
uWS.js send/end/write takes a Buffer, there is nothing mentioned about a Buffer view not working, I think uWS copies data on method call so buf.subarray() should work which is very fast |
@e3dio |
@uasan yes, my understanding is that uWS makes a copy of the passed in data. |
I don't see subarray in your benchmark |
@e3dio included |
Creating a temporary
Essentially, creating a temporary In, something like response.endFast(uint8, offset, length);
socket.sendFast(uint8, offset, length); no need to allocate extra 208 bytes. (A WebSocket message itself could be way less than this 208 bytes overhead.) For 100K messages per second, that is extra 20.8 MB/s of work for V8 allocator and garbage collector. |
Just to summarize my case:
uws.get('/rpc', (res, req) => {
const result = rpc.exec(/* ... */);
const slice = encoder.encode(result);
// very slow:
res.end(slice.buf.subarray(slice.start, slice.end));
// would be nice to have:
res.endFast(slice.buf, slice.start, slice.end);
}) |
@e3dio I'm not sure I understand what you mean. Here, Also, |
If you need a node const FastBuffer = Buffer[Symbol.species]
const buf2 = new FastBuffer(buf1.buffer, buf1.byteOffset + offset, length) // buf1.subarray(offset ,length) Or something along those lines. |
To be fair... this is a problem with |
Using the |
151 ms to create 10 million slices does not sound like a problem. Say that you will be capped at 300k req/sec at best - that would put the total overhead of slicing at 0.4% if I did my math correctly. This is not a bottleneck, esp. not considering how sluggish JavaScript is in comparison. |
The math is correct, assuming 300K req/sec, V8 will alloc/dealloc ~60 MB/sec unnecessarily and spend 0.45% or 4.6ms each second being blocked on creating slices. That is assuming the developer will use the most efficient |
Would you accept a PR if I implemented something like below? interface WebSocket {
sendSpecialOrSomethingLikeThat(buffer: ArrayBuffer, offset: number, length: number): void;
} |
@streamich here is good benchmark I think Click to expand benchmarkconst iterations = 1e8;
const ab = new ArrayBuffer(1024 * 4);
const arr = new Uint8Array(ab);
const buf = Buffer.from(ab);
const bench = (name, fn) => {
console.time(name);
for (let i = 0; i < iterations; i++) fn(i % 1024);
console.timeEnd(name);
};
bench('Uint8Array-ab', i => {
new Uint8Array(ab, i, 1);
});
bench('Uint8Array-arr', i => {
new Uint8Array(arr.buffer, i + arr.byteOffset , 1);
});
bench('Uint8Array-buf', i => {
new Uint8Array(buf.buffer, i + buf.byteOffset, 1);
});
bench('subarray-arr', i => {
arr.subarray(i, i + 1);
});
bench('subarray-buf', i => {
buf.subarray(i, i + 1);
}); Results:
So if you start from Buffer, |
If we're talking an overhead of 0.4% and ~200 bytes per slice I don't really see how this is even remotely an issue. It just sounds like a natural property of a scripting language. Scripting languages aren't zero overhead. |
The existing uWS.js API should be fine. Use subarray or new Uint8Array as described above to send uWS.js a zero copy slice/view of buffer |
There is a distinction between what is in scope and what is not. We can't add bypasses for custom use cases such as slice-and-send or extract-and-send or receive-to-json or anything like that. Then we would just add a bunch of hacks to workaround shortcomings of the language. Either accept the shortcomings of scripting or swap to the C++ or C library if you need guaranteed lowest overhead. 0.4% is not even a low hanging fruit |
In my RPC requests and responses have schema, and encoder is JIT compiled according to that schema, which writes straight to big shared |
I think you have effectively optimized your process to the maximum, and you are safe to move on to something else ;) Also shameless plug check out my encoding library for smallest possible size, speed, and ease of use https://github.com/e3dio/packBytes Although I need to update the Buffer sizing mechanism, currently it calculates the exact buffer size needed instead of estimating and sending a view as described in this issue, I need to update that to improve speed |
@e3dio thanks, yeah will take a look. |
I totally salute your dedication but JavaScript is not the right place to be if you care about 0.4% slicing overhead. |
There is also the overhead of GC of all the temporaries which might not be represented in the benchmarks... |
Typically generational GC do not promote young objects that die young (that's the GC way of doing stack variables) to full on GC'd objects |
uWebSockets.js accepts
ArrayBuffer
-and-co JavaScript binary formats when writing to the socket, that is great, but is not enough.All data encoders pre-allocate more memory than needed, but current API does not allow to effectively provide only the slice of the final encoded data, instead of creating a new buffer-like object.
A simplified example:
Instead, responses could be sped up substantially (and consume less memory), with an API as follows:
, where
0
is offset andsize
is the length of binary chunk to copy. (No call to.subarray()
, which is very slow.)Constructing
ArrayBuffer
, orUint8Array
, orBuffer
are very, very slow in Node.js and every release, at least since Node v14, it is getting slower.I would like to propose an optimized API, where the binary chunk slice can be specified using
offset
andlength
params, instead of creating a temporaryArrayBuffer
orUint8Array
.Below are three options for this new API.
Option 1
This would be the fastest option:
The
.end()
and.write()
are the most performance critical methods, so.writeHeader()
and.writeStatus()
could be ignored here.Option 2
This option is slower, but would integrate nicely with the existing API:
constructing a new instance of
Slice
is about 30x faster than creating a "slice" usingArrayBuffer
orUint8Array
.In TypeScript:
This option would integrate well also with the
.writeHeader
and.writeStatus
methods.uWebSockets
could provide theSlice
class, which V8 would hopefully compile as hidden class:Option 3
This option is similar to Option 2, but instead of extending the
RecognizedString
type, various existing methods would receive optionaloffset
andlength
parameters:The same applies to the WebSocket instances.
The text was updated successfully, but these errors were encountered: