You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 3, 2022. It is now read-only.
After trying to get as much performance out of node as possible it looks like the remaining time spend on the CPU is mostly doing socket read and writes.
There is an open PR ( #916 ) for a minimal relay in node and there is a hosted flame graph
To truly get the next order of magnitude of performance we need to write the relaying code in a different language. One approach would be to write the actual relaying and socket logic in C.
I'm not a C/C++ programmer, however myself and @Matt-Esch brainstormed an idea on how to implement the actual relaying part of ( which is 90% of the flamegraph ) in C/C++.
libuv relay server
varLibuvTChan=require('libuv-tchannel');varparse=newLibuvTChan();// You get frames form the channelparser.onFrame=onFrame;// You create a tcp server in nodenet.createServer(onConnection);functiononConnection(socket){parser.manageSocket(socket._handle);// tchannel Connection/Channel node.js code}// You create out sockets in nodevarsocket=net.createConnection(host,port);parser.manageSocket(socket._handle);// tchannel Connection/Channel node.js code// You can forward frames through the parserparser.sendVolatileFrame(socket._handle,VolativeFrame);// You can also send frames through the parserparser.sendPersistentFrame(socket._handle,PersistentFrame);// Any information for stats and logs will be// sent to javascriptparser.onStats=onStats;
The idea is that all the actual tcp read and write logic is rewritten in C.
This removes the overhead of node's TCP implementation.
This also removes all buffer manipulation overhead in node.js
Interface
Parser.onFrame
The parser.onFrame function must be set in JavaScript and is a function
that takes a VolatileFrame.
A VolatileFrame is backed by buffer in C.
A VolatileFrame can be one of the N types of frames in the protocol.
For our forwarding use cases the VolatileFrame has a few fields that can
be read and a few mutable fields. The mutable fields are id and ttl.
A VolatileFrame also has an persistent() method that returns a PersistentFrame object that is fully realized.
For performance reasons the C implementation will recycle the VolatileFrame
immediately after the function call finished.
This means you must do one of two things synchronously:
Mutate the VolatileFrame and then sendVolatileFrame() synchronously
for fast forwarding.
Call persistent() and get a persistent PersistentFrame that has all the
needed fields so that you can pass it to an endpoint handler.
Note that currently in our relay implementation we wait for identification in the
socket. To be able to make synchronous forwarding decisions we will have to
synchronously forward a Declined error frame when a connection is not
initialized.
Parser.manageSocket(handle)
If you have a TCP Socket in node you can pass the handle to libuv and it will
manager the reading of all incoming frames for you.
Every time it reads a frame it calls Parser.onFrame.
Parser.sendVolatileFrame(handle, VolatileFrame)
For doing efficient forwarding you can mutate the VolatileFrame emitted by onFrame and send it directly to a different handle.
If you want to send a frame without having any other frames you can do so
with sendPersistentFrame(). It's expected that the javascript
code has a pool of persistent frame objects that it can mutate and send.
It's safe to assume that the persistent frame can be recycled and mutated again
after the sendPersistentFrame() call is done.
Big ideas
The big idea here is that a nodejs tchannel relay is just a ringpop cluster
that manages connections.
The actual work of parsing TCP and writing to TCP is all handled in a really
efficient shared C library.
Volatile Frame vs Persistent Frame
Volatile Frame
A Volatile frame is created in C++ and has a piece of memory that is the actual
frame buffer associated with it. A VolatileFrame only exposes information to
JavaScript that is absolutely needed by the relay code.
All volatile frames have the following fields:
mem some kind of representation of the memory
id a mutable int32
type an immutable int8
The size field is hidden and only available in C++.
For each one of the types of frames a volatile frame supports more information.
In the current case the only frame type that has more information is CallRequest which exposes the following fields
ttl a mutable i32
serviceName an immutable utf8 string
callerName an immutable utf8 string
Persistent frame
A persistent frame can only be create from JavaScript. There are unique persistent
frame constructors for all types of frames; for each persistent frame constructor
it has mutable fields for all the pieces of information in the protocol document
There are two ways of creating an persistent frame
Ask the VolatileFrame to populate an persistent frame object with
information from the buffer so that endpoint handlers can do their job
and read all data
Take one of the cached persistent frames meant for writing; set some fields
and call sendPersistentFrame() on the parser.
Open questions
How do we get this deployed
By only moving the socket and parsing code into C/C++ we can continue to re-use the following
All the stats/logger/alerts/visibility integration that we currently have deployed
All of the rate limiting/circuit breaking/peer selection/timeout/channels/connections code stays in node. The C/C++ code just implements a blazing fast relay. All other services, e.g. dispatch, onedirection, etc will continue to use the node client and server.
The actual server and socket management code stays in node. This is key to allowing node to manage as much as possible and also allows us to implement the ringpop/hyperbahn advertise code in node without akward bridging into C/C++ or re-implementing either in C/C++
The C/C++ code effectively will only be replacing a few file in the current node code, the relay_request.js, bufrw/stream/read_machine.js and the v2/call.js classes. The rest of the code will pretty much stay identical
Why invest in C/C++ instead of go/java
Our flamegraphs demonstrate clearly that more then 90% of the CPU is actual forwarding and network logic that has very little to do with the rest of the node implementation; for example only 2% of the process is unoptimized timeout logic, only 4% of the process is unoptimized peer selection. Those parts still have room for optimization but are not the bottleneck
Rather then investing in a complete re-implementation of the entire hyperbahn system in a new language including:
rewriting ringpop
rewriting advertisement logic
rewriting all admin control
rewriting all stats
rewriting all logging
rewriting all the integration tests
refixing all the production bugs from stress testing
It would be ideal to write a minimal implementation of a relay in C/C++ to get our next order of magnitude in performance.
We could implement the minimal relay in go/java and shell out to that from node however that would be difficult to do. There is no standard way to call into java/go from node, you would first have to call into C++ and then call into go/java. The real performance gains to made is a tight coupling to the v8 C++ interface to have a minimal memory allocation overhead as well as having a tight coupling to the node TCPWrap C++ class and libuv so we can just move the minimal hot path socket manipulation code into a non-javascript language.
How do we get this tested.
The existing node tchannel code has a large suite of integration tests that treat all networking code as a black box. Futhermore we have a large suite of tests in rt/hyperbahn as well.
Because the relay server is designed to only handle socket reads we can completely abstract away the fact we are using C/C++ at all in our connection.js class. The vast majority of our tests treat the connection class as a black box and will allow us to re-use the existing nodejs test to verify the C/C++ code.
Futhermore, writing C/C++ addons is a fully supported feature for any node.js project. It's very easy to make binary code a part of the entire engineering workflow and it's pretty easy to import C++ classes into javascript itself.
I remember @breerly tossing around the idea of frame parsing as a C library that could be shared across languages.
If Node performance is leaving us wanting more, then I'd strongly prefer to go down this route versus a Hyperbahn rewrite in Go. We're not domain experts here but we can build something that works. Having one, consistent implementation of the low-level protocol details that we can easily share across languages would be huge.
For an example of how Python could benefit from this: compiling frame parsing to Cython (that is, C but still with a bunch of overhead to support Python duck typing) gives us a ~10x speedup (!!).
The infra work to support C/C++ in production is not too bad. It's just a binary node library like any other binary node library (we already have binary libraries for farmhash etc).
The infra work to support C/C++ in production is not too bad. It's just a
binary node library like any other binary node library (we already have
binary libraries for farmhash etc).
—
Reply to this email directly or view it on GitHub #920 (comment).
After trying to get as much performance out of node as possible it looks like the remaining time spend on the CPU is mostly doing socket read and writes.
There is an open PR ( #916 ) for a minimal relay in node and there is a hosted flame graph
To truly get the next order of magnitude of performance we need to write the relaying code in a different language. One approach would be to write the actual relaying and socket logic in C.
I'm not a C/C++ programmer, however myself and @Matt-Esch brainstormed an idea on how to implement the actual relaying part of ( which is 90% of the flamegraph ) in C/C++.
libuv relay server
The idea is that all the actual tcp read and write logic is rewritten in C.
This removes the overhead of node's TCP implementation.
This also removes all buffer manipulation overhead in node.js
Interface
Parser.onFrame
The parser.onFrame function must be set in JavaScript and is a function
that takes a
VolatileFrame
.A
VolatileFrame
is backed by buffer in C.A
VolatileFrame
can be one of the N types of frames in the protocol.For our forwarding use cases the
VolatileFrame
has a few fields that canbe read and a few mutable fields. The mutable fields are
id
andttl
.A
VolatileFrame
also has anpersistent()
method that returns aPersistentFrame
object that is fully realized.For performance reasons the C implementation will recycle the
VolatileFrame
immediately after the function call finished.
This means you must do one of two things synchronously:
VolatileFrame
and thensendVolatileFrame()
synchronouslyfor fast forwarding.
persistent()
and get a persistentPersistentFrame
that has all theneeded fields so that you can pass it to an endpoint handler.
Note that currently in our relay implementation we wait for identification in the
socket. To be able to make synchronous forwarding decisions we will have to
synchronously forward a Declined error frame when a connection is not
initialized.
Parser.manageSocket(handle)
If you have a TCP Socket in node you can pass the handle to libuv and it will
manager the reading of all incoming frames for you.
Every time it reads a frame it calls
Parser.onFrame
.Parser.sendVolatileFrame(handle, VolatileFrame)
For doing efficient forwarding you can mutate the
VolatileFrame
emitted byonFrame
and send it directly to a different handle.Parser.sendPersistentFrame(handle, PersistentFrame)
If you want to send a frame without having any other frames you can do so
with
sendPersistentFrame()
. It's expected that the javascriptcode has a pool of persistent frame objects that it can mutate and send.
It's safe to assume that the persistent frame can be recycled and mutated again
after the
sendPersistentFrame()
call is done.Big ideas
The big idea here is that a nodejs tchannel relay is just a ringpop cluster
that manages connections.
The actual work of parsing TCP and writing to TCP is all handled in a really
efficient shared C library.
Volatile Frame vs Persistent Frame
Volatile Frame
A Volatile frame is created in C++ and has a piece of memory that is the actual
frame buffer associated with it. A VolatileFrame only exposes information to
JavaScript that is absolutely needed by the relay code.
All volatile frames have the following fields:
mem
some kind of representation of the memoryid
a mutable int32type
an immutable int8The
size
field is hidden and only available in C++.For each one of the types of frames a volatile frame supports more information.
In the current case the only frame type that has more information is
CallRequest
which exposes the following fieldsttl
a mutable i32serviceName
an immutable utf8 stringcallerName
an immutable utf8 stringPersistent frame
A persistent frame can only be create from JavaScript. There are unique persistent
frame constructors for all types of frames; for each persistent frame constructor
it has mutable fields for all the pieces of information in the protocol document
There are two ways of creating an persistent frame
information from the buffer so that endpoint handlers can do their job
and read all data
and call
sendPersistentFrame()
on the parser.Open questions
How do we get this deployed
By only moving the socket and parsing code into C/C++ we can continue to re-use the following
Why invest in C/C++ instead of go/java
Our flamegraphs demonstrate clearly that more then 90% of the CPU is actual forwarding and network logic that has very little to do with the rest of the node implementation; for example only 2% of the process is unoptimized timeout logic, only 4% of the process is unoptimized peer selection. Those parts still have room for optimization but are not the bottleneck
Rather then investing in a complete re-implementation of the entire hyperbahn system in a new language including:
It would be ideal to write a minimal implementation of a relay in C/C++ to get our next order of magnitude in performance.
We could implement the minimal relay in go/java and shell out to that from node however that would be difficult to do. There is no standard way to call into java/go from node, you would first have to call into C++ and then call into go/java. The real performance gains to made is a tight coupling to the v8 C++ interface to have a minimal memory allocation overhead as well as having a tight coupling to the node TCPWrap C++ class and libuv so we can just move the minimal hot path socket manipulation code into a non-javascript language.
How do we get this tested.
The existing node tchannel code has a large suite of integration tests that treat all networking code as a black box. Futhermore we have a large suite of tests in rt/hyperbahn as well.
Because the relay server is designed to only handle socket reads we can completely abstract away the fact we are using C/C++ at all in our connection.js class. The vast majority of our tests treat the connection class as a black box and will allow us to re-use the existing nodejs test to verify the C/C++ code.
Futhermore, writing C/C++ addons is a fully supported feature for any node.js project. It's very easy to make binary code a part of the entire engineering workflow and it's pretty easy to import C++ classes into javascript itself.
cc @jcorbin @prashantv @blampe @mranney
The text was updated successfully, but these errors were encountered: