add performance tests #27

Closed
tjanczuk opened this Issue Mar 26, 2013 · 32 comments

Projects

None yet

3 participants

@tjanczuk
Owner

This is to add benchmarks and measure the cost of cross-boundary calls from node.js to .NET. A few numbers that would be useful to understand:

  • the added latency of a node.js to .NET call compared to plain JavaScript call with corresponding functionality,
  • the memory cost of adding CLR to the node.exe process in the baseline case as well as a few common scenarios (from "hello, world" to "access SQL" or "access a SOAP service").
@mythz

Hey I've added some preliminary benchmarks on my fork under the /performance/Echo folder.

Not sure how you want to structure it, e.g. if you want a pull-request or to add it as a new project or keep it as a separate fork as it's quite heavy since it's self-hosting, i.e. includes a copy of node.exe, required npm /node_modules and an Apache benchmark ab.exe.

Anyway it's in benchmarks-direct.js and I started off trying to make it async but kept hitting max callstack errors so the main run loop is now synchronous:

function run(fn, times, cb) {
    var start = Date.now();

    var called = 0;
    for (var i = 0; i < times; i++) {
        fn(function() {
            if (++called == times) {
                var takenMs = Date.now() - start;
                cb(takenMs);
            }
        });
    }
}

I'm using rows in the Northwind database for test data:

1 row
10 rows
100 rows

Each stored as JSON, first sending it as a string and then JSON to measure the differences.

Anyway to run it you just need to do:

node benchmarks-direct.js

And here are the results on my new iMac (the node.js is effectively a no-op):

C:\src\edge\performance\Echo>node benchmarks-direct.js
running 1 Northwind Customer.txt...

1 Northwind Customer.txt 10000 times:
node: 1ms
edge: 274ms
edge is 274x slower

running 10 Northwind Customer.txt...

10 Northwind Customer.txt 10000 times:
node: 1ms
edge: 258ms
edge is 258x slower

running 100 Northwind Customer.txt...

100 Northwind Customer.txt 10000 times:
node: 1ms
edge: 428ms
edge is 428x slower

running 1 Northwind Customer.json...

1 Northwind Customer.json 10000 times:
node: 1ms
edge: 336ms
edge is 336x slower

running 10 Northwind Customer.json...

10 Northwind Customer.json 10000 times:
node: 1ms
edge: 2029ms
edge is 2029x slower

running 100 Northwind Customer.json...

100 Northwind Customer.json 10000 times:
node: 1ms
edge: 19287ms
edge is 19287x slower
@mythz

I'll work on adding benchmarks for vanilla node and express tonight and benchmark it with ab.exe

@mythz

Got some benchmarks-node.js for node.js.

It uses this Apache benchmark batch script ab-benchmarks-node-10k.bat to perform the benchmarks using the same row raw text and json payloads as above (i.e. 1,10,100) to a node.js server vs edge.

The results of each benchmark is saved to a separate date-stamped html and json files in the charts folder.

I'll see if I can create some charts for this tomorrow to visualize the results easier.

@mythz

I've updated parse-ab-results.js which now generates a static html page containing the graphed results.

It generates a date-labelled page e.g. 2013-04-01-echo-benchmarks.html which captures and displays all the results of running the benchmarks 10k times with 1 and 100 concurrent threads.

You can see the charts on-line of you view it in github's htmlpreview

Otherwise I also generate a separate JSON-only file 2013-04-01-echo-benchmarks.json which you just paste into this jsfiddle populating the var series variable, e.g:

http://jsfiddle.net/mythz/jZrmw/

Highcharts also supports an export to PNG function which looks like:

chart

@tjanczuk
Owner

Demis, this is great data, thanks for starting this investigation.

Which edge bits did you use? The latest published to npm (0.7.4) or did you build a private with the latest checked in changes? The removal of JavaScriptSerializer is not yet published to npm.

I also have a few questions and comments regarding the test code itself:

  1. I notice in the node.js case https://github.com/mythz/edge/blob/master/performance/Echo/benchmarks-direct.js#L51-57 you are simply returning the object that was passed in. This will make it a constant time operation regardless of the "size" of the data, since it is passed around by reference. Given that we are trying to assess the additional cost of CLR to .NET marshaling, I think we should remove as many elements of the scenario that make node.js to node.js different from node.js to CLR, except for the marshaling part. In this particular case I would suggest we create a new JavaScript instance to pass to every call (regardless if node.js or edge), and then also create a brand new instance to return for every call (both in node.js here: https://github.com/mythz/edge/blob/master/performance/Echo/benchmarks-direct.js#L54, and in edge.js here: https://github.com/mythz/edge/blob/master/performance/Echo/Startup.cs#L13).

  2. The same probably applies to benchmarks-node.js.

  3. The node and edge scenarios above appear to exhibit similar scalability dynamics. Did you have a chance to compute a relative value of latency for node to edge to see how that changes with the size of data?

  4. In the ab test script https://github.com/mythz/edge/blob/master/performance/Echo/ab-benchmarks-node-10k.bat I would recommend starting a new instance of node.exe before each variation. Otherwise subsequent tests incur GC penalty for previously executed variations.

  5. Is servicestack server multi-threaded? The node scenarios (both pure node and edge one given how it is implemented) are single threaded. Did you have a chance to compare CPU utilization between the scenarios?

  6. It would be a good idea to allow for a small warm-up period in each measurement before actually starting to capture time. Say do 1000 calls and only then start the timer. This is to ensure all static setup had been taken care of and the system is in a stable place.

  7. The benchmark-node.js is actually a good first stress test of edge. Did you notice any stability issues while running these tests?

@mythz

Hi Tomasz,

I used edge from npm from 2 days ago so this would still be using JavaScriptSerializer.

1, 2) ...This will make it a constant time operation regardless of the "size" of the data, since it is passed around by reference.... would suggest we create a new JavaScript instance to pass to every call (regardless if node.js or edge)

Sweet thanks for the info on the internals, I'll construct a new object graph of the same size and send it back in both direct and node/edge/servicestack benchmarks.

  1. The node and edge scenarios above appear to exhibit similar scalability dynamics. Did you have a chance to compute a relative value of latency for node to edge to see how that changes with the size of data?

I haven't yet, but will do. I'll keep internal counter of the average latency and pull it out after the benchmarks have run.

  1. In the ab test script ab-benchmarks-node-10k.bat I would recommend starting a new instance of node.exe before each variation. Otherwise subsequent tests incur GC penalty for previously executed variations.

Good idea, will update it to.

  1. Is servicestack server multi-threaded? The node scenarios (both pure node and edge one given how it is implemented) are single threaded. Did you have a chance to compare CPU utilization between the scenarios?

The ServiceStack HttpListenerBase used is only processing responses on the same HttpListener IO callback thread. I also have a AppHostHttpListenerLongRunningBase app host I can use that uses a different ThreadPool thread to handle the request. If you want I can add that to the list of benchmarks?

  1. It would be a good idea to allow for a small warm-up period in each measurement before actually starting to capture time

Good idea, will do.

  1. The benchmark-node.js is actually a good first stress test of edge. Did you notice any stability issues while running these tests?

Actually I was quite surprised, the apache benchmark results reported 0 failed requests for all benchmarks - Kudos, you've done an amazing job here given the project is so young and you're in un-charted territory!

After work I'll see if I can change it to use edge package from the master repo instead of npm (let me know if there's any gotcha's with this approach). I'll also update the benchmarks with the above changes.

@tjanczuk
Owner

There should be no gotchas with using the latest bits from the repo, just follow the build instructions at https://github.com/tjanczuk/edge#building.

Even if your ServiceStack application is single threaded, the HTTP.SYS does a number of things for you in a multi-threaded fashion. Once we go beyond benchmarking into the all-up application performance measurements, I think there is really only one way to arrive at a meaningful comparison between stacks: make sure the server CPU is close to 100% utilization (>97%). In ServiceStack it probably means going multi-threaded at the application level (i.e. user space). In case of node or node with edge, it likely means using the cluster module to spawn as many child processes as there are cores on the box. And of course the client should be running on a different box.

Lastly, edge really does not aspire to compete with other homogeneous HTTP stacks, be it pure node, Web API, or ServiceStack. Edge is an alternative to building a heterogeneous app with a process boundary between technologies. So the "application level" performance comparison I would like to make at some point is comparing:
1. Node.js app making an HTTP call to a .NET server to run business logic B and get back the results, with
2. Node.js app using edge to call into business logic B implemented in process.

@mythz

IMO it depends on what you're trying to conclude with the benchmarks, running and benchmarking against a single default node or .NET host instance is what it currently does. If .NET is able to leverage multiple cores by default (i.e. without added complexity in app code) because of it's design, then that's an inherent benefit of .NET (as a concession .NET suffers multi-threading issues). I'm not sure what the most common installation of node.js is, but I'm assuming it's a single instance, which means the other cores will go unused unless they cluster it themselves or utilize something like node workers. If most installments of node instance actually utilize clusters, then that is a case for doing that here as well.

Also I should note that I'm running the benchmarks inside a Parallels VM on an iMac - Parallels is only set to use 2 cores but if you think it would be better I can dial it down and use 1, but this will compete with the rest of the running OS/processes so I'm not sure if this would be more informative.

Lastly, edge really does not aspire to compete with other homogeneous HTTP stacks, be it pure node, Web API, or ServiceStack. Edge is an alternative to building a heterogeneous app with a process boundary between technologies.

Sure that's what this project should limit it's scope to, i.e. to make it easily adoptable and hostable in a variety of different contexts, allowing innovation between frameworks - or support small projects to use it vanilla as-is. Although my primary interest in node is to be able to escape the ASP.NET/MVC heavy-framework, enterprise-focus and complexity tax, and be able to offer an integrated Node/(Edge?)/ServiceStack solution that provides the best of both worlds, i.e. leverage node's ecosystem for Web DSL's, web frameworks / html generation and server-side notification support, whilst still leveraging ServiceStack for back-end services. I don't expect any ServiceStack bits to be added to the Edge repo, if that was a concern.

I'm adding benchmarks for ServiceStack as I want to evaluate what the most optimal setup and performance of using ServiceStack + node together is, at the same time I'm also measuring using node/ServiceStack via a node http proxy and I also plan on developing a fast tcp/async interconnect between node/ServiceStack without the overhead of HTTP but still allow process decoupling between node and ServiceStack. Depending on the results will determine which strategy I'll adopt.

o the "application level" performance comparison I would like to make at some point is comparing:
1. Node.js app making an HTTP call to a .NET server to run business logic B and get back the results, with
2. Node.js app using edge to call into business logic B implemented in process.

Right I'll also look at adding some business logic in the services to see how it compares. What do you think is a good test case? I was going to persist the request to Sqlite, but do you think something else like a configurable Fibonacci calculation would make for a better benchmark?

@tjanczuk
Owner

Demis, just shipped 0.7.5 on npm which includes the marshaling improvements. If you npm install these bits you will not need to compile your own any more.

I think of benchmarks mostly in terms of engineering tools that help in driving performance improvements. Their results are usually only mildly interesting as any indication of a performance of a real application. These are the tests I needed when throwing out the JavaScriptSerializer, because without them I would be moving in the dark.

Unlike benchmarks, I think of scenario performance testing as indicative of system performance one can experience when writing a real app. To put it colloquially, these are the numbers I would blog about. This is typically a comparison between alternative approaches to writing an app. And I think it really only makes sense to compare two solutions at max CPU utilization (assuming CPU is the bottleneck one hits first), since one usually writes software in a way that milks the hardware as much as possible rather than in a most convenient way to the programmer. In case of node.js apps this really means any self-respecting app running on any self-respecting server (i.e. multi core) will use the cluster module to fully leverage the CPU.

In terms of a scenario test I would like to have, I think there are two classes of business logic that is interesting to measure:
1. CPU bound computations. Here the Fibonacci sequence is a good approximation.
2. Accessing IO bound functionality that is only implemented in .NET, e.g. doing SOAP with WCF or SQL with ADO.NET. Since the actual compute is kind of irrelevant, I think await Thread.Delay(5000) is good enough of an approximation of the business logic.

I think it would be worthwhile to compare running these two classes of business logic in two variants each:
1. In-process with edge.js
2. Calling into a self-hosted, localhost Web API HTTP server from node.js.

What do you think?

@glennblock
Collaborator
@tjanczuk
Owner

It is certainly an interesting number to have, but I need Web API numbers as I anticipate needing them to drive some of the discussions going forward.

@mythz

I think of benchmarks mostly in terms of engineering tools that help in driving performance improvements. Their results are usually only mildly interesting as any indication of a performance of a real application. These are the tests I needed when throwing out the JavaScriptSerializer, because without them I would be moving in the dark.

Agreed, I think we should focus on this, i'll make the above changes to benchmarks with the focus of this in mind.

since one usually writes software in a way that milks the hardware as much as possible rather than in a most convenient way to the programmer

This I don't agree with, and is a goal I'm opposed to, so it's likely where we differ. Speed is very important, yes, but I consider embracing artificial complexity as one of the biggest problems with the art of programming and the progression of computer science. I wont go on about this too much detail here but I don't see any future for solutions that doesn't aim to simplify and maximize programmer convenience, first. I don't think Speed and Simplicity are at odds either as you're better able to make macro optimizations when the foundations upon which you build on are simple. If you take a look at the common thread in all my OSS libs, they promote the minimum friction and projection complexity but are still able to deliver maximum performance.

So, just to let you know as a philosophical goal this is what I'm looking at how to structurally provide the simplest and most versatile and flexible solution that promotes the least friction, whilst maximizing performance. This will be an area where I'll be able to help out in exploring.

In terms of a scenario test I would like to have, I think there are two classes of business logic that is interesting to measure:
1. CPU bound computations. Here the Fibonacci sequence is a good approximation.
2. Accessing IO bound functionality.... await Thread.Delay(5000)

Cool, I look into adding support for this.

I think it would be worthwhile to compare running these two classes of business logic in two variants each:
1. In-process with edge.js
2. Calling into a self-hosted, localhost Web API HTTP server from node.js.

I can help with 1, but for the self-hosted HTTP Server I'm going to use ServiceStack since I already need to spend the time to do this for my own decision making.

I'm sure there will be no shortage of MS/MVP devs who are familiar with WebApi and will be willing to donate their time in benchmarking it.


I will look at updating to the latest npm and make suggested changes to the benchmarks.

@tjanczuk
Owner

Sounds great, thanks.

I'm sure there will be no shortage of MS/MVP devs who are familiar with WebApi and will be willing to donate their time in benchmarking it.

Yes @glennblock, we are LOOKING at you ;) Seriously though I can do it, Glenn showed me how. It is only 27 lines more than in node.

@glennblock
Collaborator
@glennblock
Collaborator
@mythz

What's the Katana host implementation built on? HttpListener or from scratch on raw tcp sockets?
Will WebApi be switching to use it? when?

@glennblock
Collaborator
@mythz

@tjanczuk hmmm interesting, the payload in edge is coming into C# as object[Dictionary<string,object>]? Do you know of a good/fast way to marshal that into typed C# classes? e.g. List<Customer>

@tjanczuk
Owner

Perhaps a mechanism similar to ASP.NET MVC model binders could be employed, but nothing "out of the box" comes to mind.

@glennblock
Collaborator
@glennblock
Collaborator
@mythz

I think the optimal goal to aim for should be to hydrate it into the C# world of typed C#/.NET classes, and a JSON string would be much easier to hydrate from than a object[Dictionary<string,object>].

Is there much overhead/(e.g. encoding?) in edge if I pass a raw string across instead of a JS object?

Maybe I'll benchmark this as well.

@tjanczuk
Owner

What do you mean by "serialized"? During marshaling from V8 to CLR or back, the data is not serialized into a "one huge string" at any point in time. Marshaling in edge maps between V8 type system and CLR type system directly. Given the lack of strong typing in JavaScript, if you want a Customer in CLR, you need an additional transformation (from CLR types to other CLR types, not from strings to CLR types).

@glennblock
Collaborator
@mythz

The way I see it, here are options we have now:

node string body -> json object |.NET| object[Dictionary<string,object>] -> List<T>

node string body |.NET| C# string -> List<T>

It's easier for me to hydrate List<T> from a json string than from a obj[..].

So I'm just curious what the penalty is like to pass a large node string to C#?

@mythz

anyway I might as well benchmark it, I'll use manual C# boilerplate to convert obj[...] to List<Customer> so we have an idea of the cost of each approach.

@tjanczuk
Owner

I would expect it to be less than with structured data (and in fact your prior measurements appear to arrive at the same conclusion). So if you prefer to do your parsing entirely in you application code, feel free to do so.

Edge gives you the flexibility to pass a blob or structured data (up to the point o IDictionary<string,object> and IEnumerable<object> from node to CLR, and <anything> from CLR to to node.js).

Given that edge already forces you into the prescriptive pattern of using Func<object,Task<object>>, in majority of situations you are going to have some adapter code in CLR to interface with the APIs you actually want to invoke in CLR, This is a natural place to do any additional data transformation that is necessary.

@mythz

I think we were being hurt by converting a string to js object and then using JavaScriptSerializer to convert it to C#.
If we leave it and pass it as a string and use a faster C# json serializer than it might scale better. We also never compared it to the cost of converting Dictionary<string,object> -> List<T>.

I like the fact that we can pass anything, but dealing with Dictionary<string,object> in C# app code is not particularly helpful if we need to maintain manual adapter code to deal with it ourselves. Anyway I'll explore this a bit further to see if I can find anything out. Maybe passing utf8 bytes might be even faster than a string?

Also it would be nice if we can support something like:

var edgeInvoke = edge.func('bin/Release/Echo.dll'");
var edgeInvoke = edge.func('bin/Release/Echo.dll', "Namespace.Startup.Invoke"); //same as above

var edgeInvoke = edge.func('bin/Release/Echo.dll', "Namespace.Startup.CustomInvoke");  

i.e. allow us to invoke different methods from same dll, this would mean I wouldn't have to have different projects for each Invoke impl.

@tjanczuk
Owner

You can already specify custom class names and method to call in CLR, read through the section https://github.com/tjanczuk/edge#how-to-integrate-c-code-into-nodejs-code

@mythz

Oh this is brilliant, just what I'm looking for:

var clrMethod = edge.func({
    assemblyFile: 'My.Edge.Samples.dll',
    typeName: 'Samples.FooBar.MyType',
    methodName: 'MyMethod'
});

Add it to the slides :)

@mythz

ok it's kinda what I suspected, object[Dictionary<string,object>] -> List<T> is slower and scales worse.

I'm running the benchmarks-marshalling.js which looks at Startup.DeserializeJson and Startup.DeserializeObject and I'm using my ServiceStack.Text JSON Serializer.

These are the results:

C:\src\edge\performance\Echo>node benchmarks-marshalling.js
running 1 Northwind Customer...

1 Northwind Customer 10000 times:
jsonStringToType: 1201ms
jsToObjectToType: 1870ms
jsToObjectToType is 1.557x slower

running 10 Northwind Customers...

10 Northwind Customers 10000 times:
jsonStringToType: 1208ms
jsToObjectToType: 2981ms
jsToObjectToType is 2.468x slower

running 100 Northwind Customers...

100 Northwind Customers 10000 times:
jsonStringToType: 1239ms
jsToObjectToType: 14357ms
jsToObjectToType is 11.588x slower

This also doesn't take into the cost of converting the json string into a js object before sending it to DeserializeObject.

For added info, when I did take converting to js object into account by passing a string and changing the fn to:

var jsToObjectToType = function (input) {
    return function (cb) {
        deserializeObject(JSON.parse(input), function (err, result) {
            cb();
        });
    };
};

These were the results:

C:\src\edge\performance\Echo>node benchmarks-marshalling.js
running 1 Northwind Customer...

1 Northwind Customer 10000 times:
jsonStringToType: 1223ms
jsToObjectToType: 1891ms
jsToObjectToType is 1.546x slower

running 10 Northwind Customers...

10 Northwind Customers 10000 times:
jsonStringToType: 1210ms
jsToObjectToType: 3377ms
jsToObjectToType is 2.791x slower

running 100 Northwind Customers...

100 Northwind Customers 10000 times:
jsonStringToType: 1226ms
jsToObjectToType: 17568ms
jsToObjectToType is 14.33x slower
@tjanczuk
Owner

I think you would want to start with a JavaScript object in node.js in both cases, since this is likely the form the data is going to be in originally. This means calling JSON.stringify when calling in the string case. But at the end of the day this will probably not change the results too much.

So which deserializer are you using and how does it work? Reflection?

@tjanczuk tjanczuk closed this Mar 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment