Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upCPU Usage for ZmqSocket.Receive. 9.8% libzmq (C) usage vs 80% clrzmq(C#) #83
Comments
This comment has been minimized.
This comment has been minimized.
Here is a graph of CPU utilization: |
This comment has been minimized.
This comment has been minimized.
I have forked clrzmq at https://github.com/rohitjoshi/clrzmq and modified the code to use zmq_send and zmq_recv function to send/receive raw byte buffer which has improved performance drastically. See below screenshot with improved performance. https://docs.google.com/open?id=0Bz_UeFnokXJVekI4OEhNN3pUR1E I would prefer if we can merge so we don't have to maintain two repository. |
This comment has been minimized.
This comment has been minimized.
Sorry for the long delay on this. I did not receive any email notifications from GitHub when you created this issue. It appears that you have removed your fork - are you still experiencing CPU usage issues you outlined above? If so, I would encourage you to open a pull request with your fix so that we can review it and merge it in, if appropriate. Once again, sorry for the delay. |
This comment has been minimized.
This comment has been minimized.
Here is forked repo: https://github.com/rohitjoshi/clrzmq-optimized Below page explains the different of CPU utilization The problem with using zmq message is too many PInvoke calls so I replaced with functions which takes raw buffer. |
This comment has been minimized.
This comment has been minimized.
I really like the approach you have taken. I considered something similar in the past, but dropped it to simplify the implementation. Now that you've demonstrated the performance benefits, I would be happy to merge your changes into the master branch. For messages longer than 8192 bytes, instead of splitting them into multiple parts I will fall back to a What are your thoughts on that? |
This comment has been minimized.
This comment has been minimized.
On second thought, my approach described above will not work. It is impossible to know ahead of time on the Receive side whether to receive a buffer or a message. I think the best compromise is to expose |
This comment has been minimized.
This comment has been minimized.
@rohitjoshi Ok, I think I came up with a solution that will work for low powered devices while preserving the expected behaviour for Send & Receive methods. It also avoids new Send/Receive overloads. If you use the Please test this on your system and let me know if you see the same performance benefits. If not, re-open this issue and we'll come up with a better solution. |
This comment has been minimized.
This comment has been minimized.
I am glad you included the performance changes request. As I don't want to fork/maintain a version just for my company. I will try this out and let you know. Most probably it would be next week. |
This comment has been minimized.
This comment has been minimized.
I may have opened a duplicate, but using SocketFlags.DontWait with the out int size overload results in CPU mayhem. I understand there is a precondition around max buffer size, but why did the old bindings not have the CPU symptoms for the equivalent code? // Ruh-oh
int size;
var buffer = new byte[0];
while ((buffer = _socket.Receive(buffer, SocketFlags.DontWait, out size)) == null && !_stop)
{
Thread.Sleep(10);
}
return buffer; // All good
byte[] buffer;
while ((buffer = _socket.Recv(SendRecvOpt.NOBLOCK)) == null && !_stop)
{
Thread.Sleep(10);
}
return buffer; |
This comment has been minimized.
This comment has been minimized.
@danielcrenna Should the Also, are you testing both examples with the same version of libzmq (2.x)? Both versions result in a direct Also, what happens if you pass the 10 millisecond timeout to the receive method instead of calling E.g. byte[] buffer;
do
{
buffer = _socket.Receive(buffer, Timeout.FromMilliseconds(10), out size);
} while (buffer == null && !_stop); |
This comment has been minimized.
This comment has been minimized.
You're right about the comments, I swapped them. I have confirmed that the same libzmq .x version is being used. Using your overload CPU usage decreased to ~50% usage which is still heavily utilized for code accepting no messages, but it's a starting point for further testing. |
This comment has been minimized.
This comment has been minimized.
Cool, thanks. Is this with the latest clrzmq build from the master branch? |
This comment has been minimized.
This comment has been minimized.
Yes, it is. I cloned it into our repo. |
This comment has been minimized.
This comment has been minimized.
Worth considering is that the performance characteristics of DontWait vs. blocking are large enough in difference that it might be worth just cooking a "stop" message rather than trying to wrestle with it, since the locking version consumes almost no resources. |
This comment has been minimized.
This comment has been minimized.
K. I will investigate it. Off the top of my head, it might be due to extra Can you try one last rewrite? var buffer = new byte[ZmqSocket.MaxBufferSize];
int receivedBytes;
do
{
receivedBytes = socket.Receive(buffer, TimeSpan.FromMilliseconds(10));
} while (receivedBytes < 0 && !_stop); This uses |
This comment has been minimized.
This comment has been minimized.
I will try that, but I actually figured out that the problem, for me, is that SocketProxy is giving back an empty byte[] no matter what, and my repurposed loop code was checking for a null byte[] buffer. The end result is a tight loop that is the cause of the CPU churn, not the library or the bindings. Basically, between the two bindings, the behavior has changed such that I used to expect a null buffer, and now I get an empty one, if that makes sense. Up to you if you want to call attention to the subtlety or not. |
This comment has been minimized.
This comment has been minimized.
Snippets for posterity: // New bindings always return a buffer, so use a length check instead or you'll get a tight loop
int size;
var buffer = new byte[0];
while ((buffer = _socket.Receive(buffer, SocketFlags.DontWait, out size)).Length == 0 && !_stop)
{
Thread.Sleep(10);
}
return buffer; // Old bindings send back a null byte[] if nothing is received
byte[] buffer;
while ((buffer = _socket.Recv(SendRecvOpt.NOBLOCK)) == null && !_stop)
{
Thread.Sleep(10);
}
return buffer; |
This comment has been minimized.
This comment has been minimized.
I am evaluating whether to adopt ZMQ, XS, by extension into .NET C# world CLRZMQ, and oh-by-the-way, must also cross compile to ArchLinux ARM. Re: the "Ruh-oh" comments above, how does something like a return byte[] and out size get overlooked? That's seems fundamental enough with any C# .NET library API. It's completely unnecessary, IMO, when callers can simply check buffer==null||buffer.Length==0, or conversely buffer!=null&&buffer.Length>0, and so on. And then, write yourself an extension method if buffer nullness is a concern: public static T[] ToSelfOrArray(this T[] arr) { return arr ?? new T[0]; } to help guarantee buffer has something after the call. Simple. I also tend to prefer IEnumerable to T[] anyway. It extends better into LINQ providers. Straight CXX-style arrays tend to lead to stronger coupling downstream. |
Hello,
I am running performance test on our application which uses clrzmq. It seems ZmqSocket.send and receive are very expensive compared to libzmq.dll.
Below is the breakdown for ZmqSocket.Receive method
SpinWait:17.1%
Stopwatch.GetElapsedDateTimeTicks: 4.1%
Stopwatch.StartNew: 2.4%
Receive: 73.3%
Now CPU usage for Receive (73.3) function
ErrorProxy.get_ShouldTryAgain: 5.1%
SocketProxy.Receive: 64.4%
CPU USage for SocketProxy.Receive()
DisposableIntPtr.Dispose: 11.1%
ZmqMsgT.Init:7.1%
ZmqMsgT.Close:5.8%
SocketProxy.RetryIfInterrupted: 20.8%
CPU Usage for SocketProxy.RetryIfInterrupted:
SocketProxty+<>c__DisplatClassa. 14.4% in which 9.8% is LibZmq.ZmqMsgRecvProc.Invoke
I do have a graph of these calls (png file) if anyone interested. There is no option to attach that file here.