Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely low performance and double free #194

Closed
Zabrane opened this issue Feb 2, 2023 · 9 comments
Closed

Extremely low performance and double free #194

Zabrane opened this issue Feb 2, 2023 · 9 comments

Comments

@Zabrane
Copy link

Zabrane commented Feb 2, 2023

While hammering the echo_server.c (i deleted this line + set SSL=0), i got this error:

$ sw_vers                                                                                                                                                                                                                                                     
ProductName:		macOS
ProductVersion:		13.1
BuildVersion:		22C65

$ clang --version                                                                                                                                                                                                                                             
Apple clang version 14.0.0 (clang-1400.0.29.202)

$ git clone https://github.com/uNetworking/uSockets.git
$ cd uSockets
$ make; make examples
$ ./echo_server
Listening on port 3000...
Client connected
echo_server(62921,0x7ff8503538c0) malloc: double free for ptr 0x7ff1e0008000
echo_server(62921,0x7ff8503538c0) malloc: *** set a breakpoint in malloc_error_break to debug
fish: Job 1, './echo_server' terminated by signal SIGABRT (Abort)

The performances are also poor:

Destination: [127.0.0.1]:3000
Total data sent:     10.2 MiB
Total data received: 8.4 MiB
Bandwidth per channel: 31.261⇅ Mbps
Test duration: 5.00187 s.

Is there a way to increase the receive (resp. send) buffer?

We are trying to replace an old proprietary TCP sever with uSockets.
Here's what we can get already:

Destination: [127.0.0.1]:3000
Total data sent:     19247.0 MiB 
Total data received: 19245.2 MiB
Bandwidth per channel: 32256.206⇅ Mbps
Test duration: 5.00517 s.

This is 3x order of magnitude faster than uSockets.

More or less same result under Linux (Ubuntu-20.04 LTS):

$ uname -a
Linux 5.4.0-137-generic #154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ gcc --version
gcc (Ubuntu 10.4.0-1ubuntu1~20.04) 10.4.0

$ ./echo_server
Listening on port 3000...
Client connected
double free or corruption (top)
Aborted (core dumped)

Test result:

Destination: [127.0.0.1]:3000
Total data sent:     24.3 MiB 
Total data received: 18.1 MiB
Bandwidth per channel: 71.169⇅ Mbps
Test duration: 5.00203 s.
@uNetworkingAB
Copy link
Contributor

That example is doing malloc, memcpy and free every time it streams a chunk to kernel, so I wouldn't use it for benchmarking, esp. not with large data. And if you see double frees then it's pretty broken.

@uNetworkingAB
Copy link
Contributor

You probably want a pre-allocated ring buffer to add/remove to/from if you benchmark

@Zabrane
Copy link
Author

Zabrane commented Feb 2, 2023

@uNetworkingAB could you please help us to adapt the echo TCP example to use a pre-allocated ringbuffer?
Any test code will be more than welcome. I really want to get rid of this proprietary server.

@Zabrane
Copy link
Author

Zabrane commented Feb 2, 2023

That example is doing malloc, memcpy and free every time it streams a chunk to kernel, so I wouldn't use it for benchmarking, esp. not with large data. And if you see double frees then it's pretty broken.

The benchmark consists of sending and receiving a single character as fast as possible. Thus, no large data is involved. Just a single char.

@uNetworkingAB
Copy link
Contributor

void bsd_socket_nodelay(LIBUS_SOCKET_DESCRIPTOR fd, int enabled) {
setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, (void *) &enabled, sizeof(enabled));
}

bsd_socket_nodelay(us_poll_fd((struct us_poll_t *)s), 0);

You probably want to run this in on_open to disable TCP_NODELAY - pretty sure your proprietary variant has TCP_NODELAY=false, we have it true by default

@Zabrane
Copy link
Author

Zabrane commented Feb 3, 2023

@uNetworkingAB didn't change anything even with static buffer (no malloc involved, 1 big global alloc) and TCP_NODELAY set to false per your recommendation.

Linux:

Destination: [127.0.0.1]:3000
Total data sent:     11.7 MiB
Total data received: 6.0 MiB
Bandwidth per channel: 29.729⇅ Mbps
Test duration: 5.00832 s.

@Zabrane
Copy link
Author

Zabrane commented Feb 4, 2023

@uNetworkingAB Hi. How can i set the socket's buffers size when using uSockets?
To get this:

setsockopt(s, SOL_SOCKET, SO_RCVBUF, &sz, sizeof(sz));
setsockopt(s, SOL_SOCKET, SO_SNDBUF, &sz, sizeof(sz));

Does uSockets set the socket to non-blocking?

fcntl(s, F_SETFL, O_NONBLOCK, s)

@Zabrane
Copy link
Author

Zabrane commented Feb 4, 2023

@uNetworkingAB I've noticed that you perform a socket write in the on_echo_socket_data callback. Why?
If i don't write during reads, the on_echo_socket_writable callback's never called.

struct us_socket_t *on_echo_socket_data(struct us_socket_t *s, char *data, int length) {
	struct echo_socket *es = (struct echo_socket *) us_socket_ext(SSL, s);
	/*  don't write, just buffer up the number of 'x' to send back  */
	es->length += length;
	return s;
}

Could you please shed some light on the underlying uSockets design?

@uNetworkingAB
Copy link
Contributor

If you want to pay for consulting time you can send me such an email and we can set it up. It's becoming obvious that you're benchmarking apples vs. carrots here, as the echo_server doesn't do what your alternative does. With some rough math you can infer that you must be doing 3.8 billion messages (chars) per second, which is 0.1 nanosecond a pop, which is not the case. So it's an apple vs. carrot comparison

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants