Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible bug / regression in new TCP stack #28577

Closed
cfriedt opened this issue Sep 21, 2020 · 2 comments
Closed

possible bug / regression in new TCP stack #28577

cfriedt opened this issue Sep 21, 2020 · 2 comments
Assignees
Labels
area: Networking bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug

Comments

@cfriedt
Copy link
Member

cfriedt commented Sep 21, 2020

Describe the bug

I believe I've uncovered a bug in the new TCP stack. I only discovered it because CONFIG_NET_TCP2=y is selected as the default stack instead of the legacy stack as of this commit 70dae09 and I just happened to rebase my changes on a8ffc03 today.

In my scenario, the Greybus service opens 3 TCP server sockets on port 4242, 4243, & 4244. The service thread calls poll(2) on a set of pollfds representing those sockets. When poll(2) returns (i.e. there is traffic on one of the server sockets), accept(2) is called creating a client socket. A pthread is spawned for the client, and the newly created client socket is passed to the spawned thread. The spawned thread then calls poll(2) on the client socket, and that is where poll(2) returns 0 with errno set to 0, which is indicates a timeout. However, poll(2) was called with the timeout value set to -1 meaning that it should wait indefinitely until there is traffic.

I'll double-check the other flags that were set by the call to poll(2) to see if there is anything out of the ordinary, but for now I was just checking POLLIN.

I was able to diagnose the issue partially because my application works perfectly with CONFIG_NET_TCP1=y.

To Reproduce

Heh.. it could take a bit of effort, unfortunately, so I apologize in advance.

Steps to reproduce the behavior:

Follow the steps described in detail here. Note, you will need an ATUSB and a CC1352R1 LaunchXL.

https://gist.github.com/cfriedt/9ad6a10250b5098e1aeb25193c2c9ba3

Instead of checking out https://github.com/cfriedt/zephyr.git, checkout https://github.com/zephyrproject-rtos/zephyr.git.

Apply the patch file here: greybus-not-working-with-tcp2.patch.txt

This should apply cleanly against a8ffc03 from today.

Instead of using the cc1352r_sensortag, use the cc1352r1_launchxl.

The bug becomes evident when building with

west build ${ZEPHYR_PROJECT} -- -DCONF_FILE="prj.conf overlay-802154.conf"

but the application behaves perfectly when built with

west build ${ZEPHYR_PROJECT} -- -DCONF_FILE="prj.conf overlay-802154.conf" -DCONFIG_NET_TCP1=y

Expected behavior

[00:00:00.011,291] <inf> net_config: Initializing network
[00:00:00.111,328] <inf> net_config: IPv6 address: 2001:db8::1
[00:00:00.111,419] <inf> net_config: IPv6 address: 2001:db8::1
[00:00:00.111,572] <dbg> greybus_service.greybus_service_init: Greybus initializing..
[00:00:00.111,846] <dbg> greybus_transport_tcpip.gb_transport_backend_init: Greybus TCP/IP Transport initializing..
[00:00:00.111,999] <dbg> greybus_transport_tcpip.netsetup: created server socket 0 for cport 0
[00:00:00.111,999] <dbg> greybus_transport_tcpip.netsetup: setting socket options for socket 0
[00:00:00.112,030] <dbg> greybus_transport_tcpip.netsetup: binding socket 0 (cport 0) to port 4242
[00:00:00.112,060] <dbg> greybus_transport_tcpip.netsetup: listening on socket 0 (cport 0)
[00:00:00.112,213] <dbg> greybus_transport_tcpip.netsetup: created server socket 1 for cport 1
[00:00:00.112,213] <dbg> greybus_transport_tcpip.netsetup: setting socket options for socket 1
[00:00:00.112,243] <dbg> greybus_transport_tcpip.netsetup: binding socket 1 (cport 1) to port 4243
[00:00:00.112,274] <dbg> greybus_transport_tcpip.netsetup: listening on socket 1 (cport 1)
[00:00:00.112,426] <dbg> greybus_transport_tcpip.netsetup: created server socket 2 for cport 2
[00:00:00.112,426] <dbg> greybus_transport_tcpip.netsetup: setting socket options for socket 2
[00:00:00.112,457] <dbg> greybus_transport_tcpip.netsetup: binding socket 2 (cport 2) to port 4244
[00:00:00.112,487] <dbg> greybus_transport_tcpip.netsetup: listening on socket 2 (cport 2)
[00:00:00.112,731] <inf> greybus_transport_tcpip: Greybus TCP/IP Transport initialized
[00:00:00.114,227] <inf> greybus_service: Greybus is active
[00:00:00.114,593] <dbg> greybus_transport_tcpip.accept_loop: calling poll
[00:00:04.326,019] <dbg> greybus_transport_tcpip.accept_loop: poll returned 1
[00:00:04.326,049] <dbg> greybus_transport_tcpip.accept_loop: socket 0 (cport 0) has traffic
[00:00:04.326,110] <dbg> greybus_transport_tcpip.accept_loop: accepted connection from [2001:db8::2]:42482 as fd 3
[00:00:04.326,141] <dbg> greybus_transport_tcpip.accept_loop: spawning client thread..
[00:00:04.326,324] <dbg> greybus_transport_tcpip.accept_loop: calling poll
[00:00:04.420,501] <dbg> greybus_transport_tcpip.accept_loop: poll returned 1
[00:00:04.420,501] <dbg> greybus_transport_tcpip.accept_loop: socket 2 (cport 2) has traffic
[00:00:04.420,593] <dbg> greybus_transport_tcpip.accept_loop: accepted connection from [2001:db8::2]:50178 as fd 4
[00:00:04.420,593] <dbg> greybus_transport_tcpip.accept_loop: spawning client thread..
[00:00:04.420,776] <dbg> greybus_transport_tcpip.accept_loop: calling poll
[00:00:04.498,809] <dbg> greybus_transport_tcpip.accept_loop: poll returned 1
[00:00:04.498,870] <dbg> greybus_transport_tcpip.accept_loop: socket 1 (cport 1) has traffic
[00:00:04.498,931] <dbg> greybus_transport_tcpip.accept_loop: accepted connection from [2001:db8::2]:47458 as fd 5
[00:00:04.498,962] <dbg> greybus_transport_tcpip.accept_loop: spawning client thread..
[00:00:04.499,176] <dbg> greybus_transport_tcpip.accept_loop: calling poll

Impact
For my application, this is a showstopper

Logs and console output

[00:00:00.011,352] <inf> net_config: Initializing network
[00:00:00.111,389] <inf> net_config: IPv6 address: 2001:db8::1
[00:00:00.111,480] <inf> net_config: IPv6 address: 2001:db8::1
[00:00:00.111,663] <dbg> greybus_service.greybus_service_init: Greybus initializing..
[00:00:00.111,938] <dbg> greybus_transport_tcpip.gb_transport_backend_init: Greybus TCP/IP Transport initializing..
[00:00:00.112,152] <dbg> greybus_transport_tcpip.netsetup: created server socket 0 for cport 0
[00:00:00.112,152] <dbg> greybus_transport_tcpip.netsetup: setting socket options for socket 0
[00:00:00.112,152] <dbg> greybus_transport_tcpip.netsetup: binding socket 0 (cport 0) to port 4242
[00:00:00.112,182] <dbg> greybus_transport_tcpip.netsetup: listening on socket 0 (cport 0)
[00:00:00.112,426] <dbg> greybus_transport_tcpip.netsetup: created server socket 1 for cport 1
[00:00:00.112,426] <dbg> greybus_transport_tcpip.netsetup: setting socket options for socket 1
[00:00:00.112,457] <dbg> greybus_transport_tcpip.netsetup: binding socket 1 (cport 1) to port 4243
[00:00:00.112,487] <dbg> greybus_transport_tcpip.netsetup: listening on socket 1 (cport 1)
[00:00:00.112,701] <dbg> greybus_transport_tcpip.netsetup: created server socket 2 for cport 2
[00:00:00.112,701] <dbg> greybus_transport_tcpip.netsetup: setting socket options for socket 2
[00:00:00.112,731] <dbg> greybus_transport_tcpip.netsetup: binding socket 2 (cport 2) to port 4244
[00:00:00.112,762] <dbg> greybus_transport_tcpip.netsetup: listening on socket 2 (cport 2)
[00:00:00.113,037] <inf> greybus_transport_tcpip: Greybus TCP/IP Transport initialized
[00:00:00.114,807] <inf> greybus_service: Greybus is active
[00:00:00.115,203] <dbg> greybus_transport_tcpip.accept_loop: calling poll
[00:00:05.074,157] <dbg> greybus_transport_tcpip.accept_loop: poll returned 1
[00:00:05.074,157] <dbg> greybus_transport_tcpip.accept_loop: socket 0 (cport 0) has traffic
[00:00:05.074,249] <dbg> greybus_transport_tcpip.accept_loop: accepted connection from [2001:db8::2]:42338 as fd 3
[00:00:05.074,249] <dbg> greybus_transport_tcpip.accept_loop: spawning client thread..
[00:00:05.074,493] <dbg> greybus_transport_tcpip.accept_loop: calling poll
[00:00:07.101,837] <dbg> greybus_transport_tcpip.getMessage: recv returned 0 - EOF??
[00:00:07.101,867] <dbg> greybus_transport_tcpip.thread_fun: recv from fd 3 returned EOF
[00:00:07.101,867] <dbg> greybus_transport_tcpip.thread_fun: closing fd 3

Environment (please complete the following information):

  • OS: Ubuntu Linux (Fossa)
  • Toolchain: Zephyr SDK 0.11.3
  • Commit SHA or Version used: a8ffc03

Additional context
I discovered this possible bug while testing the greybus code that I've been working with for about a year. It's fairly stable, but it is thread, socket, and POSIX intensive.

It should be possible to reproduce the issue using a carefully crafted test case though.

@cfriedt cfriedt added the bug The issue is a bug, or the PR is fixing a bug label Sep 21, 2020
@jukkar
Copy link
Member

jukkar commented Sep 21, 2020

I am not sure I understood what is the actual problem. You have the graybus log but it is not obvious what is the actual issue here, I am not familiar what the graybus is doing. Could you elaborate the issue a bit more?

Never mind, after reading the slack discussion and the description better, it is now clear what fails.

@MaureenHelm MaureenHelm added the priority: low Low impact/importance bug label Sep 22, 2020
@cfriedt
Copy link
Member Author

cfriedt commented Sep 23, 2020

I narrowed it down to something completely unrelated to TCP.

Before I rebased, socket file descriptors began at 0, which was arguably quite bad, as 0 := stdin, 1 := stdout, and 2 := stderr are reserved.

I just noticed now that after rebasing, the sockets began at fd 3. Previously, my application would open up to socket fd 5. Now it attempts to open up to socket fd 8.

I was able to fix the issue with CONFIG_POSIX_MAX_FDS=16 and it's working fine.

Chalking it up to a configuration error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Networking bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug
Projects
None yet
Development

No branches or pull requests

3 participants