Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial TLS connection failure causes TLS client handler to stop and fail endlessly w/ EBUSY #5781

Closed
mike-scott opened this issue Jan 22, 2018 · 15 comments
Assignees
Labels
area: LWM2M area: Networking bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Milestone

Comments

@mike-scott
Copy link
Contributor

The TLS startup code looks like this:
net_app_connect() -> start_tls_client() starts the tls_client_handler thead.

IE: the only way to start the TLS client handler thread is a call to net_app_connect.

If the initial TLS connection fails for any reason (connectivity or otherwise), the TLS client handler is stopped, and using the net_app_connect() call again generates errors as the thread isn't cleared up correctly.

Further, I'm not sure if this is the API flow that we want sample apps to use for making sure TLS comes up correctly. There isn't much in the way of documentation in this area.

@mike-scott mike-scott added area: Networking bug The issue is a bug, or the PR is fixing a bug labels Jan 23, 2018
@nashif nashif added the priority: medium Medium impact/importance bug label Feb 6, 2018
@jukkar
Copy link
Member

jukkar commented Feb 13, 2018

I am not seeing exactly like this outcome in my test. So I am using echo-client with TLS enabled, but I do not have any server setup so that the connection will fail. The TLS handshake will fail and the system returns ETIMEDOUT but the client tries to re-connect.

[net/app] [DBG] tls_client_handler: (0x20000eb4): Starting TLS client thread for 0x20000e0c
[net/app] [DBG] _net_app_tls_init: (0x20000eb4): SSL client setup done
[net/app] [DBG] _app_connected: (0x2000de28): Cannot set recv_cb (-57)
[net/app] [DBG] _app_connected: (0x2000de28): Postponing TLS connection cb for ctx 0x20000e0c
[net/app] [DBG] _net_app_ssl_mainloop: (0x20000eb4): Starting TLS handshake
[net/app] [DBG] _net_app_select_net_ctx_debug: (0x20000eb4): Selecting 0x2000da50 net_ctx (net_app_get_net_pkt():958)
[net/app] [ERR] _net_app_ssl_mainloop: Closing connection -0x6c00
[net/app] [ERR] tls_client_handler: TLS mainloop startup failed (-27648)
[net/app] [DBG] tls_client_handler: (0x20000eb4): Shutting down TLS handler
[net/app] [DBG] _net_app_tls_handler_stop: (0x20000eb4): TLS thread 0x20000eb4 stopped
[net/app] [DBG] net_app_connect: (0x20012c48): Cannot connect to peer (-60)
[net/app] [DBG] _net_app_tls_handler_stop: (0x20012c48): TLS thread 0x20000eb4 stopped
[echo-client] [ERR] connect_tcp: Cannot connect TCP (-60)
[echo-client] [ERR] start_tcp: Cannot init IPv4 TCP client (-60)

Do you remember how did you got the client to return EBUSY?

@mike-scott
Copy link
Contributor Author

Hi @jukkar

The easiest way to reproduce the EBUSY is to build the lwm2m_client sample configured for DTLS for qemu_x86 (in master branch) and connect it to a Leshan demo server (LwM2M) server without setting up any DTLS information:

[terminal window #1: run Leshan demo server]
wget https://hudson.eclipse.org/leshan/job/leshan/lastSuccessfulBuild/artifact/leshan-server-demo.jar
java -jar ./leshan-server-demo.jar

[terminal window #2: build / run lwm2m_client for qemu_x86 using DTLS]
cd /samples/net/lwm2m_client
mkdir build && cd build
cmake -DBOARD=qemu_x86 -DCONF_FILE=prj_dtls.conf ..
make run

You should see a series of -16 (EBUSY) errors and then -12 (ENOMEM) once the heap is used.

@jukkar
Copy link
Member

jukkar commented Feb 14, 2018

You should see a series of -16 (EBUSY) errors and then -12 (ENOMEM) once the heap is used.

Ok, thanks for the steps, I will try to reproduce this.

@nashif
Copy link
Member

nashif commented Feb 20, 2018

any news?

@jukkar
Copy link
Member

jukkar commented May 14, 2018

@mike-scott There has been lot of activity with lwm2m code recently, can you confirm if this is still a valid issue?

@mike-scott
Copy link
Contributor Author

@jukkar I don't think any of the LwM2M changes would affect the ability to tear down and rebuild the TLS connection in the net_app layer. I'll retest and post my findings.

@mike-scott
Copy link
Contributor Author

@jukkar Yep, still an issue.

However, I'm reworking the LwM2M engine code flow which includes how net_app_connect() is used.
This will happen over the next few patchsets as bootstrap support is added. I think it should be possible to kill off the entire connection and reset from scratch when errors like this occur in the future. Currently, that's not how the flow of the code works so I end up staying "connected" but attempting to retry only the send portion.

TL;DR: This will be solved (I think) with future patches that enable bootstrap support.

@nashif
Copy link
Member

nashif commented May 29, 2018

@mike-scott is this going to happen for 1.12?

@mike-scott
Copy link
Contributor Author

@nashif nope. needed to get 18 more patches upstream to finish Bootstrap support. Let's mark as v1.13. I'm also not sure how this will change w/ the upcoming rewrite for sockets.

@nashif nashif added this to the v1.13.0 milestone May 29, 2018
@nashif
Copy link
Member

nashif commented Jul 24, 2018

@mike-scott any updates?

@nashif
Copy link
Member

nashif commented Aug 26, 2018

@mike-scott ping

@nashif nashif modified the milestone: v1.13.0 Aug 26, 2018
@mike-scott
Copy link
Contributor Author

@nashif Not sure if we want to close this. It won't be addressed using the net-app APIs. I'll deal with it in the socket re-write.

@nashif
Copy link
Member

nashif commented Aug 27, 2018

so 1.14?

@mike-scott mike-scott modified the milestones: v1.13.0, v1.14.0 Aug 27, 2018
@mike-scott
Copy link
Contributor Author

done

@mike-scott
Copy link
Contributor Author

Move to sockets eliminates this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: LWM2M area: Networking bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

No branches or pull requests

3 participants