Skip to content

Conversation

rlubos
Copy link
Contributor

@rlubos rlubos commented Jul 9, 2018

This PR is a follow up of #7118.

It provides an alternative TLS implementation for secure sockets, implemented at socket shim layer instead of net_context layer.

Most of this code was rewritten, but I've kept in mind comments from #7118 wherever applicable. Especially, I've added mutex protection for TLS context allocation and mbedTLS calls.
To avoid long mbedTLS mutex lockups during mbedtls_ssl_read/mbedtls_ssl_write, underlying socket works in a non-blocking manner, and the actual blocking takes place outside of these calls (as suggested by @d3zd3z).

To avoid bloating of this PR, it contains only TLS communication part. It enables to create TLS socket and establish TLS connection, yet it does not provide socket options to configure the TLS session (credentials, ciphersuites etc.), therefore the communication cannot be considered secure yet.
What is not added into this PR, but will be submited in consecutive PR's in the near future:

  • Generic mbedTLS config file, to enable mbedTLS configuration from Kconfig level (as suggested by @pfalcon, I'll make it independend of secure sockets),
  • Subsystem for credentials management,
  • Socket options for TLS socket configuration (credentials selection, ciphersuite selection, hostname configuration).

As a proof of concept, I've included two initially ported samples (http_get and big_http_download). They do use TLS for communication, but do not verify certs.

@codecov-io
Copy link

codecov-io commented Jul 9, 2018

Codecov Report

Merging #8814 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #8814   +/-   ##
=======================================
  Coverage   52.28%   52.28%           
=======================================
  Files         196      196           
  Lines       24768    24768           
  Branches     5151     5151           
=======================================
  Hits        12951    12951           
  Misses       9738     9738           
  Partials     2079     2079
Impacted Files Coverage Δ
include/net/socket.h 100% <ø> (ø) ⬆️
include/net/net_ip.h 70.39% <ø> (ø) ⬆️
include/net/net_context.h 93.61% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f0de6e0...5e169ff. Read the comment docs.

Copy link
Member

@jukkar jukkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, minor style issues

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove empty line, we normally do not have empty line between variable setting and if() statement that checks it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove empty line

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove empty line

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space char in the middle of the string

@rlubos rlubos force-pushed the tls-socket-communication branch from 3faa7e2 to 3a16a9a Compare July 10, 2018 11:35
@rlubos
Copy link
Contributor Author

rlubos commented Jul 10, 2018

@jukkar Thanks, done.

Copy link
Member

@jukkar jukkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed one compile issue in earlier review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will give error if MBEDTLS_X509_CRT_PARSE_C is not defined, the function would not return anything in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I've removed the ifdef and call to mbedtls_ssl_conf_cert_profile, as we do not verify certs at this point yet, there is no use to set the profile anyway.
Just FYI, this entire function is going to be rewritten, once the credential support is added.

@rlubos rlubos force-pushed the tls-socket-communication branch from 3a16a9a to fbceddd Compare July 10, 2018 12:15
@jukkar jukkar requested a review from d3zd3z July 10, 2018 13:43
Copy link
Member

@jukkar jukkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@pfalcon @d3zd3z Do you have time to check if this looks better now?

@pfalcon
Copy link
Contributor

pfalcon commented Jul 10, 2018

I'm on vacation until Thurs.

Copy link

@GAnthony GAnthony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR!
One general comment...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that tls has been moved up to the socket layer, and since offloading is still being done at the net_context layer, we may need to do a little extra to support wifi drivers which completely offload the TLS support to the coprocessor.
In a secure socket offloading case, do we need to allocate a TLS context? Maybe (context->tls == NULL) can be the switch to fall back to the non-TLS zsock_ APIs.
Also, there need not be any calls to mbedtls APIs in that case.

There is a CONFIG_NET_OFFLOAD macro which, if not defined, could compile out these TLS calls from sockets.
For example, see CONFIG_NET_OFFLOAD code sections in https://github.com/zephyrproject-rtos/zephyr/blob/master/subsys/net/ip/net_context.c#L261

However, some wifi drivers may not actually handle secure socket offload, so we may need to introduce a new definition (eg: CONFIG_SECURE_SOCKET_OFFLOAD) to compile out the TLS and mbedtls calls from the socket layer.

Then, I imagine secure socket offload would work by:

  1. setsockopt() calls setting up the socket for TLS, which calls get passed down to net_context_set_opt() - TBD, which offloads to the driver.
  2. all the rest of the standard zsock_ socket calls proceed down to net_context_* operations as usual, which offloads to the wifi driver (when CONFIG_NET_OFFLOAD is set).

Is this how we see secure socket offload working?

Otherwise, PR looks good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking into this. Let me address the comments inline.

In a secure socket offloading case, do we need to allocate a TLS context? Maybe (context->tls == NULL) can be the switch to fall back to the non-TLS zsock_ APIs.
Also, there need not be any calls to mbedtls APIs in that case.

With TLS offloading TLS context does not need to be allocated, as the context contains mostly mbedTLS data. As for the fallback, that's already implemented. We only allocate TLS context when secure socket is created (i. e. proto type is TLS, for instance IPPROTO_TLS_1_2). When TLS context is not allocated, we do fallback to regular sockets.

There is a CONFIG_NET_OFFLOAD macro which, if not defined, could compile out these TLS calls from sockets.
However, some wifi drivers may not actually handle secure socket offload, so we may need to introduce a new definition (eg: CONFIG_SECURE_SOCKET_OFFLOAD) to compile out the TLS and mbedtls calls from the socket layer.

Look's like we'd rather need another switch, CONFIG_SECURE_SOCKET_OFFLOAD, just as you mention. At socket shim layer, it would be enough to block TLS context allocation in case we do support TLS offloading.
net_context would need to accept new protocol types though, so that they can be passed to the offloaded driver. In this solution, we use for instance IPPROTO_TLS_1_2 proto type to create a secure socket. Currently, this information is kept at the socket shim layer though and net_context is not aware of it.

setsockopt() calls setting up the socket for TLS, which calls get passed down to net_context_set_opt() - TBD, which offloads to the driver.

This will be needed as well, for socket configuration. We need a way to set certs/keps/ciphers, and that'd be a job for socket options.

Is this how we see secure socket offload working?

Well, I guess that we have consensus here, that it is doable with CONFIG_NET_OFFLOAD.

But from Nordic's point of view though, a better solution would be something like proposal from #4821. I saw that PR got closed due to lack of consensus and socket support in application protocols. But as Zephyr's networking subsystem is undergoing transition to socket API being default, and there are more parites interested in the solution (Nordic), perhaps it would be a good idea to bring back the topic? Forcing the transistion between sockets<->net_context<->sockets sounds like a unnecessary overcomplexity to me.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the replies...

net_context would need to accept new protocol types though, so that they can be passed to the offloaded driver. In this solution, we use for instance IPPROTO_TLS_1_2 proto type to create a secure socket.

Yes, either net_context_get() could allow a new protocol number (non-IANA standard) in enum net_ip_protocol, or we could pass another option via setsockopt() to promote an open non-secure socket to a secure socket. (The SimpleLink stack can handle either). It may be nice to support both, eventually (if there's a use case).

But from Nordic's point of view though, a better solution would be something like proposal from #4821.

As I understand it, the main blockers for that PR were:

  1. It bypasses the net_context APIs, so existing protocols built on net_context do not benefit from the socket offload;
  • This seems to be getting resolved via the plan to move protocols to sockets, and perhaps by the work to split up net_app into libraries.
  1. This socket offload method bypasses the Zephyr IP routing tables, so only one network interface is ever bound to the sockets.
  • So, for example, packets cannot be routed between network interfaces. Eg: no multi-bearer support (https://en.wikipedia.org/wiki/Multi-bearer_network).
  • Any other potential uses of the routing table would also be bypassed. Eg: to bind a socket to a particular network interface.
  • This actually is still OK for our customer's typical use case: an IoT edge device connected via a single network interface (WiFi device). But, it's a constraint that would need to be known/accepted building for a pure "socket offload" device.

perhaps it would be a good idea to bring back the topic?

Given there is interest from at least two important vendors of IoT devices (TI, Nordic), and one of the major issues is getting resolved (protocols migrating to sockets), it seems it may be time to do so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit is nice, clean and easy to review, I can only +1 it. But I think that commit message should emphasize that the purpose of this commit is to establish "switching infrastructure", but the actual implementation in this commit is null, just redirects to normal socket calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is not part of this commit, and following commit removes this #include (apparently, the decision was made to make "sockets_tls.h" contents inline). So, I guess it makes sense to give another thought whether it should be inline or standalone, and if the former, then remove this line right from this commit (i.e. make each commit buildable across e.g. bisecting).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EBADF is the error for such a case, see e.g. http://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EBADF

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this should be atomic, right? So, mutex locking shouldn't be required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, given EBADF comments above, the way to handle errors might be more involved. It should finalize both TLS context and socket, but if any returned error, it should return it too (with errno set appropriately). If both error out, it's up to you to choose which prevails ;-).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth adding TODO that this won't work for non-blocking sockets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in a sense, K_FOREVER contradicts here a case where is_block == false. Overall, all this stuff adds a lot of complication for unknown benefit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "poll-like" instead of the proper poll? Overall, all this stuff adds a lot of complication for unknown benefit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I've implemented it with poll, which indeed was simpler. It didn't work-out though as poll isn't reentrant (uses static event array), so running two separate sockets on two separate threads simply wasn't possible.

@pfalcon
Copy link
Contributor

pfalcon commented Jul 12, 2018

I should start with saying that I very much appreciate this refactor - it's more than I could expect. I made some comments, which I think either should be addressed (like make each commit buildable) or responded too.

But here's one thing I find an overcomplication and problematic:

To avoid long mbedTLS mutex lockups during mbedtls_ssl_read/mbedtls_ssl_write, underlying socket works in a non-blocking manner, and the actual blocking takes place outside of these calls (as suggested by @d3zd3z).

I didn't understand what's the problem in the discussion in #7118, and I don't understand it now. I guess, to approach in general manner, we should start with conservative assumption:

  • A socket, being an I/O resource, requires explicit application-level synchronization if shared among multiple threads.

Then, we should challenge that by trying to find explicit specification in POSIX where it mandates doing it on socket (i.e. OS) level instead. This specification would include:

  1. Exact requirement what the behavior of 2 concurrent read()/recv() operations (issued by different threads) should be.
  2. Exact requirement what the behavior of 2 concurrent write()/send() operations (issued by different threads) should be.
  3. Exact requirement what the behavior of concurrent write() vs read() operations (issued by different threads) should be.

Let's consider case 1. Among the domain of possibilities there would be: a) "first" call finishes and gets as much data as possible, only then "second" call proceeds; b) they interleave data received in particular pattern; c) they interleave data in arbitrary pattern. Well, we know that POSIX already allows to return less bytes bytes from read()/recv() call than requested. That's logically entails choice c), and there's no need to specify that explicitly (indeed, I bet that won't find any spec like above in POSIX). But then, calling recv() concurrently without additional application-aware synchronization is absolutely useless for the apps.

So, suggestion: from "net: tls: Handle TLS socket send and recv", a) remove "struct k_mutex mbedtls_lock;" b) remove implement blocking-ops-using-non-blocking ops trickery. Each of these changes should be a subject of a separate PR (in that order), with specific arguments given for why it is needed and what drawbacks with that. For example, even if we added "struct k_mutex mbedtls_lock", any well-behaving application (if you call an app which shares socket among threads "well-behaving", 95% of apps will never do that in the first place) would need to use its own mutex in addition to that, so we'll have double-locking which obviously wastes resources. Is there any benefit can be found in having "struct k_mutex mbedtls_lock" then? Yeah, with some vigour it can be found, but then it should be written down in the commit message and discussed from all sides. (OTOH, I can assure that there's no, and can't be benefit in emulating blocking calls using non-blocking - unless the exact no-nonsense code showing otherwise is provided.)

Copy link
Contributor

@pfalcon pfalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure that points in #8814 (comment) are properly considered, let me give this a temporary -1 to avoid premature merge. Thanks.

@rlubos
Copy link
Contributor Author

rlubos commented Jul 12, 2018

@pfalcon Thank you for the feedback.

Regarding mutex protection, for me it's fine to protect the socket (and therefore mbedTLS) access at application level. But as this protection was added in reply to explicit concern from @d3zd3z, I'd like to hear from him that's he's fine with it as well. Notheless, I'll split the mutex-related stuff to a separate commit(s), so they could be easily removed in case we agree on that.

So @d3zd3z I'd like to hear your opinion here.

As for the other stuff, I'll just fix it and update PR soon, nothing to disagree there.

@pfalcon
Copy link
Contributor

pfalcon commented Jul 12, 2018

But as this protection was added in reply to explicit concern from @d3zd3z, I'd like to hear from him that's he's fine with it as well. Notheless, I'll split the mutex-related stuff to a separate commit(s), so they could be easily removed in case we agree on that.

That's why I suggest to split off mutex introduction, etc. to separate focused PR(s), while in the meantime we can merge no-concurrent-socket-access case with this PR ASAP.

rlubos added 3 commits July 12, 2018 15:58
Add switch to a socket layer that will enable switching socket API to
TLS secure sockets. At this point there is no secure sockets
implementation, so secure socket calls redirect to regular socket calls.

Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Add tls_context structure that stored data required by TLS socket
implementation. This structure is allocated from global pool during
socket creation and freed during socket closure.

Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Add entropy source for mbedTLS. If no entropy driver is available, use
non-secure, software entropy source.

Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
rlubos added 2 commits July 13, 2018 10:49
Add mbedTLS logging function to enable logs from mbedTLS.

Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Implement TLS handshake handling in socket connect/accept functions.

Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
@rlubos rlubos force-pushed the tls-socket-communication branch from fbceddd to a47c871 Compare July 13, 2018 10:32
rlubos added 3 commits July 13, 2018 12:38
Implement socket recv/recvfrom and send/sendto functions.

Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Implement socket poll function for TLS socket. In addition to regular
poll checks, we have to check if there is some decrypted data pending on
mbedTLS.

Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Add config file that enables to run http_get and big_http_download
samples with TLS enabled and receive the data through HTTPS.

Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
@rlubos rlubos force-pushed the tls-socket-communication branch from a47c871 to 5e169ff Compare July 13, 2018 10:38
@rlubos
Copy link
Contributor Author

rlubos commented Jul 13, 2018

I've applied fixes and extracted mbedTLS mutex implementation to a separate branch. If there are voices that we need mutex at socket level, I can submit another PR.

Copy link
Contributor

@pfalcon pfalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think this is the good starting point to be merged, and other things can implemented/addressed in follow-up PRs.

One question I'd still have is about poll() implementation, but I don't want to complicate matters by requesting splitting it out to another PR, let's go with it as it is for now.

@nashif nashif merged commit f25baeb into zephyrproject-rtos:master Jul 13, 2018
@rlubos rlubos deleted the tls-socket-communication branch August 3, 2018 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants