TLS communication for secure sockets #8814

rlubos · 2018-07-09T14:54:25Z

This PR is a follow up of #7118.

It provides an alternative TLS implementation for secure sockets, implemented at socket shim layer instead of net_context layer.

Most of this code was rewritten, but I've kept in mind comments from #7118 wherever applicable. Especially, I've added mutex protection for TLS context allocation and mbedTLS calls.
To avoid long mbedTLS mutex lockups during mbedtls_ssl_read/mbedtls_ssl_write, underlying socket works in a non-blocking manner, and the actual blocking takes place outside of these calls (as suggested by @d3zd3z).

To avoid bloating of this PR, it contains only TLS communication part. It enables to create TLS socket and establish TLS connection, yet it does not provide socket options to configure the TLS session (credentials, ciphersuites etc.), therefore the communication cannot be considered secure yet.
What is not added into this PR, but will be submited in consecutive PR's in the near future:

Generic mbedTLS config file, to enable mbedTLS configuration from Kconfig level (as suggested by @pfalcon, I'll make it independend of secure sockets),
Subsystem for credentials management,
Socket options for TLS socket configuration (credentials selection, ciphersuite selection, hostname configuration).

As a proof of concept, I've included two initially ported samples (http_get and big_http_download). They do use TLS for communication, but do not verify certs.

codecov-io · 2018-07-09T15:21:18Z

Codecov Report

Merging #8814 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #8814   +/-   ##
=======================================
  Coverage   52.28%   52.28%           
=======================================
  Files         196      196           
  Lines       24768    24768           
  Branches     5151     5151           
=======================================
  Hits        12951    12951           
  Misses       9738     9738           
  Partials     2079     2079

Impacted Files	Coverage Δ
include/net/socket.h	`100% <ø> (ø)`	⬆️
include/net/net_ip.h	`70.39% <ø> (ø)`	⬆️
include/net/net_context.h	`93.61% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f0de6e0...5e169ff. Read the comment docs.

jukkar

Looks good, minor style issues

jukkar · 2018-07-10T09:56:13Z

subsys/net/lib/sockets/sockets_tls.c

Remove empty line, we normally do not have empty line between variable setting and if() statement that checks it.

jukkar · 2018-07-10T09:57:08Z

subsys/net/lib/sockets/sockets_tls.c

Remove empty line

jukkar · 2018-07-10T09:58:56Z

subsys/net/lib/sockets/sockets_tls.c

remove empty line

jukkar · 2018-07-10T09:59:07Z

subsys/net/lib/sockets/sockets_tls.c

jukkar · 2018-07-10T10:00:37Z

subsys/net/lib/sockets/sockets_tls.c

extra space char in the middle of the string

rlubos · 2018-07-10T11:36:08Z

@jukkar Thanks, done.

jukkar

I missed one compile issue in earlier review.

jukkar · 2018-07-10T11:59:19Z

subsys/net/lib/sockets/sockets_tls.c

This will give error if MBEDTLS_X509_CRT_PARSE_C is not defined, the function would not return anything in that case.

Right, I've removed the ifdef and call to mbedtls_ssl_conf_cert_profile, as we do not verify certs at this point yet, there is no use to set the profile anyway.
Just FYI, this entire function is going to be rewritten, once the credential support is added.

jukkar

LGTM.

@pfalcon @d3zd3z Do you have time to check if this looks better now?

pfalcon · 2018-07-10T14:49:01Z

I'm on vacation until Thurs.

GAnthony

Thanks for this PR!
One general comment...

GAnthony · 2018-07-10T23:50:10Z

subsys/net/lib/sockets/sockets_tls.c

Now that tls has been moved up to the socket layer, and since offloading is still being done at the net_context layer, we may need to do a little extra to support wifi drivers which completely offload the TLS support to the coprocessor.
In a secure socket offloading case, do we need to allocate a TLS context? Maybe (context->tls == NULL) can be the switch to fall back to the non-TLS zsock_ APIs.
Also, there need not be any calls to mbedtls APIs in that case.

There is a CONFIG_NET_OFFLOAD macro which, if not defined, could compile out these TLS calls from sockets.
For example, see CONFIG_NET_OFFLOAD code sections in https://github.com/zephyrproject-rtos/zephyr/blob/master/subsys/net/ip/net_context.c#L261

However, some wifi drivers may not actually handle secure socket offload, so we may need to introduce a new definition (eg: CONFIG_SECURE_SOCKET_OFFLOAD) to compile out the TLS and mbedtls calls from the socket layer.

Then, I imagine secure socket offload would work by:

setsockopt() calls setting up the socket for TLS, which calls get passed down to net_context_set_opt() - TBD, which offloads to the driver.

all the rest of the standard zsock_ socket calls proceed down to net_context_* operations as usual, which offloads to the wifi driver (when CONFIG_NET_OFFLOAD is set).

Is this how we see secure socket offload working?

Otherwise, PR looks good to me.

Thank you for looking into this. Let me address the comments inline.

In a secure socket offloading case, do we need to allocate a TLS context? Maybe (context->tls == NULL) can be the switch to fall back to the non-TLS zsock_ APIs.
Also, there need not be any calls to mbedtls APIs in that case.

With TLS offloading TLS context does not need to be allocated, as the context contains mostly mbedTLS data. As for the fallback, that's already implemented. We only allocate TLS context when secure socket is created (i. e. proto type is TLS, for instance IPPROTO_TLS_1_2). When TLS context is not allocated, we do fallback to regular sockets.

There is a CONFIG_NET_OFFLOAD macro which, if not defined, could compile out these TLS calls from sockets.
However, some wifi drivers may not actually handle secure socket offload, so we may need to introduce a new definition (eg: CONFIG_SECURE_SOCKET_OFFLOAD) to compile out the TLS and mbedtls calls from the socket layer.

Look's like we'd rather need another switch, CONFIG_SECURE_SOCKET_OFFLOAD, just as you mention. At socket shim layer, it would be enough to block TLS context allocation in case we do support TLS offloading.
net_context would need to accept new protocol types though, so that they can be passed to the offloaded driver. In this solution, we use for instance IPPROTO_TLS_1_2 proto type to create a secure socket. Currently, this information is kept at the socket shim layer though and net_context is not aware of it.

setsockopt() calls setting up the socket for TLS, which calls get passed down to net_context_set_opt() - TBD, which offloads to the driver.

This will be needed as well, for socket configuration. We need a way to set certs/keps/ciphers, and that'd be a job for socket options.

Is this how we see secure socket offload working?

Well, I guess that we have consensus here, that it is doable with CONFIG_NET_OFFLOAD.

But from Nordic's point of view though, a better solution would be something like proposal from #4821. I saw that PR got closed due to lack of consensus and socket support in application protocols. But as Zephyr's networking subsystem is undergoing transition to socket API being default, and there are more parites interested in the solution (Nordic), perhaps it would be a good idea to bring back the topic? Forcing the transistion between sockets<->net_context<->sockets sounds like a unnecessary overcomplexity to me.

Thanks for the replies...

net_context would need to accept new protocol types though, so that they can be passed to the offloaded driver. In this solution, we use for instance IPPROTO_TLS_1_2 proto type to create a secure socket.

Yes, either net_context_get() could allow a new protocol number (non-IANA standard) in enum net_ip_protocol, or we could pass another option via setsockopt() to promote an open non-secure socket to a secure socket. (The SimpleLink stack can handle either). It may be nice to support both, eventually (if there's a use case).

But from Nordic's point of view though, a better solution would be something like proposal from #4821.

As I understand it, the main blockers for that PR were:

It bypasses the net_context APIs, so existing protocols built on net_context do not benefit from the socket offload;

This seems to be getting resolved via the plan to move protocols to sockets, and perhaps by the work to split up net_app into libraries.

This socket offload method bypasses the Zephyr IP routing tables, so only one network interface is ever bound to the sockets.

So, for example, packets cannot be routed between network interfaces. Eg: no multi-bearer support (https://en.wikipedia.org/wiki/Multi-bearer_network).

Any other potential uses of the routing table would also be bypassed. Eg: to bind a socket to a particular network interface.

This actually is still OK for our customer's typical use case: an IoT edge device connected via a single network interface (WiFi device). But, it's a constraint that would need to be known/accepted building for a pure "socket offload" device.

perhaps it would be a good idea to bring back the topic?

Given there is interest from at least two important vendors of IoT devices (TI, Nordic), and one of the major issues is getting resolved (protocols migrating to sockets), it seems it may be time to do so.

pfalcon · 2018-07-12T10:57:25Z

include/net/socket.h

This commit is nice, clean and easy to review, I can only +1 it. But I think that commit message should emphasize that the purpose of this commit is to establish "switching infrastructure", but the actual implementation in this commit is null, just redirects to normal socket calls.

pfalcon · 2018-07-12T11:02:51Z

subsys/net/lib/sockets/sockets_tls.c

This file is not part of this commit, and following commit removes this #include (apparently, the decision was made to make "sockets_tls.h" contents inline). So, I guess it makes sense to give another thought whether it should be inline or standalone, and if the former, then remove this line right from this commit (i.e. make each commit buildable across e.g. bisecting).

pfalcon · 2018-07-12T11:09:08Z

subsys/net/lib/sockets/sockets_tls.c

EBADF is the error for such a case, see e.g. http://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html

pfalcon · 2018-07-12T11:09:17Z

subsys/net/lib/sockets/sockets_tls.c

pfalcon · 2018-07-12T11:10:16Z

subsys/net/lib/sockets/sockets_tls.c

Well, this should be atomic, right? So, mutex locking shouldn't be required.

pfalcon · 2018-07-12T11:18:00Z

subsys/net/lib/sockets/sockets_tls.c

So, given EBADF comments above, the way to handle errors might be more involved. It should finalize both TLS context and socket, but if any returned error, it should return it too (with errno set appropriately). If both error out, it's up to you to choose which prevails ;-).

pfalcon · 2018-07-12T12:20:58Z

subsys/net/lib/sockets/sockets_tls.c

Probably worth adding TODO that this won't work for non-blocking sockets.

pfalcon · 2018-07-12T12:31:53Z

subsys/net/lib/sockets/sockets_tls.c

So, in a sense, K_FOREVER contradicts here a case where is_block == false. Overall, all this stuff adds a lot of complication for unknown benefit.

pfalcon · 2018-07-12T12:34:23Z

subsys/net/lib/sockets/sockets_tls.c

Why "poll-like" instead of the proper poll? Overall, all this stuff adds a lot of complication for unknown benefit.

Originally I've implemented it with poll, which indeed was simpler. It didn't work-out though as poll isn't reentrant (uses static event array), so running two separate sockets on two separate threads simply wasn't possible.

pfalcon · 2018-07-12T13:06:18Z

I should start with saying that I very much appreciate this refactor - it's more than I could expect. I made some comments, which I think either should be addressed (like make each commit buildable) or responded too.

But here's one thing I find an overcomplication and problematic:

To avoid long mbedTLS mutex lockups during mbedtls_ssl_read/mbedtls_ssl_write, underlying socket works in a non-blocking manner, and the actual blocking takes place outside of these calls (as suggested by @d3zd3z).

I didn't understand what's the problem in the discussion in #7118, and I don't understand it now. I guess, to approach in general manner, we should start with conservative assumption:

A socket, being an I/O resource, requires explicit application-level synchronization if shared among multiple threads.

Then, we should challenge that by trying to find explicit specification in POSIX where it mandates doing it on socket (i.e. OS) level instead. This specification would include:

Exact requirement what the behavior of 2 concurrent read()/recv() operations (issued by different threads) should be.
Exact requirement what the behavior of 2 concurrent write()/send() operations (issued by different threads) should be.
Exact requirement what the behavior of concurrent write() vs read() operations (issued by different threads) should be.

Let's consider case 1. Among the domain of possibilities there would be: a) "first" call finishes and gets as much data as possible, only then "second" call proceeds; b) they interleave data received in particular pattern; c) they interleave data in arbitrary pattern. Well, we know that POSIX already allows to return less bytes bytes from read()/recv() call than requested. That's logically entails choice c), and there's no need to specify that explicitly (indeed, I bet that won't find any spec like above in POSIX). But then, calling recv() concurrently without additional application-aware synchronization is absolutely useless for the apps.

So, suggestion: from "net: tls: Handle TLS socket send and recv", a) remove "struct k_mutex mbedtls_lock;" b) remove implement blocking-ops-using-non-blocking ops trickery. Each of these changes should be a subject of a separate PR (in that order), with specific arguments given for why it is needed and what drawbacks with that. For example, even if we added "struct k_mutex mbedtls_lock", any well-behaving application (if you call an app which shares socket among threads "well-behaving", 95% of apps will never do that in the first place) would need to use its own mutex in addition to that, so we'll have double-locking which obviously wastes resources. Is there any benefit can be found in having "struct k_mutex mbedtls_lock" then? Yeah, with some vigour it can be found, but then it should be written down in the commit message and discussed from all sides. (OTOH, I can assure that there's no, and can't be benefit in emulating blocking calls using non-blocking - unless the exact no-nonsense code showing otherwise is provided.)

pfalcon

To make sure that points in #8814 (comment) are properly considered, let me give this a temporary -1 to avoid premature merge. Thanks.

rlubos · 2018-07-12T13:36:39Z

@pfalcon Thank you for the feedback.

Regarding mutex protection, for me it's fine to protect the socket (and therefore mbedTLS) access at application level. But as this protection was added in reply to explicit concern from @d3zd3z, I'd like to hear from him that's he's fine with it as well. Notheless, I'll split the mutex-related stuff to a separate commit(s), so they could be easily removed in case we agree on that.

So @d3zd3z I'd like to hear your opinion here.

As for the other stuff, I'll just fix it and update PR soon, nothing to disagree there.

pfalcon · 2018-07-12T13:57:58Z

But as this protection was added in reply to explicit concern from @d3zd3z, I'd like to hear from him that's he's fine with it as well. Notheless, I'll split the mutex-related stuff to a separate commit(s), so they could be easily removed in case we agree on that.

That's why I suggest to split off mutex introduction, etc. to separate focused PR(s), while in the meantime we can merge no-concurrent-socket-access case with this PR ASAP.

Add switch to a socket layer that will enable switching socket API to TLS secure sockets. At this point there is no secure sockets implementation, so secure socket calls redirect to regular socket calls. Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>

Add tls_context structure that stored data required by TLS socket implementation. This structure is allocated from global pool during socket creation and freed during socket closure. Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>

Add entropy source for mbedTLS. If no entropy driver is available, use non-secure, software entropy source. Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>

Add mbedTLS logging function to enable logs from mbedTLS. Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>

Implement TLS handshake handling in socket connect/accept functions. Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>

Implement socket recv/recvfrom and send/sendto functions. Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>

Implement socket poll function for TLS socket. In addition to regular poll checks, we have to check if there is some decrypted data pending on mbedTLS. Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>

Add config file that enables to run http_get and big_http_download samples with TLS enabled and receive the data through HTTPS. Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>

rlubos · 2018-07-13T10:42:12Z

I've applied fixes and extracted mbedTLS mutex implementation to a separate branch. If there are voices that we need mutex at socket level, I can submit another PR.

pfalcon

Thanks! I think this is the good starting point to be merged, and other things can implemented/addressed in follow-up PRs.

One question I'd still have is about poll() implementation, but I don't want to complicate matters by requesting splitting it out to another PR, let's go with it as it is for now.

rlubos requested review from jukkar, pfalcon and tbursztyka as code owners July 9, 2018 14:54

carlescufi requested review from pfl, mike-scott and nashif July 9, 2018 14:55

jukkar added the area: Networking label Jul 9, 2018

jukkar requested changes Jul 10, 2018

View reviewed changes

rlubos force-pushed the tls-socket-communication branch from 3faa7e2 to 3a16a9a Compare July 10, 2018 11:35

jukkar reviewed Jul 10, 2018

View reviewed changes

rlubos force-pushed the tls-socket-communication branch from 3a16a9a to fbceddd Compare July 10, 2018 12:15

jukkar requested a review from d3zd3z July 10, 2018 13:43

jukkar approved these changes Jul 10, 2018

View reviewed changes

GAnthony reviewed Jul 11, 2018

View reviewed changes

pfalcon reviewed Jul 12, 2018

View reviewed changes

subsys/net/lib/sockets/sockets_tls.c Outdated

Copy link

Contributor

pfalcon Jul 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EBADF

pfalcon reviewed Jul 12, 2018

View reviewed changes

pfalcon suggested changes Jul 12, 2018

View reviewed changes

rlubos added 3 commits July 12, 2018 15:58

net: tls: Add mbedTLS entropy source

f3807f2

Add entropy source for mbedTLS. If no entropy driver is available, use non-secure, software entropy source. Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>

rlubos added 2 commits July 13, 2018 10:49

net: tls: Add mbedTLS logging

7a55838

Add mbedTLS logging function to enable logs from mbedTLS. Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>

net: tls: Handle TLS handshake

55d8782

Implement TLS handshake handling in socket connect/accept functions. Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>

rlubos force-pushed the tls-socket-communication branch from fbceddd to a47c871 Compare July 13, 2018 10:32

rlubos added 3 commits July 13, 2018 12:38

net: tls: Handle TLS socket send and recv

4a704bd

Implement socket recv/recvfrom and send/sendto functions. Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>

net: tls: Implement poll with support for mbedTLS pending data

1e04bd0

Implement socket poll function for TLS socket. In addition to regular poll checks, we have to check if there is some decrypted data pending on mbedTLS. Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>

net: samples: Add TLS support to http_get and big_http_download samples

5e169ff

Add config file that enables to run http_get and big_http_download samples with TLS enabled and receive the data through HTTPS. Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>

rlubos force-pushed the tls-socket-communication branch from a47c871 to 5e169ff Compare July 13, 2018 10:38

pfalcon approved these changes Jul 13, 2018

View reviewed changes

nashif merged commit f25baeb into zephyrproject-rtos:master Jul 13, 2018

pfalcon mentioned this pull request Jul 19, 2018

net: tls: Implement TLS socket options #9007

Merged

GAnthony mentioned this pull request Jul 19, 2018

BSD Sockets API: Offloading Support #4821

Merged

This was referenced Jul 27, 2018

Verify TLS server side operation #8026

Closed

Move TLS connection data out from net_context #7371

Closed

rlubos deleted the tls-socket-communication branch August 3, 2018 07:56

TLS communication for secure sockets #8814

TLS communication for secure sockets #8814

Uh oh!

Conversation

rlubos commented Jul 9, 2018

Uh oh!

codecov-io commented Jul 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jukkar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rlubos commented Jul 10, 2018

Uh oh!

jukkar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jukkar left a comment

Choose a reason for hiding this comment

Uh oh!

pfalcon commented Jul 10, 2018

Uh oh!

GAnthony left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pfalcon commented Jul 12, 2018

Uh oh!

pfalcon left a comment

Choose a reason for hiding this comment

Uh oh!

rlubos commented Jul 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pfalcon commented Jul 12, 2018

Uh oh!

rlubos commented Jul 13, 2018

Uh oh!

pfalcon left a comment

Choose a reason for hiding this comment

codecov-io commented Jul 9, 2018 •

edited

Loading

rlubos commented Jul 12, 2018 •

edited

Loading