Skip to content

Conversation

pfl
Copy link
Contributor

@pfl pfl commented Apr 19, 2018

Initial idea of an TLS setsockopt for Zephyr sockets. See #5985.

This pull request is not to be taken literally as mbedTLS is not connected to the implementation yet, but the test case (function call that succeeds) is what I had in mind initially.

@pfl pfl added In progress For PRs: is work in progress and should not be merged yet. For issues: Is being worked on DNM This PR should not be merged (Do Not Merge) labels Apr 19, 2018
@codecov-io
Copy link

codecov-io commented Apr 19, 2018

Codecov Report

Merging #7118 into master will decrease coverage by 0.34%.
The diff coverage is 40%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7118      +/-   ##
==========================================
- Coverage   64.61%   64.26%   -0.35%     
==========================================
  Files         421      426       +5     
  Lines       40296    40838     +542     
  Branches     6803     6914     +111     
==========================================
+ Hits        26037    26245     +208     
- Misses      11126    11430     +304     
- Partials     3133     3163      +30
Impacted Files Coverage Δ
include/net/net_pkt.h 84.61% <ø> (ø) ⬆️
include/net/net_ip.h 70.54% <ø> (ø) ⬆️
subsys/net/ip/net_private.h 40% <ø> (ø) ⬆️
include/net/net_context.h 93.61% <ø> (ø) ⬆️
subsys/net/ip/tcp.c 58.81% <ø> (ø) ⬆️
subsys/net/lib/sockets/sockets_tls.h 0% <0%> (ø)
subsys/net/ip/net_tls_internal.h 100% <100%> (ø)
subsys/net/ip/ipv6.c 46.3% <100%> (ø) ⬆️
include/net/socket.h 100% <100%> (ø) ⬆️
drivers/net/loopback.c 72% <100%> (ø) ⬆️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6f61f09...18e80da. Read the comment docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "ernno"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Fixed. Once I get the code around mbedTLS somehow on track, I'll also revisit configuration options in these four.

@pfalcon
Copy link
Contributor

pfalcon commented Apr 20, 2018

Thanks @pfl for posting this, I (hopefully not just I!) appreciate such an approach, which allows to understand better the general idea, and if needed, to react to it early.

But this code doesn't bring any surprises - it's the option no.1 from my comment #5985 (comment) . Quoting here:

If it's:

int sock = socket(AF_INET, SOCK_STREAM, PROTO_TLS);
setsockopt(sock, SOL_TLS, SOCKOPT_TLS_SET_CERT, cert_bytes, sizeof(cert_bytes));

-- then this is plain and blatant violation of POSIX. The core of the POSIX is portability. The code like above is not portable to any other OS besides Zephyr, in other words you design a proprietary API, and worse, disguise it as POSIX BSD Sockets API, which can lead only to fragmentation, confusion, and pain.

I can elaborate on this point of view with more details and case studies if needed.

@pfalcon
Copy link
Contributor

pfalcon commented Apr 20, 2018

Otherwise, you posted quite a good comment at #5985 (comment) , which pinpoints culprits and differences in our approaches, and hopefully will allow us to find some common ground to assess them, and see where the differences are crucial (and why each of us sticks to them), and where we have almost the same stuff, modulo details of functions signatures and naming (which are hopefully less important than functionality, thought still should be paid some due diligence).

So, please let me answer that comment (not sure I'll do that today, but will try before next Monday). Thanks.

@pfl
Copy link
Contributor Author

pfl commented Apr 23, 2018

Here is an updated version that starts to gather mbedTLS integration. mbedTLS handling is for sure not what it ought to be, but the idea hopefully makes itself clearer now.

mbedTLS needs to be connected with a source of randomness, that is thought of being done somewhere at stack initialization time. Also the contents of read and write are absent and the logic of closing, as is the sane handling of mbedTLS in general. But at least it compiles.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds dependency of the application-level protocol(s), TLS/DTLS, to a networking context, and that doesn't make sense technically - That's layering violation. What next we will make the networking context depend on? HTTP? MQTT? LWM2M?

(This goes beyond discussing paradigmatic/API design issues in #5985 (comment) to discussing technical issues with such approach. Which, as was mentioned previously, are many too).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes any and every network context waste space on mbedTLS structures, regardless of whether TLS is actually used with it or not. And mbedTLS structures are quite big in general.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice this needs to be a pointer that is "allocated" to this context only when needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This patch just intends to show(el) TLS support placed somewhere, it's not the last word on the issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a separate memory area for TLS data and access them using net_context as key. But as said, this is so far just a prototype. Next step is to have TLS set up properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice ...

In another practice, nothing of that is needed. Because TLS is layered on top of UDP/TCP context, so TLS needs to point to such a context, not the other way around.

But memory management of TLS-related structures is indeed one of the hardest technical problems of this "setockopt, push-under-the-socket-layer" approach. Zstream deals with it in a very smart way - in no way at all. It never allocates such structures, but instead always receives them via dependency injection pattern. Allocation is left to users of API. Yes, that means they have to do it. But that also means they have ultimate flexibility with it. For example, for clients, which require just one TLS structure, it can be allocated statitically, with zero additional management overhead. For servers, it's more complicated, but still offers ultimate flexibility. For example, allocation can use kernel k_mem_slab for a fixed number of connections for guaranteed service, and if that overflows, switch to allocation from heap (for best effort service of larger number of conns). All that without adding any overhead to Zephyr kernel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory needed for TLS is separated from net_context in patch 'net: tls: Separate TLS context from net_context'.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subsys/net/ip/net_tls.h

Even the name of this file looks weird. There's no TLS in IP. Layering violation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire patch series is a layering violation. But, TLS itself is pretty much designed as a layering violation. The best place to describe TLS is as a presentation layer, which sits entirely above the transport layer, and below the application layer.

The only argument I can see for not creating a TLS layer that just calls the socket code is because of the desire to offload work. But, I would argue that we should be offloading at the presentation layer, and not the transport layer. That would mean our API should be there, and, well, something like what net_app already does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to other kconfig files, there is one extra space indendation to help texts (so should start from l letter in previous help line).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, shippable will scream at me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space between )? missing

@pfl
Copy link
Contributor Author

pfl commented Apr 25, 2018

With the above two patches there now is an idea of how to handle sending of data and wrapping it in TLS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There needs to be initial value for ret here, perhaps -EPROTONOSUPPORT so that we are not trying to return uninitialized value if UDP or TCP is not configured but proto is set to be UDP or TCP. In this case also the default branch could be removed if the ret is init in the beginning of the func.

Copy link
Contributor Author

@pfl pfl Apr 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, #if defined(CONFIG_NET_UDP) needs to include all of the case statement, otherwise a request for UDP while UDP is not configured will fall through the cracks. net_tcp_queue_data() is compiled as return -EPROTONOSUPPORT if TCP is not configured, which causes an unconditional goto err; (and the compiler can combine this part with the default: case).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be very careful with infinite timeout as that can lead to deadlocks. This needs to have a timeout, and we need to bail out properly in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, as this API is advertised to work on "net_context" level, it would need to honor corresponding parameters of its API calls: c2c4247#r185754772

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@rlubos
Copy link
Contributor

rlubos commented Apr 27, 2018

Looking at the different options that have been exposed both in this PR and also in the zstream one (#5985) I tend to align more towards the option proposed in this PR.

To build on top of this PR and add a posibility to generically configure credentials for underlying TLS/DTLS implementations I would like to propose some extensions/changes to this PR.

The switch to set the socket to use TLS/DTLS could be done during socket creation to avoid repurposing the socket after, just like a socket created with AF_INET, cannot switch to being AF_INET6 after creation.

In this proposal, the socket is created already with the knowledge that it is going to connect to a TLS endpoint, using a new protocol value, SPROTO_TLS1v2, for example. This would immediately imply that the socket is to be used ofr a TLS v1.2 TCP connection, making the additional, TLS-specific, socket options available to this socket level (SOL_SECURE):

int sock_fd = socket(AF_INET, SOCK_STREAM, SPROTO_TLS1v2); 

In order to provide security credentials to the underlying TLS/DTLS implementation, I also propose a mechanism based on out-of-band tags.
In this example, sec_tag is a reference to the credential which is provissioned out-of-band. A sec_tag or a list of sec_tag would be meaningful for the underlying implementation of TLS/DTLS and might be implementation-specific if the IP/TLS is offloaded.

int sec_tag_ca       = 1; /* ca-chain */
int sec_tag_priv_pub = 2; /* private/public cert */

int sec_tags[] = { sec_tag_ca, sec_tag_priv_pub };

/* Set up sec_tags to use for this socket */ 
struct sec_tag_list tag_list = {
    .sec_tag_count = 2,
    .sec_tag_list  = sec_tags
}; 

The call to setsockopt could then provide the list of tags to be used (i.e. the credentials for the hand-shake) to the underlying TLS/DTLS implementation:

setsockopt(sock_fd,
           SOL_SECURE,
           SO_SEC_TAG_LIST,
           &tag_list,
           sizeof(sec_tag_list));

connect(sock_fd, (const struct sockaddr *)&dest_addr, sizeof(dest_addr)); 

The call to connect/accept might now of course fail with (D)TLS errors, so we would need to augment the scope of the possible values of errno to cover this, for example:

EKEYREJECTED
EKEYEXPIRED
EKEYREVOKED
ENOKEY

@pfalcon
Copy link
Contributor

pfalcon commented Apr 27, 2018

Looking at the different options that have been exposed both in this PR and also in the zstream one (#5985) I tend to align more towards the option proposed in this PR.

Yes, but why? Especially given the concerns expressed about API design of this PR, discussion however happens in 5985: #5985 (comment)

@rlubos
Copy link
Contributor

rlubos commented Apr 30, 2018

@pfalcon Many arguments have already been raised, so I think there is no need to repeat ourselves. From my point of view, doing TLS below socket API is a neat approach in case you need TLS offloading. Doing it this way would enable seamless integration of our other project with Zephyr in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pfl,

I've been trying to come up to speed on the two approaches since they came up at the TSC last week. I've read as much as I could, but would appreciate some clarifications.

In response to @pfalcon's original zstream PR, @tbursztyka said:

The problem is that it's taking the problem by the easiest part: read, write, flush and close.
But TLS will needs way more than that, and dedicated stuff such as configure and handshake for instance, are annoying parts. How is this approach going to address such requirements?

I am not really clear on how these concerns are addressed by the setsockopt() approach, either.

Here the default SSL configuration is chosen, and the handshake below is performed in the same routine. I'm not sure how an application programmer would set up root certificates etc. for authenticating the host, or set up their own certificates, or configure cipher suites, or manage protocol extensions like SNI, within this API.

In particular, is there a design written down anywhere describing the setsockopt() plan, similarly to issues #6086, #5900 for zstream?

So far I have read a lot of praise for the TI SimpleLink API approach. I haven't read that API, so perhaps the answer to all of these questions is "we'll do it like SimpleLink does".

But I'm assuming given all your work to integrate with Zephyr's existing sockets that your approach is substantially more complex in the details than just adopting those APIs wholesale.

Any clarification appreciated!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the default SSL configuration is chosen, and the handshake below is performed in the same routine. I'm not sure how an application programmer would set up root certificates etc. for authenticating the host, or set up their own certificates, or configure cipher suites, or manage protocol extensions like SNI, within this API.

In this example (compile time) defaults are used for everything. rlubos above showed more options above in line with what a non-trivial application might want to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this example (compile time) defaults are used for everything. rlubos above showed more options above in line with what a non-trivial application might want to do.

Sure, and I saw that.

Are you saying you are going to adopt his proposal as-is?

I'm still not understanding what the design will be which can address the concerns originally outlined by @tbursztyka, and which seem reasonable to me.

From the public code on GitHub and posts to the mailing lists, what is visible so far looks like iterative design by development. My question is if you do have a design document somewhere, or if this is in fact what is going on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can talk with tbursztyka as I don't have his comments in front of me, nor can I remember what they were about. This PR is WIP and iterative until I have figured out what is needed. rlubos comments will be wrapped into a patch once we're that far. But first I'd like to iterate over what the most default case should be doing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that for "TLS sockets", you're going to randomly ignore "cb" and "user_data" parameters?

(Layering violation. There's no TLS on "net_context" level. And can be only by making bunch of slips like above, compromising the original net_context API contract (which is not ideal in the first place, but at least "stable")).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, "timeout" is being ignored too, re: f8f1f33#r184029293

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending user input data over TLS fragment by fragment is not ideal (adds overhead, limits throughput). Zstream API is free from such a stipulation (it actually has API-level support for arbitrary buffering, if useful to the underlying implementation).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to figure out how to use mbedtls, I was hoping it were good enough in grouping fragmented writes. If not, I need to look at Zstream code, which I was thinking of using provided that I first understand what mbedtls is in general doing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping it were good enough in grouping fragmented writes.

It does not. I was surprised (pretty much) about that too, but such approach makes sense too - if user called mbedtls_ssl_write() and it returned success, the user knows that the data was flushed to the underlying network stack.

Zstream API includes an explicit zstream_flush() call which user is supposed to call when he wants to be sure that data is flushed down the network stack (so he can expect a reply from peer for example). Based on the above, for current stream implementation (plain socket and mbedTLS) such a call is actually null, but it's still mandatory and important, and allows e.g. to implement "userspace" buffering, which is especially easy/useful due to full composability of streams (so, buffered stream can wrap TLS stream which wraps socket stream).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I had to demonstrate that no bytes get left behind with suboptimal mbedtls usage, or make everybody notice that not all fragments were accounted for. mbedtls API is not that nice, nor does it understand Zephyr buffers and packets. But that is business as usual.

@pfl
Copy link
Contributor Author

pfl commented May 3, 2018

In this proposal, the socket is created already with the knowledge that it is going to connect to a TLS endpoint, using a new protocol value, SPROTO_TLS1v2, for example. This would immediately imply that the socket is to be used ofr a TLS v1.2 TCP connection, making the additional, TLS-specific, socket options available to this socket level (SOL_SECURE):

int sock_fd = socket(AF_INET, SOCK_STREAM, SPROTO_TLS1v2);

Yes, this is definitely to be added.

In order to provide security credentials to the underlying TLS/DTLS implementation, I also propose a mechanism based on out-of-band tags.
In this example, sec_tag is a reference to the credential which is provissioned out-of-band. A sec_tag or a list of sec_tag would be meaningful for the underlying implementation of TLS/DTLS and might be implementation-specific if the IP/TLS is offloaded.

int sec_tag_ca = 1; /* ca-chain /
int sec_tag_priv_pub = 2; /
private/public cert */

int sec_tags[] = { sec_tag_ca, sec_tag_priv_pub };

/* Set up sec_tags to use for this socket */
struct sec_tag_list tag_list = {
.sec_tag_count = 2,
.sec_tag_list = sec_tags
};

The call to setsockopt could then provide the list of tags to be used (i.e. the credentials for the hand-shake) to the underlying TLS/DTLS implementation:

setsockopt(sock_fd,
SOL_SECURE,
SO_SEC_TAG_LIST,
&tag_list,
sizeof(sec_tag_list));

connect(sock_fd, (const struct sockaddr *)&dest_addr, sizeof(dest_addr));

Yes, this is something that is to be added for those cases where Zephyr isn't a one-trick compile time pony, but where it has something sensible to add run time or being configured after flashing with some storage space to spare.

The call to connect/accept might now of course fail with (D)TLS errors, so we would need to augment the scope of the possible values of errno to cover this, for example:

EKEYREJECTED
EKEYEXPIRED
EKEYREVOKED
ENOKEY

Yes indeed.

@pfalcon
Copy link
Contributor

pfalcon commented May 8, 2018

So, I would like to ack that at the last (May 3) Zephyr Networking Forum meeting, the approach of this PR was supported by the majority of participants (literally, by everyone except me).

I however would like to request that:

  1. All the code added in this PR could be disabled based on the configuration setting. This is due to the fact that it adds non-standard extension to BSD Sockets API, and there're many usecases when it won't be needed/used.
  2. As much as possible, to avoid modifying the existing code. Instead, just try to add new code - which, per p.1, would be wrapped in #ifdef. This is due to the fact that the code of the IP stack is already complex, we have some hard to reproduce issues already, and regularly have cases when some "fixes" break/regress something else, and that even goes undetected for a while (i.e. we don't have adequate testing for the IP stack). Mixing TLS into this would increase complexity significantly, and will make debugging/testing even more complicated.

The current state of this PR kinda goes along these requirements, but not very consistently. E.g., there seems to be defined 3 config options: CONFIG_NET_SOCKETS_SOCKOPT_TLS, CONFIG_NET_TLS, CONFIG_NET_DTLS. First of them isn't even defined in Kconfig. I'd suggest to define one "master switch" options, then sub-options for specific features, e.g. TLS vs DTLS.

Regarding 2nd suggestion, I appreciate that it may be not always possible to add cleanly new TLS-related branches to the code. And the existing code already goes for splitting an existing function. Still, I think it's worth to try explicitly to avoid that whenever possible to keep changes manageable. I'm writing this on the fresh cases of finding regressions in socket handling from the previous changes made: #7365, #7377.

Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mbedtls_ssl_session_reset clears in/out buffers allocated in mbedtls_ssl_setup, therefore calling it from here will result in a fault. Actually, mbedtls_ssl_setup can also return an error without allocating the buffers, so error management should be refined a little bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't net_tls_connect be called after net_tcp_connect, as it is supposed to do the TLS handshake?

/** TLS packet fifo. */
struct k_fifo rx_fifo;

/** Receive callback for TLS */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really "Receive callback for TLS"? Maybe it's "Original user recv callback for the context"?

mbedtls_x509_crt_init(&tls_context[i].ca_chain);
#endif
#if defined(MBEDTLS_DEBUG_C) && defined(CONFIG_NET_TLS_DEBUG)
mbedtls_ssl_conf_dbg(&tls_context[i].config, my_debug, NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad indentation.


if (ret != MBEDTLS_ERR_SSL_WANT_WRITE &&
ret != MBEDTLS_ERR_SSL_WANT_READ) {
return -EBADF;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad error code.

@rlubos
Copy link
Contributor

rlubos commented Jun 28, 2018

@pfalcon

Does this mean, that with the idea of doing all the TLS via setsockopt, you guys ended up with an adhoc, non-setsockopt API calls to configure parts of it still? What's the logic behind this?

Yes, that's part of the idea that you can share credentials across multiple sockets and have an option to pre-provisison during compilation or after compilation. For example this would support storing credentials in a co-processor, pre-provisioned or provisioned out-of-band.

@carlescufi carlescufi requested a review from MaureenHelm June 28, 2018 12:45
@carlescufi
Copy link
Member

As per the TSC call yesterday @MaureenHelm @galak @nashif can you weigh in on the appropriateness of getting this PR merged once @pfalcon's comments are addressed by @rlubos and @pfl ?

Copy link
Member

@MaureenHelm MaureenHelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the TSC call yesterday @MaureenHelm @galak @nashif can you weigh in on the appropriateness of getting this PR merged once @pfalcon's comments are addressed by @rlubos and @pfl ?

I agree with @pfalcon that this should be split up into multiple PRs. It is a huge number of changes with CI checking only the very last one.

@nashif
Copy link
Member

nashif commented Jun 29, 2018

As per the TSC call yesterday @MaureenHelm @galak @nashif can you weigh in on the appropriateness of getting this PR merged once @pfalcon's comments are addressed by @rlubos and @pfl ?

IMO we should mandate to have multiple PRs just based on the number of commits in a PR. Splitting PRs makes only sense if you have multiple changes to multiple areas or subsystems that are unrelated. Look at the PR as a patch series that when split makes development of the feature very difficult and we will end up having to pull changes from PR X to PR Y to satisfy dependencies.

So my vote is to get this merged after real feedback and concerns have been raised.

@nashif
Copy link
Member

nashif commented Jun 29, 2018

I agree with @pfalcon that this should be split up into multiple PRs. It is a huge number of changes with CI checking only the very last one.

Splitting this into multiple PR will not help with this, you will have multiple PRs that will also run CI on the latest commit of each PR, so this will be useless. If we are trying to validate every commit, this can also be done by the developers or offline and just run on a subset of tests that are relevant. CI right now is not setup for this. (Running CI on every commit means we have to wait ages for CI to complete, at least based on the way we are building right now).

@pfalcon
Copy link
Contributor

pfalcon commented Jun 29, 2018

@nashif

IMO we should mandate to have multiple PRs just based on the number of commits in a PR.

Good idea. Like, normal PR should have 5 or less commits. There still can be requests to split out unrelated parts, but it's up to submitter to argue that an exception can be made as it still may be easier to review. PRs of 5-10 commits should have argumentation why they're so big upfront, and should split it up if concerns are raised. PRs of 10 or more commits are blocked by default, and require extensive argumentation why they're better that way and TSC approval for merging (i.e. someone who submits too long PRs should know beforehand that they will require more loops to get merged, and the easiest way to deal with that is splitting up into manageable parts). We can start with that right now ;-).

Splitting PRs makes only sense if you have multiple changes to multiple areas

This is exactly the case here. This is a TLS PR, and there's a request to split any changes not directly related to TLS as separate PRs (multiple PR, per each independent change, not just 2nd PR).

Look at the PR as a patch series that when split makes development of the feature very difficult and we will end up having to pull changes from PR X to PR Y to satisfy dependencies.

That's not the case. Again, the talk is about splitting out "pave the road" changes not directly related to TLS, as separate PRs, review them thoroughly, merge, rebase main TLS PR on top of them, and then proceed with thorough review of the TLS PR.

It is a huge number of changes with CI checking only the very last one.

While @MaureenHelm's concern is with automated CI, my primary concern is human reviewability. It's very hard and non-rewarding to review huge patchsets consisting of multiple "loosely related" (aka not directly related) sets of changes. For example, I did first pass of review and concluded that this PR should be split, and some high-level comments. After the last TSC meeting, I put everything aside, and spent whole day to review deep into it and see that there're bunch of real technical issues to address, before various shady areas (ideally should be addressed too, if not, distinguishable statements should be made when/how they will be addressed). By "distinguishable" I mean "clearly standing out" not "drowning in a hundred of other comments".

Overall, I don't see a reason to submit overlarge PRs besides desire to complicate review and sneak in under-reviewed and questionable changes. The rule of thumb should be simple: everything which can be split out should be split out. If the authors of the PR don't see a way to split changes, reviewers are here to help.

Finally, my comments on the "need to split up" are generic and not tied to this PR, I sounded them previously in response to PRs from @jhedberg, @jukkar, etc. It's just in this case, it's not "please split up, so I had a chance to understand the changes", it's "there're multiple issues, and identifying/addressing/confirming fixes requires splitting up this big PR". So, I'd prefer to stay with -1 on this one. Otherwise, I'm sure that everyone should do their job - reviewers try to get to the bottom of the code proposed, submitters act on or ignore the feedback, and TSC make exceptions and merge things regardless of the concerns raised. Thanks.

@nashif
Copy link
Member

nashif commented Jun 29, 2018

@nashif

IMO we should mandate to have multiple PRs just based on the number of commits in a PR.

Good idea.

Well, I forgot to add a NOT here, so this should read:

IMO we should NOT mandate to have multiple PRs

@d3zd3z
Copy link
Contributor

d3zd3z commented Jun 29, 2018

My main blocking concern is that we need to address how to handle mbed TLS not being reentrant. Most persistent protocols are going to have a reader waiting while writes are happened, and this will have to be handled at this layer.

@pfalcon
Copy link
Contributor

pfalcon commented Jun 29, 2018

My main blocking concern is that we need to address how to handle mbed TLS not being reentrant. Most persistent protocols are going to have a reader waiting while writes are happened, and this will have to be handled at this layer.

@d3zd3z, perhaps, it would be helpful to describe the exact call/workflow scenario where you expect the problems to happen. I for one can't make the exact opinion based on the information provided above. But I can give following notes:

  1. Situation with full-duplex protocol support with mbedTLS is better than some other small TLS implementations. For example, my favorite implementation, axTLS (it's favorite because it's the smallest liberally licensed impl, comparing too it, mbedTLS is truly a bloat) has single TLS record buffer. You may understand what that means - it can work only in half-duplex manner, attempting to write when not entire buffer is read will lead to corruption. Which makes axTLS suitable for classy protocols like HTTP/1.0, but not for e.g. MQTT.
  2. Socket interface should provide decoupling between TLS transport and application - if it's properly implemented on TLS path. But due to mixed-up nature of the patch, I actually can't say if I saw it done "right", even after thorough attempt to follow its garden-paths. One thing I'd expect that code to actually use sockets underneath. Instead, I see they duplicated decoupling-via-fifo code. One reason for that may be that they also try to support legacy callback-style API. And that API is both poor match for mbedTLS API (which is rather socket-like) and its non-reentrancy you mention.

Btw, it just occurred to me that one of the hot topic of the discussion with @rlubos - why there random figure of 160 rx buffers - happens because they actually try to slurp entire decoded TLS record at once into net_buf's on Zephyr side (and potentially address some non-reentrancy issues in such way?). But I didn't see anything like that in the code when I looked, and that's not how mbedTLS works in the first place (it doesn't work in terms of TLS records, but in terms of POSIXy read and write operations).

@d3zd3z
Copy link
Contributor

d3zd3z commented Jun 29, 2018

@pfalcon The scenario I'm thinking of is where the application has two threads. One thread, the reader, will spend most of its time in a recv call. Then, in another thread (or more perhaps), the application will do sends to push data to the other side.

mbed TLS wants this to be used in a non-blocking manner. The reader will call mbedtls_ssl_read, which will return either MBEDTLS_ERR_SSL_WANT_READ or MBEDTLS_ERR_SSL_WANT_WRITE indicating that the underlying transport needs either more data read, or data able to be written. The code calling it will then need to wait for new data to come in, and then retry the call to mbedtls_ssl_read to see if it has enough to read. mbedtls_ssl_write behaves in the same manner. Neither of these is reentrant within the same context, so this also needs to be locked. It is important to note that either read or write can block waiting for underlying read or write, and that there isn't a direct correlation between tls read and when the underlying network sends and receives packets.

To summarize, I expect situations where there are multiple threads, with one thread waiting to receive data, and another thread wanting to send data. This is at least how POSIX code could be written that doesn't use TLS. Having TLS under the socket interface is venturing into new ground, but I think it should probably work the same way.

@pfalcon
Copy link
Contributor

pfalcon commented Jun 30, 2018

@d3zd3z: Thanks for explaining this usecase. My IMHO on this is following:

the application has two threads. One thread, the reader, will spend most of its time in a recv call. Then, in another thread (or more perhaps), the application will do sends to push data to the other side.

I can't claim to be a well-rounded networking/multithreading expert, but the rule of thumb I know is: sharing the same I/O resource (be it a socket, file, hardware block like UART, SPI, crypto accel or whatever) between more than 1 thread is: a) in 90% is plain mistake; b) otherwise, a very peculiar design choice. I am not aware of any design pattern (in the https://en.wikipedia.org/wiki/Design_Patterns sense) which includes sharing the same I/O resource between multiple threads. There usually should be an alternative - and better - way to design the algorithm. But when it happens, I'm absolutely sure that it requires application-level synchronization of the access to the common I/O resource.

As implication of that, it would be following situation:

  1. Access code to I/O resource may be not reenterable, so concurrent access can lead to corrupted state and "crashes".
  2. However, even if it's reenterable, sharing I/O resource between 2 threads won't work correctly without explicit synchronization on the level of these 2 threads. But with such explicit synchronization, reenterability problem would be also "automagically" solved.

This all somehow touches the IRC conversation we had with @andrewboie:

<apboie> pfalcon: i'm working on socket system calls and I have some concerns about concurrent usage of the APIs. I think I can solve this by adding a semaphore to the net_context struct in the case of CONFIG_NET_SOCKETS and take/give them in each of the socket APIs that use a net_context object, what do you think?
<pfalcon> apboie: what the concerns are?
<apboie> concurrent socket API calls. particularly if someone calls close() while another socket API is in progress.
<apboie> not sure how safe it is for multiple threads to be doing API calls on the same net_context object for other APIs, don't know the code that well
<apboie> for example what happens if we are halfway through a bind() call (or something) and another thread calls recvfrom(), stuff like that
<pfalcon> apboie: Ah, calling close() while something else is happening. Nobody should be doing that ;-). I guess that's way too fringe case to care about. (I.e. nothing good can happen anyway)
<pfalcon> apboie: "halfway through a bind() call (or something) and another thread calls recvfrom()" - that's more specific, but do you see any problem in the *socket* shim, or do you mean general problems on net_context level?
<apboie> pfalcon: unfortunately for user mode, "nobody should be doing that" won't work, we have to protect against attacks which try to intentionally corript the kernel
<andyross> FWIW, in linux the bulk of the locking is done at the lower level, not in the socket API layer.   Fancier layers can do things better than serializing all the calls.
<apboie> pfalcon: socket API calls should be sufficient since only those have user mode access
<apboie> if people try to do bad/insane things from supervisor mode we generally don't defend against it
<andyross> Concurrency bugs are bugs regardless.  They may not be "security" bugs, but if we can't handle closing a socket from a different thread without application-managed synchronization, we should fix that.
<apboie> andyross: the synchronization wouldn't be on the application side
<apboie> i'm just trying to prevent bad usage of these system calls from crashing the kernel.
<pfalcon> apboie: so, we do have general concurrency issues in the IP stack, and non-orthodox way to hmm, resolve them: https://github.com/zephyrproject-rtos/zephyr/pull/8386 . My ideal response would be: "Can we have specific concerns ticketed?" But I understand that you're not up to that now, and using global sema is still better than disable preemption at all. So, feel free to go for it, we can discuss specific patch later

Summing up, your concern, @d3zd3z, regarding reenterability of TLS impl in this PR doesn't take or give something - the existing IP stack is already not reenterable. It's actually worse that that - the IP stack is itself multithreaded, and these threads are not properly synchronized, there're pieces of code (and data structures) which are not reenterable, and different threads of the IP stack can enter that code at the same time, causing havoc.

So, all in all, while I agree that concern you raise has its place, I don't "include" it my -1 mark on this PR (we can't really make mbedTLS usage reenterable if the entire stack is not reenterable).

But I raised another, more local reenterability concerns in my review comments, e.g, the fact that "credentials" management is not reenterable, so e.g. trying to create two independent TLS sockets in different threads has a chance to obscurely fail/cause havoc. With that, I'm firmly -1 (because existing stack cares to protect e.g. global net_context array; thus, new code added to the stack should not deteriorate already attained code quality patterns).

@rlubos
Copy link
Contributor

rlubos commented Jul 2, 2018

the fact that "credentials" management is not reenterable

Well, I didn't have an image of registering/removing credentials from different contexts. My idea was to do it at the start of the application, at least in the initial version. But if it's such a concern, I'll add mutex there.

so e.g. trying to create two independent TLS sockets in different threads has a chance to obscurely fail/cause havoc

TLS socket creation / connection does not modify credential array in any way, so why should it fail?

@pfalcon
Copy link
Contributor

pfalcon commented Jul 2, 2018

Well, I didn't have an image of registering/removing credentials from different contexts. My idea was to do it at the start of the application

TLS socket creation / connection does not modify credential array in any way, so why should it fail?

Ack, makes sense. Apparently, I pictured a situation when a credential is added, while another tries to look one up.

But if it's such a concern, I'll add mutex there.

Well, I'd say we either should document that particular calls aren't reenterable by design, or make them such. (I pay particular attention to this matter due to issues like #729 and #8386.)

@rlubos
Copy link
Contributor

rlubos commented Jul 2, 2018

Thank you all for reviewing this PR.

Taking all the feedback into account we've decided to rework the solution. As having support for TLS with net_context's callback-style API is not essential here, and it bring's extra complexity (e. g. duplicated fifo's for packet processing) I've decided to move the TLS handling to the socket shim layer. Overall API will remain as proposed here.

I will submit changes in smaller PRs, so they're easier to review. Hopefully I'll be able to propose something in a few days.

@pfalcon
Copy link
Contributor

pfalcon commented Jul 2, 2018

@rlubos: Thanks, that's a good compromise to alleviate some layering issues, and should make code clearer/more maintainable too.

@jukkar
Copy link
Member

jukkar commented Jul 30, 2018

The TLS code got merged already by other PRs so closing this one.

@jukkar jukkar closed this Jul 30, 2018
@ghost ghost removed the In progress For PRs: is work in progress and should not be merged yet. For issues: Is being worked on label Jul 30, 2018
@pfl
Copy link
Contributor Author

pfl commented Jul 30, 2018

The TLS code got merged already by other PRs so closing this one.

Yes indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.