Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: Sockets (and likely net_context's) should not be closed behind application's back #8008

Closed
pfalcon opened this issue May 29, 2018 · 8 comments
Assignees
Labels
area: Networking bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug Stale

Comments

@pfalcon
Copy link
Contributor

pfalcon commented May 29, 2018

This is ticketed based on the discussion in #7865 (comment)

Currently, it's possible that net_context/socket (they're represented by the same underlying structure, net_context) can be closed by the IP stack behind the application back's. This behavior is definitely not POSIX in regard to sockets: a socket opened by application may change its state, but can be closed/deallocated only on explicit application request. Thus, for sockets this will definitely need to be fixed.

The question is, as usual, is whether this should be fixed on the level on native API instead. Note that deallocating net_context opened by application hardly can be safe: if application opened it, it likely keeps reference to it. If it keeps reference to it, then it may and will access it after it's freed by the IP stack. This requires patches like #7865, but that's not reliable at all - a freed net_context can be allocated again by some other code, and then application will refer to completely different net_context and will try to make operations on it!

@pfalcon
Copy link
Contributor Author

pfalcon commented May 29, 2018

@rlubos : FYI

@pfalcon
Copy link
Contributor Author

pfalcon commented May 29, 2018

Note that net_context API has ground measures to deal with this issue: the structure is referenced-counted. Then the problem can be rephrased as: "a net_context created by an application in a normal way can be closed behind the app's back". In other words, reference counting is there, but it's not really used.

Again, the question whether such behavior of native API is a "feature" - it has a building block, but leaves its usage to higher levels (to the apps themselves, or extra layers like sockets API), or if it's an overlook, and reference counting should work "out of the box" even for native apps.

@nashif nashif added the bug The issue is a bug, or the PR is fixing a bug label May 31, 2018
@MaureenHelm MaureenHelm added priority: low Low impact/importance bug area: Networking labels Jun 1, 2018
@pfalcon
Copy link
Contributor Author

pfalcon commented Jun 6, 2018

Before this issue arised in Zephyr mainline, I faced it MicroPython Zephyr port. Here's the code I have in my git stash:

--- a/zephyr/modusocket.c
+++ b/zephyr/modusocket.c
@@ -303,8 +303,9 @@ STATIC mp_obj_t socket_accept(mp_obj_t self_in) {
     socket_check_closed(socket);
 
     struct net_context *ctx = k_fifo_get(&socket->accept_q, K_FOREVER);
-    // Was overwritten by fifo
-    ctx->refcount = 1;
+    // Was overwritten by fifo, init at 2, because stack may unref it
+    // on FIN reception, etc.
+    ctx->refcount = 2;
 
     socket_obj_t *socket2 = socket_new();
     socket2->ctx = ctx;
@@ -484,6 +485,7 @@ STATIC MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(socket_makefile_obj, 1, 3, socket_mak
 STATIC mp_obj_t socket_close(mp_obj_t self_in) {
     socket_obj_t *socket = self_in;
     if (socket->ctx != NULL) {
+        socket->ctx->refcount = 1;
         RAISE_ERRNO(net_context_put(socket->ctx));
         socket->ctx = NULL;
     }

Unless @jukkar expresses the opinion that it should be fixed on native API level, that's how I'd go to fix it on just the socket shim layer (i.e. init refcount as 2 on socket creation, reset to 1 on closure; well, the fact that the code above is from git stash rather than commit, may give a hint that there may be more to it (though likely it just wasn't tested enough)).

@laperie
Copy link
Collaborator

laperie commented Mar 14, 2019

Pushing to past 1.14 time when we will have more time to address this one

@pfalcon
Copy link
Contributor Author

pfalcon commented Mar 14, 2019

This is high-priority issue which absolutely must be addressed for 1.14. Especially given that the patch is submitted and just needs to be merged: #13667

@pfalcon
Copy link
Contributor Author

pfalcon commented Mar 18, 2019

This is high-priority issue which absolutely must be addressed for 1.14. Especially given that the patch is submitted and just needs to be merged: #13667

Ok, #13667 was merged, so the situation is "good enough" for 1.14. #13667 however addresses the issue on socket level, while @rlubos suggested that it might be better to address it on layers below - net_context, or even specifically on TCP connection object level. So, I'm not closing this, in the hope that we will look into that later.

@carlescufi
Copy link
Member

@rlubos @jukkar is this still an issue?

@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Networking bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug Stale
Projects
None yet
Development

No branches or pull requests

6 participants