-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IP Stack: Enabling MMU for qemu_x86 broke active connect support #752
Comments
Btw, this is 3rd regression in row from the recent commits found in last 3 days. Which leads to thinking: From the commit message of the above patch:
Indeed, it helped ;-). But who's going to resolve them? How about reverting that patch, then fixing the issues it uncovered as resources permit, and only then enable it by default? |
Lets not revert the MMU patch but fix instead the errors in the echo-client instead. |
Ah, I could replicate this after all. Investigating the issue.... |
The problem seems to be some stray pointer that only occacionally goes to NULL, I have hard time triggering this with gdb attached. |
When the ARP message is received when the device is starting up, the network interface might not yet have IPv4 address setup correctly. In this case, the IP address pointer could be NULL and we must not use it for anything. Fixes zephyrproject-rtos#752 Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Found an issue in ARP handling, depends lot of timing issues so a bit hard to replicate. This is only seen when the device is starting up. |
@pfalcon we are not reverting the MMU patch. |
@andrewboie : I don't see how submitting more user stories helps to resolve issues. I submitted GH-3559 "Fault injection framework for Zephyr" some time ago and set its Fix Version to 1.9.0, it was reset. While instead it would need involvement of folks experienced with fault injection e.g. in Linux to design something for Zephyr. And FI is required to even start talking about network code coverage. And as I mentioned, I found 3 issues in row in 3 days, all using MicroPython application, i.e. they aren't much findable using Zephyr own tree. (And I'm trying to have progress with MicroPython port to move along with more testing, and instead hit these recent regressions.) |
@pfalcon what exactly are you asking for? |
When the ARP message is received when the device is starting up, the network interface might not yet have IPv4 address setup correctly. In this case, the IP address pointer could be NULL and we must not use it for anything. Fixes #752 Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
I can look into it, but I would need information how to test MicroPython, is there some instructions somewhere describing how to do this? |
This is currently the case, we are not doing proper end-to-end testing for the networking stack. We have lot of unit tests and some of them simulate networking but it is not really the same thing as end-to-end testing of the stack. This needs to be improved of course. |
Unfortunately, it's not easy to test networking code with sanitycheck as it is now, and many tests aren't even possible without additional infrastructure (like fault injection mention above).
Well, I just appreciate discussion of the issues we have (particularly because I imagine you're member of TSC and can affect prioritization of test infra related tasks). One question I was asking though and would appreciate an answer is:
Searching commit subject line, I can find only https://github.com/zephyrproject-rtos/zephyr/pull/245/commits , but it lacks the exact commit in question. Thanks. |
@jukkar :
There're, e.g. https://github.com/pfalcon/micropython/wiki/ZephyrSocket . But while I would appreciate you trying it, I may imagine, you have more important things to do, so no hurry with that, and I'll try to investigate issue with the connect() myself a bit later. (I'm just trying to refactor MicroPython port from out-of-tree socket code to the in-tree one, and can't establish solid testing baseline, each time I try, I find a new regression even with old out of tree code ;-) ). |
Found the root cause for this connect crash. The local end point was not created which then caused a crash because context->local.sin_addr was NULL and was accessed when connecting. The micropython sample test started to work when I added s.bind(("192.0.2.1",8888)) before s.connect() call. I suppose we would need to fix this by binding automatically in net_context_connect() if the user has not bound the local end point. I am a bit busy now with DTLS support so this will need to be postponed a bit, or if possible could you propose a patch that fixes this? |
@jukkar : Thanks for investigating this! Yes, I'll look into this, hopefully this week. |
I suggest we close this. There are probably still issues in the stack regarding NULL pointer access, but the original issue reported by @pfalcon is fixed. |
Oh, so the bugtracker is back, neat! |
dbd7052 seems to have broken TCP connect() support. E.g. echo_client crashes:
I'm not exactly sure about operation which crashes, but initially discovered with MicroPython Zephyr port with socket.connect() call.
I wasn't able to find pull request which introduced that commit (to cross-link to this ticket), @andrewboie , can you please point to it?
The text was updated successfully, but these errors were encountered: