QEMU serial output is not reliable, may affect SLIP and thus network testing #8187

pfalcon · 2018-06-05T20:52:40Z

This ticket provides a (partial) answer of why the issue described in #7831 (comment) happens, specifically:

when running samples/net/socket/dumb_http_server sample app on qemu_cortex_m3,
running ab -n1000 http://192.0.2.1:8080/,
processing of requests gets stuck after just few dozens of requests, ab eventually times out
(ab can be restarted and number of requests can be processed still, i.e. the app keeps running, but requests get stuck soon)

So, it's more or less know issue, but it's not always kept in mind: UART emulation in QEMU is sub-ideal, and there can be problems with serial communication, which is used by SLIP and loop-slip-tap.sh. This is what happens here.

For example, SLIP driver logging:

[slip] [INF] slip_send: sent: pkt 0x20001ec4 llr: 14, len: 54
[slip] [INF] slip_send: sent: pkt 0x20001ec4 llr: 14, len: 1506
[slip] [INF] slip_send: sent: pkt 0x20001e78 llr: 14, len: 783
[slip] [INF] slip_send: sent: pkt 0x20001e2c llr: 14, len: 54
Connection from 192.0.2.2 closed
[slip] [INF] slip_send: sent: pkt 0x20001e78 llr: 14, len: 783

What we can see here is that pkt 0x20001e78 was transmitted twice. But here's what Wireshark sees:

As can be seen, instead of first 783 bytes packet it receives broken 275 bytes packet, which gets ignored by host. That's what causes retransmission, and next time the packet gets thru.

The text was updated successfully, but these errors were encountered:

galak · 2018-11-20T20:49:30Z

Still an issue with new qemu?

pfalcon · 2018-11-20T21:02:25Z

Can retest to be 100% sure, bit I don't see any that part of qemu change any longer. Indirectly, it's the same - MicroPythin testsuite running over QEMU serial emu has ~50% chance to fail: https://ci.linaro.org/view/lite-iot-ci/job/lite-aeolus-micropython/

pfalcon · 2018-11-21T07:28:38Z

@rlubos, this was submitted as a result of investigation of issue reported by you (see comment link in the description), so I wonder why you aren't mentioned still ;-). @jukkar, you should be in loop on every networking related issue ;-).

pfalcon · 2018-11-21T07:29:29Z

Well, I'm actually pleasantly surprised, because the situation is visibly improved, running with qemu from SDK 0.9.5.

First run of dumb_http_server/qemu_cortex_m3 with ab -n1000 http://192.0.2.1:8080/ went without a hitch.

I then proceeded with -n10000, and it failed soon:

Benchmarking 192.0.2.1 (be patient)
Completed 1000 requests
apr_socket_recv: Connection reset by peer (104)
Total of 1219 requests completed

But note that the type of failure is different from the original description above: there it was hang with eventual timeout, here quick ECONNRESET. Another ab session is runnable after that. This ECONNRESET can be as well a different issue, e.g. issue in the stack, not on UART comm level - or not.

ECONNRESETs are repeatable, the longest run I got from 3 was:

apr_socket_recv: Connection reset by peer (104)
Total of 4499 requests completed

pfalcon · 2018-11-21T07:35:34Z

But! Now qemu_x86 and qemu_cortex_m3 switched their positions, i.e. SLIP comm with qemu_x86 seems to be significantly broken:

Benchmarking 192.0.2.1 (be patient)
apr_pollset_poll: The timeout specified has expired (70007)
Total of 115 requests completed

Got these timeouts 2 times in row (again much less than on 1000th req). All that happened with -n10000. Surprisingly, running with -n1000, I got 2 successful runs. The thing smells the big number and gives up early, but cheerfully chews not so frightening numbers ;-).

Summing up: yes, QEMU SLIP is still not reliable, yes.

pfalcon · 2018-11-21T07:36:14Z

@rlubos, Given that this comes from your report, can you please put qemu_x86/qemu_cortex_m3 under some ordeal too?

jukkar · 2018-11-28T12:53:44Z

Have you tried with native_posix board? Just wondering if this is something related to SLIP or serial connection, or if the issue is in other part of the networking stack.

pfalcon · 2018-11-28T13:25:44Z

Have you tried with native_posix board?

Fairly speaking no, as I find network setup of native_posix kinda cumbersome.

Just wondering if this is something related to SLIP or serial connection, or if the issue is in other part of the networking stack.

As the title of this ticket suggest, the best hypothesis is that the problem on the side of QEMU emulation. The behavior I experienced is that SLIP driver gets e.g. 783 bytes packet, and spools it into the UART. However, Wireshark sees just truncated 275 bytes packet, which of course gets discarded.

jukkar · 2018-12-14T12:59:02Z

Qemu has now native ethernet support so we could start to migrate away from SLIP. Closing this one.

pfalcon · 2018-12-14T13:11:01Z

Well, closing is s bit of haste, given the "we can start". And this ticket is about serial output non-reliability, not just networking as affected by it. So, I'll likely reopen it when hitting issue again.

pfalcon added area: Networking area: QEMU QEMU Emulation labels Jun 5, 2018

This was referenced Jun 5, 2018

net: TCP: FIN packets aren't queued for retransmission, loss leads to TCP timeout on peer's side #8188

Closed

Revert "samples: net: Fix sanitycheck for sam_e70_xplained board" #7831

Closed

MaureenHelm added bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug labels Jun 6, 2018

pfalcon mentioned this issue Oct 22, 2018

logging: Proposals to optimize/minimize logged content #10770

Open

pfalcon mentioned this issue Nov 9, 2018

uart: Problems with interrupt-driven UART in QEMU and some hw boards #8869

Closed

jukkar closed this as completed Dec 14, 2018

pfalcon mentioned this issue Sep 21, 2020

Misc TCP2 fixes #28302

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QEMU serial output is not reliable, may affect SLIP and thus network testing #8187

QEMU serial output is not reliable, may affect SLIP and thus network testing #8187

pfalcon commented Jun 5, 2018

galak commented Nov 20, 2018

pfalcon commented Nov 20, 2018

pfalcon commented Nov 21, 2018

pfalcon commented Nov 21, 2018

pfalcon commented Nov 21, 2018

pfalcon commented Nov 21, 2018

jukkar commented Nov 28, 2018

pfalcon commented Nov 28, 2018

jukkar commented Dec 14, 2018

pfalcon commented Dec 14, 2018

QEMU serial output is not reliable, may affect SLIP and thus network testing #8187

QEMU serial output is not reliable, may affect SLIP and thus network testing #8187

Comments

pfalcon commented Jun 5, 2018

galak commented Nov 20, 2018

pfalcon commented Nov 20, 2018

pfalcon commented Nov 21, 2018

pfalcon commented Nov 21, 2018

pfalcon commented Nov 21, 2018

pfalcon commented Nov 21, 2018

jukkar commented Nov 28, 2018

pfalcon commented Nov 28, 2018

jukkar commented Dec 14, 2018

pfalcon commented Dec 14, 2018