Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
The next step in considering UDP for communications is to gather concrete values representing the resources used, and performance improvements provided, by communicating via UDP using the lwIP Raw API. For now, the performance optimizations can be applied and verified and in the standalone project
Optimizations (potential areas)
The 10/100Mbps PHY on the F7 board should theoretically take 0.1/0.01 microseconds to transfer each bit across at the physical level. In testing, we are seeing values for a round-trip transmission of about 10-100x the theoretical physical speed, with an average of 0.8 milliseconds, when sending a UDP packet with data of 80 bytes (1.25 microseconds per bit). This is likely due to overhead in the network stack (IP, scheduling, context switching kernel to user mode).
It is difficult to find a theoretical latency for an application-application-level transmission (for PC<=>MCU), due to varying network stacks between different OSs, network interfaces, configurations where prior testing has been done.
For now, we can aim to reduce latency in the echo test as much as possible. The fact that the transmission can be as fast as 0.3ms (sometimes lower) makes me think it wouldn't be unreasonable to get a latency reliably centered around 0.5ms, given the right OS-level configurations (on the PC side). Then we could expect a round-trip latency of <1ms once implemented in the
After disabling non-required modules such as TCP and ICMP (see the CubeMX config), the code size is
lwIP now uses the minimum pbuf pool size allowed by Cube (down to 11 pbufs from 16 default). Pbufs are of size 1524 - enough for 1 MTU + headers. Space for 1 UDP Process Control Block (pcb) is given as only one connection is needed (down from 4 default).
Seen from the earlier results, using scheduling options on the PC-OS side improves predictability of the latency of a given transmission. The spreadsheet is uploaded. Results can be reproduced using the scripts in the test kit, with other scheduling settings FIFO and RR. We can control the priority of the communication task on the PC-OS side with respect to other running tasks by setting the priority value passed to the scheduling system call.
The results also show a relationship between the number of bytes and message transmission time - so we can accurately predict an average round trip time for the messages sizes we decide on for commands sent to the robot.
Opened this to keep track of any ideas for optimization - please edit the issue description, or comment, to add any ideas. We can consider the issue resolved when all the check-boxes are complete, and the metrics are collected (for the standalone application).
Attaching some experimental data, testing with a Python module called
The following results show time taken to send a UDP packet of variable size from the PC (running Ubuntu 18.04 LTS) to the MCU (STM32F767ZI running FreeRTOS+lwIP), and have the MCU send the packet back (echo). The mean, standard deviations as error bars, and max times are shown. 100 trials of each message size were carried out. The message sizes do not include any headers or metadata appended by the network stack.
Times are measured under conditions of "not busy", "busy", "no scheddl", and "with scheddl".
Testing script used: https://github.com/utra-robosoccer/soccer-embedded/blob/rfairley-lwip-rtos-config/Testing/Ethernet/eth_echo_test.py.
Overall, setting a higher scheduling priority for the PC-side communication process does not decrease UDP round-trip data transfer times by much, but improves resiliency of the transmission speed greatly when the PC is multitasking - and therefore is an improvement on the predictability of the latency of UDP communication.
Tooling for ethernet testing is better developed (https://github.com/utra-robosoccer/soccer-embedded/tree/f524d3b63fe2e03628893cad1c2c6c32ef49a570/Testing/Ethernet/eth_test_kit) and results are reproducible using the test script, with a few process scheduling options available (deadline, FIFO, and round-robin).
From the results we now have some values for the latency and variance of the latency that we can expect. Investigation can be considered complete once the tooling is merged.