L2 rework #10547

tbursztyka · 2018-10-12T11:36:39Z

Changing the way packet are sent: instead of going like net_if_send_data -> l2 send -> net_if tx queue -> net_if_tx -> driver send
it will be now:
net_if_send _data -> net_if tx queue -> net_if_tx -> l2 send -> driver send

This will enable removing ll reserve from l2 as well as doing interesting optimization in L2 side.

Now L2 is responsible for unrefing the packet on success. (net_if still keeps that responsibility in case of error) so drivers do not have to care about it.
l2 send does not return a verdict, but a int: 0+ (lenght of sent packet) on succes, negative errno otherwise.
driver API have to provide their own send/tx function, relatively to their respective L2. There is no more the net_if_api send function. That will permit l2 to have whatever signature they need for that function (15.4 has done so for quite a while, but it was thus a hack).

@tgorochowik Could you check if the sam gmac driver still works fine? It's a bit different in there, as it queues tx pkt. I hope I did not break it.
@Vudentz It was relatively simple in bt L2 to do the change, but I could not test it. I believe it's fine though.

All of this is part of the preliminary work to revise net_pkt buffer management totally.

codecov-io · 2018-10-12T12:05:37Z

Codecov Report

Merging #10547 into master will increase coverage by 0.07%.
The diff coverage is 62.32%.

@@            Coverage Diff             @@
##           master   #10547      +/-   ##
==========================================
+ Coverage    48.2%   48.28%   +0.07%     
==========================================
  Files         279      279              
  Lines       43312    43316       +4     
  Branches    10372    10372              
==========================================
+ Hits        20878    20913      +35     
+ Misses      18288    18252      -36     
- Partials     4146     4151       +5

Impacted Files	Coverage Δ
include/net/ethernet.h	`56.25% <ø> (ø)`	⬆️
include/net/net_if.h	`58.97% <ø> (-1.03%)`	⬇️
drivers/ethernet/eth_native_posix.c	`23.62% <0%> (+0.54%)`	⬆️
subsys/net/l2/ethernet/arp.c	`57.23% <100%> (+2.74%)`	⬆️
include/net/net_pkt.h	`85.84% <100%> (+0.13%)`	⬆️
subsys/net/l2/dummy/dummy.c	`100% <100%> (ø)`	⬆️
drivers/net/loopback.c	`80% <100%> (+2.22%)`	⬆️
subsys/net/l2/ethernet/ethernet.c	`56.01% <51.13%> (+2.98%)`	⬆️
subsys/net/ip/net_if.c	`62.65% <62.5%> (+0.19%)`	⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 94b72e7...673ff91. Read the comment docs.

tgorochowik

There are some left-over references to api->send, and the code doesn't compile:

In file included from /zephyr-path/subsys/net/ip/net_if.c:15:
/zephyr-path/subsys/net/ip/net_if.c: In function ‘init_iface’:
/zephyr-path/subsys/net/ip/net_if.c:246:17: error: ‘const struct net_if_api’ has no member named ‘send’
   NET_ASSERT(api->send);
                 ^~
/zephyr-path/include/net/net_core.h:64:9: note: in definition of macro ‘NET_ASSERT’
   if (!(cond)) {          \
         ^~~~
make[2]: *** [zephyr/subsys/net/ip/CMakeFiles/subsys__net__ip.dir/build.make:89: zephyr/subsys/net/ip/CMakeFiles/subsys__net__ip.dir/net_if.c.obj] Error 1
make[1]: *** [CMakeFiles/Makefile2:1808: zephyr/subsys/net/ip/CMakeFiles/subsys__net__ip.dir/all] Error 2

Therefore I am unable to test it on sam gmac.

tbursztyka · 2018-10-12T13:12:23Z

It always amaze me how local sanitycheck running never finds out such obvious thing...

rveerama1 · 2018-10-23T11:30:47Z

Overall looks good.

jukkar · 2018-10-23T11:51:20Z

include/net/ethernet.h

@@ -329,7 +329,6 @@ struct ethernet_context {
 */


The "net/ethernet: Remove as many ifdef for CONFIG_NET_VLAN as possible" could be sent to upstream separately.

tgorochowik · 2018-10-30T09:34:56Z

I just tested this on sam e70 with the dhcpv4_client sample with NET_BUF_POOL_USAGE and NET_SHELL enabled.

It is quite weird, your changes should only affect the the TX side in that driver, but zephyr seems to be leaking RX buffers.

After pinging zephyr with ping 192.168.0.101 -i 0.00001 for a few minutes, the buffer pools look like this:

uart:~$ net mem
Fragment length 128 bytes
Network buffer pools:
Address         Total   Avail   Name
0x20407638      14      40      RX
0x20407654      14      21      TX
0x204076e4      36      1       RX DATA (rx_bufs)
0x2040770c      36      36      TX DATA (tx_bufs)

And it stops responding to further pings.

I am going to dismiss my review for now, as this might be a broader issue. The TX side that you changed here seems to work fine.

Retested on SAM E70

tbursztyka · 2018-10-31T09:05:31Z

@tgorochowik thanks for retesting this. Looks like I may have messed up your driver because icmp echo replies are just built on rx echo requests packets, so these net_pkt do come for RX reserve. I'll check

pfalcon · 2018-10-31T09:20:59Z

Looks like I may have messed up your driver because icmp echo replies are just built on rx echo requests packets, so these net_pkt do come for RX reserve.

Well, that's bad. That's a hack, and we won't have a clean/reliable stack until we give that "optimization" up. RX packets are for RX, TX for TX. They may come from absolutely different pool implementations. E.g., from hardware-backed pool, allowing only RX or only TX, respectively.

tbursztyka · 2018-10-31T09:32:16Z

@pfalcon

Well, that's bad. That's a hack, and we won't have a clean/reliable stack until we give that "optimization" up. RX packets are for RX, TX for TX. They may come from absolutely different pool implementations. E.g., from hardware-backed pool, allowing only RX or only TX, respectively.

That has been like that almost since day 1. Actually, this RX/TX differentiation is going to disappear mostly in next to come buffer management rework (using various pools will remain). TX or RX you don't care much: if one of the pool if fully used, the system will eventually be unresponsive (normally, if it's not due to a leak - unlike what I did here - but an overload, timeout will kick in and system will start to response again). So it appears like a hack because of this tx/rx differentiation, but it's actually not a hack.

pfalcon · 2018-10-31T11:49:33Z

That has been like that almost since day 1.

Yeah, I know.

TX or RX you don't care much: if one of the pool if fully used, the system will eventually be unresponsive

Well, user may not care, but hardware definitely does. The processes behind RX and TX are different, and buffer management is part of that process. And there's no need to "help" system become unresponsive by misusing buffers. What should be done with RX buffers is quickly parse/copy data out of it, and get buffer back into the pool to accept new data. Holding onto the RX buffer for longer than needed, misusing it to stuff TX data and putting it into TX queue (which can be arbitrarily long) should be big no-no.

Again, IMHO we won't have a decent stack until we accept things above and implement it like that.

Otherwise, this is "oh, BTW" note for now, it just your comment triggered things I had in mind for a while.

tbursztyka · 2018-10-31T12:12:34Z

Well, user may not care, but hardware definitely does.

Not really. Hardware buffers, or local driver buffers are always unrelated to net_pkt buffers. We always copy (either on tx or rx path, in the drivers). On rx path, if drivers are not able to get a free net_pkt or the necessary net_buf , it will have to drop the frame from its internal buffers. That's fully nominal.
This TX/RX pool split really is misleading, hopefully we'll get rid of that.

pfalcon · 2018-10-31T12:18:02Z

Hardware buffers, or local driver buffers are always unrelated to net_pkt buffers. We always copy

Ok, so let me reformulate: all the above true if we want [one sweet day] to achieve zero-copy networking. Given that we don't have a reliable networking so far, I'm happy with anything which helps to achieve that first.

tbursztyka · 2018-10-31T12:23:03Z

0-copy is a different beast. We'll try to get to that eventually, but we are far from it.

This is currently unoptimized, as all frags are allocated with relevant ll reserve for such header space. However, this is the first step towards getting rid of that ll reserve concept everywhere. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>

Current code generating Ethernet header is scattered all over the place, sometimes in functions that are supposed to check something (and not filling the header). Not to say about innefficiency. Src ll address does not need to be set in L2 as net_if.c handles that already. Broadcast dst ll address is the same in ipv4 or ipv6, thus factorizing. In each case, multicast is filled in only at the relevant place. This is the first step towards changing L2 sending logic, when L2 send API function will be the only point of sending. The redirection from driver to L2 again (which finally uses the right device API function to send) it a temporary hack. This simplifies the code but will also enable using statically allocated net_buf and ethernet header payload buffer afterwards. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>

Currently, first part is done in L2's send, then the next one in ethernet device driver net_if send function. That last one was already moved to a L2 based implementation. Let's just move forward and place the whole logic of the L2's send in that second function. This is the first step, ethernet centric only, to move towards a one-pass sending logic in net stack. In future, net_if's send will disappear. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>

Now that sending is done at last time, in one pass, no need to go through net_if_send_data here. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>

Now instead of such path: net_if_send_data -> L2's send -> net_if tx_queue -> net_if_tx -> driver net_if's send It will be: net_if_send_data -> net_if tx_queue -> net_if_tx -> L2's send -> driver net_if's send Only Ethernet is adapted, but 15.4 and bt will follow up. All Ethernet drivers are made compatible with that new scheme also. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>

And adapt loopback and slip drivers relevantly Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>

As for Ethernet, up to ieee802154 L2's send to actually sent the packet. It's currently unoptimized as 6lo compression, 15.4 fragmentation and so on will reallocate net_buf etc... but it's the first step towards removing ll reserve space and more. Applying changes to Openthread L2 as well. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>

This is a much more trivial move than in Ethernet or ieee802154 L2s. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>

Many tests use either Ethernet or Dummy L2 and as such require modifications towards the driver API on their fake devices. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>

And apply that to modem driver setting that pointer to NULL. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>

Like ARP, GPTP works on top of Ethernet but is considered as a layer 2 protocol, so we cannot set an AF_* family type. Instead let's have a bit telling that current packet is a gptp message (family has to be AF_UNSPEC). The bit is unionized with a TCP related bit: TCP cannot be present on an AF_UNSPEC packet so there will not be any collision. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>

Instead of redoing what Ethernet L2 already does, let just create the gptp message without any Ethenet header. Which one will be done as sending phase by Ethernet L2 relevantly. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>

Such sent flag is in a union in net_pkt, shared with a gptp flag. Tweaking it when the family is not AF_INET or AF_INET6 will generate corrupted gptp packets. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>

jukkar

LGTM, also the latest version fixes memory leak in sam-e70 board

pfalcon · 2018-11-30T21:00:25Z

subsys/net/l2/ethernet/arp.c

 	net_pkt_set_vlan_tag(pkt, net_eth_get_vlan_tag(iface));

-	eth = net_eth_fill_header(ctx, pkt, htons(NET_ETH_PTYPE_ARP),
-				  NULL, NULL);
+	if(!net_eth_fill_header(ctx, pkt, htons(NET_ETH_PTYPE_ARP),


So, it went to the mainline like that, with missing space after "if"?

pfalcon · 2018-11-30T21:27:22Z

subsys/net/l2/ethernet/arp.h

@@ -71,7 +71,8 @@ void net_arp_init(void);
 */

 #else /* CONFIG_NET_ARP */
-
+#define net_arp_prepare(_kt, _u1, _u2) _kt


Nice names.

pfalcon · 2018-11-30T21:53:22Z

include/net/net_l2.h

@@ -49,8 +49,9 @@ struct net_l2 {
 	 * This function is used by net core to push a packet to lower layer
 	 * (interface's L2), which in turn might work on the packet relevantly.
 	 * (adding proper header etc...)
+	 * Returns a negative error code, or the number of bytes sent otherwise.


Driver patches as included in this commit don't seem to follow "the number of bytes sent otherwise", so stats would be off (unless I'm mixing up all the send()'s around).

pfalcon · 2018-11-30T21:54:20Z

subsys/net/l2/ieee802154/ieee802154.c


 	if (net_pkt_family(pkt) != AF_INET6) {
-		return NET_DROP;
+		return -EINVAL;


Maybe -EPROTONOTSUP or how it's called?

pfalcon · 2018-11-30T22:03:29Z

include/net/net_pkt.h

-				 * to be sent but has not reached the driver
-				 * yet. Used only if defined(CONFIG_NET_TCP)
-				 */
+	union {


Are you sure bitfield union works as expected? I'd expect compiler bugs.

pfalcon · 2018-12-01T08:10:56Z

@tbursztyka, I'm curious how well this was tested. Changes to existing Ethernet drivers are pretty light, whereas this refactor immediately breaks my cute WIP driver #11680, where I took freedom to rely on the fact that any net_buf (except last) has len of even u32_t's. Apparently, any optimization in Zephyr is premature so far ;-). But I wonder if no other driver had anything like that?

tbursztyka · 2018-12-03T08:42:04Z

@pfalcon would have been nice to get your review before it got merged. This PR has been public for more than a month.

pfalcon · 2018-12-03T09:40:30Z

This PR has been public for more than a month.

Really? In my list it's up since ~ end of June, before you went on leave, though maybe it was just a branch, not PR then. Who'd have known that would be merged last week? ;-)

Otherwise, I kinda skip such reviews, because you definitely already heard my comments: this is too complex, can we split it up, etc. I just accept that some changes are necessarily complex, and rely on our discussion in Le Mans that it's otherwise OK. You're absolutely right that I should at least eyeball them for simple issues nonetheless.

tbursztyka requested review from pfalcon, jukkar, Vudentz, tgorochowik and rveerama1 October 12, 2018 11:36

tbursztyka requested a review from tarunkum as a code owner October 12, 2018 11:36

jfischer-no self-requested a review October 12, 2018 12:00

tgorochowik previously requested changes Oct 12, 2018

View reviewed changes

tbursztyka force-pushed the l2_rework branch 2 times, most recently from a68854a to c5cd160 Compare October 12, 2018 14:07

tbursztyka requested a review from finikorg as a code owner October 12, 2018 14:07

tbursztyka force-pushed the l2_rework branch 2 times, most recently from d591f50 to a5114c3 Compare October 12, 2018 17:12

jukkar added the area: Networking label Oct 15, 2018

tbursztyka force-pushed the l2_rework branch 2 times, most recently from c677b87 to 14059f1 Compare October 23, 2018 08:02

jukkar reviewed Oct 23, 2018

View reviewed changes

tbursztyka force-pushed the l2_rework branch from 14059f1 to a5e9d9a Compare October 31, 2018 12:24

tbursztyka force-pushed the l2_rework branch 2 times, most recently from d4e82fc to 673ff91 Compare November 29, 2018 14:11

Tomasz Bursztyka added 13 commits November 30, 2018 15:06

net/arp: ARP packet can be queued right away

5b93d84

Now that sending is done at last time, in one pass, no need to go through net_if_send_data here. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>

net/dummy: Switch to L2 sending path

9371aaf

And adapt loopback and slip drivers relevantly Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>

net/bluetooth: Switch to L2 sending path

5c384ff

This is a much more trivial move than in Ethernet or ieee802154 L2s. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>

tests/net: Apply L2 change everywhere where relevant

ce8acda

Many tests use either Ethernet or Dummy L2 and as such require modifications towards the driver API on their fake devices. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>

net/iface: Finally remove the send function from net_if_api

5cec112

And apply that to modem driver setting that pointer to NULL. Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>

tbursztyka force-pushed the l2_rework branch from 673ff91 to f8aa817 Compare November 30, 2018 14:08

jukkar approved these changes Nov 30, 2018

View reviewed changes

nashif merged commit 10c4841 into zephyrproject-rtos:master Nov 30, 2018

pfalcon reviewed Nov 30, 2018

View reviewed changes

pfalcon mentioned this pull request Nov 30, 2018

[RFC][DNM] Experimental TCP #11443

Closed

pfalcon mentioned this pull request Dec 5, 2018

drivers/ethernet: Adapt stellaris driver to new L2 behaviour #11869

Merged

tbursztyka deleted the l2_rework branch December 7, 2018 10:18

pfalcon mentioned this pull request Oct 11, 2019

Noticeable delay between processing multiple client connection requests (200ms+) #19678

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

L2 rework #10547

L2 rework #10547

tbursztyka commented Oct 12, 2018 •

edited

codecov-io commented Oct 12, 2018 •

edited

tgorochowik left a comment

tbursztyka commented Oct 12, 2018

rveerama1 commented Oct 23, 2018

jukkar Oct 23, 2018

tgorochowik commented Oct 30, 2018

tbursztyka commented Oct 31, 2018

pfalcon commented Oct 31, 2018

tbursztyka commented Oct 31, 2018

pfalcon commented Oct 31, 2018

tbursztyka commented Oct 31, 2018

pfalcon commented Oct 31, 2018

tbursztyka commented Oct 31, 2018

jukkar left a comment

pfalcon Nov 30, 2018

pfalcon Nov 30, 2018

pfalcon Nov 30, 2018

pfalcon Nov 30, 2018

pfalcon Nov 30, 2018

pfalcon commented Dec 1, 2018

tbursztyka commented Dec 3, 2018

pfalcon commented Dec 3, 2018 •

edited

L2 rework #10547

L2 rework #10547

Conversation

tbursztyka commented Oct 12, 2018 • edited

codecov-io commented Oct 12, 2018 • edited

Codecov Report

tgorochowik left a comment

Choose a reason for hiding this comment

tbursztyka commented Oct 12, 2018

rveerama1 commented Oct 23, 2018

jukkar Oct 23, 2018

Choose a reason for hiding this comment

tgorochowik commented Oct 30, 2018

tbursztyka commented Oct 31, 2018

pfalcon commented Oct 31, 2018

tbursztyka commented Oct 31, 2018

pfalcon commented Oct 31, 2018

tbursztyka commented Oct 31, 2018

pfalcon commented Oct 31, 2018

tbursztyka commented Oct 31, 2018

jukkar left a comment

Choose a reason for hiding this comment

pfalcon Nov 30, 2018

Choose a reason for hiding this comment

pfalcon Nov 30, 2018

Choose a reason for hiding this comment

pfalcon Nov 30, 2018

Choose a reason for hiding this comment

pfalcon Nov 30, 2018

Choose a reason for hiding this comment

pfalcon Nov 30, 2018

Choose a reason for hiding this comment

pfalcon commented Dec 1, 2018

tbursztyka commented Dec 3, 2018

pfalcon commented Dec 3, 2018 • edited

tbursztyka commented Oct 12, 2018 •

edited

codecov-io commented Oct 12, 2018 •

edited

pfalcon commented Dec 3, 2018 •

edited