Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

L2 rework #10547

Merged
merged 13 commits into from Nov 30, 2018
Merged

L2 rework #10547

merged 13 commits into from Nov 30, 2018

Conversation

tbursztyka
Copy link
Collaborator

@tbursztyka tbursztyka commented Oct 12, 2018

Changing the way packet are sent: instead of going like net_if_send_data -> l2 send -> net_if tx queue -> net_if_tx -> driver send
it will be now:
net_if_send _data -> net_if tx queue -> net_if_tx -> l2 send -> driver send

This will enable removing ll reserve from l2 as well as doing interesting optimization in L2 side.

  • Now L2 is responsible for unrefing the packet on success. (net_if still keeps that responsibility in case of error) so drivers do not have to care about it.

  • l2 send does not return a verdict, but a int: 0+ (lenght of sent packet) on succes, negative errno otherwise.

  • driver API have to provide their own send/tx function, relatively to their respective L2. There is no more the net_if_api send function. That will permit l2 to have whatever signature they need for that function (15.4 has done so for quite a while, but it was thus a hack).

@tgorochowik Could you check if the sam gmac driver still works fine? It's a bit different in there, as it queues tx pkt. I hope I did not break it.
@Vudentz It was relatively simple in bt L2 to do the change, but I could not test it. I believe it's fine though.

All of this is part of the preliminary work to revise net_pkt buffer management totally.

@codecov-io
Copy link

codecov-io commented Oct 12, 2018

Codecov Report

Merging #10547 into master will increase coverage by 0.07%.
The diff coverage is 62.32%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10547      +/-   ##
==========================================
+ Coverage    48.2%   48.28%   +0.07%     
==========================================
  Files         279      279              
  Lines       43312    43316       +4     
  Branches    10372    10372              
==========================================
+ Hits        20878    20913      +35     
+ Misses      18288    18252      -36     
- Partials     4146     4151       +5
Impacted Files Coverage Δ
include/net/ethernet.h 56.25% <ø> (ø) ⬆️
include/net/net_if.h 58.97% <ø> (-1.03%) ⬇️
drivers/ethernet/eth_native_posix.c 23.62% <0%> (+0.54%) ⬆️
subsys/net/l2/ethernet/arp.c 57.23% <100%> (+2.74%) ⬆️
include/net/net_pkt.h 85.84% <100%> (+0.13%) ⬆️
subsys/net/l2/dummy/dummy.c 100% <100%> (ø) ⬆️
drivers/net/loopback.c 80% <100%> (+2.22%) ⬆️
subsys/net/l2/ethernet/ethernet.c 56.01% <51.13%> (+2.98%) ⬆️
subsys/net/ip/net_if.c 62.65% <62.5%> (+0.19%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 94b72e7...673ff91. Read the comment docs.

Copy link
Member

@tgorochowik tgorochowik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some left-over references to api->send, and the code doesn't compile:

In file included from /zephyr-path/subsys/net/ip/net_if.c:15:
/zephyr-path/subsys/net/ip/net_if.c: In function ‘init_iface’:
/zephyr-path/subsys/net/ip/net_if.c:246:17: error: ‘const struct net_if_api’ has no member named ‘send’
   NET_ASSERT(api->send);
                 ^~
/zephyr-path/include/net/net_core.h:64:9: note: in definition of macro ‘NET_ASSERT’
   if (!(cond)) {          \
         ^~~~
make[2]: *** [zephyr/subsys/net/ip/CMakeFiles/subsys__net__ip.dir/build.make:89: zephyr/subsys/net/ip/CMakeFiles/subsys__net__ip.dir/net_if.c.obj] Error 1
make[1]: *** [CMakeFiles/Makefile2:1808: zephyr/subsys/net/ip/CMakeFiles/subsys__net__ip.dir/all] Error 2

Therefore I am unable to test it on sam gmac.

@tbursztyka
Copy link
Collaborator Author

It always amaze me how local sanitycheck running never finds out such obvious thing...

@rveerama1
Copy link
Collaborator

Overall looks good.

@@ -329,7 +329,6 @@ struct ethernet_context {
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "net/ethernet: Remove as many ifdef for CONFIG_NET_VLAN as possible" could be sent to upstream separately.

@tgorochowik
Copy link
Member

I just tested this on sam e70 with the dhcpv4_client sample with NET_BUF_POOL_USAGE and NET_SHELL enabled.

It is quite weird, your changes should only affect the the TX side in that driver, but zephyr seems to be leaking RX buffers.

After pinging zephyr with ping 192.168.0.101 -i 0.00001 for a few minutes, the buffer pools look like this:

uart:~$ net mem
Fragment length 128 bytes
Network buffer pools:
Address         Total   Avail   Name
0x20407638      14      40      RX
0x20407654      14      21      TX
0x204076e4      36      1       RX DATA (rx_bufs)
0x2040770c      36      36      TX DATA (tx_bufs)

And it stops responding to further pings.

I am going to dismiss my review for now, as this might be a broader issue. The TX side that you changed here seems to work fine.

@tgorochowik tgorochowik dismissed their stale review October 30, 2018 09:35

Retested on SAM E70

@tbursztyka
Copy link
Collaborator Author

@tgorochowik thanks for retesting this. Looks like I may have messed up your driver because icmp echo replies are just built on rx echo requests packets, so these net_pkt do come for RX reserve. I'll check

@pfalcon
Copy link
Contributor

pfalcon commented Oct 31, 2018

Looks like I may have messed up your driver because icmp echo replies are just built on rx echo requests packets, so these net_pkt do come for RX reserve.

Well, that's bad. That's a hack, and we won't have a clean/reliable stack until we give that "optimization" up. RX packets are for RX, TX for TX. They may come from absolutely different pool implementations. E.g., from hardware-backed pool, allowing only RX or only TX, respectively.

@tbursztyka
Copy link
Collaborator Author

@pfalcon

Well, that's bad. That's a hack, and we won't have a clean/reliable stack until we give that "optimization" up. RX packets are for RX, TX for TX. They may come from absolutely different pool implementations. E.g., from hardware-backed pool, allowing only RX or only TX, respectively.

That has been like that almost since day 1. Actually, this RX/TX differentiation is going to disappear mostly in next to come buffer management rework (using various pools will remain). TX or RX you don't care much: if one of the pool if fully used, the system will eventually be unresponsive (normally, if it's not due to a leak - unlike what I did here - but an overload, timeout will kick in and system will start to response again). So it appears like a hack because of this tx/rx differentiation, but it's actually not a hack.

@pfalcon
Copy link
Contributor

pfalcon commented Oct 31, 2018

That has been like that almost since day 1.

Yeah, I know.

TX or RX you don't care much: if one of the pool if fully used, the system will eventually be unresponsive

Well, user may not care, but hardware definitely does. The processes behind RX and TX are different, and buffer management is part of that process. And there's no need to "help" system become unresponsive by misusing buffers. What should be done with RX buffers is quickly parse/copy data out of it, and get buffer back into the pool to accept new data. Holding onto the RX buffer for longer than needed, misusing it to stuff TX data and putting it into TX queue (which can be arbitrarily long) should be big no-no.

Again, IMHO we won't have a decent stack until we accept things above and implement it like that.

Otherwise, this is "oh, BTW" note for now, it just your comment triggered things I had in mind for a while.

@tbursztyka
Copy link
Collaborator Author

Well, user may not care, but hardware definitely does.

Not really. Hardware buffers, or local driver buffers are always unrelated to net_pkt buffers. We always copy (either on tx or rx path, in the drivers). On rx path, if drivers are not able to get a free net_pkt or the necessary net_buf , it will have to drop the frame from its internal buffers. That's fully nominal.
This TX/RX pool split really is misleading, hopefully we'll get rid of that.

@pfalcon
Copy link
Contributor

pfalcon commented Oct 31, 2018

Hardware buffers, or local driver buffers are always unrelated to net_pkt buffers. We always copy

Ok, so let me reformulate: all the above true if we want [one sweet day] to achieve zero-copy networking. Given that we don't have a reliable networking so far, I'm happy with anything which helps to achieve that first.

@tbursztyka
Copy link
Collaborator Author

0-copy is a different beast. We'll try to get to that eventually, but we are far from it.

@tbursztyka tbursztyka force-pushed the l2_rework branch 2 times, most recently from d4e82fc to 673ff91 Compare November 29, 2018 14:11
Tomasz Bursztyka added 13 commits November 30, 2018 15:06
This is currently unoptimized, as all frags are allocated with relevant
ll reserve for such header space. However, this is the first step
towards getting rid of that ll reserve concept everywhere.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Current code generating Ethernet header is scattered all over the place,
sometimes in functions that are supposed to check something (and not
filling the header). Not to say about innefficiency.

Src ll address does not need to be set in L2 as net_if.c handles that
already.

Broadcast dst ll address is the same in ipv4 or ipv6, thus factorizing.
In each case, multicast is filled in only at the relevant place.

This is the first step towards changing L2 sending logic, when L2 send
API function will be the only point of sending. The redirection from
driver to L2 again (which finally uses the right device API function to
send) it a temporary hack.

This simplifies the code but will also enable using statically
allocated net_buf and ethernet header payload buffer afterwards.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Currently, first part is done in L2's send, then the next one in
ethernet device driver net_if send function. That last one was already
moved to a L2 based implementation. Let's just move forward and place
the whole logic of the L2's send in that second function.

This is the first step, ethernet centric only, to move towards a
one-pass sending logic in net stack. In future, net_if's send will
disappear.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Now that sending is done at last time, in one pass, no need to go
through net_if_send_data here.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Now instead of such path:

net_if_send_data -> L2's send -> net_if tx_queue -> net_if_tx -> driver
net_if's send

It will be:

net_if_send_data -> net_if tx_queue -> net_if_tx -> L2's send -> driver
net_if's send

Only Ethernet is adapted, but 15.4 and bt will follow up.
All Ethernet drivers are made compatible with that new scheme also.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
And adapt loopback and slip drivers relevantly

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
As for Ethernet, up to ieee802154 L2's send to actually sent the packet.
It's currently unoptimized as 6lo compression, 15.4 fragmentation and so
on will reallocate net_buf etc... but it's the first step towards
removing ll reserve space and more.

Applying changes to Openthread L2 as well.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
This is a much more trivial move than in Ethernet or ieee802154 L2s.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Many tests use either Ethernet or Dummy L2 and as such require
modifications towards the driver API on their fake devices.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
And apply that to modem driver setting that pointer to NULL.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Like ARP, GPTP works on top of Ethernet but is considered as a layer 2
protocol, so we cannot set an AF_* family type. Instead let's have a bit
telling that current packet is a gptp message (family has to be
AF_UNSPEC).

The bit is unionized with a TCP related bit: TCP cannot be present on an
AF_UNSPEC packet so there will not be any collision.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Instead of redoing what Ethernet L2 already does, let just create the
gptp message without any Ethenet header. Which one will be done as
sending phase by Ethernet L2 relevantly.

Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Such sent flag is in a union in net_pkt, shared with a gptp flag.
Tweaking it when the family is not AF_INET or AF_INET6 will generate
corrupted gptp packets.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Copy link
Member

@jukkar jukkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, also the latest version fixes memory leak in sam-e70 board

@nashif nashif merged commit 10c4841 into zephyrproject-rtos:master Nov 30, 2018
net_pkt_set_vlan_tag(pkt, net_eth_get_vlan_tag(iface));

eth = net_eth_fill_header(ctx, pkt, htons(NET_ETH_PTYPE_ARP),
NULL, NULL);
if(!net_eth_fill_header(ctx, pkt, htons(NET_ETH_PTYPE_ARP),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it went to the mainline like that, with missing space after "if"?

@@ -71,7 +71,8 @@ void net_arp_init(void);
*/

#else /* CONFIG_NET_ARP */

#define net_arp_prepare(_kt, _u1, _u2) _kt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice names.

@@ -49,8 +49,9 @@ struct net_l2 {
* This function is used by net core to push a packet to lower layer
* (interface's L2), which in turn might work on the packet relevantly.
* (adding proper header etc...)
* Returns a negative error code, or the number of bytes sent otherwise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Driver patches as included in this commit don't seem to follow "the number of bytes sent otherwise", so stats would be off (unless I'm mixing up all the send()'s around).


if (net_pkt_family(pkt) != AF_INET6) {
return NET_DROP;
return -EINVAL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe -EPROTONOTSUP or how it's called?

* to be sent but has not reached the driver
* yet. Used only if defined(CONFIG_NET_TCP)
*/
union {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure bitfield union works as expected? I'd expect compiler bugs.

@pfalcon
Copy link
Contributor

pfalcon commented Dec 1, 2018

@tbursztyka, I'm curious how well this was tested. Changes to existing Ethernet drivers are pretty light, whereas this refactor immediately breaks my cute WIP driver #11680, where I took freedom to rely on the fact that any net_buf (except last) has len of even u32_t's. Apparently, any optimization in Zephyr is premature so far ;-). But I wonder if no other driver had anything like that?

@tbursztyka
Copy link
Collaborator Author

@pfalcon would have been nice to get your review before it got merged. This PR has been public for more than a month.

@pfalcon
Copy link
Contributor

pfalcon commented Dec 3, 2018

This PR has been public for more than a month.

Really? In my list it's up since ~ end of June, before you went on leave, though maybe it was just a branch, not PR then. Who'd have known that would be merged last week? ;-)

Otherwise, I kinda skip such reviews, because you definitely already heard my comments: this is too complex, can we split it up, etc. I just accept that some changes are necessarily complex, and rely on our discussion in Le Mans that it's otherwise OK. You're absolutely right that I should at least eyeball them for simple issues nonetheless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants