Add sampling to pping #13

simosund · 2021-02-25T18:29:26Z

After some thinking and a bit of discussion with @abrunstrom and @sferlin, I've found that there are a number of potential issues that needs to be solved in order to get sampling to work well. Before I start actually implementing the sampling, I would like to get some feedback on if my plan for how to do the sampling makes sense. Therefore I've started documenting some of the challenges and how I plan to solve them. And instead of sending a bunch of mails back and fourth, I figured we could simply have that discussion in this PR instead.

So this PR doesn't contain any code changes (yet), only a markdown document (SAMPLING_DESIGN.md) which I hope can serve as the basis for some discussion. I still have some sections left which I plan to fill in tomorrow, and then we can maybe discuss some of these issues on the next meeting.

Rewrite/regroup/reorder some points for the General pping section. Also add some new points, add some additional comments to existing points, and check in the "Skip pure ACKs" as complated. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

Add a document outlining my thoughts for how to implement sampling. Intended both as a basis for discussion, as well as being a form of documentation. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

tohojo · 2021-02-25T22:14:32Z

Immediate thoughts: The "keep track of the last seen identifier and sample the first one seen" was what immediately came to mind to me as well, so that makes sense!

As for the possibility of "missing" an event, one option could be to sample in a "window". I.e., when you figure out that it's time to do a new sample, you save the timestamps of the next (say) 3 identifiers. And when the replies come back you also rate limit the output so you'll end up only outputing one of those three samples when they come back (since they should be close together).

As for sampling frequency, I'd say start with the simple thing: A fixed interval that defaults to something sensible but which can be overridden at the command line. And then add dynamic RTT-based sampling as the next thing :)

Add sections on per-flow state, graceful degradation and some implementation considerations. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

simosund · 2021-02-26T11:52:22Z

Thanks for your comments @tohojo. I've also been thinking about allowing a timestamping a few consecutive packets, however I didn't think about also rate-limiting the output. So that will probably be a good addition to make as well.

Starting simple with the sampling frequency sounds like a good choice. Overall, I will probably start of with something pretty basic, and then add more of the features I discuss after hand (as you approve of the basics :)). But still wanted to outline the overall requirements for this whole sampling thing early on, so that I don't continue building on that basic thing, only to discover after a while that it has some fatal flaw and I have to start all over (guess there's still a risk for that, but can at least try to minimize it somewhat).

sferlin · 2021-03-01T09:19:32Z

I was thinking a bit about the sampling as well (head sampling, i.e., sample some consecutive packets at the beginning of a flow), a time-based window sampling (this one is more intuitive and easier to overwrite per cmd line). About the output I have not thought about it. maybe I missed something, the benefit of it.

tohojo · 2021-03-01T11:40:11Z

See also Kathie's visualisation tool for pping: https://github.com/pollere/ppviz - haven't played with it yet, but may be worth keeping in mind...

simosund · 2021-03-01T11:57:12Z

Yhea, saw that visualization tool from the email-conversation you passed along. Haven't tried it out yet either, but might be useful to keep the same "machine readable" format as her pping does so one can use that tool with both pping implementations (or at least have it as an option, we might want to output some additional information, like protocol, in the future).

Add a per-flow rate limit, limiting how often new timestamp entries can be created. As part of this, add per-flow state keeping track of when last timestamp was created and last seen identifier for each flow. Additionally, remove timestamp entry as soon as RTT is calculated, as last seen identifier is used to find first unique value instead. Furthermore, remove packet_timestamp struct and only use __u64 as timestamp, as used memeber is no longer needed. This initial commit lacks cleanup of flow-state, user-configuration of rate limit, mechanism to handle bursts etc. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

simosund · 2021-03-01T17:37:59Z

Have pushed a very early implementation of the per-flow rate limitation now. This still lacks many key components that I will add on later (see commit message). However, first I plan to update the documentation with some of the outcomes from today's discussion.

Also, before going further I also think I should attempt the fix @tohojo previously suggested to always load the TC program with libbpf. If I've understood it correctly, the idea is to let the current userspace program (which loads and attaches the XDP program) also load and pin the TC program using libbpf. Then it simply uses tc to attach the pinned BPF program. This way, it should have libbpf support regardless of if iproute has libbpf support or not, and I won't have to deal with the dual case iproute with/without libbpf support anymore (which in this commit leads to two definitions of the flow_state map).

Update SAMPLING_DESIGN.md, partly based on discussions during the meeting with Red Hat on 2021-03-01. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

tohojo · 2021-03-02T11:05:04Z

pping/pping_kern_tc.c

 		.nh = { .pos = pctx.data },
 	};
+	struct flow_state *f_state;
+	struct flow_state new_state = { 0 }; // Rarely-ish used, but can't really dynamically allocate memory or?


No, you can't. But allocating this on the heap would be inefficient anyway since it's such a small value :)

pping/pping_kern_tc.c

tohojo · 2021-03-02T11:06:23Z

pping/pping_kern_tc.c

+				    BPF_NOEXIST);
+		f_state = bpf_map_lookup_elem(&flow_state, &p_id.flow);
+		if (!f_state)
+			goto end;


The convention in the kernel is generally to use 'out' for these kinds of 'end of function' labels. You're being consistent with 'end', though, and obviously I haven't noticed before, but just thought I'd mention it since I did notice this time ;)

Ok, good to know. Have fixed it in the latest commit.

tohojo · 2021-03-02T11:06:53Z

pping/pping_kern_tc.c

+#define BURST_DURATION                                                         \
+	4000000 // 4ms, duration for when it may burst packet timestamps
+#define BURST_SIZE                                                             \
+	3 // Number of packets it may create timestamps for in a burst


Maybe don't define these until you're actually using them?

True, just got a bit ahead of myself.

tohojo · 2021-03-02T11:08:13Z

pping/pping_kern_tc.c

 	__uint(type, BPF_MAP_TYPE_HASH);
 	__uint(key_size, sizeof(struct packet_id));
-	__uint(value_size, sizeof(struct packet_timestamp));
+	__uint(value_size, sizeof(__u64));


Oh, BTW, when you're using BTF-defined maps instead of value_size and key_size, you can use __type(key, struct packet_id) and __type(value, __u64) (IIRC that's the syntax), which will also encode the actual types into the BTF information that goes into the kernel with the map, which makes it possible for bpftool to print out the types, for instance...

Right, I remember seeing that somewhere (possibly in traffic-pacing-edt example), but forgot it once writing it myself. Have addressed it for all maps except the perf-buffer one which didn't seem too happy about it (guess __u32 might not be the correct type, just right size).

Load and pin the tc-bpf program in pping.c using libbpf, and only attach the pinned program using iproute. That way, can use features that are not supported by the old iproute loader, even if iproute does not have libbpf support. To support this change, extend bpf_egress_loader with option to load pinned program. Additionally, remove configure script and parts of Makefile that are no longer needed. Furthermore, remove multiple definitions of ts_start map, and place singular definition in pping_helpers.h which is included by both BPF programs. Also, some minor fixes based on Toke's review. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

tohojo · 2021-03-03T15:29:47Z

pping/pping.c

+/* 		return mkdir(path, 0700) ? -errno : 0; */
+/* 	} */
+/* 	return S_ISDIR(st.st_mode) ? 0 : -EEXIST; */
+/* } */


Just remove code instead of commenting it out - it can always be resurrected from the git history...

Oops, forgot to clean that up. Often temporarily comment out some code while working on it, but missed removing it before the commit in this case.

tohojo · 2021-03-03T15:33:48Z

pping/pping.c

+	 * Could use bpf_obj__unpin_maps(obj, PINNED_DIR) if it only tried
+	 * unpinning pinned maps. But as it also attempts (and fails) to unpin
+	 * maps that aren't pinned, will instead manually unpin the one pinned
+	 * map for now.


Should be possible to call it with a NULL second argument, in which case it will only try to unpin the maps that are marked as pinned (so as long as it's the same bpf_object you originally pinned the maps on it should just work).

Completely missed that possibility. In that case that would be a much nicer solution.

Remove some out-commented code. Also use bpf_object__unpin_maps instead of manually unpinning the ts_start map. Additionally, change map_ipv4_to_ipv6 to use clearer implementation (that now also works for tc due to always using libbpf to load program). Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

Add parsing of TCP FIN/RST to determine if connection is being closed, and if so delete state directly from BPF programs. Only enabled on tc-program, as verifier is unhappy about in on XDP side for some reason. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

simosund · 2021-03-09T19:17:37Z

Hi @tohojo,
Have pushed some new commits that attempts to add cleanup of the flow state map. The first commit adds the periodic userspace cleanup, which seems to work. The second commit contains an initial implementation of the BPF side cleanup, which attemps to delete flow state if the packet parsing detects that the flow is being closed (FIN or RST flags).

I couldn't get it to work on the XDP side however, as the verifier started complaining then. I'm still not very good at figuring out what the verifier finds problematic, and being tired/frustrated probably doesn't help. When the verifier also started to complain simply from adding a bpf_printk statement that didn't access packet data at all I decided to give up for the day. So if you have the time and energy, it would be great if you could take a look at it (I have commented out the section I have problem with in pping_kern_xdp.c, as well as the printk line in pping_helpers.h/parse_tcp_identifier().

tohojo · 2021-03-09T22:05:04Z

Simon Sundberg <notifications@github.com> writes:

Hi @tohojo, Have pushed some new commits that attempts to add cleanup of the flow state map. The first commit adds the periodic userspace cleanup, which seems to work. The second commit contains an initial implementation of the BPF side cleanup, which attemps to delete flow state if the packet parsing detects that the flow is being closed (FIN or RST flags). I couldn't get it to work on the XDP side however, as the verifier started complaining then. I'm still not very good at figuring out what the verifier finds problematic, and being tired/frustrated probably doesn't help. When the verifier also started to complain simply from adding a bpf_printk statement that didn't access packet data at all I decided to give up for the day. So if you have the time and energy, it would be great if you could take a look at it (I have commented out the section I have problem with in `pping_kern_xdp.c`, as well as the printk line in `pping_helpers.h/parse_tcp_identifier()`.

Hmm, that's a really good question! I can see why you're frustrated... As far as I can tell, the immediate cause of the trouble is that including the bpf_printk() statement causes the compiler to allocate registers differently, which means that some of the later code uses the r7 register instead of the r1 register to store the packet offsets they are loading from. And this means that the verifier suddenly thinks that those packet reads are out of bounds. Not sure if this is a verifier bug, or a compiler bug, or both. Or if there is some other subtle bug lurking somewhere that just gets triggered by this. But at least the above is an explanation :) The bpf_map_delete() by itself seems to work fine for me if I leave only the bpf_printk commented out...

simosund · 2021-03-10T09:04:27Z

Hmm, that's a really good question! I can see why you're frustrated...
As far as I can tell, the immediate cause of the trouble is that
including the bpf_printk() statement causes the compiler to allocate
registers differently, which means that some of the later code uses the
r7 register instead of the r1 register to store the packet offsets they
are loading from. And this means that the verifier suddenly thinks that
those packet reads are out of bounds.

Not sure if this is a verifier bug, or a compiler bug, or both. Or if
there is some other subtle bug lurking somewhere that just gets
triggered by this. But at least the above is an explanation :)

Thanks for the answer, makes it a bit clearer what's going on at least, even if it doesn't really make it any clearer for me how to fix it. Of course, the bpf_printk isn't needed, but it is one of my primary ways to debug my programs and figure out what's going on, so a bit annoying when that itself causes issues (or possibly triggers some other issue that I'm unaware of).

The bpf_map_delete() by itself seems to work fine for me if I leave only
the bpf_printk commented out...

Interesting, as I get a different error when I try to run with that (without the bpf_printk). Warning, massive output incomming:

$ sudo ./pping test123
libbpf: load bpf program failed: Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf: 
; int pping_ingress(struct xdp_md *ctx)
0: (bf) r6 = r1
1: (b7) r1 = 0
; struct packet_id p_id = { 0 };
2: (7b) *(u64 *)(r10 -8) = r1
last_idx 2 first_idx 0
regs=2 stack=0 before 1: (b7) r1 = 0
3: (7b) *(u64 *)(r10 -16) = r1
4: (7b) *(u64 *)(r10 -24) = r1
5: (7b) *(u64 *)(r10 -32) = r1
6: (7b) *(u64 *)(r10 -40) = r1
7: (7b) *(u64 *)(r10 -48) = r1
; struct rtt_event event = { 0 };
8: (7b) *(u64 *)(r10 -56) = r1
9: (7b) *(u64 *)(r10 -64) = r1
10: (7b) *(u64 *)(r10 -72) = r1
11: (7b) *(u64 *)(r10 -80) = r1
12: (7b) *(u64 *)(r10 -88) = r1
13: (7b) *(u64 *)(r10 -96) = r1
14: (7b) *(u64 *)(r10 -104) = r1
; .data_end = (void *)(long)ctx->data_end,
15: (61) r2 = *(u32 *)(r6 +4)
; .data = (void *)(long)ctx->data,
16: (61) r4 = *(u32 *)(r6 +0)
; if (nh->pos + hdrsize > data_end)
17: (bf) r1 = r4
18: (07) r1 += 14
; if (nh->pos + hdrsize > data_end)
19: (2d) if r1 > r2 goto pc+249
 R1_w=pkt(id=0,off=14,r=14,imm=0) R2_w=pkt_end(id=0,off=0,imm=0) R4_w=pkt(id=0,off=0,r=14,imm=0) R6_w=ctx(id=0,off=0,imm=0) R10=fp0 fp-8_w=00000000 fp-16_w=00000000 fp-24_w=00000000 fp-32_w=00000000 fp-40_w=00000000 fp-48_w=00000000 fp-56_w=00000000 fp-64_w=00000000 fp-72_w=00000000 fp-80_w=00000000 fp-88_w=00000000 fp-96_w=00000000 fp-104_w=00000000
; 
20: (71) r3 = *(u8 *)(r4 +13)
21: (67) r3 <<= 8
22: (71) r5 = *(u8 *)(r4 +12)
23: (4f) r3 |= r5
; if (!proto_is_vlan(h_proto))
24: (15) if r3 == 0xa888 goto pc+1
 R1_w=pkt(id=0,off=14,r=14,imm=0) R2_w=pkt_end(id=0,off=0,imm=0) R3_w=inv(id=0) R4_w=pkt(id=0,off=0,r=14,imm=0) R5_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R6_w=ctx(id=0,off=0,imm=0) R10=fp0 fp-8_w=00000000 fp-16_w=00000000 fp-24_w=00000000 fp-32_w=00000000 fp-40_w=00000000 fp-48_w=00000000 fp-56_w=00000000 fp-64_w=00000000 fp-72_w=00000000 fp-80_w=00000000 fp-88_w=00000000 fp-96_w=00000000 fp-104_w=00000000
25: (55) if r3 != 0x81 goto pc+19
 R1=pkt(id=0,off=14,r=14,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv129 R4=pkt(id=0,off=0,r=14,imm=0) R5=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R6=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=00000000 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000
; if (vlh + 1 > data_end)
26: (bf) r5 = r4
27: (07) r5 += 18
; if (vlh + 1 > data_end)
28: (2d) if r5 > r2 goto pc+16
 R1=pkt(id=0,off=14,r=18,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv129 R4=pkt(id=0,off=0,r=18,imm=0) R5_w=pkt(id=0,off=18,r=18,imm=0) R6=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=00000000 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000
; 
29: (71) r1 = *(u8 *)(r4 +16)
30: (71) r3 = *(u8 *)(r4 +17)
31: (67) r3 <<= 8
32: (4f) r3 |= r1
; if (!proto_is_vlan(h_proto))
33: (15) if r3 == 0xa888 goto pc+2
 R1=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R2=pkt_end(id=0,off=0,imm=0) R3=inv(id=0) R4=pkt(id=0,off=0,r=18,imm=0) R5=pkt(id=0,off=18,r=18,imm=0) R6=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=00000000 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000
; 
34: (bf) r1 = r5
; if (!proto_is_vlan(h_proto))
35: (55) if r3 != 0x81 goto pc+9
 R1_w=pkt(id=0,off=18,r=18,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv129 R4=pkt(id=0,off=0,r=18,imm=0) R5=pkt(id=0,off=18,r=18,imm=0) R6=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=00000000 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000
; 
36: (bf) r0 = r4
37: (07) r0 += 22
38: (bf) r1 = r5
; if (vlh + 1 > data_end)
39: (2d) if r0 > r2 goto pc+5
 R0_w=pkt(id=0,off=22,r=22,imm=0) R1_w=pkt(id=0,off=18,r=22,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv129 R4=pkt(id=0,off=0,r=22,imm=0) R5=pkt(id=0,off=18,r=22,imm=0) R6=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=00000000 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000
; 
40: (71) r1 = *(u8 *)(r4 +20)
41: (71) r3 = *(u8 *)(r4 +21)
42: (67) r3 <<= 8
43: (4f) r3 |= r1
44: (bf) r1 = r0
; if (proto == bpf_htons(ETH_P_IP)) {
45: (57) r3 &= 65535
46: (15) if r3 == 0xdd86 goto pc+96
 R0=pkt(id=0,off=22,r=22,imm=0) R1=pkt(id=0,off=22,r=22,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3_w=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)) R4=pkt(id=0,off=0,r=22,imm=0) R5=pkt(id=0,off=18,r=22,imm=0) R6=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=00000000 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000
47: (55) if r3 != 0x8 goto pc+221
 R0=pkt(id=0,off=22,r=22,imm=0) R1=pkt(id=0,off=22,r=22,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3_w=inv8 R4=pkt(id=0,off=0,r=22,imm=0) R5=pkt(id=0,off=18,r=22,imm=0) R6=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=00000000 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000
48: (b7) r3 = 2
; p_id->flow.ipv = AF_INET;
49: (73) *(u8 *)(r10 -6) = r3
; if (iph + 1 > data_end)
50: (bf) r3 = r1
51: (07) r3 += 20
; if (iph + 1 > data_end)
52: (2d) if r3 > r2 goto pc+216
 R0=pkt(id=0,off=22,r=42,imm=0) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3_w=pkt(id=0,off=42,r=42,imm=0) R4=pkt(id=0,off=0,r=42,imm=0) R5=pkt(id=0,off=18,r=42,imm=0) R6=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000
; hdrsize = iph->ihl * 4;
53: (71) r3 = *(u8 *)(r1 +0)
; hdrsize = iph->ihl * 4;
54: (67) r3 <<= 2
55: (57) r3 &= 60
56: (b7) r4 = 8
; if(hdrsize < sizeof(iph))
57: (2d) if r4 > r3 goto pc+211
 R0=pkt(id=0,off=22,r=42,imm=0) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv(id=0,umin_value=8,umax_value=60,var_off=(0x0; 0x3c)) R4=inv8 R5=pkt(id=0,off=18,r=42,imm=0) R6=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000
; if (nh->pos + hdrsize > data_end)
58: (bf) r4 = r1
59: (0f) r4 += r3
last_idx 59 first_idx 57
regs=8 stack=0 before 58: (bf) r4 = r1
regs=8 stack=0 before 57: (2d) if r4 > r3 goto pc+211
 R0=pkt(id=0,off=22,r=42,imm=0) R1_r=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3_rw=invP(id=0,umax_value=60,var_off=(0x0; 0x3c)) R4_rw=inv8 R5=pkt(id=0,off=18,r=42,imm=0) R6=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000
parent didn't have regs=8 stack=0 marks
last_idx 56 first_idx 45
regs=8 stack=0 before 56: (b7) r4 = 8
regs=8 stack=0 before 55: (57) r3 &= 60
regs=8 stack=0 before 54: (67) r3 <<= 2
regs=8 stack=0 before 53: (71) r3 = *(u8 *)(r1 +0)
; if (nh->pos + hdrsize > data_end)
60: (2d) if r4 > r2 goto pc+208
 R0=pkt(id=0,off=22,r=42,imm=0) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=invP(id=0,umin_value=8,umax_value=60,var_off=(0x0; 0x3c)) R4_w=pkt(id=1,off=22,r=22,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5=pkt(id=0,off=18,r=42,imm=0) R6=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000
61: (b7) r3 = 1
; return iph->protocol;
62: (71) r5 = *(u8 *)(r1 +9)
; if (proto == IPPROTO_TCP) {
63: (55) if r5 != 0x6 goto pc+205
 R0=pkt(id=0,off=22,r=42,imm=0) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3_w=inv1 R4_w=pkt(id=1,off=22,r=22,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5_w=inv6 R6=ctx(id=0,off=0,imm=0) R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000
; if (h + 1 > data_end)
64: (bf) r7 = r4
65: (07) r7 += 20
; if (h + 1 > data_end)
66: (2d) if r7 > r2 goto pc+202
 R0=pkt(id=0,off=22,r=42,imm=0) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv1 R4=pkt(id=1,off=22,r=42,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5=inv6 R6=ctx(id=0,off=0,imm=0) R7=pkt(id=1,off=42,r=42,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000
; len = h->doff * 4;
67: (69) r8 = *(u16 *)(r4 +12)
68: (7b) *(u64 *)(r10 -112) = r8
; len = h->doff * 4;
69: (77) r8 >>= 2
70: (57) r8 &= 60
71: (b7) r0 = 8
; if(len < sizeof(h))
72: (2d) if r0 > r8 goto pc+196
 R0_w=inv8 R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv1 R4=pkt(id=1,off=22,r=42,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5=inv6 R6=ctx(id=0,off=0,imm=0) R7=pkt(id=1,off=42,r=42,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R8_w=inv(id=0,umin_value=8,umax_value=60,var_off=(0x0; 0x3c)) R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112_w=mmmmmmmm
; if (nh->pos + len > data_end)
73: (bf) r0 = r4
74: (0f) r0 += r8
last_idx 74 first_idx 66
regs=100 stack=0 before 73: (bf) r0 = r4
regs=100 stack=0 before 72: (2d) if r0 > r8 goto pc+196
regs=100 stack=0 before 71: (b7) r0 = 8
regs=100 stack=0 before 70: (57) r8 &= 60
regs=100 stack=0 before 69: (77) r8 >>= 2
regs=100 stack=0 before 68: (7b) *(u64 *)(r10 -112) = r8
regs=100 stack=0 before 67: (69) r8 = *(u16 *)(r4 +12)
; if (nh->pos + len > data_end)
75: (2d) if r0 > r2 goto pc+193
 R0=pkt(id=2,off=22,r=22,umin_value=16,umax_value=120,var_off=(0x0; 0x7c),s32_min_value=0,s32_max_value=124,u32_max_value=124) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv1 R4=pkt(id=1,off=22,r=42,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5=inv6 R6=ctx(id=0,off=0,imm=0) R7=pkt(id=1,off=42,r=42,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R8=invP(id=0,umin_value=8,umax_value=60,var_off=(0x0; 0x3c)) R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112=mmmmmmmm
76: (b7) r9 = 21
; if (tcph + 1 > data_end || len <= sizeof(struct tcphdr))
77: (2d) if r9 > r8 goto pc+191
 R0=pkt(id=2,off=22,r=22,umin_value=16,umax_value=120,var_off=(0x0; 0x7c),s32_min_value=0,s32_max_value=124,u32_max_value=124) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv1 R4=pkt(id=1,off=22,r=42,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5=inv6 R6=ctx(id=0,off=0,imm=0) R7=pkt(id=1,off=42,r=42,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R8=invP(id=0,umin_value=21,umax_value=60,var_off=(0x0; 0x3c)) R9_w=inv21 R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112=mmmmmmmm
; if (pos + 1 > opt_end || pos + 1 > data_end)
78: (bf) r8 = r4
79: (07) r8 += 21
; if (pos + 1 > opt_end || pos + 1 > data_end)
80: (2d) if r8 > r2 goto pc+188
 R0=pkt(id=2,off=22,r=22,umin_value=16,umax_value=120,var_off=(0x0; 0x7c),s32_min_value=0,s32_max_value=124,u32_max_value=124) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv1 R4=pkt(id=1,off=22,r=43,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5=inv6 R6=ctx(id=0,off=0,imm=0) R7=pkt(id=1,off=42,r=43,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R8_w=pkt(id=1,off=43,r=43,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R9_w=inv21 R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112=mmmmmmmm
81: (2d) if r8 > r0 goto pc+187
 R0=pkt(id=2,off=22,r=22,umin_value=16,umax_value=120,var_off=(0x0; 0x7c),s32_min_value=0,s32_max_value=124,u32_max_value=124) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv1 R4=pkt(id=1,off=22,r=43,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5=inv6 R6=ctx(id=0,off=0,imm=0) R7=pkt(id=1,off=42,r=43,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R8_w=pkt(id=1,off=43,r=43,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R9_w=inv21 R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112=mmmmmmmm
; opt = *pos;
82: (71) r9 = *(u8 *)(r7 +0)
; if (opt == 0) // Reached end of TCP options
83: (15) if r9 == 0x0 goto pc+185
 R0=pkt(id=2,off=22,r=22,umin_value=16,umax_value=120,var_off=(0x0; 0x7c),s32_min_value=0,s32_max_value=124,u32_max_value=124) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv1 R4=pkt(id=1,off=22,r=43,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5=inv6 R6=ctx(id=0,off=0,imm=0) R7=pkt(id=1,off=42,r=43,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R8=pkt(id=1,off=43,r=43,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R9=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112=mmmmmmmm
84: (15) if r9 == 0x1 goto pc+190
 R0=pkt(id=2,off=22,r=22,umin_value=16,umax_value=120,var_off=(0x0; 0x7c),s32_min_value=0,s32_max_value=124,u32_max_value=124) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv1 R4=pkt(id=1,off=22,r=43,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5=inv6 R6=ctx(id=0,off=0,imm=0) R7=pkt(id=1,off=42,r=43,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R8=pkt(id=1,off=43,r=43,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R9=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112=mmmmmmmm
; if (pos + 2 > opt_end || pos + 2 > data_end)
85: (bf) r5 = r4
86: (07) r5 += 22
87: (7b) *(u64 *)(r10 -120) = r5
; if (pos + 2 > opt_end || pos + 2 > data_end)
88: (2d) if r5 > r2 goto pc+180
 R0=pkt(id=2,off=22,r=22,umin_value=16,umax_value=120,var_off=(0x0; 0x7c),s32_min_value=0,s32_max_value=124,u32_max_value=124) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv1 R4=pkt(id=1,off=22,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5_w=pkt(id=1,off=44,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R6=ctx(id=0,off=0,imm=0) R7=pkt(id=1,off=42,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R8=pkt(id=1,off=43,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R9=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112=mmmmmmmm fp-120_w=pkt
89: (79) r5 = *(u64 *)(r10 -120)
90: (2d) if r5 > r0 goto pc+178
 R0=pkt(id=2,off=22,r=22,umin_value=16,umax_value=120,var_off=(0x0; 0x7c),s32_min_value=0,s32_max_value=124,u32_max_value=124) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv1 R4=pkt(id=1,off=22,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5_w=pkt(id=1,off=44,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R6=ctx(id=0,off=0,imm=0) R7=pkt(id=1,off=42,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R8=pkt(id=1,off=43,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R9=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112=mmmmmmmm fp-120_w=pkt
; if (opt == 8 && opt_size == 10) {
91: (71) r8 = *(u8 *)(r8 +0)
92: (55) if r9 != 0x8 goto pc+180
 R0=pkt(id=2,off=22,r=22,umin_value=16,umax_value=120,var_off=(0x0; 0x7c),s32_min_value=0,s32_max_value=124,u32_max_value=124) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv1 R4=pkt(id=1,off=22,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5=pkt(id=1,off=44,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R6=ctx(id=0,off=0,imm=0) R7=pkt(id=1,off=42,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R8=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R9=inv8 R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112=mmmmmmmm fp-120=pkt
93: (55) if r8 != 0xa goto pc+179

from 93 to 273: R0=pkt(id=2,off=22,r=22,umin_value=16,umax_value=120,var_off=(0x0; 0x7c),s32_min_value=0,s32_max_value=124,u32_max_value=124) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv1 R4=pkt(id=1,off=22,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5=pkt(id=1,off=44,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R6=ctx(id=0,off=0,imm=0) R7=pkt(id=1,off=42,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R8=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R9=inv8 R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112=mmmmmmmm fp-120=pkt
; pos += opt_size;
273: (0f) r7 += r8
last_idx 273 first_idx 273
 R0=pkt(id=2,off=22,r=22,umin_value=16,umax_value=120,var_off=(0x0; 0x7c),s32_min_value=0,s32_max_value=124,u32_max_value=124) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv1 R4=pkt(id=1,off=22,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5=pkt(id=1,off=44,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R6=ctx(id=0,off=0,imm=0) R7_r=pkt(id=1,off=42,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R8_r=invP(id=0,umax_value=255,var_off=(0x0; 0xff)) R9=inv8 R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112=mmmmmmmm fp-120=pkt
parent didn't have regs=100 stack=0 marks
last_idx 93 first_idx 92
regs=100 stack=0 before 93: (55) if r8 != 0xa goto pc+179
regs=100 stack=0 before 92: (55) if r9 != 0x8 goto pc+180
 R0_r=pkt(id=2,off=22,r=22,umin_value=16,umax_value=120,var_off=(0x0; 0x7c),s32_min_value=0,s32_max_value=124,u32_max_value=124) R1_r=pkt(id=0,off=22,r=42,imm=0) R2_r=pkt_end(id=0,off=0,imm=0) R3_r=invP1 R4_r=pkt(id=1,off=22,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5_w=pkt(id=1,off=44,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R6_r=ctx(id=0,off=0,imm=0) R7_r=pkt(id=1,off=42,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R8_rw=invP(id=0,umax_value=255,var_off=(0x0; 0xff)) R9_r=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R10=fp0 fp-8_r=00000m00 fp-16_r=00000000 fp-24_r=00000000 fp-32_r=00000000 fp-40_r=00000000 fp-48=00000000 fp-56_r=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112_r=mmmmmmmm fp-120_w=pkt
parent didn't have regs=100 stack=0 marks
last_idx 91 first_idx 83
regs=100 stack=0 before 91: (71) r8 = *(u8 *)(r8 +0)
; 
274: (bf) r8 = r7
; if (pos + 1 > opt_end || pos + 1 > data_end)
275: (bf) r7 = r8
276: (07) r7 += 1
; if (pos + 1 > opt_end || pos + 1 > data_end)
277: (2d) if r7 > r0 goto pc-9
 R0=pkt(id=2,off=22,r=22,umin_value=16,umax_value=120,var_off=(0x0; 0x7c),s32_min_value=0,s32_max_value=124,u32_max_value=124) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv1 R4=pkt(id=1,off=22,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5=pkt(id=1,off=44,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R6=ctx(id=0,off=0,imm=0) R7_w=pkt(id=5,off=43,r=0,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R8_w=pkt(id=5,off=42,r=0,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R9=inv8 R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112=mmmmmmmm fp-120=pkt
278: (2d) if r7 > r2 goto pc-10
 R0=pkt(id=2,off=22,r=22,umin_value=16,umax_value=120,var_off=(0x0; 0x7c),s32_min_value=0,s32_max_value=124,u32_max_value=124) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv1 R4=pkt(id=1,off=22,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5=pkt(id=1,off=44,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R6=ctx(id=0,off=0,imm=0) R7_w=pkt(id=5,off=43,r=43,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R8_w=pkt(id=5,off=42,r=43,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R9=inv8 R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112=mmmmmmmm fp-120=pkt
; opt = *pos;
279: (71) r9 = *(u8 *)(r8 +0)
280: (7b) *(u64 *)(r10 -120) = r9
; if (opt == 0) // Reached end of TCP options
281: (15) if r9 == 0x0 goto pc-13
 R0=pkt(id=2,off=22,r=22,umin_value=16,umax_value=120,var_off=(0x0; 0x7c),s32_min_value=0,s32_max_value=124,u32_max_value=124) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv1 R4=pkt(id=1,off=22,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5=pkt(id=1,off=44,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R6=ctx(id=0,off=0,imm=0) R7=pkt(id=5,off=43,r=43,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R8=pkt(id=5,off=42,r=43,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R9=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112=mmmmmmmm fp-120=mmmmmmmm
282: (79) r9 = *(u64 *)(r10 -120)
283: (15) if r9 == 0x1 goto pc+16
 R0=pkt(id=2,off=22,r=22,umin_value=16,umax_value=120,var_off=(0x0; 0x7c),s32_min_value=0,s32_max_value=124,u32_max_value=124) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv1 R4=pkt(id=1,off=22,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5=pkt(id=1,off=44,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R6=ctx(id=0,off=0,imm=0) R7=pkt(id=5,off=43,r=43,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R8=pkt(id=5,off=42,r=43,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R9_w=inv(id=0) R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112=mmmmmmmm fp-120=mmmmmmmm
; if (pos + 2 > opt_end || pos + 2 > data_end)
284: (bf) r5 = r8
285: (07) r5 += 2
286: (7b) *(u64 *)(r10 -128) = r5
; if (pos + 2 > opt_end || pos + 2 > data_end)
287: (2d) if r5 > r0 goto pc-19
 R0=pkt(id=2,off=22,r=22,umin_value=16,umax_value=120,var_off=(0x0; 0x7c),s32_min_value=0,s32_max_value=124,u32_max_value=124) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv1 R4=pkt(id=1,off=22,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5_w=pkt(id=5,off=44,r=43,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R6=ctx(id=0,off=0,imm=0) R7=pkt(id=5,off=43,r=43,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R8=pkt(id=5,off=42,r=43,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R9_w=inv(id=0) R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112=mmmmmmmm fp-120=mmmmmmmm fp-128_w=pkt
288: (79) r9 = *(u64 *)(r10 -128)
289: (2d) if r9 > r2 goto pc-21
 R0=pkt(id=2,off=22,r=22,umin_value=16,umax_value=120,var_off=(0x0; 0x7c),s32_min_value=0,s32_max_value=124,u32_max_value=124) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv1 R4=pkt(id=1,off=22,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5=pkt(id=5,off=44,r=44,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R6=ctx(id=0,off=0,imm=0) R7=pkt(id=5,off=43,r=44,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R8=pkt(id=5,off=42,r=44,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R9=pkt(id=5,off=44,r=44,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112=mmmmmmmm fp-120=mmmmmmmm fp-128=pkt
; if (opt == 8 && opt_size == 10) {
290: (71) r7 = *(u8 *)(r7 +0)
291: (7b) *(u64 *)(r10 -128) = r7
292: (79) r7 = *(u64 *)(r10 -120)
293: (55) if r7 != 0x8 goto pc+3
 R0=pkt(id=2,off=22,r=22,umin_value=16,umax_value=120,var_off=(0x0; 0x7c),s32_min_value=0,s32_max_value=124,u32_max_value=124) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv1 R4=pkt(id=1,off=22,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5=pkt(id=5,off=44,r=44,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R6=ctx(id=0,off=0,imm=0) R7_w=inv8 R8=pkt(id=5,off=42,r=44,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R9=pkt(id=5,off=44,r=44,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112=mmmmmmmm fp-120=mmmmmmmm fp-128_w=mmmmmmmm
294: (bf) r7 = r8
295: (79) r9 = *(u64 *)(r10 -128)
296: (15) if r9 == 0xa goto pc-203
 R0=pkt(id=2,off=22,r=22,umin_value=16,umax_value=120,var_off=(0x0; 0x7c),s32_min_value=0,s32_max_value=124,u32_max_value=124) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv1 R4=pkt(id=1,off=22,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5=pkt(id=5,off=44,r=44,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R6=ctx(id=0,off=0,imm=0) R7_w=pkt(id=5,off=42,r=44,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R8=pkt(id=5,off=42,r=44,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R9_w=inv(id=0) R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112=mmmmmmmm fp-120=mmmmmmmm fp-128_w=mmmmmmmm
; pos += opt_size;
297: (79) r7 = *(u64 *)(r10 -128)
298: (0f) r8 += r7
last_idx 298 first_idx 297
regs=80 stack=0 before 297: (79) r7 = *(u64 *)(r10 -128)
 R0=pkt(id=2,off=22,r=22,umin_value=16,umax_value=120,var_off=(0x0; 0x7c),s32_min_value=0,s32_max_value=124,u32_max_value=124) R1=pkt(id=0,off=22,r=42,imm=0) R2=pkt_end(id=0,off=0,imm=0) R3=inv1 R4=pkt(id=1,off=22,r=44,umin_value=8,umax_value=60,var_off=(0x0; 0x3c),s32_min_value=0) R5=pkt(id=5,off=44,r=44,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R6=ctx(id=0,off=0,imm=0) R7_w=pkt(id=5,off=42,r=44,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R8_r=pkt(id=5,off=42,r=44,umin_value=8,umax_value=315,var_off=(0x0; 0x1ff),s32_min_value=0,s32_max_value=511,u32_max_value=511) R9_w=inv(id=0) R10=fp0 fp-8=00000m00 fp-16=00000000 fp-24=00000000 fp-32=00000000 fp-40=00000000 fp-48=00000000 fp-56=00000000 fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=00000000 fp-96=00000000 fp-104=00000000 fp-112=mmmmmmmm fp-120=mmmmmmmm fp-128_rw=mmmmmmmm
parent already had regs=0 stack=0 marks
math between pkt pointer and register with unbounded min value is not allowed
processed 216 insns (limit 1000000) max_states_per_insn 0 total_states 17 peak_states 17 mark_read 12

libbpf: -- END LOG --
libbpf: failed to load program 'pping_ingress'
libbpf: failed to load object 'pping_kern_xdp.o'
Failed loading XDP program: Unknown error 4007
Failed unpinning maps: No such file or directory

What I find confusing (other than it suddenly complaining at the code that's parsing the TCP options), is that the specific bpf_map_delete_elem on its own doesn't seem to cause any issues. It works fine if I for example comment out the bpf_perf_event_output. So guess that this might be something similar, i.e. that it causes the compiler to use different regiesters and the verifier is unhappy about it.

Verifier might have rejected XDP program due to opt_size being loaded from memory, see https://blog.path.net/ebpf-xdp-and-network-security. Add check of opt_size to attempt to convince verifier that it's not a negative value or anything else crazy. Leads to verifier instead thinking the program is too large (over 1m instructions). Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

simosund · 2021-03-15T11:30:08Z

Hi again @tohojo,
Recently just pushed an attempt at a "fix" for the problem that I ran into (but that you apparently didn't?) when attempting to delete the flow-state from the XDP program (but without the prink-statement that breaks the program). I've found this blog-post that seemed to run into a similar problem as me, which is apparently caused by the verifier to forget constraints for variables it retrieves from the stack. So if I understand it correctly, when opt_size (in pping_helpers.h/parse_tcp_ts()) is fetched from the stack at 297, the verifier no longer knows it's a u8, and think it might contain some negative value that puts us outside of the packet.

So my attempt at a fix was to add a little extra check to try and convince the verifier that opt_size can't contain a negative value, (and limited the upper bound to 34 as that's the largest legal size for a TCP option as far as I can tell, simply putting it at 0xff will cause the compiler to skip the check as it's already a u8). But then I run into a new problem of the program suddenly being too large (over 1000001 instructions). I find this a bit strange as I don't feel like my program should have that many paths?

Think I might somehow have gotten stuck in an infinite loop here. As just after pushing this commit I noticed that if I put #pragma unroll in front of the loop it works (as in the program isn't too large anymore, but if I un-comment the line that deletes the flow-state in the XDP program I still run into the same issue, so my fix didn't really fix anything).

So if you have the time, could you check if this fix also causes the verifier to reject the program as being too big for you as well? And if so, any idea why the not-unrolled loop would become an infinite loop (or at least much larger than the unrolled version)? Finally, any idea how I should solve this whole mess?

Add a check that to ensure verifier that opt_size is positive in case its been read in from stack. Also enable (uncomment) the flow-state cleanup from the XDP program as the added check avoids verifier rejection. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

simosund · 2021-03-15T17:43:08Z

With last commit (e04284a) I've added a check (or rather a bitwise AND) that seems to work for convincing the verifier that opt_size must be positive. So now uncommenting the state-info cleanup from the XDP program works for me (even on my ancient 5.8 kernel).

Adding this does however cause the "infinite loop" problem we've talked about (at least for me), so until we've figured that out I've temporarily reverted to unrolling the loop. Will later on squash some of these commits to avoid this whole detour, but figured I'll leave them for now while they're still somewhat relevant to our discussion.

Also, as a note to myself, I must figure out what the MAX_TCP_OPTIONS value should really be, the 10 is just a temporary value I made up a while back and that seems to have worked alright. Need to sit down and figure out what the maximum number of possible TCP options can really be.

abrunstrom · 2021-03-29T19:18:06Z

I can envision an interesting preface about your friendship with the verifier in your thesis :)

tohojo · 2021-03-29T19:43:48Z

abrunstrom ***@***.***> writes:

I can envision an interesting preface about your friendship with the verifier in your thesis :)

Heh, yeah. I've been beating my head against it for a couple of hours now; think I'm just going to give up for the time being :/ One thing I did notice while doing this, though, is that I don't think your code correctly handles the case where opt_size is 0 after you read it from the packet (or at least it is only doing so incidentally by exhausting the loop)?

simosund · 2021-03-30T07:10:51Z

One thing I did notice while doing this, though, is that I don't think your code correctly handles the case where opt_size is 0 after you read it from the packet (or at least it is only doing so incidentally by exhausting the loop)?

Yhea, you are right about that. While opt_size should never be 0, you can of course never know what you get in a packet. So as you say, my code right now should just loop over the same place until it's exhausted the loop. Guess the correct way to handle it would instead be to just terminate the loop immediately.

tohojo · 2021-03-30T07:51:21Z

Simon Sundberg ***@***.***> writes:

Yhea, you are right about that. While `opt_size` _should_ never be 0, you can of course never know what you get in a packet. So as you say, my code right now should just loop over the same place until it's exhausted the loop. Guess the correct way to handle it would instead be to just terminate the loop immediately.

Yup, that works :)

Add a check that opt_size is at least 2 in pping_helpers.h/prase_tcp_ts, otherwise terminate the loop unsucessfully. Only check the lower bound of opt_size, the upper bound will be checked in the first step of the next loop iteration anyways. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

tohojo

A few nits - also, your main() function is getting quite long, may be worth thinking about splitting it. For the BPF code, now that you're compiling and loading everything with libbpf, you could move the two functions into the same source file and avoid some code duplications (and the need for the pping_helpers.h file).

tohojo · 2021-04-12T20:11:15Z

pping/pping.c

 	       duration % NS_PER_SECOND);
+#endif
+cleanup:
+	if (key)


It's perfectly fine to call free() on a NULL value, so you can skip these checks...

tohojo · 2021-04-12T20:12:44Z

pping/pping.c

+ * Periodically cleans out entries from both the packet timestamp map and the
+ * flow state map. Maybe better to split up the cleaning of the maps into two
+ * separate threads instead, to better utilize multi-threading and allow for
+ * maps to be cleaned up at different intervals?


I very much doubt the map cleaning is going to be a bottleneck, so adding threading is just likely to complicate things. If you do want to optimise this in the future, the right way would be to use the in-kernel map iterator functionality... :)

tohojo · 2021-04-12T20:21:15Z

pping/pping_helpers.h

 		if (opt == 8 && opt_size == 10) {
-			if (pos + opt_size > opt_end ||
-			    pos + opt_size > data_end)
+			if (pos + 10 > opt_end || pos + 10 > data_end)


I think that if you add a check before the loop for opt_end <= data_end, you can avoid these double checks everywhere (i.e., just check against opt_end)...

I think I tried that before but the verifier wasn't convinced. That said, it was quite some time ago and have changed some things around since then, so can try that again and see if the verifier is smart enough this time.

Had another try at this, but still can't seem to convince the verifier that if opt_end <= data_end, then anything < opt_end must also < data_end. So think the double checks will have to be there for now.

Not sure if it's just that the verifier is too stupid, or if the opt_end get's placed on the stack and then the verifier forgets about its limitations.

Merge the pping_kern_tc.c, pping_kern_xdp.c and pping_helpers.h into the single file pping_kern.c. Do not change any of the BPF code, except renaming the map ts_start to packet_ts. To handle both BPF programs kept in single ELF-file, change loading mechanism to extract and attach both tc and XDP programs from it. Also refactor main-method into several smaller functions to reduce its size. Finally, added the --force (-f) and --cleanup-interval (-c) options to the argument parsing, and improved the parsing of the --rate-limit (-r) option. NOTE: The verifier rejects program in it's current state as too large (over 1 million instructions). Setting the TCP_MAX_OPTIONS in pping_kern.c to 5 (or less) solves this. Unsure at the moment what causes the verifier to think the program is so large, as the code in pping_kern.c is identical to the one from the three files it was merged from. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

simosund · 2021-04-15T12:38:45Z

Hi @tohojo, think my latest commit adress most of your points. There is however one major problem with it, the verifier rejects it due to hitting the 1 million instruction limit. I find this very confusing as:

The BPF-program code is the same as before, simply moved into a single file
The loop is unrolled, so it can't really be an infinite loop. That said, it still seems to be connected to the loop, as lowering the loop count to 5 will make it work. After having a look at the generated BPF bytecode for 5 and 6 loops, I didn't see any major differences between them other than an additional part of the "loop", and some registers being changed around. Although from my experience I guess some registers being changed around can actually be part of the problem.
The generated BPF bytecode is about 600 instructions for the common functions and about 100 instructions each for the separate BPF programs. Even if I call some bpf-helpers, I have a hard time seeing how it can reach 1 million instructions.

simosund · 2021-04-15T12:44:54Z

Will also need to update the documentation a bit to reflect the changes from this commit, as incorrect documentation is worse than no documentation at all. That said, I want to hold off on it for a little while in case we need to do some other restructure to solve this "infinite loop" (or whatever that causes the verifier to think the program is so large) problem.

tohojo · 2021-04-15T20:21:47Z

AFAICT what happens is that the compiler shuffles the code around in a way that includes a backwards jump even though the loop gets unrolled. And so the verifier still needs to trace all the jumps around, can't convince itself that the program exits, and ends up exiting due to the instruction limit.

I think we may be at the point where we need to post this on the upstream list and see if someone there knows how the verifier can be made smarter, or if there's a way to convince the compiler to emit code that passes. Do you think you could produce a minimal self-contained example suitable for posting upstream? :)

simosund · 2021-04-16T07:55:54Z

AFAICT what happens is that the compiler shuffles the code around in a way that includes a backwards jump even though the loop gets unrolled. And so the verifier still needs to trace all the jumps around, can't convince itself that the program exits, and ends up exiting due to the instruction limit.

Makes sense I guess, there are indeed some backward jumps in the bytecode. That said, those backjumps also exist for the 5 loop version, so guess it just it just gets too complex to track the jumps for the verifier after 6 loops? It also works fine for the XDP program (if I basically just comment out all the code in the tc program) which should hit the same backward jumps, so not sure what differeniates them that's relevant here.

I think we may be at the point where we need to post this on the upstream list and see if someone there knows how the verifier can be made smarter, or if there's a way to convince the compiler to emit code that passes. Do you think you could produce a minimal self-contained example suitable for posting upstream? :)

I can try, but as I'm not sure exactly what's triggering the problem I don't really know how to recreate it in a minimal fashion. I guess some of the key ingredients are two programs in the same file sharing a common function with a loop, but there's also some secret sauce which I don't really know what it is.

Is there any way to get the verifier output even when it passes a program? Would be interesting to know how many instructions it exhausts on the XDP program, as well as when only using 5 loops. See if those are anywhere near 1 million (based on the much shorter loading times I suspect not).

tohojo · 2021-04-16T10:36:00Z

Simon Sundberg ***@***.***> writes:

> AFAICT what happens is that the compiler shuffles the code around in > a way that includes a backwards jump even though the loop gets > unrolled. And so the verifier still needs to trace all the jumps > around, can't convince itself that the program exits, and ends up > exiting due to the instruction limit. Makes sense I guess, there are indeed some backward jumps in the bytecode. That said, those backjumps also exist for the 5 loop version, so guess it just it just gets too complex to track the jumps for the verifier after 6 loops? It also works fine for the XDP program (if I basically just comment out all the code in the tc program) which should hit the same backward jumps, so not sure what differeniates them that's relevant here.

Erm, no idea? I guess the context around the call to that function trips the verifier for some reason?

> I think we may be at the point where we need to post this on the > upstream list and see if someone there knows how the verifier can be > made smarter, or if there's a way to convince the compiler to emit > code that passes. Do you think you could produce a minimal > self-contained example suitable for posting upstream? :) I can try, but as I'm not sure exactly what's triggering the problem I don't really know how to recreate it in a minimal fashion. I guess some of the key ingredients are two programs in the same file sharing a common function with a loop, but there's also some secret sauce which I don't really know what it is.

"Minimal" doesn't have to mean "small". It should be as small as possible, but no smaller. And it should be self-contained so you can put it into a single email, which I guess means copying over the helpers from parsing_helpers.h. So I'd start by doing this, and then delete as much code as you can while still hitting the bug.

Is there any way to get the verifier output even when it passes a program? Would be interesting to know how many instructions it exhausts on the XDP program, as well as when only using 5 loops. See if those are anywhere near 1 million (based on the much shorter loading times I suspect not).

Sure, the verifier will spit out the log in any case if requested to do so. The trouble is that libbpf doesn't really expose this in a good way, so you may have to hack the loading function. IIRC, it's enough to set log_buf_size to a non-zero value at the top of load_program() in libbpf, and set log_level in struct bpf_prog_load_attr (which means loading with bpf_object__load_xattr() or bpf_prog_load_xattr().

simosund · 2021-04-16T10:46:34Z

Sure, the verifier will spit out the log in any case if requested to do
so. The trouble is that libbpf doesn't really expose this in a good way,
so you may have to hack the loading function. IIRC, it's enough to set
log_buf_size to a non-zero value at the top of load_program() in libbpf,
and set log_level in struct bpf_prog_load_attr (which means loading with
bpf_object__load_xattr() or bpf_prog_load_xattr().

Hmm, might toy around a bit with that later, as I guess that means I need to set the map pin-location in some different manner. For now though I simply put some of my bpftrace knowledge to use to fetch it from the verifier's bpf_verifier_env struct using a kprobe and kretprobe for do_check().

tohojo · 2021-04-16T10:53:26Z

Simon Sundberg ***@***.***> writes:

> Sure, the verifier will spit out the log in any case if requested to do so. The trouble is that libbpf doesn't really expose this in a good way, so you may have to hack the loading function. IIRC, it's enough to set log_buf_size to a non-zero value at the top of load_program() in libbpf, and set log_level in struct bpf_prog_load_attr (which means loading with bpf_object__load_xattr() or bpf_prog_load_xattr(). Hmm, might toy around a bit with that later, as I guess that means I need to set the map pin-location in some different manner.

No, you can still open with open_opts() setting pin_root_path, and then load with load_xattr... bpf_object__load() is literally just: int bpf_object__load(struct bpf_object *obj) { struct bpf_object_load_attr attr = { .obj = obj, }; return bpf_object__load_xattr(&attr); }

For now though I simply put some of my bpftrace knowledge to use to fetch it from the verifier's `bpf_verifier_env` struct using a kprobe and kretprobe for `do_check()`.

That works too, I guess ;)

simosund · 2021-04-16T11:36:39Z

No, you can still open with open_opts() setting pin_root_path, and then
load with load_xattr... bpf_object__load() is literally just:

But for just getting the processed instructions (which was mainly what I was after), my bpftrace script seems to work fine as well.

Anyways, seems like even before merged the two programs into a single part I wasn't too far away from hitting the verifier limit, as if comment out the entire XDP program (so the file only contains a single program) it still uses about 700k instructions to verify the tc program. So this is not really a new issue, in the sense that I would have hit it before as well if I just increased my loop count to say 13-15. When both the XDP and tc programs are present, it uses about 500k instructions to verify the XDP program and then hits the 1m limit for the tc program. And it's quite interesting how commenting out different parts of the code makes a differenence to the number of instructions the verifier needs to process.

For example, I've noted that if I remove the "switch" logic on saddr and daddr depending on if it's egress or ingress then the XDP and tc programs will use about the same amount of instructions (whereas with the switch logic the tc programs needs about twice the number of instructions processed). If I drop the IPv6 branch then falls down to just 8k for the XDP program and 16k for the tc program.

tohojo · 2021-04-16T13:10:39Z

Simon Sundberg ***@***.***> writes:

> No, you can still open with open_opts() setting pin_root_path, and then load with load_xattr... bpf_object__load() is literally just: But for just getting the processed instructions (which was mainly what I was after), my bpftrace script seems to work fine as well.

Ah, you just wanted the number of processed instructions? Sure, bpftrace will probably work better for that :)

Anyways, seems like even before merged the two programs into a single part I wasn't too far away from hitting the verifier limit, as if comment out the entire XDP program (so the file only contains a single program) it still uses about 700k instructions to verify the tc program. So this is not really a new issue, in the sense that I would have hit it before as well if I just increased my loop count to say 13-15. When both the XDP and tc programs are present, it uses about 500k instructions to verify the XDP program and then hits the 1m limit for the tc program. And it's quite interesting how commenting out different parts of the code makes a differenence to the number of instructions the verifier needs to process.

Clever! :)

For example, I've noted that if I remove the "switch" logic on saddr and daddr depending on if it's egress or ingress then the XDP and tc programs will use about the same amount of instructions (whereas with the switch logic the tc programs needs about twice the number of instructions processed). If I drop the IPv6 branch then falls down to just 8k for the XDP program and 16k for the tc program.

Ah! I guess you're running into some kind of combinatorial explosion with the loop in skip_ip6hdrext(), then. You could try commenting that out? Another thing to play with is removing the __always_inline to see if you can get each function to actually end up as separate functions in the BPF code. That *should* make the verifier look at each function in isolation, which might also help :)

simosund · 2021-04-16T17:05:16Z

Ah! I guess you're running into some kind of combinatorial explosion
with the loop in skip_ip6hdrext(), then.

Yep, I do indeed seem to be hitting some interesting combinatorial effect.

You could try commenting that out?

Commenting out or reducing the loop count for skip_ip6hdrext() greatly reduces the number of instruction the verifier needs to process.

Another thing to play with is removing the __always_inline to see
if you can get each function to actually end up as separate functions in
the BPF code. That should make the verifier look at each function in
isolation, which might also help :)

Unfortunately removing __always_inline seems to have no effect, the compiler still inlines everything (the generated BPF bytecode is identical). Would need a __never_inline instead =)

I have pushed a kinda minimal working example (or not working example I guess) to the pping_mwe folder on my branch Create_minimal_example. Feel free to have a look at that and let me know what you think. You can probably remove a few more parts of the code, but this was a bit of a compromise to still have a somewhat functioning semi-sensibe BPF program that I could experiment with.

The question is what to do now? Should we still try to report something upstream, and in that case what? That having multiple (unrolled, non-nested) loops seems to cause the verifier to exhaust a very large amount of processed instructions?

And how do we solve it in the meantime? Splitting it up into two separate programs would probably work for now, but based on my testing that still causes the verifier to process a rather large amount of instructions, so may still run into the same problem later on as I continue to add features to pping. Guess another option would be to lower the loop counts somewhat, as I think it's quite rare that packets would need to exhaust all of them anyways.

tohojo · 2021-04-16T17:52:26Z

noinline does exist. You could try googling for the magical incantation for clang and trying that out?

simosund · 2021-04-20T14:42:43Z

noinline does exist. You could try googling for the magical incantation for clang and trying that out?

I've now tried noinline on the skip_ip6hdrext() and parse_tcp_ts() functions (the ones containing the loops), and it did help a little bit. But it was not enough to get under the 1 million instruction limit for the tc program (based on the instructions processed for the XDP program I'm guessing it would need about 1.14 million instructions). Could potentially try and apply it to other methods and see then, but not sure if it's a good idea for me to start telling the compiler to stop inlining a bunch of functions (that it would normally like to inline).

Also tried a bit of the opposite, that is telling the compiler to inline everything (__always_inline). When the programs are kept in separate files it does inline everything, but when the programs are kept in the same file it calls parse_packet_identifier() as an actual function. So was hoping telling the compiler to simply inline it again would get me down below the 1 million limit, but seems like even when using __always_inline the compiler will still generate BPF bytecode where it's called like a function. I've compiled some of the options I've tried and the resulting number of instructions the verifier processed here if it is of interest.

After some discussion with my supervisors we decided it would be good if I didn't get stuck on this too long, so thinking of implementing a bit of a quickfix for now and we can eventually come back to this in the future. I'm planning on just decreasing the maximum number of IPv6 header extensions parsed (they are apparently not very common). Some other options would be to split it up into two files again (largley undoing the last commit) or decreasing the maximum number of parsed TCP options, but think those come with more drawbacks. But if you have a different opinion I'm all ears.

Regarding reporting it upstream, I'm not sure if there's anything in particular to report anymore (that the verifier doesn't like multiple loops?). Another less formal alternative could be to just post it on the EBPF slack channel and see if people have any good ideas on how to get around this.

tohojo · 2021-04-20T15:14:07Z

Simon Sundberg ***@***.***> writes:

> noinline does exist. You could try googling for the magical incantation for clang and trying that out? I've now tried `noinline` on the `skip_ip6hdrext()` and `parse_tcp_ts()` functions (the ones containing the loops), and it did help a little bit. But it was not enough to get under the 1 million instruction limit for the tc program (based on the instructions processed for the XDP program I'm guessing it would need about 1.14 million instructions). Could potentially try and apply it to other methods and see then, but not sure if it's a good idea for me to start telling the compiler to stop inlining a bunch of functions (that it would normally like to inline).

Ah, too bad - was hoping that would be a quick fix ;)

Also tried a bit of the opposite, that is telling the compiler to inline everything (`__always_inline`). When the programs are kept in separate files it does inline everything, but when the programs are kept in the same file it calls `parse_packet_identifier()` as an actual function. So was hoping telling the compiler to simply inline it again would get me down below the 1 million limit, but seems like even when using `__always_inline` the compiler will still generate BPF bytecode where it's called like a function. I've compiled some of the options I've tried and the resulting number of instructions the verifier processed [here](https://github.com/simosund/bpf-examples/tree/Create_minimal_example/pping_mwe) if it is of interest.

Wow, nice writeup! :) Adding the kernel and compiler versions you're testing this on is the only thing I can see that's missing for this to be a useful reference for asking people; see below.

After some discussion with my supervisors we decided it would be good if I didn't get stuck on this too long, so thinking of implementing a bit of a quickfix for now and we can eventually come back to this in the future. I'm planning on just decreasing the maximum number of IPv6 header extensions parsed (they are apparently not very common). Some other options would be to split it up into two files again (largley undoing the last commit) or decreasing the maximum number of parsed TCP options, but think those come with more drawbacks. But if you have a different opinion I'm all ears.

Totally agreed, don't get held up on this; just lower one or both loop counts and carry on!

Regarding reporting it upstream, I'm not sure if there's anything in particular to report anymore (that the verifier doesn't like multiple loops?). Another less formal alternative could be to just post it on the EBPF slack channel and see if people have any good ideas on how to get around this.

Given that you've already created such a comprehensive writeup, I definitely think you should at least ask upstream! You could do this either on Slack or on the mailing list; or both? On Slack you can probably get away with just putting in the link above, but on the mailing list you probably need to do a little bit of a writeup. You could just edit the existing README file a bit and turn it into an email, then send that to ***@***.*** and ***@***.*** I guess posting the link on Slack is easiest, so try that first, then send the email afterwards if you don't get any replies? :)

abrunstrom · 2021-04-21T10:26:24Z

Nice summary Simon! I agree, send it on the Slack channel and let's move on to the output :)

Reduce IPV6_EXT_MAX_CHAIN to 3 to avoid hitting the verifier limit of processing 1 million instructions, This results in fewer loops in parsing_helpers.h/skip_ip6hdrnext which simplifies the verifier analysis. IPv6 extension headers do not appear to be that common, so this is unlikely to cause a considerable limitation. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

Update documentation to reflect the current state of pping (after merging pping_kern_tc and pping_kern_xdp into a single file). Also add another point to the TODO list that has been discussed at a previous meeting. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

simosund · 2021-04-22T16:22:31Z

Hi again @tohojo. I've now reduced the number of IPv6 extensions parsed, so that the program loads again. So if you could have a final look at it we could perhaps get it merged in. I think I've addressed all of your requests (minus double checks for opt_end and data_end, which I can't seem to manage without), but there could of course be something more to do as well.

Now that we load both the BPF programs from the same object-file, I guess we don't need to pin the maps anymore? So guess I could also remove the code related to that (not that much, since libbpf takes care of most of it).

tohojo · 2021-04-23T09:56:30Z

Yeah, good point about the pinning! I agree, let's get rid of it. We may need to bring it back later for when you enable background monitoring (in that case there won't necessarily have to be a long-running process at all if a cron job can just ping the maps), but for now it's just complicating stuff and incurring a dependency on bpffs...

Couldn't spot anything else to complain about, so I'm fine with merging that once you've removed the pinning :)

When both BPF programs are kept in the same file, no longer need to pin the maps in order to share them between the programs. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

simosund · 2021-04-23T12:24:07Z

Yeah, good point about the pinning! I agree, let's get rid of it. We may need to bring it back later for when you enable background monitoring (in that case there won't necessarily have to be a long-running process at all if a cron job can just ping the maps), but for now it's just complicating stuff and incurring a dependency on bpffs...

Have now removed the code related to pinning the maps (as stated, not many lines). That said, we don't really get rid of the dependency on bpffs as we still pin the tc program (so we don't need to rely on tc loading the program, only attaching it).

tohojo · 2021-04-23T12:51:30Z

Right, but libbpf will hopefully gain support for attaching the tc progs soon, so that can go away as well :)

simosund added 2 commits February 25, 2021 16:33

pping: Update TODO-list

a9c276c

Rewrite/regroup/reorder some points for the General pping section. Also add some new points, add some additional comments to existing points, and check in the "Skip pure ACKs" as complated. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

pping: Add document about sampling design

ae1d89c

Add a document outlining my thoughts for how to implement sampling. Intended both as a basis for discussion, as well as being a form of documentation. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

pping: Update sampling design document

6e51360

Add sections on per-flow state, graceful degradation and some implementation considerations. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

pping: Update sampling design document

1282bce

Update SAMPLING_DESIGN.md, partly based on discussions during the meeting with Red Hat on 2021-03-01. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

tohojo reviewed Mar 2, 2021

View reviewed changes

pping/pping_kern_tc.c Outdated Show resolved Hide resolved

tohojo reviewed Mar 2, 2021

View reviewed changes

simosund added 2 commits March 2, 2021 17:40

pping: Update and linebreak TODO

3bd3333

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

tohojo requested changes Mar 3, 2021

View reviewed changes

simosund added 3 commits March 9, 2021 12:21

pping: Add periodic cleanup of flow state

0b2107f

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

simosund mentioned this pull request Mar 15, 2021

Consolidate duplicate tc-loader scripts #9

Open

tohojo requested changes Apr 12, 2021

View reviewed changes

simosund added 2 commits April 22, 2021 17:51

pping: Remove pinning of maps

20c6dbe

When both BPF programs are kept in the same file, no longer need to pin the maps in order to share them between the programs. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>

tohojo merged commit 7ff771f into xdp-project:master Apr 23, 2021

simosund deleted the pping_Add_Sampling branch April 29, 2021 10:15

Add sampling to pping #13

Add sampling to pping #13

Uh oh!

Conversation

simosund commented Feb 25, 2021

Uh oh!

tohojo commented Feb 25, 2021

Uh oh!

simosund commented Feb 26, 2021

Uh oh!

sferlin commented Mar 1, 2021

Uh oh!

tohojo commented Mar 1, 2021

Uh oh!

simosund commented Mar 1, 2021

Uh oh!

simosund commented Mar 1, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simosund commented Mar 9, 2021

Uh oh!

tohojo commented Mar 9, 2021 via email

Uh oh!

simosund commented Mar 10, 2021

Uh oh!

simosund commented Mar 15, 2021

Uh oh!

simosund commented Mar 15, 2021

Uh oh!

abrunstrom commented Mar 29, 2021

Uh oh!

tohojo commented Mar 29, 2021 via email

Uh oh!

simosund commented Mar 30, 2021

Uh oh!

tohojo commented Mar 30, 2021 via email

Uh oh!

tohojo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simosund commented Apr 15, 2021

Uh oh!

simosund commented Apr 15, 2021

Uh oh!

tohojo commented Apr 15, 2021

Uh oh!

simosund commented Apr 16, 2021

Uh oh!