From 2fa275a4639224577bce34a4cdf550102ba86ca2 Mon Sep 17 00:00:00 2001 From: liycheng Date: Mon, 13 Jan 2025 15:22:50 -0800 Subject: [PATCH 1/2] Update 2024-12-31-Request-delay-200ms-becauseof-MTU-setting-in-IPVS-Loadbalancer-with-IPIP-forwarding-method.md --- ...ting-in-IPVS-Loadbalancer-with-IPIP-forwarding-method.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/_posts/2024-12-31-Request-delay-200ms-becauseof-MTU-setting-in-IPVS-Loadbalancer-with-IPIP-forwarding-method.md b/_posts/2024-12-31-Request-delay-200ms-becauseof-MTU-setting-in-IPVS-Loadbalancer-with-IPIP-forwarding-method.md index 1f62315..62475d5 100644 --- a/_posts/2024-12-31-Request-delay-200ms-becauseof-MTU-setting-in-IPVS-Loadbalancer-with-IPIP-forwarding-method.md +++ b/_posts/2024-12-31-Request-delay-200ms-becauseof-MTU-setting-in-IPVS-Loadbalancer-with-IPIP-forwarding-method.md @@ -58,7 +58,7 @@ This is the tcpdump file captured from the 'lo' and 'eth0' interfaces on the cli - Packet number 259044 is the first packet sent from 10.0.0.1 to 10.0.0.2. The timestamp is 09:05:53.638466, indicating a delay of approximately 207.76 ms compared to the timestamp 09:05:53.430710 of packet number 257858. -But the strange thing is, there was another request almost at the same time, but it exhibited different behavior. +The strange thing is, there was another request almost at the same time, it exhibited different behavior. ![](/assets/2024-12-31-TCPDUMP-request-A.png) @@ -240,7 +240,7 @@ Let's trace the kernel stack of the function icmp_send(skb, ICMP_DEST_UNREACH, I From the call stack, we can see that it checks the route MTU and attempts to perform IP fragmentation. If the DF (Don't Fragment) bit is set in the IP header, the kernel will send an ICMP packet with type 3 (ICMP_DEST_UNREACH) and code 4 (ICMP_FRAG_NEEDED). -There were two requests occurring almost simultaneously. The first request triggers the ICMP Destination Unreachable packet from the load balancer, which then changes the route MTU to 1480. The second request performs MSS (Maximum Segment Size) negotiation based on an MTU of 1500, but the route MTU has already been changed to 1480. Thus, when the packet length exceeds 1500, an ICMP packet with type 3 (ICMP_DEST_UNREACH) and code 4 (ICMP_FRAG_NEEDED) is sent from the kernel with the source IP equal to the destination IP. +There were two requests occurring almost simultaneously. The request A triggers the ICMP Destination Unreachable packet from the load balancer, which then changes the route MTU to 1480. The request B also performs MSS (Maximum Segment Size) negotiation based on an MTU of 1500(while the route MTU has already been changed to 1480 because of request A). Thus, when the packet length exceeds 1500, an ICMP packet with type 3 (ICMP_DEST_UNREACH) and code 4 (ICMP_FRAG_NEEDED) is sent from the kernel with the source IP equal to the destination IP. If we examine the statistics in the Linux kernel using netstat, we can see some relevant data: ``` @@ -333,7 +333,7 @@ Upon further tracing, in the function tcp_v4_err(), the packet is handled by tcp tcp_v4_mtu_reduced() is invoked when the socket is released by release_sock(). In tcp_v4_mtu_reduced(), tcp_simple_retransmit() is called [tcp_simple_retransmit()](https://elixir.bootlin.com/linux/v5.15.126/source/net/ipv4/tcp_ipv4.c#L372), but the packet is not sent out, so TCP retransmission is not triggered in this scenario, unlike request A [tcp_input.c](https://elixir.bootlin.com/linux/v5.15.126/source/net/ipv4/tcp_input.c#L2770). -Upon further tracing, in this condition, the packet is sent again during the handling of the TCP probe timer, resulting in a delay of more than 200ms. +In such scenario, the packet, from the request B, instead be sent in TCP retransmissionis triggered by ICMP, it is sent again during the handling of the TCP probe timer, resulting in a delay of more than 200ms. ``` From 76b452b7e6cf252e66f64032e096a8f1c8666f9e Mon Sep 17 00:00:00 2001 From: liycheng Date: Mon, 13 Jan 2025 15:29:34 -0800 Subject: [PATCH 2/2] Update 2024-12-31-Request-delay-200ms-becauseof-MTU-setting-in-IPVS-Loadbalancer-with-IPIP-forwarding-method.md --- ...ng-in-IPVS-Loadbalancer-with-IPIP-forwarding-method.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/_posts/2024-12-31-Request-delay-200ms-becauseof-MTU-setting-in-IPVS-Loadbalancer-with-IPIP-forwarding-method.md b/_posts/2024-12-31-Request-delay-200ms-becauseof-MTU-setting-in-IPVS-Loadbalancer-with-IPIP-forwarding-method.md index 62475d5..de5573b 100644 --- a/_posts/2024-12-31-Request-delay-200ms-becauseof-MTU-setting-in-IPVS-Loadbalancer-with-IPIP-forwarding-method.md +++ b/_posts/2024-12-31-Request-delay-200ms-becauseof-MTU-setting-in-IPVS-Loadbalancer-with-IPIP-forwarding-method.md @@ -240,7 +240,7 @@ Let's trace the kernel stack of the function icmp_send(skb, ICMP_DEST_UNREACH, I From the call stack, we can see that it checks the route MTU and attempts to perform IP fragmentation. If the DF (Don't Fragment) bit is set in the IP header, the kernel will send an ICMP packet with type 3 (ICMP_DEST_UNREACH) and code 4 (ICMP_FRAG_NEEDED). -There were two requests occurring almost simultaneously. The request A triggers the ICMP Destination Unreachable packet from the load balancer, which then changes the route MTU to 1480. The request B also performs MSS (Maximum Segment Size) negotiation based on an MTU of 1500(while the route MTU has already been changed to 1480 because of request A). Thus, when the packet length exceeds 1500, an ICMP packet with type 3 (ICMP_DEST_UNREACH) and code 4 (ICMP_FRAG_NEEDED) is sent from the kernel with the source IP equal to the destination IP. +There were two requests occurring almost simultaneously. The request A triggers the ICMP Destination Unreachable packet from the load balancer, which leads the changes of the route MTU from 1500 to be 1480. The request B also performs MSS (Maximum Segment Size) negotiation based on an MTU of 1500(while the route MTU has already been changed to 1480 because of the request A). Since the packet length exceeds 1500, it triggers the ICMP Destination Unreachable packet as well. If we examine the statistics in the Linux kernel using netstat, we can see some relevant data: ``` @@ -281,9 +281,9 @@ OR This is based on the held status of the socket [tcp_ipv4.c](https://elixir.bootlin.com/linux/v5.15.126/source/net/ipv4/tcp_ipv4.c#L554) -From the tcpdump, request A triggers TCP retransmission, which helps reduce delay duration. However, request B does not trigger TCP retransmission, resulting in a delay of more than 200ms. +From the tcpdump, request A triggers TCP retransmission, which helps reduce delay duration. However, request B does not trigger TCP retransmission, even it does triggers the ICMP Destination Unreachable packet as the same as request A, this results a delay of more than 200ms of the request B. -To understand how the ICMP packet triggered by request B is handled and why it experiences a delay of over 200ms, we need to examine how this ICMP packet is processed in the Linux kernel: +To understand how the ICMP packet triggered by request B is handled and why it experiences a delay of over 200ms(no TCP retransmission), we need to examine how this ICMP packet is processed in the Linux kernel: ``` b'tcp_v4_err+0x1' @@ -333,7 +333,7 @@ Upon further tracing, in the function tcp_v4_err(), the packet is handled by tcp tcp_v4_mtu_reduced() is invoked when the socket is released by release_sock(). In tcp_v4_mtu_reduced(), tcp_simple_retransmit() is called [tcp_simple_retransmit()](https://elixir.bootlin.com/linux/v5.15.126/source/net/ipv4/tcp_ipv4.c#L372), but the packet is not sent out, so TCP retransmission is not triggered in this scenario, unlike request A [tcp_input.c](https://elixir.bootlin.com/linux/v5.15.126/source/net/ipv4/tcp_input.c#L2770). -In such scenario, the packet, from the request B, instead be sent in TCP retransmissionis triggered by ICMP, it is sent again during the handling of the TCP probe timer, resulting in a delay of more than 200ms. +In such scenario, the packet, from the request B, instead be sent in TCP retransmissionis triggered by "ICMP Destination Unreachable", it is sent again during the handling of the TCP probe timer, resulting in a delay of more than 200ms. ```