Skip to content
Permalink
master
Switch branches/tags
Go to file
 
 
Cannot retrieve contributors at this time

XDP benchmark baseline

Record some baseline benchmarks for XDP.

And describe the hardware Toke and Jesper have.

What kind of benchmarks

XDP\_DROP

Parameters:
  • Different NICs
  • Touching reading data before drop vs not
  • Single RX-queue performance
  • Multi RX-queue performance scaling

Q: Will a packet size test make sense?

Since we are already saturating the PCI bus I don’t think this is needed.

Q: Should we compare against ‘iptables -t raw -j DROP’ ?

Yes, this makes sense; and against DPDK.

XDP\_TX

TODO: Desc in paper how XDP\_TX actually acheives bulking, by delaying the tail/doorbell (until driver exit it’s NAPI call).

XDP\_PASS

Idea: We could measure the overhead XDP introduce, by comparing against iptables-raw drop?

Yes, this makes sense: touch packet in XDP, pass to iptables-raw.

XDP\_REDIRECT

The redirect needs a separate benchmark document.

file://bench05_xdp_redirect.org

Hardware: Jesper

DUT (Device Under Test):
  • CPU: E5-1650 v4 @ 3.60GHz

Jesper have more types of NICs.

Hardware: Toke

  • DUT:
    • model name : Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz

Benchmarks: Jesper

t-res single stream

(Kernel: 4.17.0-rc6-bpf-next-rm-ndo-flush+ #24 SMP PREEMPT)

trex-console command:

start -f stl/udp_for_benchmarks.py -t packet_len=64 --port 0 -m 100%
-Per port stats table 
      ports |               0 |               1 
 -----------------------------------------------------------------------------------------
   opackets |    111670205505 |               0 
     obytes |   7146893156880 |               0 
   ipackets |             184 |             116 
     ibytes |           38924 |            7564 
    ierrors |               0 |               0 
    oerrors |               0 |               0 
      Tx Bw |      50.64 Gbps |       0.00  bps 

-Global stats enabled 
 Cpu Utilization : 99.4  %  17.0 Gb/core 
 Platform_factor : 1.0  
 Total-Tx        :      50.64 Gbps  
 Total-Rx        :       0.00  bps  
 Total-PPS       :      98.92 Mpps  
 Total-CPS       :       0.00  cps  

 Expected-PPS    :       0.00  pps  
 Expected-CPS    :       0.00  cps  
 Expected-BPS    :       0.00  bps  

 Active-flows    :        0  Clients :        0   Socket-util : 0.0000 %    
 Open-flows      :        0  Servers :        0   Socket :        0 Socket/Clients :  -nan 
 Total_queue_full : 658688774         
 drop-rate       :      50.64 Gbps   
 current time    : 1922.0 sec  
 test duration   : 0.0 sec  

Max XDP_DROP dropping speed single RX queue.

[jbrouer@broadwell kernel-bpf-samples]
$ sudo ./xdp_rxq_info --dev mlx5p1 --action XDP_DROP --sec 3

Running XDP on dev:mlx5p1 (ifindex:8) action:XDP_DROP
XDP stats       CPU     pps         issue-pps  
XDP-RX CPU      3       25,928,270  0          
XDP-RX CPU      total   25,928,270 

RXQ stats       RXQ:CPU pps         issue-pps  
rx_queue_index    3:3   25,928,274  0          
rx_queue_index    3:sum 25,928,274 

t-rex testing many streams “clients”

(Kernel: 4.17.0-rc6-bpf-next-rm-ndo-flush+ #24 SMP PREEMPT)

trex-console command:

start -f stl/udp_1pkt_range_clients.py -t packet_len=64 --port 0 -m 100%

Trex performance 88.20 Mpps

-Per port stats table 
      ports |               0 |               1 
 -----------------------------------------------------------------------------------------
   opackets |    130357779874 |               0 
     obytes |   8342897911936 |               0 
   ipackets |             234 |             143 
     ibytes |           49960 |            9292 
    ierrors |               0 |               0 
    oerrors |               0 |               0 
      Tx Bw |      45.16 Gbps |       0.00  bps 

-Global stats enabled 
 Cpu Utilization : 100.0  %  15.1 Gb/core 
 Platform_factor : 1.0  
 Total-Tx        :      45.16 Gbps  
 Total-Rx        :       1.35 Kbps  
 Total-PPS       :      88.20 Mpps  
 Total-CPS       :       0.00  cps  

 Expected-PPS    :       0.00  pps  
 Expected-CPS    :       0.00  cps  
 Expected-BPS    :       0.00  bps  

 Active-flows    :        0  Clients :        0   Socket-util : 0.0000 %    
 Open-flows      :        0  Servers :        0   Socket :        0 Socket/Clients :  -nan 
 Total_queue_full : 1091860676         
 drop-rate       :      45.16 Gbps   
 current time    : 2248.9 sec  
 test duration   : 0.0 sec  

XDP_DROP results: total 75,297,461 pps, and approx 12Mpps per RX queue.

[jbrouer@broadwell kernel-bpf-samples]$ sudo ./xdp_rxq_info --dev mlx5p1 --action XDP_DROP --sec 3

Running XDP on dev:mlx5p1 (ifindex:8) action:XDP_DROP
XDP stats       CPU     pps         issue-pps  
XDP-RX CPU      0       12,617,796  0          
XDP-RX CPU      1       13,106,530  0          
XDP-RX CPU      2       12,499,630  0          
XDP-RX CPU      3       12,276,195  0          
XDP-RX CPU      4       12,528,915  0          
XDP-RX CPU      5       12,268,394  0          
XDP-RX CPU      total   75,297,461 

RXQ stats       RXQ:CPU pps         issue-pps  
rx_queue_index    0:0   12,617,796  0          
rx_queue_index    0:sum 12,617,796 
rx_queue_index    1:1   13,106,511  0          
rx_queue_index    1:sum 13,106,511 
rx_queue_index    2:2   12,499,589  0          
rx_queue_index    2:sum 12,499,589 
rx_queue_index    3:3   12,276,230  0          
rx_queue_index    3:sum 12,276,230 
rx_queue_index    4:4   12,528,917  0          
rx_queue_index    4:sum 12,528,917 
rx_queue_index    5:5   12,268,394  0          
rx_queue_index    5:sum 12,268,394 

Issue is that the CPU have approx 40% idle cycles.

Show adapter(s) (ixgbe1 ixgbe2 mlx5p1 i40e1 i40e2) statistics (ONLY that changed!)
Ethtool(mlx5p1  ) stat:            0 (              0) <= outbound_pci_stalled_wr /sec
Ethtool(mlx5p1  ) stat:     12413351 (     12,413,351) <= rx0_cache_reuse /sec
Ethtool(mlx5p1  ) stat:     12413330 (     12,413,330) <= rx0_xdp_drop /sec
Ethtool(mlx5p1  ) stat:     12936295 (     12,936,295) <= rx1_cache_reuse /sec
Ethtool(mlx5p1  ) stat:     12936306 (     12,936,306) <= rx1_xdp_drop /sec
Ethtool(mlx5p1  ) stat:     12331680 (     12,331,680) <= rx2_cache_reuse /sec
Ethtool(mlx5p1  ) stat:     12331690 (     12,331,690) <= rx2_xdp_drop /sec
Ethtool(mlx5p1  ) stat:     12089538 (     12,089,538) <= rx3_cache_reuse /sec
Ethtool(mlx5p1  ) stat:     12089538 (     12,089,538) <= rx3_xdp_drop /sec
Ethtool(mlx5p1  ) stat:     12359246 (     12,359,246) <= rx4_cache_reuse /sec
Ethtool(mlx5p1  ) stat:     12359234 (     12,359,234) <= rx4_xdp_drop /sec
Ethtool(mlx5p1  ) stat:     12065542 (     12,065,542) <= rx5_cache_reuse /sec
Ethtool(mlx5p1  ) stat:     12065542 (     12,065,542) <= rx5_xdp_drop /sec
Ethtool(mlx5p1  ) stat:     88048303 (     88,048,303) <= rx_64_bytes_phy /sec
Ethtool(mlx5p1  ) stat:   5635163922 (  5,635,163,922) <= rx_bytes_phy /sec
Ethtool(mlx5p1  ) stat:     74194562 (     74,194,562) <= rx_cache_reuse /sec
Ethtool(mlx5p1  ) stat:     13760980 (     13,760,980) <= rx_discards_phy /sec
Ethtool(mlx5p1  ) stat:        93624 (         93,624) <= rx_out_of_buffer /sec
Ethtool(mlx5p1  ) stat:     88049446 (     88,049,446) <= rx_packets_phy /sec
Ethtool(mlx5p1  ) stat:   5635015858 (  5,635,015,858) <= rx_prio0_bytes /sec
Ethtool(mlx5p1  ) stat:     74287687 (     74,287,687) <= rx_prio0_packets /sec
Ethtool(mlx5p1  ) stat:   4457316248 (  4,457,316,248) <= rx_vport_unicast_bytes /sec
Ethtool(mlx5p1  ) stat:     74288600 (     74,288,600) <= rx_vport_unicast_packets /sec
Ethtool(mlx5p1  ) stat:     74194573 (     74,194,573) <= rx_xdp_drop /sec

10:26:26 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft    %idle
10:26:28 PM  all    0.00    0.00    0.08    0.00    1.26   59.61    39.04
10:26:28 PM    0    0.00    0.00    0.00    0.00    1.52   59.60    38.89
10:26:28 PM    1    0.00    0.00    0.00    0.00    1.50   61.50    37.00
10:26:28 PM    2    0.00    0.00    0.00    0.00    1.02   59.90    39.09
10:26:28 PM    3    0.00    0.00    0.00    0.00    1.51   58.29    40.20
10:26:28 PM    4    0.00    0.00    0.00    0.00    1.02   59.90    39.09
10:26:28 PM    5    0.00    0.00    0.00    0.00    1.02   59.18    39.80

10:26:26 PM  CPU    intr/s
10:26:28 PM  all 224677.50
10:26:28 PM    0 246317.00
10:26:28 PM    1 254161.00
10:26:28 PM    2 244789.50
10:26:28 PM    3 241976.00
10:26:28 PM    4 244768.50
10:26:28 PM    5 240606.00

Looking at the NAPI bulking, it is clear that sometimes the NAPI completes with less that 64 packets.

[jbrouer@broadwell prototype-kernel-bpf]$ sudo ./napi_monitor

NAPI RX bulking (measurement period: 2.000216)
bulk[00]	614	(           0 pps)
bulk[01]	775	(         387 pps)
bulk[02]	1361	(       1,361 pps)
bulk[03]	1353	(       2,029 pps)
bulk[04]	1794	(       3,588 pps)
bulk[05]	1965	(       4,912 pps)
bulk[06]	3681	(      11,042 pps)
bulk[07]	2607	(       9,124 pps)
bulk[08]	5051	(      20,202 pps)
bulk[09]	3222	(      14,497 pps)
bulk[10]	3556	(      17,778 pps)
bulk[11]	3586	(      19,721 pps)
bulk[12]	4118	(      24,705 pps)
bulk[13]	4024	(      26,153 pps)
bulk[14]	8025	(      56,169 pps)
bulk[15]	4744	(      35,576 pps)
bulk[16]	6937	(      55,490 pps)
bulk[17]	5301	(      45,054 pps)
bulk[18]	5841	(      52,563 pps)
bulk[19]	5457	(      51,836 pps)
bulk[20]	9812	(      98,109 pps)
bulk[21]	5502	(      57,765 pps)
bulk[22]	11503	(     126,519 pps)
bulk[23]	5710	(      65,658 pps)
bulk[24]	6488	(      77,848 pps)
bulk[25]	5735	(      71,680 pps)
bulk[26]	6745	(      87,676 pps)
bulk[27]	5805	(      78,359 pps)
bulk[28]	13623	(     190,701 pps)
bulk[29]	6440	(      93,370 pps)
bulk[30]	11199	(     167,967 pps)
bulk[31]	6804	(     105,451 pps)
bulk[32]	7566	(     121,043 pps)
bulk[33]	7002	(     115,521 pps)
bulk[34]	11034	(     187,558 pps)
bulk[35]	7053	(     123,414 pps)
bulk[36]	13220	(     237,934 pps)
bulk[37]	7036	(     130,152 pps)
bulk[38]	7932	(     150,692 pps)
bulk[39]	7220	(     140,775 pps)
bulk[40]	8610	(     172,181 pps)
bulk[41]	7374	(     151,151 pps)
bulk[42]	17750	(     372,710 pps)
bulk[43]	7703	(     165,597 pps)
bulk[44]	15153	(     333,330 pps)
bulk[45]	7931	(     178,428 pps)
bulk[46]	8739	(     200,975 pps)
bulk[47]	7923	(     186,170 pps)
bulk[48]	10461	(     251,037 pps)
bulk[49]	7989	(     195,709 pps)
bulk[50]	13136	(     328,365 pps)
bulk[51]	7983	(     203,545 pps)
bulk[52]	8710	(     226,436 pps)
bulk[53]	7980	(     211,447 pps)
bulk[54]	9153	(     247,104 pps)
bulk[55]	7931	(     218,079 pps)
bulk[56]	18446	(     516,432 pps)
bulk[57]	7919	(     225,667 pps)
bulk[58]	16643	(     482,595 pps)
bulk[59]	7759	(     228,866 pps)
bulk[60]	8778	(     263,312 pps)
bulk[61]	7735	(     235,892 pps)
bulk[62]	9413	(     291,772 pps)
bulk[63]	7707	(     242,744 pps)
bulk[64]	2077468	(  66,471,811 pps)
NAPI-from-idle,	2529350	average bulk	59.00	(  74,768,110 pps) bulk0=600
NAPI-ksoftirqd,	24485	average bulk	58.00	(     713,623 pps) bulk0=14

System global SOFTIRQ stats:
 SOFTIRQ_NET_RX/sec	enter:1276773/s	exit:1276773/s	raise:1276770/s
 SOFTIRQ_NET_TX/sec	enter:0/s	exit:0/s	raise:0/s
 SOFTIRQ_TIMER/sec	enter:3856/s	exit:3856/s	raise:3795/s

I captures an ethtool stats snap-shot with an unusual but small counter called “outbound_pci_stalled_wr”. The PHY counters show what the generator is MAX outputting.

Show adapter(s) (mlx5p1) statistics (ONLY that changed!)
Ethtool(mlx5p1  ) stat:            4 (              4) <= outbound_pci_stalled_wr /sec
Ethtool(mlx5p1  ) stat:     12602448 (     12,602,448) <= rx0_cache_reuse /sec
Ethtool(mlx5p1  ) stat:     12602431 (     12,602,431) <= rx0_xdp_drop /sec
Ethtool(mlx5p1  ) stat:     13091898 (     13,091,898) <= rx1_cache_reuse /sec
Ethtool(mlx5p1  ) stat:     13091898 (     13,091,898) <= rx1_xdp_drop /sec
Ethtool(mlx5p1  ) stat:     12485274 (     12,485,274) <= rx2_cache_reuse /sec
Ethtool(mlx5p1  ) stat:     12485388 (     12,485,388) <= rx2_xdp_drop /sec
Ethtool(mlx5p1  ) stat:     12267209 (     12,267,209) <= rx3_cache_reuse /sec
Ethtool(mlx5p1  ) stat:     12267201 (     12,267,201) <= rx3_xdp_drop /sec
Ethtool(mlx5p1  ) stat:     12506807 (     12,506,807) <= rx4_cache_reuse /sec
Ethtool(mlx5p1  ) stat:     12507044 (     12,507,044) <= rx4_xdp_drop /sec
Ethtool(mlx5p1  ) stat:     12252285 (     12,252,285) <= rx5_cache_reuse /sec
Ethtool(mlx5p1  ) stat:     12252221 (     12,252,221) <= rx5_xdp_drop /sec
Ethtool(mlx5p1  ) stat:     88295187 (     88,295,187) <= rx_64_bytes_phy /sec
Ethtool(mlx5p1  ) stat:   5650856237 (  5,650,856,237) <= rx_bytes_phy /sec
Ethtool(mlx5p1  ) stat:     75214073 (     75,214,073) <= rx_cache_reuse /sec
Ethtool(mlx5p1  ) stat:     13066088 (     13,066,088) <= rx_discards_phy /sec
Ethtool(mlx5p1  ) stat:        10106 (         10,106) <= rx_out_of_buffer /sec
Ethtool(mlx5p1  ) stat:     88294650 (     88,294,650) <= rx_packets_phy /sec
Ethtool(mlx5p1  ) stat:   5650511306 (  5,650,511,306) <= rx_prio0_bytes /sec
Ethtool(mlx5p1  ) stat:     75224002 (     75,224,002) <= rx_prio0_packets /sec
Ethtool(mlx5p1  ) stat:   4513478678 (  4,513,478,678) <= rx_vport_unicast_bytes /sec
Ethtool(mlx5p1  ) stat:     75224475 (     75,224,475) <= rx_vport_unicast_packets /sec
Ethtool(mlx5p1  ) stat:     75214086 (     75,214,086) <= rx_xdp_drop /sec

REDIRECT t-rex many streams “clients”

(Kernel: 4.17.0-rc6-bpf-next-rm-ndo-flush+ #24 SMP PREEMPT)

Redirect: ingress mlx5p1 redirect egress i40e1: 30,493,921 pps

$ sudo ./xdp_redirect_map $(</sys/class/net/mlx5p1/ifindex) $(</sys/class/net/i40e1//ifindex)
input: 8 output: 4
map[0] (vports) = 4, map[1] (map) = 5, map[2] (count) = 0
ifindex 4:   40192251 pkt/s
ifindex 4:   30493614 pkt/s
ifindex 4:   30493921 pkt/s
ifindex 4:   30490341 pkt/s
ifindex 4:   30495391 pkt/s
ifindex 4:   30498160 pkt/s
XDP-event       CPU:to  pps          drop-pps     extra-info
XDP_REDIRECT    total   0            0            Error
cpumap-kthread  total   0            0            0          
devmap-xmit     0       4,927,675    0            16.00      bulk-average 
devmap-xmit     1       4,986,185    0            16.00      bulk-average 
devmap-xmit     2       5,044,664    0            16.00      bulk-average 
devmap-xmit     3       4,994,976    0            16.00      bulk-average 
devmap-xmit     4       4,983,994    0            16.00      bulk-average 
devmap-xmit     5       5,014,333    0            16.00      bulk-average 
devmap-xmit     total   29,951,825   0            16.00      bulk-average 

Forgot this was with rx_cqe_compress=on

ethtool --set-priv-flags mlx5p1 rx_cqe_compress on
Show adapter(s) (mlx5p1) statistics (ONLY that changed!)
Ethtool(mlx5p1  ) stat:      5073293 (      5,073,293) <= rx0_cache_empty /sec
Ethtool(mlx5p1  ) stat:       219224 (        219,224) <= rx0_cqe_compress_blks /sec
Ethtool(mlx5p1  ) stat:      1329666 (      1,329,666) <= rx0_cqe_compress_pkts /sec
Ethtool(mlx5p1  ) stat:      5051573 (      5,051,573) <= rx1_cache_empty /sec
Ethtool(mlx5p1  ) stat:       222439 (        222,439) <= rx1_cqe_compress_blks /sec
Ethtool(mlx5p1  ) stat:      1380038 (      1,380,038) <= rx1_cqe_compress_pkts /sec
Ethtool(mlx5p1  ) stat:      5067505 (      5,067,505) <= rx2_cache_empty /sec
Ethtool(mlx5p1  ) stat:       220519 (        220,519) <= rx2_cqe_compress_blks /sec
Ethtool(mlx5p1  ) stat:      1315711 (      1,315,711) <= rx2_cqe_compress_pkts /sec
Ethtool(mlx5p1  ) stat:      5043176 (      5,043,176) <= rx3_cache_empty /sec
Ethtool(mlx5p1  ) stat:       223895 (        223,895) <= rx3_cqe_compress_blks /sec
Ethtool(mlx5p1  ) stat:      1349297 (      1,349,297) <= rx3_cqe_compress_pkts /sec
Ethtool(mlx5p1  ) stat:      5032563 (      5,032,563) <= rx4_cache_empty /sec
Ethtool(mlx5p1  ) stat:       222138 (        222,138) <= rx4_cqe_compress_blks /sec
Ethtool(mlx5p1  ) stat:      1301549 (      1,301,549) <= rx4_cqe_compress_pkts /sec
Ethtool(mlx5p1  ) stat:      5093823 (      5,093,823) <= rx5_cache_empty /sec
Ethtool(mlx5p1  ) stat:       214919 (        214,919) <= rx5_cqe_compress_blks /sec
Ethtool(mlx5p1  ) stat:      1273362 (      1,273,362) <= rx5_cqe_compress_pkts /sec
Ethtool(mlx5p1  ) stat:     88243413 (     88,243,413) <= rx_64_bytes_phy /sec
Ethtool(mlx5p1  ) stat:   5647560737 (  5,647,560,737) <= rx_bytes_phy /sec
Ethtool(mlx5p1  ) stat:     30362207 (     30,362,207) <= rx_cache_empty /sec
Ethtool(mlx5p1  ) stat:      1323158 (      1,323,158) <= rx_cqe_compress_blks /sec
Ethtool(mlx5p1  ) stat:      7949743 (      7,949,743) <= rx_cqe_compress_pkts /sec
Ethtool(mlx5p1  ) stat:     14635008 (     14,635,008) <= rx_discards_phy /sec
Ethtool(mlx5p1  ) stat:     43246222 (     43,246,222) <= rx_out_of_buffer /sec
Ethtool(mlx5p1  ) stat:     88243138 (     88,243,138) <= rx_packets_phy /sec
Ethtool(mlx5p1  ) stat:   5647524379 (  5,647,524,379) <= rx_prio0_bytes /sec
Ethtool(mlx5p1  ) stat:     73608194 (     73,608,194) <= rx_prio0_packets /sec
Ethtool(mlx5p1  ) stat:   4416504322 (  4,416,504,322) <= rx_vport_unicast_bytes /sec
Ethtool(mlx5p1  ) stat:     73608402 (     73,608,402) <= rx_vport_unicast_packets /sec

Disabling rx_cqe_compress didn’t change performance, but stats changed:

Show adapter(s) (mlx5p1) statistics (ONLY that changed!)
Ethtool(mlx5p1  ) stat:      5133804 (      5,133,804) <= rx0_cache_empty /sec
Ethtool(mlx5p1  ) stat:      5119036 (      5,119,036) <= rx1_cache_empty /sec
Ethtool(mlx5p1  ) stat:      5110855 (      5,110,855) <= rx2_cache_empty /sec
Ethtool(mlx5p1  ) stat:      5168146 (      5,168,146) <= rx3_cache_empty /sec
Ethtool(mlx5p1  ) stat:      5111374 (      5,111,374) <= rx4_cache_empty /sec
Ethtool(mlx5p1  ) stat:      5137363 (      5,137,363) <= rx5_cache_empty /sec
Ethtool(mlx5p1  ) stat:     88164618 (     88,164,618) <= rx_64_bytes_phy /sec
Ethtool(mlx5p1  ) stat:   5642515995 (  5,642,515,995) <= rx_bytes_phy /sec
Ethtool(mlx5p1  ) stat:     30780489 (     30,780,489) <= rx_cache_empty /sec
Ethtool(mlx5p1  ) stat:     13169863 (     13,169,863) <= rx_discards_phy /sec
Ethtool(mlx5p1  ) stat:     44213842 (     44,213,842) <= rx_out_of_buffer /sec
Ethtool(mlx5p1  ) stat:     88164306 (     88,164,306) <= rx_packets_phy /sec
Ethtool(mlx5p1  ) stat:   5642429004 (  5,642,429,004) <= rx_prio0_bytes /sec
Ethtool(mlx5p1  ) stat:     74992720 (     74,992,720) <= rx_prio0_packets /sec
Ethtool(mlx5p1  ) stat:   4499667608 (  4,499,667,608) <= rx_vport_unicast_bytes /sec
Ethtool(mlx5p1  ) stat:     74994463 (     74,994,463) <= rx_vport_unicast_packets /sec

Data and graphs

Get real results for XDP_DROP 2-5 cores

  • State “DONE” from “TODO” [2018-06-09 Sat 20:59]

Try to fix the dip in performance at higher numbers of flows

Maybe running the traffic generator with 6 flows from the beginning, and just varying the flow rules on RX is better? That way we wouldn’t get the weird dips at higher # of cores.

Initial data from Jesper’s runs

RXQsXDP_DROPXDP_REDIRECTREDIR PREEMPT voluntary
12592827079091038649872
21496473315975491
31758605219222735
42016787521535588
52386392725464083
6752974612837675529828924

Testing with newer xdp_rxq_info tool that have an option for reading data.

sudo ./xdp_rxq_info --dev mlx5p1 --action XDP_DROP --no-sep --read

https://git.kernel.org/pub/scm/linux/kernel/git/hawk/net-next-xdp.git/commit/?h=xdp_paper01&id=8b314e06b52b8111

Generator command:

start -f /home/jbrouer/git/xdp-paper/benchmarks/udp_for_benchmarks02.py -t packet_len=64,stream_count=RXQs --port 0 -m 100mpps
RXQs DROPno_touch RX=1024no_touch RX=512read RX=1024read RX=512
124379188248042752309560623062789
249805895502323704655290346537526
373230349743509006447400568859775
486624323861983616825079186168278
586830822869730554990564587341248
686608045871011165632368487585333

Update (<2018-06-19 Tue>): Jesper found that the maximum scaled drop rate can be improved by enabling the mlx5 priv-flags rx_cqe_compress=on (and rx_striding_rq=off). This confirms the PCIe bottleneck, as rx_cqe_compress reduce the transactions on PCIe by compressing the RX descriptors.

One issue is that with rx_cqe_compress=on, the per core performance is slightly slower as it requires more CPU cycles to “decompress” the descriptors.

RXQs DROPno_touch RX=1024no_touch RX=512read RX=1024read RX=512
1239026412386347122653821
2454630764551470944345271
36580041267796536cache-misses
48456331388821307starts…
5991058729935797898758694
6108118978108607056105077478

Observations: Even with these extremely high numbers we are still seeing idle CPU cycles.

Enable rx_cqe_compress=on cmdline:

ethtool --set-priv-flags mlx5p1 rx_cqe_compress on
$ ethtool --show-priv-flags mlx5p1
Private flags for mlx5p1:
rx_cqe_moder   : on
tx_cqe_moder   : off
rx_cqe_compress: on
rx_striding_rq : off

Issues: With the rx_cqe_compress=on setting, I’m seeing errors in the kernel dmesg, and individial RX-queues are stopping to work. (Stopping and starting XDP application enables queues again).

(dmesg errors)
 mlx5_core 0000:03:00.0: mlx5_eq_int:540:(pid 0): CQ error on CQN 0x41d, syndrome 0x1
 mlx5_core 0000:03:00.0 mlx5p1: mlx5e_cq_error_event: cqn=0x00041d event=0x04
 mlx5_core 0000:03:00.0: mlx5_cmd_check:714:(pid 28036): MODIFY_CQ(0x403) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x2f1396)

The default NIC driver mlx5 RX-ring size of 1024 frames turned out to be a performance problem. When scaling to multiple RX-queues, then the NIC driver will have more outstandig memory, which the DDIO mechanism tries to place in L3-cache. With RX-ring size 1024 we observed cache-misses (to main memory) on some RX-queues. This can only mean that DDIO some how didn’t manage to

The problem was identified as RX-ring size adjustable via ethtool:

ethtool -G mlx5p1 rx 512 tx 512

Data for RXQ=4 unbalance, two CPUs process 21Mpps and these CPUs have idle cycles 2.75% :

Running XDP on dev:mlx5p1 (ifindex:8) action:XDP_DROP options:read
XDP stats       CPU     pps         issue-pps  
XDP-RX CPU      0       21620425    0          
XDP-RX CPU      1       21603145    0          
XDP-RX CPU      2       12676047    0          
XDP-RX CPU      3       12324059    0          
XDP-RX CPU      total   68223677   

RXQ stats       RXQ:CPU pps         issue-pps  
rx_queue_index    0:0   21620423    0          
rx_queue_index    0:sum 21620423   
rx_queue_index    1:1   21603146    0          
rx_queue_index    1:sum 21603146   
rx_queue_index    2:2   12676050    0          
rx_queue_index    2:sum 12676050   
rx_queue_index    3:3   12324050    0          
rx_queue_index    3:sum 12324050   

05:01:06 PM  CPU    %usr   %sys %iowait    %irq   %soft    %idle
05:01:08 PM  all    0.00   0.17    0.00    0.17   64.89    34.77
05:01:08 PM    0    0.00   0.55    0.00    0.00   96.70     2.75
05:01:08 PM    1    0.00   0.54    0.00    0.54   96.22     2.70
05:01:08 PM    2    0.00   0.00    0.00    0.00  100.00     0.00
05:01:08 PM    3    0.00   0.00    0.00    0.00  100.00     0.00
05:01:08 PM    4    0.00   0.50    0.00    0.00    0.50    99.00
05:01:08 PM    5    0.00   0.51    0.00    0.00    0.00    99.49

In RXQ=4 case, the CPUs are experiencing different levels of cache-misses.

$ sudo ~/perf stat -C3 -e cycles -e  instructions -e cache-references -e cache-misses -r 3 sleep 1

 Performance counter stats for 'CPU(s) 3' (3 runs):

  3,804,377,863  cycles                                         ( +-  0.01% )
  5,865,630,935  instructions    #    1.54  insn per cycle      ( +-  0.03% )
     43,829,681  cache-references                               ( +-  0.04% )
      9,360,529  cache-misses    #   21.357 % of all cache refs ( +-  0.03% )

$ sudo ~/perf stat -C0 -e cycles -e  instructions -e cache-references -e cache-misses -r 3 sleep 1

 Performance counter stats for 'CPU(s) 0' (3 runs):

  3,728,030,288  cycles                                         ( +-  0.01% )
 10,383,860,909  instructions     #    2.79  insn per cycle     ( +-  0.03% )
     85,613,852  cache-references                               ( +-  0.11% )
        358,027  cache-misses     #    0.418 % of all cache refs( +-  1.94% )

Data for RXQ=3

Running XDP on dev:mlx5p1 (ifindex:8) action:XDP_DROP options:read
XDP stats       CPU     pps         issue-pps  
XDP-RX CPU      0       22088751    0          
XDP-RX CPU      1       21279391    0          
XDP-RX CPU      2       21189691    0          
XDP-RX CPU      total   64557835   

RXQ stats       RXQ:CPU pps         issue-pps  
rx_queue_index    0:0   22088753    0          
rx_queue_index    0:sum 22088753   
rx_queue_index    1:1   21279391    0          
rx_queue_index    1:sum 21279391   
rx_queue_index    2:2   21189695    0          
rx_queue_index    2:sum 21189695   

05:03:04 PM  CPU    %usr   %sys %iowait    %irq   %soft   %idle
05:03:06 PM  all    0.00   0.08    0.00    0.00   50.04   49.87
05:03:06 PM    0    0.00   0.00    0.00    0.00  100.00    0.00
05:03:06 PM    1    0.00   0.00    0.00    0.00  100.00    0.00
05:03:06 PM    2    0.00   0.00    0.00    0.00  100.00    0.00
05:03:06 PM    3    0.50   0.50    0.00    0.50    0.00   98.51
05:03:06 PM    4    0.00   0.00    0.00    0.00    0.00  100.00
05:03:06 PM    5    0.00   0.00    0.00    0.00    0.00  100.00

XDP_DROP per number of queues

These are from Toke’s test run. Note that REDIRECT throughput drops by 5 Mpps (on a single core) when running xdp_monitor at the same time!

RXQsXDP_DROPXDP_REDIRECT
1259282708461375
25134974416241020
37657824118639798
48278245021417122
58229414325373567
68044430329970889
d = np.array(data)
plt.plot(d[:,0], d[:,1]/10**6, marker='o', label="XDP_DROP")
plt.plot(d[:,0], d[:,2]/10**6, marker='o', label="XDP_REDIRECT")
plt.xlabel("Number of cores")
plt.ylabel("Mpps")
plt.legend()
plt.show()

./obipy-resources/yxxa1q.svg

CPU usage for lower PPS

For this test, we set the packet generator to a fixed pps and report the CPU usage using mpstat. For a single core, up to the 26Mpps maximum performance. For DPDK this is always 100%, by design.

The interface is running the ‘xdp_rxq_info’ sample program with XDP_DROP as action.

3 samples of 30sec intervals:

mpstat -P ALL 30 3

We report average %idle and plot the inverse. For XDP confirm with the output of the xdp1 program that no packets are dropped. For iptables –raw look at ethtool stats.

MppsXDPLinux
0100100
0.2593.079.0
0.588.369.8
180.457.4
271.140.4
364.829.2
552.70
1037.60
1525.30
208.20
24.800
2500
3000
3500
4000
4300

DPDK tests:

Test invocation for RX test:

sudo ./testpmd -l 0-5 -- -i --nb-cores=1 --forward-mode=rxonly --auto-start --portmask=0x2

To get multiple cores working, we need to enable multiple rxqs and txqs, and also enable UDP RSS:

sudo ./testpmd -l 0-5 -n 4 -- -i --nb-cores=2 --forward-mode=rxonly --auto-start --portmask=0x2 --rxq 2 --txq 2 --rss-udp

Or instead of interactive mode, use stats reporting mode:

sudo ./testpmd -l 0-5 -n 4 -- --nb-cores=5 --forward-mode=rxonly --auto-start --portmask=0x2 --rxq 5 --txq 5 --rss-udp --stats-period=1
CoresRX PPSForward PPS
14352727923914513
27049931835337558
38269573056526568
48293753158197505
58057518762998140

From 3-5 this is actually bounded by the traffic generator, it would seem.

Trying faster generator setup:

Using two packet generators. T-rex sending 99 Mpps and kernel pktgen (pktgen_sample05_flow_per_thread.sh) sending approx 45 Mpps. Ethtool stats show on 100G switch show 144,371,992 pps TX towards DUT :

Show adapter(s) (sw1p5 sw1p9 sw1p13) statistics (ONLY that changed!)
Ethtool(sw1p5   ) stat:     46132421 (     46,132,421) <= a_frames_received_ok /sec
Ethtool(sw1p5   ) stat:     98833878 (     98,833,878) <= a_frames_transmitted_ok /sec
Ethtool(sw1p5   ) stat:       667329 (        667,329) <= a_mac_control_frames_received /sec
Ethtool(sw1p5   ) stat:   2952475678 (  2,952,475,678) <= a_octets_received_ok /sec
Ethtool(sw1p5   ) stat:   6325368523 (  6,325,368,523) <= a_octets_transmitted_ok /sec
Ethtool(sw1p5   ) stat:       667329 (        667,329) <= a_pause_mac_ctrl_frames_received /sec
Ethtool(sw1p5   ) stat:     45465150 (     45,465,150) <= rx_frames_prio_0 /sec
Ethtool(sw1p5   ) stat:   2952478555 (  2,952,478,555) <= rx_octets_prio_0 /sec
Ethtool(sw1p5   ) stat:         -122 (           -122) <= tc_transmit_queue_tc_0 /sec
Ethtool(sw1p5   ) stat:     98834011 (     98,834,011) <= tx_frames_prio_0 /sec
Ethtool(sw1p5   ) stat:   6325376817 (  6,325,376,817) <= tx_octets_prio_0 /sec
Ethtool(sw1p9   ) stat:    144371988 (    144,371,988) <= a_frames_transmitted_ok /sec
Ethtool(sw1p9   ) stat:   9239808140 (  9,239,808,140) <= a_octets_transmitted_ok /sec
Ethtool(sw1p9   ) stat:         -276 (           -276) <= tc_transmit_queue_tc_0 /sec
Ethtool(sw1p9   ) stat:    144371992 (    144,371,992) <= tx_frames_prio_0 /sec
Ethtool(sw1p9   ) stat:   9239807395 (  9,239,807,395) <= tx_octets_prio_0 /sec
Ethtool(sw1p13  ) stat:     98855049 (     98,855,049) <= a_frames_received_ok /sec
Ethtool(sw1p13  ) stat:     45474257 (     45,474,257) <= a_frames_transmitted_ok /sec
Ethtool(sw1p13  ) stat:   6326723716 (  6,326,723,716) <= a_octets_received_ok /sec
Ethtool(sw1p13  ) stat:   2910352561 (  2,910,352,561) <= a_octets_transmitted_ok /sec
Ethtool(sw1p13  ) stat:     98855091 (     98,855,091) <= rx_frames_prio_0 /sec
Ethtool(sw1p13  ) stat:   6326726369 (  6,326,726,369) <= rx_octets_prio_0 /sec
Ethtool(sw1p13  ) stat:          -61 (            -61) <= tc_transmit_queue_tc_0 /sec
Ethtool(sw1p13  ) stat:     45474290 (     45,474,290) <= tx_frames_prio_0 /sec
Ethtool(sw1p13  ) stat:   2910354419 (  2,910,354,419) <= tx_octets_prio_0 /sec

Note: DUT was running kernel 4.16.13-200.fc27.x86_64 during these DPDK tests to please the MLNX_OFED software.

Testpmd DPDK-“drop” command used changing variable CORES:

export CORES=3 ; sudo build/app/testpmd -l 0-5 -n 4 -- --nb-cores=$CORES --forward-mode=rxonly --auto-start --portmask=0x1 --rxq $CORES --txq $CORES --rss-udp --stats-period=2

Testpmd DPDK-“forward” command used changing variable CORES:

export CORES=1 ; sudo build/app/testpmd -l 0-5 -n 4 -- --nb-cores=$CORES --forward-mode=mac --auto-start --portmask=0x3 --rxq $CORES --txq $CORES --rss-udp --stats-period=2

Used the testpmd forward-mode “mac”, which I don’t know if it is correct(?).

CoresRX PPSDPDK Forward PPSDPDK-rxonly-drop run#2
1436170572189309443503636
2750340642654932074380044
3975352873819403297203856
411580695740641805113876503
510060026351173067115453781

run#2 was with our net-next-xdp kernel, and generators was sending with 138726434 pps (138,726,434 pps) measured at switch.

Ethtools stats (works due to Mellanox bifurcated driver) for dpdk_test2 with 5 cores:

Show adapter(s) (mlx5p1) statistics (ONLY that changed!)
Ethtool(mlx5p1  ) stat:    144075528 (    144,075,528) <= rx_64_bytes_phy /sec
Ethtool(mlx5p1  ) stat:   9220824984 (  9,220,824,984) <= rx_bytes_phy /sec
Ethtool(mlx5p1  ) stat:     42595068 (     42,595,068) <= rx_discards_phy /sec
Ethtool(mlx5p1  ) stat:    144075393 (    144,075,393) <= rx_packets_phy /sec
Ethtool(mlx5p1  ) stat:   9220821849 (  9,220,821,849) <= rx_prio0_bytes /sec
Ethtool(mlx5p1  ) stat:    101480704 (    101,480,704) <= rx_prio0_packets /sec
Ethtool(mlx5p1  ) stat:   6088740630 (  6,088,740,630) <= rx_vport_unicast_bytes /sec
Ethtool(mlx5p1  ) stat:    101479024 (    101,479,024) <= rx_vport_unicast_packets /sec
dpdk_test2 with 4 cores

Show adapter(s) (mlx5p1) statistics (ONLY that changed!)
Ethtool(mlx5p1  ) stat:    144211662 (    144,211,662) <= rx_64_bytes_phy /sec
Ethtool(mlx5p1  ) stat:   9229552001 (  9,229,552,001) <= rx_bytes_phy /sec
Ethtool(mlx5p1  ) stat:     28429333 (     28,429,333) <= rx_discards_phy /sec
Ethtool(mlx5p1  ) stat:    144211748 (    144,211,748) <= rx_packets_phy /sec
Ethtool(mlx5p1  ) stat:   9229514359 (  9,229,514,359) <= rx_prio0_bytes /sec
Ethtool(mlx5p1  ) stat:    115783855 (    115,783,855) <= rx_prio0_packets /sec
Ethtool(mlx5p1  ) stat:   6946966145 (  6,946,966,145) <= rx_vport_unicast_bytes /sec
Ethtool(mlx5p1  ) stat:    115782780 (    115,782,780) <= rx_vport_unicast_packets /sec
d = np.array(data)
plt.plot(d[:,0], d[:,1]/10**6, marker='o', label="rxonly")
#plt.plot(d[:,0], d[:,2]/10**6, marker='o', label="XDP_REDIRECT")
plt.xlabel("Number of cores")
plt.ylabel("Mpps")
plt.legend()
plt.show()

./obipy-resources/beY3Mq.svg

Baseline Linux tests

Linux “REDIRECT”

There’s no good way to do any kind of bypass, so we just run this with normal Linux forwarding. Throughput is measure by ethtool on the TX interface, e.g., for one core:

Show adapter(s) (ens3f1) statistics (ONLY that changed!) Ethtool(ens3f1 ) stat: 27183 ( 27,183) <= ch0_arm /sec Ethtool(ens3f1 ) stat: 27183 ( 27,183) <= ch0_events /sec Ethtool(ens3f1 ) stat: 27183 ( 27,183) <= ch0_poll /sec Ethtool(ens3f1 ) stat: 27183 ( 27,183) <= ch_arm /sec Ethtool(ens3f1 ) stat: 27182 ( 27,182) <= ch_events /sec Ethtool(ens3f1 ) stat: 27183 ( 27,183) <= ch_poll /sec Ethtool(ens3f1 ) stat: 104380023 ( 104,380,023) <= tx0_bytes /sec Ethtool(ens3f1 ) stat: 1739714 ( 1,739,714) <= tx0_cqes /sec Ethtool(ens3f1 ) stat: 1739667 ( 1,739,667) <= tx0_csum_none /sec Ethtool(ens3f1 ) stat: 1739667 ( 1,739,667) <= tx0_packets /sec Ethtool(ens3f1 ) stat: 104380319 ( 104,380,319) <= tx_bytes /sec Ethtool(ens3f1 ) stat: 111340650 ( 111,340,650) <= tx_bytes_phy /sec Ethtool(ens3f1 ) stat: 1739714 ( 1,739,714) <= tx_cqes /sec Ethtool(ens3f1 ) stat: 1739672 ( 1,739,672) <= tx_csum_none /sec Ethtool(ens3f1 ) stat: 1739672 ( 1,739,672) <= tx_packets /sec Ethtool(ens3f1 ) stat: 1739707 ( 1,739,707) <= tx_packets_phy /sec Ethtool(ens3f1 ) stat: 111338817 ( 111,338,817) <= tx_prio0_bytes /sec Ethtool(ens3f1 ) stat: 1739669 ( 1,739,669) <= tx_prio0_packets /sec Ethtool(ens3f1 ) stat: 104381267 ( 104,381,267) <= tx_vport_unicast_bytes /sec Ethtool(ens3f1 ) stat: 1739687 ( 1,739,687) <= tx_vport_unicast_packets /sec

CoresPPS
11739672
23370584
34976559
46488625
57848970
69285971

Linux iptables drop

What is the performance of iptables dropping SKBs in the raw table.

Unloaded all iptables modules, and then invoke the iptables -t raw command line to make it loaded only the needed iptables kernel modules.

Cmdline for ‘raw’ table:

iptables -t raw -I PREROUTING -p udp --dport 9:19 --j DROP

Cmdline for ‘filter’ table:

iptables -t filter -I INPUT  -p udp --dport 9:19 --j DROP

Cmdline for activating conntrack:

iptables -I INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

Default Fedora 27 firewalld setup with jump rules and REJECT target (reject-with icmp-host-prohibited) as the last rule getting hitted.

Corestable rawtable filterconntrackfirewalld
1505178733197181819409721284
210226514670780932740181403399
315104793994406547136102036345
4200758581323530761894122657446
5249959191644272374978383380752
6294438691951840187264984001466

Same test, but measuring the overhead of XDP_PASS (using xdp_rxq_info):

Corestable raw + XDP_PASStable raw
145451884837269
290692349661542
31359692014483976
41808142119250649
52252073324070686
62700264828817001

300 kpps overhead; 13.28 ns (1/4545188 - 1/4837269)

Benchmarks: XDP_TX

Modified sample/bpf xdp_rxq_info tool to ‘swapmac’ when doing XDP_TX benchmarks.

Copied table from bench05, to compare XDP_TX to XDP_REDIRECT.

RXQsXDP_REDIRECT (1024)RX-size=512RX-size=256RX-size=128
18649872866566486411977577448
215975491166290741674232514268397
319222735252309732573818922086964
421535588288074453418523929512278
525464083313062074146187436652247
629828924339704454606273743376903

Perf tool cmdline to measure cache misses on CPU-0:

perf stat -C0 -e L1-icache-load-misses -e cycles -e  instructions -e cache-misses -e cache-references -e branch-misses -e branches  -r 3 sleep 1

Below it is clear that XDP_TX is experiencing some cache-miss issue.

RXQsXDP_REDIRECT RX=512XDP_TX RX=1024% cache-missinsn per cycle
18665664162305380.0012.76
216629074312260240.4692.59
3252309733234446311.7521.74
4288074453260439830.5771.37
5313062073749494141.4971.26
6339704454323067841.9771.22

Testing changing RX + TX ring size in XDP_TX test:

ethtool -G mlx5p1 rx 512 tx 512
RXQsXDP_TX RX=512% cache-missinsn per cycle
1170057090.0012.86
2339263160.0042.81
3455231141.1592.60
4509789704.8682.22
55061996813.5441.76
65231556422.4981.46
ethtool -G mlx5p1 rx 256 tx 256
RXQsXDP_TX RX=256% cache-missinsn per cycle
1169006990.0012.84
2348502970.0022.87
3518840510.0022.90
4689539220.0102.90
5701647320.3162.72
6687440441.3402.48

Graphs used in the paper

Figure style

Evaluate this section to get the right figure styles:

%config InlineBackend.figure_format = 'svg'
from matplotlib import pyplot as plt
import numpy as np
import os
BASEDIR=os.getenv("XDP_PAPER_BASEDIR") # or set manually

mpl.rcParams.update({
    'axes.axisbelow': True,
    'axes.edgecolor': 'white',
    'axes.facecolor': '#E6E6E6',
    'axes.formatter.useoffset': False,
    'axes.grid': True,
    'axes.labelcolor': 'black',
    'axes.linewidth': 0.0,
    'axes.prop_cycle': mpl.cycler('color', ["#1b9e77", "#d95f02", "#7570b3",
                                            "#e7298a", "#66a61e", "#e6ab02",
                                            "#a6761d", "#666666"]),
    'figure.edgecolor': 'white',
    'figure.facecolor': 'white',
    'figure.figsize': (8.0, 5.0),
    'figure.frameon': False,
    'figure.subplot.bottom': 0.125,
    'font.size': 16,
    'grid.color': 'white',
    'grid.linestyle': '-',
    'grid.linewidth': 1,
    'image.cmap': 'Greys',
    'legend.frameon': False,
    'legend.numpoints': 1,
    'legend.scatterpoints': 1,
    'lines.color': 'black',
    'lines.linewidth': 1,
    'lines.solid_capstyle': 'round',
    'pdf.fonttype': 42,
    'savefig.dpi': 100,
    'text.color': 'black',
    'xtick.color': 'black',
    'xtick.direction': 'out',
    'xtick.major.size': 0.0,
    'xtick.minor.size': 0.0,
    'ytick.color': 'black',
    'ytick.direction': 'out',
    'ytick.major.size': 0.0,
    'ytick.minor.size': 0.0})

DROP test max PPS

dpdk = np.array(dpdk_data)
xdp = np.array([i[:3] for i in xdp_data])
linux = np.array(linux_data)
plt.plot(dpdk[:,0], dpdk[:,3]/10**6, marker='o', label="DPDK")
plt.plot(xdp[:,0], xdp[:,2]/10**6, marker='s', label="XDP")
plt.plot(linux[:,0], linux[:,1]/10**6, marker='^', label="Linux (raw)")
plt.plot(linux[:,0], linux[:,3]/10**6, marker='x', label="Linux (conntrack)")
plt.xlabel("Number of cores")
plt.ylabel("Mpps")
plt.legend()
plt.ylim(0,130)
plt.savefig(BASEDIR+"/figures/drop-test.pdf", bbox_inches='tight')
plt.show()

./obipy-resources/I3OD3d.svg

DROP test CPU usage

data = np.array(data)
ones = np.ones(len(data[:,1]))*100
plt.plot(data[:,0], ones, marker='o', label="DPDK")
plt.plot(data[:11,0], ones[:11]-data[:11,1], marker='s', label="XDP")
plt.plot(data[:7,0], ones[:7]-data[:7,2], marker='^', label="Linux")
plt.xlabel("Offered load (Mpps)")
plt.ylabel("CPU usage (%)")
plt.legend()
plt.ylim(0,110)
plt.xlim(0,27)
plt.savefig(BASEDIR+"/figures/drop-cpu.pdf", bbox_inches='tight')
plt.show()

./obipy-resources/eQB3JJ.svg

REDIRECT test

dpdk = np.array(dpdk_data)
xdp = np.array(xdp_data)
tx = np.array(tx_data)
plt.plot(dpdk[:,0], dpdk[:,2]/10**6, marker='o', label="DPDK (different NIC)")
plt.plot(tx[:,0], tx[:,1]/10**6, marker='s', label="XDP (same NIC)")
plt.plot(xdp[:,0], xdp[:,3]/10**6, marker='^', label="XDP (different NIC)")
plt.xlabel("Number of cores")
plt.ylabel("Mpps")
plt.legend()
plt.ylim(0,80)
plt.savefig(BASEDIR+"/figures/redirect-test.pdf", bbox_inches='tight')
plt.show()

./obipy-resources/NMQMJP.svg

XDP_TX

tx = np.array(tx_data)
redir = np.array(redir_data)
plt.plot(tx[:,0], tx[:,1]/10**6, marker='o', label="TX")
plt.plot(redir[:,0], redir[:,3]/10**6, marker='s', label="REDIRECT")
plt.xlabel("Number of cores")
plt.ylabel("Mpps")
plt.legend()
plt.ylim(0,80)
plt.savefig(BASEDIR+"/figures/tx-test.pdf", bbox_inches='tight')
plt.show()

./obipy-resources/BQycom.svg