Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel panic when syncing conntrack entries with conntrackd #176

Closed
didgaudin opened this issue Nov 20, 2023 · 6 comments
Closed

Kernel panic when syncing conntrack entries with conntrackd #176

didgaudin opened this issue Nov 20, 2023 · 6 comments
Labels

Comments

@didgaudin
Copy link

I have a kernel panic when i try to sync conntrack entries to backup server with conntrackd when using xt_ndpi
kernel : 5.10.165

config NDPI
iptables -t mangle -A PREROUTING -m ndpi --proto all
iptables -t mangle -A PREROUTING -j NDPI --ndpi-id-p -set-mark
iptables -t mangle -A PREROUTING -j CONNMARK --save-mark --nfmask 0xffffffff --ctmask 0xffffffff

iptables -t mangle -A POSTROUTING -m ndpi --proto all
iptables -t mangle -A POSTROUTING -j NDPI --ndpi-id-p --set-mark
iptables -t mangle -A POSTROUTING -j CONNMARK --save-mark --nfmask 0xffffffff --ctmask 0xffffffff

Kernel panic:

[ 192.651772] ndpi_mt+0x931/0x1dd0 [xt_ndpi]
[ 192.656086] ? _raw_read_unlock+0x13/0x40
[ 192.660264] ? ndpi_mt+0x1f0/0x1dd0 [xt_ndpi]
[ 192.664785] ? put_cpu_partial+0xc5/0x120
[ 192.668841] ? _raw_spin_unlock+0xd/0x30
[ 192.672957] ? get_partial_node+0x123/0x3d0
[ 192.677273] ? _raw_spin_unlock_irqrestore+0xf/0x30
[ 192.682312] nft_match_large_eval+0x2c/0x60 [nft_compat]
[ 192.687769] nft_do_chain+0x17a/0x540 [nf_tables]
[ 192.692599] ? __local_bh_enable_ip+0x2e/0x80
[ 192.697136] ? ipt_do_table+0x3a1/0x710
[ 192.701143] ? nf_ct_get_tuple+0x1f9/0x230
[ 192.705449] ? sock_alloc_send_pskb+0x206/0x240
[ 192.710196] ? nf_conntrack_udp_packet+0x1e9/0x260
[ 192.715200] nf_route_table_hook4+0x96/0x130 [nf_tables]
[ 192.720763] nf_hook_slow+0x39/0xb0
[ 192.724361] __ip_local_out+0xea/0x170
[ 192.728232] ? ip_forward_options+0x190/0x190
[ 192.732704] ip_send_skb+0x19/0x70
[ 192.736198] udp_send_skb+0x14e/0x360
[ 192.740065] udp_sendmsg+0x9c5/0xc70
[ 192.743808] ? ip_frag_init+0x50/0x50
[ 192.747590] sock_sendmsg+0x58/0x80
[ 192.751233] __sys_sendto+0x129/0x190
[ 192.755025] __x64_sys_sendto+0x20/0x30
[ 192.758993] do_syscall_64+0x31/0x50
[ 192.762736] entry_SYSCALL_64_after_hwframe+0x61/0xc6
[ 192.768014] RIP: 0033:0x7f4a73ed9896
[ 192.771695] Code: 45 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 c3 90 55 48 83 ec 30 44 89 4c 24 2c 4c 89
[ 192.791127] RSP: 002b:00007ffe4e86c4f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 192.799048] RAX: ffffffffffffffda RBX: 000000000084d1d0 RCX: 00007f4a73ed9896
[ 192.806447] RDX: 0000000000000010 RSI: 000000000084cbe0 RDI: 0000000000000005
[ 192.813937] RBP: 0000000000000000 R08: 000000000084d1d4 R09: 0000000000000010
[ 192.821356] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4a73ddb6c8
[ 192.828706] R13: 0000000000852150 R14: 00000000ffffffff R15: 0000000000000000
[ 192.836118] Modules linked in: nfnetlink_queue iptable_mangle bpfilter pnvcomp(O) act_mirred act_connmark cls_u32 sch_ingress cls_fw sch_fq cls_bpf sch_fq_codel sch_htb ifb nft_dup_ipv4 nf_dup_ipv4 macvlan xt_ndpi(O) nf_conntrack_netl]
[ 192.911213] CR2: ffff88a30c91b444
[ 192.914671] ---[ end trace 36e2c44841d679ea ]---
[ 192.919367] RIP: 0010:_raw_spin_lock_bh+0x15/0x30
[ 192.924151] Code: 75 06 5d c3 cc cc cc cc 48 8d 7d 00 5d e9 63 ff 41 ff 0f 1f 00 55 48 8d 2f bf 01 02 00 00 e8 c2 f5 3f ff 31 c0 ba 01 00 00 00 0f b1 55 00 75 06 5d c3 cc cc cc cc 48 8d 7d 00 89 c6 5d e9 a2
[ 192.943174] RSP: 0018:ffffaa37819af708 EFLAGS: 00010246
[ 192.948486] RAX: 0000000000000000 RBX: ffffa188c9c4c000 RCX: 0000000000000000
[ 192.955713] RDX: 0000000000000001 RSI: 0000000000000018 RDI: 0000000000000201
[ 192.962950] RBP: ffff88a30c91b444 R08: 000000000000a55a R09: 0000000000000000
[ 192.970187] R10: ffffa188c9c4c0b4 R11: ffff88a30c91b444 R12: 0000000000000011
[ 192.977423] R13: ffffa188ca1e3600 R14: 0000000000000001 R15: 0000000000000000
[ 192.984653] FS: 00007f4a73ddb740(0000) GS:ffffa18c1ec80000(0000) knlGS:0000000000000000
[ 192.992833] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 192.998648] CR2: ffff88a30c91b444 CR3: 0000000182826003 CR4: 00000000003706e0
[ 193.005868] Kernel panic - not syncing: Fatal exception in interrupt
[ 193.012533] Kernel Offset: 0x3a000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

@didgaudin didgaudin added the bug label Nov 20, 2023
@vel21ripn
Copy link
Owner

Try without iptables emulation.
Remove everything related to nftables and install iptables-legacy

@didgaudin
Copy link
Author

I have try with only iptables-legacy , kernel panic too.

[ 1415.927047]
[ 1415.929132] ndpi_mt+0x931/0x1dd0 [xt_ndpi]
[ 1415.933455] ? igb_xmit_frame_ring+0x5d3/0xc30 [igb]
[ 1415.938644] ? dev_hard_start_xmit+0xd5/0x230
[ 1415.943153] ipt_do_table+0x2a9/0x710
[ 1415.946966] nf_hook_slow+0x39/0xb0
[ 1415.950586] nf_hook_slow_list+0x67/0xe0
[ 1415.954651] ip_sublist_rcv+0x1fc/0x220
[ 1415.958688] ? ip_rcv_finish_core.constprop.0+0x4d0/0x4d0
[ 1415.964280] ip_list_rcv+0xf7/0x120
[ 1415.967981] __netif_receive_skb_list_core+0x253/0x2a0
[ 1415.973415] netif_receive_skb_list_internal+0x1cb/0x310
[ 1415.978927] napi_complete_done+0x6a/0x180
[ 1415.983261] igb_poll+0x824/0x13c0 [igb]
[ 1415.987307] ? load_balance+0x16a/0xca0
[ 1415.991269] net_rx_action+0x152/0x3d0
[ 1415.995203] __do_softirq+0xe5/0x2f9
[ 1415.998963] ? handle_fasteoi_mask_irq+0x1d0/0x1d0
[ 1416.003929] asm_call_irq_on_stack+0xf/0x20
[ 1416.008269]
[ 1416.010422] do_softirq_own_stack+0x5b/0x80
[ 1416.014833] irq_exit_rcu+0xc5/0x100
[ 1416.018628] common_interrupt+0xb8/0x1e0
[ 1416.022685] asm_common_interrupt+0x1e/0x40
[ 1416.026928] RIP: 0010:cpuidle_enter_state+0xd6/0x390

@vel21ripn
Copy link
Owner

To find the place where the error occurs, I need the ndpi.o object file compiled with debugging information and the commit that was used for compilation.
(The command "objdump -l -d mail.o" should show the line numbers in the source files)

@didgaudin
Copy link
Author

didgaudin commented Nov 22, 2023

main.o.gz
I use the last commit :
commit 9a6412b (HEAD -> flow_info-4, origin/flow_info-4, origin/HEAD)

obj.txt

@vel21ripn
Copy link
Owner

I have a guess about the reason for this error. "conntrackd" restores the value of "label" in which we store a pointer to our internal structures.

If the guess is correct, then by fixing conntrackd (so that it does not restore the "label") you can get rid of kernel crashes.

The fix will require significant code changes.
I don't have time for such changes yet.

@Antaryo
Copy link

Antaryo commented Jul 25, 2024

I've got kernel panic too when trying to sync sessions with conntrackd.
However, my call trace is different from the previous one:

[   95.261131] BUG: unable to handle page fault for address: ffff8d2f74c72024
[   95.265929] #PF: supervisor write access in kernel mode
[   95.269821] #PF: error_code(0x0002) - not-present page
[   95.273949] PGD 0 P4D 0
[   95.276154] Oops: 0002 [#1] PREEMPT SMP PTI
[   95.285489] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020
[   95.288763] Workqueue: events_power_efficient gc_worker [nf_conntrack]
[   95.290729] RIP: 0010:_raw_spin_lock_bh+0x1a/0x40
[   95.292122] Code: 90 5b c3 cc cc cc cc 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 89 fb bf 01 02 00 00 e8 cd 6e 61 ff 31 c0 ba 01 00 00 00 <3e> 0f b1 13 75 06 5b c3 cc cc cc cc 89 c6 48 89 df e8 30 01 00 00
[   95.297483] RSP: 0018:ffffb0ae027e7dd0 EFLAGS: 00010246
[   95.299027] RAX: 0000000000000000 RBX: ffff8d2f74c72024 RCX: 0000000000000000
[   95.301076] RDX: 0000000000000001 RSI: ffff9648328e1490 RDI: 0000000000000000
[   95.303141] RBP: 0000000000000000 R08: ffff964805ee3100 R09: 0000000000000000
[   95.305223] R10: 49b91b89bc11fbba R11: 8f9482879bee75b5 R12: ffff8d2f74c72000
[   95.307297] R13: ffff8d2f74c72024 R14: 0000000000000067 R15: 0000000000000000
[   95.309441] FS:  0000000000000000(0000) GS:ffff96483ce00000(0000) knlGS:0000000000000000
[   95.311310] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   95.312287] CR2: ffff8d2f74c72024 CR3: 00000000041ea001 CR4: 00000000001706f0
[   95.313465] Call Trace:
[   95.313881]  <TASK>
[   95.314221]  ? __die_body.cold+0x1a/0x1f
[   95.314896]  ? page_fault_oops+0xd2/0x2b0
[   95.315512]  ? exc_page_fault+0xca/0x170
[   95.316097]  ? asm_exc_page_fault+0x22/0x30
[   95.316698]  ? _raw_spin_lock_bh+0x1a/0x40
[   95.317415]  ct_ndpi_free_flow+0x4a/0x170 [xt_ndpi]
[   95.319320]  ndpi_nf_ct_destroy+0x2e/0x120 [xt_ndpi]
[   95.321009]  gc_worker+0x25e/0x570 [nf_conntrack]
[   95.322655]  process_one_work+0x1c4/0x380
[   95.324013]  worker_thread+0x4d/0x380
[   95.325472]  ? _raw_spin_lock_irqsave+0x23/0x50
[   95.326963]  ? rescuer_thread+0x3a0/0x3a0
[   95.328182]  kthread+0xe6/0x110
[   95.329294]  ? kthread_complete_and_exit+0x20/0x20
[   95.330597]  ret_from_fork+0x1f/0x30
[   95.331688]  </TASK>

I have loaded iptables rule with nDPI module and call conntrack -c with icmp traffic between Host 1 and Host 2:

Scheme of connections:

           ┌─────┐
         ┌─┤Hub 1├─┐
┌──────┐ │ └──┬──┘ │ ┌──────┐
│Host 1├─┤    │    ├─┤Host 2│
└──────┘ │ ┌──┴──┐ │ └──────┘
         └─┤Hub 2├─┘
           └─────┘

It is VRRP cluster stand with active/passive hubs. When one of the hubs fails, the other starts routing traffic.

Could you please tell me what is needed to fix this problem? I'll try to follow your suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants