Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xt_ndpi module: Memory leak from never freed flow allocations after linux kernel v5.17-rc1 #147

Closed
Vigilans opened this issue Nov 26, 2022 · 11 comments
Labels

Comments

@Vigilans
Copy link

Vigilans commented Nov 26, 2022

Describe the bug

In linux kernel, after commit torvalds/linux@6ae7989:

netfilter: conntrack: avoid useless indirection during conntrack destruction

nf_ct_put() results in a usesless indirection:

nf_ct_put -> nf_conntrack_put -> nf_conntrack_destroy -> rcu readlock +
indirect call of ct_hooks->destroy().

There are two _put helpers:
nf_ct_put and nf_conntrack_put. The latter is what should be used in
code that MUST NOT cause a linker dependency on the conntrack module
(e.g. calls from core network stack).

Everyone else should call nf_ct_put() instead.

A followup patch will convert a few nf_conntrack_put() calls to
nf_ct_put(), in particular from modules that already have a conntrack
dependency such as act_ct or even nf_conntrack itself.

Signed-off-by: Florian Westphal fw@strlen.de
Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org

The nf_conntrack_put and nf_ct_put api no longer have the same behavior once ct_hook->destroy is changed.

This caused xt_ndpi kernel module to show memory leak and cannot be unloaded once the leak happened, because conntrack code using nf_ct_put api will not call flow_free code of xt_ndpi.

Obtained behavior

  1. Abnormally large memory consumption and module used counts.
    In a 4G Pi device, after xt_ndpi been running for some days, 86% (3.2G/3.8G) of the memory has been occupied without any userspace process occupying more than 100MB memory.

    Investigating kernel module usage by lsmod | grep xt_ndpi, it shows a used count of 840620:

    # lsmod | grep xt_ndpi
    xt_ndpi              1359872  840620
    

    After reading the source code, I learned that such large usage count exactly derived from xt_ndpi flows that are not freed:

    # cat /sys/module/xt_ndpi/parameters/flow_created
    1481895
    # cat /sys/module/xt_ndpi/parameters/flow_deleted
    633595
    

    1481895-633595=848300, very close to the module usage of 840620. (In fact, during my reboot debug test, they are strictly equal, e.g. created-deleted = 214-103 = 111 == 111 = lsmod usage)

  2. conntrack -F and conntrack timeout does not reduce module usage at all.
    This is because both flush and timeout will call nf_ct_delete, which in turn will call nf_ct_put, so the hooked ndpi_destroy_conntrack will not be called at all.

Expected behavior

Figure out some way to clean the unused flow. I have no idea how to detour the nf_ct_put to call hooked ndpi function again. If hook no longer works, maybe some GC or some iptables targets that manually free the flow would help?

Environment:

  • OS name: Linux.
  • Kernel version: PC - v6.0.8, Pi - v5.15.72
  • Architecture: PC - amd64, Pi - aarch64

How to reproduce the reported bug

  1. Install kernel newer or equal to v5.17-rc1, or newer or equal to v5.15.36 (v5.15.36 also introduced this commit, see https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=linux-5.15.y&ofs=5782, v5.10.y seems not introduced).
  2. Insert iptables rules to catch some flow to ndpi.
  3. conntrack -F, see whether the module used count reduced.
@Vigilans Vigilans added the bug label Nov 26, 2022
@vel21ripn
Copy link
Owner

This is very bad news, since xt_ndpi won't work without a kernel patch :(
Earlier, to combat conntrack, I patched the kernel to add custom extensions that had their own destructor called from nf_conntrack_free().
Looks like I'll have to update this patch to the kernel.

vel21ripn added a commit that referenced this issue Nov 27, 2022
Fix for issue #147

Starting from the kernel 5.15.32 the use of nf_ct_hook->destroy
was stopped in the conntrack subsystem.
@vel21ripn
Copy link
Owner

It seems that the problem is not so serious.
I managed to abandon the dirty trick with the substitution of the pointer nf_ct_hook.destroy.
See 22e7331
Short testing on the core 5.15.80 did not reveal memory leakage.

@Vigilans
Copy link
Author

Vigilans commented Nov 28, 2022

@vel21ripn After torvalds/linux@1bc91a5 (v5.18-rc1) and torvalds/linux@1015c3d, the extension register api has been removed (according to the commit message, the destroy hook is replaced with nat_ops?), so the xt_ndpi module failed to compile on my linux PC (v6.0.8)

v5.15.y seems not introducing these commits by far.

@vel21ripn
Copy link
Owner

This is really bad news.
There is a simple option without a patch to the kernel - this is the use of "livepatch". We only need to make sure the kernels in the major distributions are built with this option.
Another option is to try sending a patch to the kernel that calls ct->destroy. But it is a very long difficult road.

@vel21ripn
Copy link
Owner

I was able to replace the ndpi_nf_ct_destroy() function with livepatch. For now, I consider this option as a temporary solution for newer kernels.
I will try to make a separate branch for new kernels.

@Vigilans
Copy link
Author

Vigilans commented Dec 8, 2022

I investigated a bit into this problem. Seems that the extension register apis are removed with following considerations:

  • There are many conntrack extensions written by kernel, whose info were initialized by conntrack register api.
  • The info provided to register api are very few: id, length, and possibly destroy hook. The destroy hook is only used by NAT extension.
  • The author of above commits uses commit torvalds/linux@5f31edc to remove the requirement of len field during register.
  • Since only NAT extension uses destroy field, it gets hard coded into nf_conntrack_free.
  • Therefore only id field is left during register, which does not provide any meaningful information. The author then removes register extension api completely.

So our problem is that with only NAT free logic hard coded into nf_conntrack_free, we do not have any way to inject ndpi free hook during conntrack connection freeing.

torvalds/linux@1bc91a5#diff-b8035b508222d2e4d442da3edc1241440ba2ca78aabebbcf52e70c53381c96a8L1600-R1610:

- nf_ct_ext_destroy(ct);
+ if (ct->status & IPS_SRC_NAT_DONE) {
+ 	const struct nf_nat_hook *nat_hook;
+
+ 	rcu_read_lock();
+ 	nat_hook = rcu_dereference(nf_nat_hook);
+ 	if (nat_hook)
+ 		nat_hook->remove_nat_bysrc(ct);
+ 	rcu_read_unlock();
+ }
+
+ kfree(ct->ext);

Therefore, we may focus more on making nf_conntrack_free call registered hook function, instead of trying to call ct->destroy again. To propose a patch that may gets accepted more easily, we may convert the legacy nf_ct_ext_destroy code into a more specific mechanism for registering conntrack free hook. I might be interested in creating such a patch.

@vel21ripn
Copy link
Owner

There are two options.

  1. We need to return an indirect call to "nf_ct_put(struct nf_conn *ct)" instead of a direct call to "nf_ct_destroy()".
  2. We need to add a callback to "struct nf_conn" to notify about connection deletion and call it before "kfree(ct->ext)" in "nf_conntrack_free()".

The second option has less overhead and can do without the rcu_pointer.

@vel21ripn
Copy link
Owner

I've started working on support for 6.x kernels. The live-patching mechanism is used.
Testing will be on the kernel version 6.1.10.

@vel21ripn
Copy link
Owner

If possible, then look at the commit 3e6d0f1

@k0ste
Copy link

k0ste commented Feb 17, 2023

If possible, then look at the commit 3e6d0f1

For me build with Arch kernel and enabled livepatch feature now successful

@Vigilans
Copy link
Author

If possible, then look at the commit 3e6d0f1

Tested on Arch kernel 6.1.12 and Debian kernel 6.0.8, it successfully works. Thanks for the work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants