Another app is currently holding the xtables lock. Perhaps you want to use the -w option? #3828

Congelli501 · 2020-07-05T17:34:01Z

What you expected to happen?

Weave starts normally, on the first try

What happened?

On a busy cluster (pod waiting to be created on a node) weave takes a lot of tries to start (fail at start, goes into CrashLoopBackOff status).
The "weave" container of the weave pod of the node shows the following log:

Another app is currently holding the xtables lock. Perhaps you want to use the -w option?

After killing the weave pod a lot of time (to retry after a CrashLoopBackOff), weave end up starting up, and then run normally.
On one of my node, it took me 20 minutes to make weave start.

How to reproduce it?

Setup a busy kubernetes node (100 pods) and reboot it. The node will start with a lot of assigned pod waiting to start and won't be able to start weave.

Anything else we need to know?

This bug occurred after a cluster upgrade (reinstall).
I didn't have this issue on my previous cluster (Ubuntu 18.04 amd64, Kubernetes v1.13.11, Weave 2.5.2)

Cloud provider: bare metal

Versions:

$ weave version
2.6.5
$ docker version
Client:
 Version:           19.03.8
 API version:       1.40
 Go version:        go1.13.8
 Git commit:        afacb8b7f0
 Built:             Tue Jun 23 22:26:12 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.8
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.8
  Git commit:       afacb8b7f0
  Built:            Thu Jun 18 08:26:54 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.3.3-0ubuntu2
  GitCommit:        
 runc:
  Version:          spec: 1.0.1-dev
  GitCommit:        
 docker-init:
  Version:          0.18.0
  GitCommit:        

$ uname -a
Linux master0 5.4.0-40-generic #44-Ubuntu SMP Tue Jun 23 00:01:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.5", GitCommit:"e6503f8d8f769ace2f338794c914a96fc335df0f", GitTreeState:"clean", BuildDate:"2020-06-26T03:47:41Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.5", GitCommit:"e6503f8d8f769ace2f338794c914a96fc335df0f", GitTreeState:"clean", BuildDate:"2020-06-26T03:39:24Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

$ lsb_release -a
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04 LTS
Release:	20.04
Codename:	focal

Logs:

$ kubectl logs -n kube-system weave-net-gsklb weave
Another app is currently holding the xtables lock. Perhaps you want to use the -w option?

Network:

Seems ok, same network as with our previous cluster.

Troubleshooting?

It seems the problems come from iptables being locked (used) by other kubernetes processes.
I saw a lot of portmap commands beeing run (100+ commands sometimes) by kubelet.

Instead of retrying on an iptables lock error or waiting its turn (with the --wait option), it crashes, making it hard to make it start (the iptables command needs to work on the first try).
I can't explain why my previous cluster was working fine though.

The text was updated successfully, but these errors were encountered:

bboreham · 2020-07-06T10:37:23Z

The whole log is that one line?

Generally iptables commands in Weave Net are run with the --wait option, so you appear to have found an outlier.
Perhaps the auto-detection of nft - could you try setting IPTABLES_BACKEND and see if it improves?

Congelli501 · 2020-07-06T17:38:37Z

Yes, that was the full log.

Indeed, the auto detection seems to chose legacy:

# docker exec -it 54e01edb7c47 sh
/home/weave # ls -lah /sbin/iptables
lrwxrwxrwx    1 root     root          20 May 13 13:15 /sbin/iptables -> xtables-legacy-multi

The detection commands returns the following results:

/home/weave # (iptables-legacy-save || true) 2>/dev/null | grep '^-' | wc -l
7033
/home/weave # (iptables-nft-save || true) 2>/dev/null | grep '^-' | wc -l
0

Running the iptables-nft-save command returns the following results:

/home/weave # iptables-nft-save
# Warning: iptables-legacy tables present, use iptables-legacy-save to see them

I use iptables-restore directly on startup to setup my firewall, could it be related ?

bboreham · 2020-07-07T14:57:59Z

Could you try setting IPTABLES_BACKEND as an environment variable in the Weave Net Daemonset and see if it improves?

Congelli501 · 2020-07-22T00:00:58Z

I just tried setting IPTABLES_BACKEND to nft on both container (weave + weave-npc) of the weave daemonset.

The weave-npc pod loop crash with an error similar to #3816

root@master0:~# kubectl logs -n kube-system weave-net-jw6m5 weave
ipset v7.2: Set cannot be destroyed: it is in use by a kernel component

root@master0:~# kubectl logs -n kube-system weave-net-jw6m5 weave-npc
INFO: 2020/07/21 23:51:38.072617 Starting Weaveworks NPC 2.6.5; node name "master0"
INFO: 2020/07/21 23:51:38.072733 Serving /metrics on :6781
Tue Jul 21 23:51:38 2020 <5> ulogd.c:408 registering plugin `NFLOG'
Tue Jul 21 23:51:38 2020 <5> ulogd.c:408 registering plugin `BASE'
Tue Jul 21 23:51:38 2020 <5> ulogd.c:408 registering plugin `PCAP'
Tue Jul 21 23:51:38 2020 <5> ulogd.c:981 building new pluginstance stack: 'log1:NFLOG,base1:BASE,pcap1:PCAP'
WARNING: scheduler configuration failed: Function not implemented
DEBU: 2020/07/21 23:51:38.114634 Got list of ipsets: [weave-;rGqyMIl1HN^cfDki~Z$3]6!N weave-s_+ChJId4Uy_$}G;WdH|~TK)I weave-k?Z;25^M}|1s7P3|H9i;*;MhG weave-5p+tpDg;~4g#*~UIK3LE5B#E7 weave-k*OnRy)yTQMY|1/JzrwAobr/X weave-!)3z_o[q?8ChN0_o[@u^aN~|j weave-P.B|!ZhkAr5q=XZ?3}tMBA+0 weave-E1ney4o[ojNrLk.6rOHi;7MPE weave-iuZcey(5DeXbzgRFs8Szo]+@p weave-]B*(W?)t*z5O17G044[gUo#$l weave-sui%__gZ}{kX~oZgI_Ttqp=Dp weave-mF}1zpEo4W6iYroE^=:V3{S6W weave-Rzff}h:=]JaaJl/G;(XJpGjZ[ weave-41s)5vQ^o/xWGz6a20N:~?#|E weave-4vtqMI+kx/2]jD%_c0S%thO%V weave-kube-test]
DEBU: 2020/07/21 23:51:38.114649 Flushing ipset 'weave-;rGqyMIl1HN^cfDki~Z$3]6!N'
DEBU: 2020/07/21 23:51:38.115150 Flushing ipset 'weave-s_+ChJId4Uy_$}G;WdH|~TK)I'
DEBU: 2020/07/21 23:51:38.115442 Flushing ipset 'weave-k?Z;25^M}|1s7P3|H9i;*;MhG'
DEBU: 2020/07/21 23:51:38.115740 Flushing ipset 'weave-5p+tpDg;~4g#*~UIK3LE5B#E7'
DEBU: 2020/07/21 23:51:38.116180 Flushing ipset 'weave-k*OnRy)yTQMY|1/JzrwAobr/X'
DEBU: 2020/07/21 23:51:38.116462 Flushing ipset 'weave-!)3z_o[q?8ChN0_o[@u^aN~|j'
DEBU: 2020/07/21 23:51:38.116738 Flushing ipset 'weave-P.B|!ZhkAr5q=XZ?3}tMBA+0'
DEBU: 2020/07/21 23:51:38.117119 Flushing ipset 'weave-E1ney4o[ojNrLk.6rOHi;7MPE'
DEBU: 2020/07/21 23:51:38.117408 Flushing ipset 'weave-iuZcey(5DeXbzgRFs8Szo]+@p'
DEBU: 2020/07/21 23:51:38.117684 Flushing ipset 'weave-]B*(W?)t*z5O17G044[gUo#$l'
DEBU: 2020/07/21 23:51:38.117949 Flushing ipset 'weave-sui%__gZ}{kX~oZgI_Ttqp=Dp'
DEBU: 2020/07/21 23:51:38.118230 Flushing ipset 'weave-mF}1zpEo4W6iYroE^=:V3{S6W'
DEBU: 2020/07/21 23:51:38.118501 Flushing ipset 'weave-Rzff}h:=]JaaJl/G;(XJpGjZ['
DEBU: 2020/07/21 23:51:38.118899 Flushing ipset 'weave-41s)5vQ^o/xWGz6a20N:~?#|E'
DEBU: 2020/07/21 23:51:38.119180 Flushing ipset 'weave-4vtqMI+kx/2]jD%_c0S%thO%V'
DEBU: 2020/07/21 23:51:38.119537 Flushing ipset 'weave-kube-test'
DEBU: 2020/07/21 23:51:38.120067 Destroying ipset 'weave-;rGqyMIl1HN^cfDki~Z$3]6!N'
ERRO: 2020/07/21 23:51:38.135906 Failed to destroy ipset 'weave-;rGqyMIl1HN^cfDki~Z$3]6!N'
FATA: 2020/07/21 23:51:38.135932 ipset [destroy weave-;rGqyMIl1HN^cfDki~Z$3]6!N] failed: ipset v7.2: Set cannot be destroyed: it is in use by a kernel component
: exit status 1

When I set back the IPTABLES_BACKEND env to legacy, weave starts
I tried to reboot the nodes and the error persisted on reboot.

nesc58 · 2020-07-28T10:01:29Z

I have found some code lines in the launch.sh where no -w added to the iptables commands.
PR: #3835

Some iptables commands runs outside of the weave application within the launch.sh. This could be the reason of the empty log file

bboreham · 2020-07-29T09:20:40Z

Thanks @nesc58 - you are agreeing with what I said at #3828 (comment)

bboreham · 2020-07-29T09:22:42Z

Sorry my mistake, we were talking about different parts of the script.
Do you think that iptables-save command should also get a -w ?

bboreham · 2020-07-29T09:23:48Z

@Congelli501 I meant to try setting it to the correct one for your machine. Sounds like "legacy" was correct.

nesc58 · 2020-07-29T13:52:09Z

Do you think that iptables-save command should also get a -w ?

iptables-save not have an -w option. Maybe the save option does not need a wait parameter because it is readonly and does not modifies the rules. iptables-restore also have a -w command line argument but I have not found any calls.

I don't expect the parameter will be added. The man pages of iptables-nft and also the nft executable no longer needs a -w.

Because the xtables-nft tools use the nf_tables kernel API, rule additions and deletions are always atomic. Unlike iptables-legacy, iptables-nft -A .. will NOT need to re‐trieve the current ruleset from the kernel, change it, and re-load the altered ruleset. Instead, iptables-nft will tell the kernel to add one rule. For this reason, the ipta‐bles-legacy --wait option is a no-op in iptables-nft.

A lot of the /test/*.sh files have a lot of iptables calls. I would suggest that they also needs the -w parameters or?

bboreham · 2020-07-29T16:09:43Z

iptables-save not have an -w option.

Ah, thanks.

A lot of the /test/*.sh files have a lot of iptables calls. I would suggest that they also needs the -w parameters or?

It wouldn't be wrong but the test scripts are expected to be run one at a time so should not encounter conflicts.
I've never seen a CI failure for that reason.

bboreham · 2020-07-29T16:26:11Z

Closing as fixed by #3835

nesc58 mentioned this issue Jul 28, 2020

wait added to iptables commands #3835

Merged

bboreham closed this as completed Jul 29, 2020

bboreham added this to the 2.7 milestone Jul 29, 2020

This was referenced Aug 20, 2020

Rancher 2.4.5 - CNI Weave - node provisionning failed rancher/rancher#28473

Closed

ipset v7.2: Set cannot be destroyed: it is in use by a kernel component - proposed solution #3847

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Another app is currently holding the xtables lock. Perhaps you want to use the -w option? #3828

Another app is currently holding the xtables lock. Perhaps you want to use the -w option? #3828

Congelli501 commented Jul 5, 2020

bboreham commented Jul 6, 2020

Congelli501 commented Jul 6, 2020

bboreham commented Jul 7, 2020

Congelli501 commented Jul 22, 2020

nesc58 commented Jul 28, 2020 •

edited

Loading

bboreham commented Jul 29, 2020

bboreham commented Jul 29, 2020

bboreham commented Jul 29, 2020

nesc58 commented Jul 29, 2020

bboreham commented Jul 29, 2020

bboreham commented Jul 29, 2020

Another app is currently holding the xtables lock. Perhaps you want to use the -w option? #3828

Another app is currently holding the xtables lock. Perhaps you want to use the -w option? #3828

Comments

Congelli501 commented Jul 5, 2020

What you expected to happen?

What happened?

How to reproduce it?

Anything else we need to know?

Versions:

Logs:

Network:

Troubleshooting?

bboreham commented Jul 6, 2020

Congelli501 commented Jul 6, 2020

bboreham commented Jul 7, 2020

Congelli501 commented Jul 22, 2020

nesc58 commented Jul 28, 2020 • edited Loading

bboreham commented Jul 29, 2020

bboreham commented Jul 29, 2020

bboreham commented Jul 29, 2020

nesc58 commented Jul 29, 2020

bboreham commented Jul 29, 2020

bboreham commented Jul 29, 2020

nesc58 commented Jul 28, 2020 •

edited

Loading