Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minikube NFS mount to WinNFSd stops working after about 15-30 minutes #68

Open
pietervogelaar opened this issue Sep 10, 2018 · 16 comments
Open

Comments

@pietervogelaar
Copy link

@pietervogelaar pietervogelaar commented Sep 10, 2018

I'm running Windows 10 and WinNFSd 2.4.0.

I start WinNFSd in the foreground with:
WinNFSd.exe -pathFile "C:\Users\myname\etc\nfs-pathfile.txt"

The content of nfs-pathfile.txt:
C:\Users\myname\minikube\sources

From my Minikube VM I create an NFS mount with:
sudo mkdir -p /host-sources && sudo mount -t nfs -o nfsvers=3,tcp 192.168.99.1:/C/Users/myname/minikube/sources /host-sources

At this point, it works great! I can list, read and write files. The WinNFSd program logs all kind of file access very fast.

However after about 15-30 minutes, if I do a ls -sal /host-sources the client just keeps waiting / hanging. When this happens, no information is added to the log. Also with retries of the command, it still hangs and no information is added to the WinNFSd log. So I have no clue what is going on.

I disabled all sleep / energy settings, also the disks are never turned off. But I still get this problem. Do you have any idea what goes wrong? Or what options there are left to try?

Do others have WinNFSd running fine for at least a couple of days?

@pietervogelaar
Copy link
Author

@pietervogelaar pietervogelaar commented Sep 11, 2018

It keeps working if I change the protocol from tcp to udp. Symantec security is running, maybe that's the problem.

@pietervogelaar
Copy link
Author

@pietervogelaar pietervogelaar commented Sep 11, 2018

I completely removed Symantec, and made an exclusion of the C:\Users\myname\minikube path in Windows defender, but I still experience the same problem if I use the tcp protocol.

@jim5359
Copy link

@jim5359 jim5359 commented Oct 11, 2018

I have exactly the same problem using the same setup (minikube on windows). I've found after 15-30 minutes of idle when I try to access the NFS share it hangs. Sometimes it comes back in a few seconds, sometimes it takes close to a minute, and a couple times it completely froze and I had to kill minikube.

I tried switching to udp and so far I haven't experienced the problem, but surprised there aren't more people complaining about this issue.

@cbj4074
Copy link

@cbj4074 cbj4074 commented Oct 18, 2018

You two are definitely not the first to complain about problems with TCP. For example:

#22
#24

At this point, it might be helpful to use a tool like Process Explorer to examine the hung thread and see if it reveals any clues, such as a specific procedure call, ballooning resource consumption over time, etc.

Also, @pietervogelaar , I would edit the Issue title to include Minikube, as this problem seems to be specific to its use.

@pietervogelaar
Copy link
Author

@pietervogelaar pietervogelaar commented Oct 18, 2018

@cbj4074 It's not specific to Minikube, could be any VM on Windows.

@cbj4074
Copy link

@cbj4074 cbj4074 commented Oct 19, 2018

@pietervogelaar Have you confirmed that, or are you speculating?

I'm not saying you're wrong, I just don't want to waste anybody's time trying to reproduce the issue when, last we knew, all of the "TCP hanging" issues were effectively ironed-out.

The fact that nobody has mentioned issues with TCP in quite some time, excepting two Minikube users, makes the question relevant, IMO.

@jim5359
Copy link

@jim5359 jim5359 commented Oct 19, 2018

Just to add a bit more color: for me, MiniKube is running inside a VirtualBox VM.

Linux minikube 4.15.0 #1 SMP Thu Sep 27 17:28:06 UTC 2018 x86_64 GNU/Linux

NAME=Buildroot
VERSION=2018.05
ID=buildroot
VERSION_ID=2018.05
PRETTY_NAME="Buildroot 2018.05"
NAME=Buildroot
VERSION=2018.05
ID=buildroot
VERSION_ID=2018.05
PRETTY_NAME="Buildroot 2018.05"

@cbj4074's point is valid. I have not tried to reproduce this issue on a non-MiniKube VM, so it's possible it is directly related to MiniKube.

@danballance
Copy link

@danballance danballance commented Nov 25, 2018

I believe I am experiencing this issue. I am using Vagrant 2.2.1 on Windows 10 to manage a VirtualBoxVM. It works for a while but then I switch to another long running task, for example importing my databases, and when I try to access the NFS share from the Linux guest it hangs/crashes.

@nathansimmonds
Copy link

@nathansimmonds nathansimmonds commented Dec 2, 2018

@pietervogelaar Have you confirmed that, or are you speculating?

I'm not saying you're wrong, I just don't want to waste anybody's time trying to reproduce the issue when, last we knew, all of the "TCP hanging" issues were effectively ironed-out.

The fact that nobody has mentioned issues with TCP in quite some time, excepting two Minikube users, makes the question relevant, IMO.

I am experiencing the same behaviour running vagrant + DrupalVM with winnfsd. Works fine for a while, but leave it too long and the linux shell just stops responding when attempting to access a NFS-mounted share. However with winnfsd open in debug mode I notice I can still issue commands to the NFS shell, such as quit, etc.

Setting UDP via the mount_options worked for this too, except that the default vagrant mount is automatic, so I updated the Vagrantfile and set the default value for nfs_udp to true within the vagrant_sync_folders section.

@nevdelap
Copy link

@nevdelap nevdelap commented Dec 2, 2018

I had a problem that was similar to this about a month ago but because I didn't have anything I thought could be used to help reproduce or debug the problem I didn't report anything and went back to using cifs. I will try it again and see if either it works or I can add something useful to help.

Update: after using it for over two days (run up identically as previously when I had the hanging after 15 minutes - same config and batch file I'd previously set up) it is working perfectly. The first time it had worked great for days before stopping working after a reboot. I've rebooted probably several times since then. Now it is working flawlessly. Windows 10 in VMWare sharing a directory in a RAM disk.
Update: I rebooted the Windows VM and Linux. Now my mount hangs after a while. I don't know how long exactly, but 15-30 minutes is probably about right. I have to close all applications and terminals that are using it in order to unmount. After remounting it happens again, and has happened again several times. I am going to try UDP.

@pietervogelaar
Copy link
Author

@pietervogelaar pietervogelaar commented Dec 4, 2018

@cbj4074 Yesterday I tested a Centos 7.5 VM with TCP, but I didn't experience any hanging issues.

Test:

vagrant init bento/centos-7.5
vagrant up
vagrant ssh
sudo mkdir -p /host-sources && sudo mount -t nfs -o nfsvers=3,tcp 192.168.99.1:/C/Users/myname/minikube/sources /host-sources

This works just fine. Only the first ls command after 15 minutes takes about 5 seconds and after that it's fast again.

Do you have any clue what can possibly go wrong with the Minikube VM?

@pietervogelaar pietervogelaar changed the title NFS mount to WinNFSd stops working after about 15-30 minutes Minikube NFS mount to WinNFSd stops working after about 15-30 minutes Dec 4, 2018
@pietervogelaar
Copy link
Author

@pietervogelaar pietervogelaar commented Dec 4, 2018

Today I tested TCP again with Minikube v0.30.0 and the hanging issue still exists.

Output from debug commands after the NFS mount was hanging:

dmesg | grep -v audit | grep -v hpet1

[  101.284216] mount.nfs (3664) used greatest stack depth: 13272 bytes left
[  105.868100] docker0: port 1(veth7300085) entered blocking state
[  105.868102] docker0: port 1(veth7300085) entered disabled state
[  105.868141] device veth7300085 entered promiscuous mode
[  105.869838] IPv6: ADDRCONF(NETDEV_UP): veth7300085: link is not ready
[  106.169499] eth0: renamed from veth034d132
[  106.172640] IPv6: ADDRCONF(NETDEV_CHANGE): veth7300085: link becomes ready
[  106.172659] docker0: port 1(veth7300085) entered blocking state
[  106.172660] docker0: port 1(veth7300085) entered forwarding state
[  106.172681] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready
[  108.459337] docker0: port 2(vethbe32148) entered blocking state
[  108.459338] docker0: port 2(vethbe32148) entered disabled state
[  108.459477] device vethbe32148 entered promiscuous mode
[  108.463073] IPv6: ADDRCONF(NETDEV_UP): vethbe32148: link is not ready
[  108.570814] docker0: port 3(vethf3a6e1b) entered blocking state
[  108.570815] docker0: port 3(vethf3a6e1b) entered disabled state
[  108.570886] device vethf3a6e1b entered promiscuous mode
[  108.578147] IPv6: ADDRCONF(NETDEV_UP): vethf3a6e1b: link is not ready
[  108.578150] docker0: port 3(vethf3a6e1b) entered blocking state
[  108.578151] docker0: port 3(vethf3a6e1b) entered forwarding state
[  108.816739] docker0: port 4(veth903835a) entered blocking state
[  108.816740] docker0: port 4(veth903835a) entered disabled state
[  108.816890] device veth903835a entered promiscuous mode
[  108.819198] IPv6: ADDRCONF(NETDEV_UP): veth903835a: link is not ready
[  108.819200] docker0: port 4(veth903835a) entered blocking state
[  108.819200] docker0: port 4(veth903835a) entered forwarding state
[  108.903450] docker0: port 5(veth4e8f225) entered blocking state
[  108.903451] docker0: port 5(veth4e8f225) entered disabled state
[  108.903587] device veth4e8f225 entered promiscuous mode
[  108.907759] IPv6: ADDRCONF(NETDEV_UP): veth4e8f225: link is not ready
[  108.907762] docker0: port 5(veth4e8f225) entered blocking state
[  108.907763] docker0: port 5(veth4e8f225) entered forwarding state
[  109.125150] eth0: renamed from vethc9a6035
[  109.129401] docker0: port 3(vethf3a6e1b) entered disabled state
[  109.129469] docker0: port 4(veth903835a) entered disabled state
[  109.129528] docker0: port 5(veth4e8f225) entered disabled state
[  109.129546] IPv6: ADDRCONF(NETDEV_CHANGE): vethbe32148: link becomes ready
[  109.129558] docker0: port 2(vethbe32148) entered blocking state
[  109.129559] docker0: port 2(vethbe32148) entered forwarding state
[  109.337932] docker0: port 6(vethfa88936) entered blocking state
[  109.337933] docker0: port 6(vethfa88936) entered disabled state
[  109.338794] device vethfa88936 entered promiscuous mode
[  109.353629] IPv6: ADDRCONF(NETDEV_UP): vethfa88936: link is not ready
[  109.353632] docker0: port 6(vethfa88936) entered blocking state
[  109.353633] docker0: port 6(vethfa88936) entered forwarding state
[  109.461531] docker0: port 6(vethfa88936) entered disabled state
[  109.529598] eth0: renamed from vethc2a6d1b
[  109.545038] IPv6: ADDRCONF(NETDEV_CHANGE): veth903835a: link becomes ready
[  109.545056] docker0: port 4(veth903835a) entered blocking state
[  109.545057] docker0: port 4(veth903835a) entered forwarding state
[  109.556089] eth0: renamed from vethd1a7a12
[  109.559633] eth0: renamed from veth7ec6305
[  109.564080] IPv6: ADDRCONF(NETDEV_CHANGE): veth4e8f225: link becomes ready
[  109.564126] docker0: port 5(veth4e8f225) entered blocking state
[  109.564127] docker0: port 5(veth4e8f225) entered forwarding state
[  109.564272] IPv6: ADDRCONF(NETDEV_CHANGE): vethf3a6e1b: link becomes ready
[  109.564281] docker0: port 3(vethf3a6e1b) entered blocking state
[  109.564282] docker0: port 3(vethf3a6e1b) entered forwarding state
[  109.770078] eth0: renamed from veth5df17ed
[  109.776160] IPv6: ADDRCONF(NETDEV_CHANGE): vethfa88936: link becomes ready
[  109.776177] docker0: port 6(vethfa88936) entered blocking state
[  109.776178] docker0: port 6(vethfa88936) entered forwarding state
[  109.883023] docker0: port 7(veth9e4be4f) entered blocking state
[  109.883024] docker0: port 7(veth9e4be4f) entered disabled state
[  109.883158] device veth9e4be4f entered promiscuous mode
[  109.888425] IPv6: ADDRCONF(NETDEV_UP): veth9e4be4f: link is not ready
[  109.888428] docker0: port 7(veth9e4be4f) entered blocking state
[  109.888429] docker0: port 7(veth9e4be4f) entered forwarding state
[  110.185464] eth0: renamed from veth52e76cf
[  110.190521] IPv6: ADDRCONF(NETDEV_CHANGE): veth9e4be4f: link becomes ready
[  110.426279] docker0: port 8(vethc39b273) entered blocking state
[  110.426280] docker0: port 8(vethc39b273) entered disabled state
[  110.426804] device vethc39b273 entered promiscuous mode
[  110.429924] IPv6: ADDRCONF(NETDEV_UP): vethc39b273: link is not ready
[  110.429926] docker0: port 8(vethc39b273) entered blocking state
[  110.429927] docker0: port 8(vethc39b273) entered forwarding state
[  110.462359] docker0: port 8(vethc39b273) entered disabled state
[  111.095048] eth0: renamed from veth3b81d4c
[  111.099524] IPv6: ADDRCONF(NETDEV_CHANGE): vethc39b273: link becomes ready
[  111.099541] docker0: port 8(vethc39b273) entered blocking state
[  111.099542] docker0: port 8(vethc39b273) entered forwarding state
[  111.884810] IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP)
[  111.884821] IPVS: Connection hash table configured (size=4096, memory=64Kbytes)
[  111.884822] IPVS: Each connection entry needs 288 bytes at least
[  112.509309] IPVS: ipvs loaded.
[  120.392211] hpet_rtc_timer_reinit: 69 callbacks suppressed
[  130.399590] hpet_rtc_timer_reinit: 9 callbacks suppressed
[  145.895264] NFSD: Unable to end grace period: -110
[  175.415045] hpet_rtc_timer_reinit: 4 callbacks suppressed
[  418.788655] kworker/dying (1132) used greatest stack depth: 13064 bytes left
[ 2286.534720] nfs: server 192.168.99.1 not responding, still trying
[ 2737.007988] device eth0 entered promiscuous mode
[ 2762.392952] device eth0 left promiscuous mode
[ 2907.930860] device eth1 entered promiscuous mode
[ 2938.958203] device eth1 left promiscuous mode
[ 3038.019977] device eth1 entered promiscuous mode

Copy from error lines of opened TCP dump in Wireshark:

No.	Time		Source		Destination	Proto	Length	Info
12	61.438201	192.168.99.101	192.168.99.1	TCP	54	[TCP Dup ACK 1#1] 992 → 2049 [ACK] Seq=1 Ack=1 Win=306 Len=0
13	61.438492	192.168.99.1	192.168.99.101	TCP	54	[TCP Dup ACK 2#1] [TCP ACKed unseen segment] 2049 → 992 [ACK] Seq=1 Ack=2 Win=2048 Len=0
@cbj4074
Copy link

@cbj4074 cbj4074 commented Dec 4, 2018

@nathansimmonds Thanks for the excellent tip here:

Setting UDP via the mount_options worked for this too, except that the default vagrant mount is automatic, so I updated the Vagrantfile and set the default value for nfs_udp to true within the vagrant_sync_folders section.

At this point, it seems clear that this a TCP problem.

That said, it seems that MiniKube may also play a role in how the problem manifests, for whatever reason.

@jim5359 @danballance As a work around, please try changing the NFS mount options to use the UDP protocol and see if the hanging behavior subsides. Please report back to us so we can further refine the root-cause.

@danballance Which OS are you running in the guest?

@pietervogelaar I'm not sure in what way MiniKube that contributes to this problem, unfortunately. Let's keep collecting info so we can better profile the root-cause.

@pietervogelaar
Copy link
Author

@pietervogelaar pietervogelaar commented Dec 4, 2018

Some additional information, the Minikube VM NFS mount (nfsvers=3,tcp) works great on a Linux (Fedora) or Mac OS X (Mojave) host.

The problem can't be fully on the Minikube VM side, but on the other hand WinNFSd with TCP works great on Windows with a CentOS 7.5 VM. So the problem can't also be fully on the WinNFSd side.

So it must be an interaction problem of some kind.

@jim5359
Copy link

@jim5359 jim5359 commented Dec 4, 2018

@cbj4074 I've been using UDP to mount my NFS share inside MiniKube for the past 6 weeks or so and it hasn't hung once. So the problem is definitely with TCP. I do occasionally have a problem with WinNFSd slamming the CPU, but it's only occasional and after killing WinNFSd the problem goes away.

@JochemKlingeler
Copy link

@JochemKlingeler JochemKlingeler commented Feb 26, 2019

I have been experiencing this problem with as well. This is my setup:

  • Host: Windows 10
  • Vagrant 2.2.3
  • using vagrant-winnfsd 1.4.0 plugin
  • VirtualBox 6.0.4
  • VMbox: CentOs/7 1901.01

Switching the mount options to use UDP like @cbj4074 suggested has solved the problem for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.