Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot stat file /proc/31856/fd/27: No such file or directory #2

Open
nomandera opened this issue Jun 10, 2015 · 8 comments
Open

Cannot stat file /proc/31856/fd/27: No such file or directory #2

nomandera opened this issue Jun 10, 2015 · 8 comments

Comments

@nomandera
Copy link

Running consld8 /mnt/user/backup

I see a number of these errors:

Cannot stat file /proc/31856/fd/27: No such file or directory

these feel like soft errors to me as the process works correctly but they are worrisome.

Do we have any ideas or is current thinking it is cache_dirs related

@trinapicot
Copy link
Owner

I did some investigating and isolated the error to the fuser command that is used to check if a file is in use before attempting to move it. It happens more frequently with cache_dirs running, but it also happens without cache_dirs. Most of the process IDs listed in the error messages were gone by the time I could try ps. The one process that I could get information on was a dynamix script that checks cpu freq. I have no idea how that is related to fuser, but it showed up several times over the course of many tests.

The only way the error might cause an issue is if a file is in use and fuser happens to fail with this (or any other) error. In this case an attempt will be made to copy a file that is in use and if that copy succeeds, the source file is deleted. I don't know the possible consequences of this happening.

In short: these errors are generally harmless. But don't use files that you are trying to move between disks because there is a small chance that something unexpected might happen.

And if anyone knows of a more reliable way to check if a file is in use, I'm open to suggestions.

@nomandera
Copy link
Author

That is interesting. Since my example above was my backup folder I can be 100% confident no intentional process was touching the files i.e. my backup scripts are manually called, unless i happened to have the cifs share open somewhere idling

I am not sure where to go with this as I still feel (like you) these errors are "generally harmless" but thats a long way from "totally harmless".

Could we perhaps skip moving and deleting any files that see this error? Worst case that means the tool would need ran a couple of times. Ugly but I cant think of a better answer.

Also I have zero clues why dynamix would be touching my backup dir hundreds of times

@trinapicot
Copy link
Owner

Could we perhaps skip moving and deleting any files that see this error?

The return code of fuser is used to skip a file. The problem is the return code from fuser doesn't differentiate between "successfully completed but didn't find a user of the file" and "unsuccessful".

Also I have zero clues why dynamix would be touching my backup dir

There is no indication that dynamix is touching your backup dir. Only that fuser is stumbling over a process that is related to dynamix.

My current hypothesis is that every time fuser is called on a file, it goes through every process and checks if that process has the file open. Every once in while, at the same time fuser is checking a process, that process closes a file. The /proc//fd/<#> disappears and fuser complains that it cannot stat. The important question is: Does fuser continue checking the rest of the processes, or does it stop? I think it might continue. Which would mean this error message is totally harmless.

I'll try to figure out a way to test this hypothesis.

@trinapicot
Copy link
Owner

I have convinced myself the "Cannot stat /proc//fd/<#>" error is totally harmless. Consider the following script:

#!/bin/bash
pcount=0
fcount=0
for i in {1..10000}; do 
  if fuser -s /mnt/cache/testdata/test.txt; then
    (( pcount+=1 ))
  else
    (( fcount+=1 ))
  fi
done
echo "Used: $pcount"
echo "Unused: $fcount"

It will loop 10,000 times and check if a file is in use. When I run it on my unRAID system I get somewhere between 30 and 90 cannot stat errors. When I have the test.txt file in use, fuser always successfully detects it. So when fuser reports the cannot stat error, it is still completing and exiting normally.

I am considering a change to diskmv to redirect the error output of fuser to /dev/null. It would make this totally harmless error totally invisible as well.

@nomandera
Copy link
Author

What you are saying makes sense however one thing I was not clear on is if this is an unRAID specific issue.

To test I replicated your script above on unRAID v6RC4 and Debian 8. As you would expect the issue happened on unRAID but not at all in Debian.

What is also interesting is the vast speed differences. In this example unRAID is a much higher specified server using SSD and a fast CPU compared to the Debian box however debian is more than twice as fast:

unRAID

time ./test.sh
...Pages of Cannot stat file errors here
Used: 0
Unused: 10000
real 2m58.437s
user 0m49.264s
sys 2m3.590s

Debian

time ./test.sh
Used: 0
Unused: 10000
real 1m16.231s
user 0m20.184s
sys 0m27.200s

I would suggest rather than suppress the error we raise this as a bug upstream (although it is more than just nice to know it is a safe warning in interim)

@jbrodriguez
Copy link

I had seen these errors too when testing unBALANCE, but I thought they were related to the interaction between fuser and a docker environment.

They seemed harmless to me as well, but it's good to know that trinapicot validated the hypothesis.

@kylerw
Copy link

kylerw commented Dec 10, 2015

I believe Apple .files also cause this error (and others). I eliminated the errors entirely by clearing any Apple generated .files.

@JustinAiken
Copy link

Thanks for the verification @trinapicot, I'm not worried now that I saw your recreation! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants