-
-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CNFSFile: retry if nfs_open returns EAGAIN #22714
Conversation
I've added a fixup to use a lambda instead of the do/while loop. I think this looks cleaner as we only retry once anyways. @graysky2 can you try again as it will now retry on all errors.
|
Perhaps make the lambda just a (private) class method? Maybe even cleaner? |
@lrusak - I have been running a build with the updated version of this PR with the same NFSv4 exports for several hours now and I have not experienced any errors. Good job 👍 Please consider backporting this to Nexus as NFS4 currently is not usable. |
Still working as expected after multiple days of uptime (cherry picked against latest from nexus branch). |
updated
|
That means the nfsv4 context has expired and needs to be recreated Signed-off-by: Lukas Rusak <lorusak@gmail.com>
squashed. The diff looks bad so I suggest using the split view to see the change. https://github.com/xbmc/xbmc/pull/22714/files?diff=split&w=0 |
Change looks good. Do you consider backport for Nexus? While the "newish" C++ is quite hard to read, I find it never the less a bit funny when looking at the mix with globals ;-) |
I've made some formatting changes to meet the current code style. The diffs are available in the following links: For more information please see our current code style guidelines. |
Now crash opening any Blu-Ray BDMV folder structure using NFSv3: 100% reproducible in Android (Shield) and Windows:
I think in general the crash will occur every time you try to open a file that doesn't exist.... |
Some things I think are wrong in this PR: return inside lambda code only returns from lambda and continues:
Before was return outside main method, now only returns from lambda and continues. Then failure is evaluated same as fail is due ie: this log may be inaccurate:
ie: is not necessary retry if error code is I hadn't said it but reverting the commit of this PR the crash is gone 🙂 |
Possible fix: bool CNFSFile::Open(const CURL& url)
{
Close();
// we can't open files like nfs://file.f or nfs://server/file.f
// if a file matches the if below return false, it can't exist on a nfs share.
if (!IsValidFile(url.GetFileName()))
{
CLog::Log(LOGINFO, "NFS: Bad URL : '{}'", url.GetFileName());
return false;
}
constexpr int NFS4ERR_EXPIRED = -11;
constexpr int NFSERR_CONNECTION = 1;
std::unique_lock<CCriticalSection> lock(gNfsConnection);
auto NfsOpen = [this](const CURL& url) -> int
{
std::string filename;
if (!gNfsConnection.Connect(url, filename))
return NFSERR_CONNECTION;
m_pNfsContext = gNfsConnection.GetNfsContext();
m_exportPath = gNfsConnection.GetContextMapId();
return nfs_open(m_pNfsContext, filename.c_str(), O_RDONLY, &m_pFileHandle);
};
int ret = NfsOpen(url);
if (ret == NFSERR_CONNECTION)
{
return false;
}
else if (ret == NFS4ERR_EXPIRED)
{
CLog::Log(LOGWARNING,
"CNFSFile::Open: Unable to open file - trying again with a new context: error: '{}'",
nfs_get_error(m_pNfsContext));
gNfsConnection.Deinit();
ret = NfsOpen(url);
}
if (ret != 0)
{
CLog::Log(LOGERROR, "CNFSFile::Open: Unable to open file: '{}' error: '{}'", url.GetFileName(),
nfs_get_error(m_pNfsContext));
m_pNfsContext = nullptr;
m_exportPath.clear();
return false;
}
CLog::Log(LOGDEBUG, "CNFSFile::Open - opened {}", url.GetFileName());
m_url=url;
struct __stat64 tmpBuffer;
if( Stat(&tmpBuffer) )
{
m_url.Reset();
Close();
return false;
}
m_fileSize = tmpBuffer.st_size;//cache the size of this file
// We've successfully opened the file!
return true;
} |
That means the nfsv4 context has expired and needs to be recreated
fixes #22566
This moves the
nfs_open
method into a do/while loop that will iterate while the return value is less than zero and we still have retry attempts remaining. I've limited it to 1 retry.NFSv4 has a lease time which needs which means if the don't have a connection in the timeout period we will receive
NFS4ERR_EXPIRED
andEAGAIN
in which case we need to recreate the nfs context.We could probably adjust the keep alive parameters but nfsd allows setting a different lease time so the keep alive would have to be configurable as well. It looks like the default lease time for nfsd is 90 seconds.
ref: https://man7.org/linux/man-pages/man8/nfsd.8.html
ref: https://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=nfs.conf;h=323f072b2fc0a92539d3c9f10620fee94b806f5a;hb=HEAD#l72
You will now see something like this:
I did also test this with NFSv3 and it seemed fine.