Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNFSFile: retry if nfs_open returns EAGAIN #22714

Merged
merged 1 commit into from
Feb 19, 2023
Merged

Conversation

lrusak
Copy link
Contributor

@lrusak lrusak commented Feb 6, 2023

That means the nfsv4 context has expired and needs to be recreated

fixes #22566

This moves the nfs_open method into a do/while loop that will iterate while the return value is less than zero and we still have retry attempts remaining. I've limited it to 1 retry.

NFSv4 has a lease time which needs which means if the don't have a connection in the timeout period we will receive NFS4ERR_EXPIRED and EAGAIN in which case we need to recreate the nfs context.

We could probably adjust the keep alive parameters but nfsd allows setting a different lease time so the keep alive would have to be configurable as well. It looks like the default lease time for nfsd is 90 seconds.

ref: https://man7.org/linux/man-pages/man8/nfsd.8.html
ref: https://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=nfs.conf;h=323f072b2fc0a92539d3c9f10620fee94b806f5a;hb=HEAD#l72

You will now see something like this:

2023-02-06 14:45:38.721 T:1062417   debug <general>: CFileCache::Open - <nfs://192.168.1.108/mnt/tank/movies/Transformers Dark of the Moon (2011)/Transformers Dark of the Moon.mkv> opening
2023-02-06 14:45:38.721 T:1062417   debug <general>: NFS: Context for 192.168.1.108/ not open - get a new context.
2023-02-06 14:45:38.721 T:1062417   debug <general>: NFS: version: 4
2023-02-06 14:45:38.727 T:1062417   debug <general>: NFS: Connected to server 192.168.1.108 and export /
2023-02-06 14:45:38.727 T:1062417   debug <general>: NFS: chunks: r/w 0/0
2023-02-06 14:45:38.743 T:1062417   debug <general>: CNFSFile::Open - opened mnt/tank/movies/Transformers Dark of the Moon (2011)/Transformers Dark of the Moon.mkv
2023-02-06 14:45:38.743 T:1062417   debug <general>: CFileCache::Open - <nfs://192.168.1.108/mnt/tank/movies/Transformers Dark of the Moon (2011)/Transformers Dark of the Moon.mkv> source chunk size is 0, setting cache chunk size to 131072
2023-02-06 14:45:38.743 T:1062417   debug <general>: CFileCache::Open - <nfs://192.168.1.108/mnt/tank/movies/Transformers Dark of the Moon (2011)/Transformers Dark of the Moon.mkv> using double memory cache each sized 10485760 bytes

I did also test this with NFSv3 and it seemed fine.

@lrusak lrusak added Type: Fix non-breaking change which fixes an issue Component: FileSystem Filesystem v21 Omega labels Feb 6, 2023
@lrusak lrusak added this to the Omega 21.0 Alpha 1 milestone Feb 6, 2023
@lrusak lrusak mentioned this pull request Feb 6, 2023
7 tasks
@lrusak
Copy link
Contributor Author

lrusak commented Feb 8, 2023

I've added a fixup to use a lambda instead of the do/while loop. I think this looks cleaner as we only retry once anyways. @graysky2 can you try again as it will now retry on all errors.

2023-02-07 20:25:59.547 T:1198318   debug <general>: CFileCache::Open - <nfs://192.168.1.108/mnt/tank/movies/Star Trek Into Darkness (2013)/Star Trek Into Darkness.mkv> opening
2023-02-07 20:25:59.548 T:1198318   debug <general>: CNFSFile::Open: Unable to open file : 'mnt/tank/movies/Star Trek Into Darkness (2013)/Star Trek Into Darkness.mkv' errno: Resource temporarily unavailable (11) error : 'open call failed with "NFS4: (path /mnt/tank/movies/Star Trek Into Darkness (2013)) failed with NFS4ERR_EXPIRED(-5)"'
2023-02-07 20:25:59.548 T:1198318   debug <general>: CNFSFile::Open: Trying again with a new context
2023-02-07 20:25:59.548 T:1198318   debug <general>: NFS: Context for 192.168.1.108/ not open - get a new context.
2023-02-07 20:25:59.548 T:1198318   debug <general>: NFS: version: 4
2023-02-07 20:25:59.552 T:1198318   debug <general>: NFS: Connected to server 192.168.1.108 and export /
2023-02-07 20:25:59.553 T:1198318   debug <general>: NFS: chunks: r/w 0/0
2023-02-07 20:25:59.566 T:1198318   debug <general>: CNFSFile::Open - opened mnt/tank/movies/Star Trek Into Darkness (2013)/Star Trek Into Darkness.mkv

@arnova
Copy link
Member

arnova commented Feb 8, 2023

I've added a fixup to use a lambda instead of the do/while loop. I think this looks cleaner as we only retry once anyways. @graysky2 can you try again as it will now retry on all errors.

2023-02-07 20:25:59.547 T:1198318   debug <general>: CFileCache::Open - <nfs://192.168.1.108/mnt/tank/movies/Star Trek Into Darkness (2013)/Star Trek Into Darkness.mkv> opening
2023-02-07 20:25:59.548 T:1198318   debug <general>: CNFSFile::Open: Unable to open file : 'mnt/tank/movies/Star Trek Into Darkness (2013)/Star Trek Into Darkness.mkv' errno: Resource temporarily unavailable (11) error : 'open call failed with "NFS4: (path /mnt/tank/movies/Star Trek Into Darkness (2013)) failed with NFS4ERR_EXPIRED(-5)"'
2023-02-07 20:25:59.548 T:1198318   debug <general>: CNFSFile::Open: Trying again with a new context
2023-02-07 20:25:59.548 T:1198318   debug <general>: NFS: Context for 192.168.1.108/ not open - get a new context.
2023-02-07 20:25:59.548 T:1198318   debug <general>: NFS: version: 4
2023-02-07 20:25:59.552 T:1198318   debug <general>: NFS: Connected to server 192.168.1.108 and export /
2023-02-07 20:25:59.553 T:1198318   debug <general>: NFS: chunks: r/w 0/0
2023-02-07 20:25:59.566 T:1198318   debug <general>: CNFSFile::Open - opened mnt/tank/movies/Star Trek Into Darkness (2013)/Star Trek Into Darkness.mkv

Perhaps make the lambda just a (private) class method? Maybe even cleaner?

@graysky2
Copy link
Contributor

graysky2 commented Feb 8, 2023

@lrusak - I have been running a build with the updated version of this PR with the same NFSv4 exports for several hours now and I have not experienced any errors. Good job 👍

Please consider backporting this to Nexus as NFS4 currently is not usable.

xbmc/filesystem/NFSFile.cpp Outdated Show resolved Hide resolved
@graysky2
Copy link
Contributor

Still working as expected after multiple days of uptime (cherry picked against latest from nexus branch).

xbmc/filesystem/NFSFile.cpp Outdated Show resolved Hide resolved
@lrusak
Copy link
Contributor Author

lrusak commented Feb 12, 2023

updated

2023-02-11 19:58:05.572 T:1685898   debug <general>: connect replacing configured host 192.168.1.108 with resolved host 192.168.1.108
2023-02-11 19:58:05.578 T:1686246   debug <general>: CFileCache::Open - <nfs://192.168.1.108/mnt/tank/movies/Transformers Dark of the Moon (2011)/Transformers Dark of the Moon.mkv> opening
2023-02-11 19:58:05.579 T:1686246   error <general>: CNFSFile::Open: Unable to open file - trying again with a new context: error: 'open call failed with "NFS4: (path /mnt/tank/movies/Transformers Dark of the Moon (2011)) failed with NFS4ERR_EXPIRED(-5)"'
2023-02-11 19:58:05.579 T:1686246   debug <general>: NFS: Context for 192.168.1.108/ not open - get a new context.
2023-02-11 19:58:05.579 T:1686246   debug <general>: NFS: version: 4
2023-02-11 19:58:05.583 T:1686246   debug <general>: NFS: Connected to server 192.168.1.108 and export /
2023-02-11 19:58:05.583 T:1686246   debug <general>: NFS: chunks: r/w 0/0
2023-02-11 19:58:05.600 T:1686246   debug <general>: CNFSFile::Open - opened mnt/tank/movies/Transformers Dark of the Moon (2011)/Transformers Dark of the Moon.mkv

That means the nfsv4 context has expired and needs to be recreated

Signed-off-by: Lukas Rusak <lorusak@gmail.com>
@lrusak
Copy link
Contributor Author

lrusak commented Feb 12, 2023

squashed.

The diff looks bad so I suggest using the split view to see the change. https://github.com/xbmc/xbmc/pull/22714/files?diff=split&w=0

@fritsch
Copy link
Member

fritsch commented Feb 12, 2023

Change looks good. Do you consider backport for Nexus?

While the "newish" C++ is quite hard to read, I find it never the less a bit funny when looking at the mix with globals ;-)

@jenkins4kodi
Copy link
Contributor

I've made some formatting changes to meet the current code style. The diffs are available in the following links:

For more information please see our current code style guidelines.

@fuzzard fuzzard merged commit e7c9ac7 into xbmc:master Feb 19, 2023
@thexai
Copy link
Member

thexai commented Feb 22, 2023

Now crash opening any Blu-Ray BDMV folder structure using NFSv3:

crash

stack

100% reproducible in Android (Shield) and Windows:

2023-02-22 10:38:47.222 T:3260    debug <general>: NFS: Connected to server 192.168.50.13 and export /volume1/NAS
2023-02-22 10:38:47.222 T:3260    debug <general>: NFS: chunks: r/w 131072/131072
2023-02-22 10:38:47.223 T:3260    error <general>: CNFSFile::Open: Unable to open file: 'volume1/NAS/TEST-MEDIA/UHD/Full_UHD/Uncharted (2021) UHD/AACS/Unit_Key_RO.inf' error: 'open call failed with "NFS: Lookup of /TEST-MEDIA/UHD/Full_UHD/Uncharted (2021) UHD/AACS/Unit_Key_RO.inf failed with NFS3ERR_NOENT(-2)"'
Exception thrown at 0x00007FF64D8724CB in kodi.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.

I think in general the crash will occur every time you try to open a file that doesn't exist....

@thexai
Copy link
Member

thexai commented Feb 22, 2023

Some things I think are wrong in this PR:

return inside lambda code only returns from lambda and continues:

    if (!gNfsConnection.Connect(url, filename))
      return false;

Before was return outside main method, now only returns from lambda and continues. Then failure is evaluated same as fail is due nfs_open in nfs_get_error.

ie: this log may be inaccurate:

    CLog::Log(LOGERROR,
              "CNFSFile::Open: Unable to open file - trying again with a new context: error: '{}'",
              nfs_get_error(m_pNfsContext));

nfs_open returns integer error code and is ignored converted to true/false only.

ie: is not necessary retry if error code is NFS3ERR_NOENT(-2)

I hadn't said it but reverting the commit of this PR the crash is gone 🙂

@thexai
Copy link
Member

thexai commented Feb 22, 2023

Possible fix:

bool CNFSFile::Open(const CURL& url)
{
  Close();
  // we can't open files like nfs://file.f or nfs://server/file.f
  // if a file matches the if below return false, it can't exist on a nfs share.
  if (!IsValidFile(url.GetFileName()))
  {
    CLog::Log(LOGINFO, "NFS: Bad URL : '{}'", url.GetFileName());
    return false;
  }

  constexpr int NFS4ERR_EXPIRED = -11;
  constexpr int NFSERR_CONNECTION = 1;

  std::unique_lock<CCriticalSection> lock(gNfsConnection);

  auto NfsOpen = [this](const CURL& url) -> int
  {
    std::string filename;

    if (!gNfsConnection.Connect(url, filename))
      return NFSERR_CONNECTION;

    m_pNfsContext = gNfsConnection.GetNfsContext();
    m_exportPath = gNfsConnection.GetContextMapId();

    return nfs_open(m_pNfsContext, filename.c_str(), O_RDONLY, &m_pFileHandle);
  };

  int ret = NfsOpen(url);

  if (ret == NFSERR_CONNECTION)
  {
    return false;
  }
  else if (ret == NFS4ERR_EXPIRED)
  {
    CLog::Log(LOGWARNING,
              "CNFSFile::Open: Unable to open file - trying again with a new context: error: '{}'",
              nfs_get_error(m_pNfsContext));

    gNfsConnection.Deinit();
    ret = NfsOpen(url);
  }

  if (ret != 0)
  {
    CLog::Log(LOGERROR, "CNFSFile::Open: Unable to open file: '{}' error: '{}'", url.GetFileName(),
              nfs_get_error(m_pNfsContext));

    m_pNfsContext = nullptr;
    m_exportPath.clear();
    return false;
  }

  CLog::Log(LOGDEBUG, "CNFSFile::Open - opened {}", url.GetFileName());
  m_url=url;

  struct __stat64 tmpBuffer;

  if( Stat(&tmpBuffer) )
  {
    m_url.Reset();
    Close();
    return false;
  }

  m_fileSize = tmpBuffer.st_size;//cache the size of this file
  // We've successfully opened the file!
  return true;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backport: Done Component: FileSystem Filesystem Type: Fix non-breaking change which fixes an issue v21 Omega
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NFS4ERR_EXPIRED(-11)
9 participants