Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nfs] Encode filenames/paths so they end up properly in the db #18311

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

pprkut
Copy link

@pprkut pprkut commented Aug 21, 2020

Description

NFS as a filesystem supports certain special characters (like '?') that are valid within a URL. These characters cause filenames to end up truncated in the db, resulting in invalid video entries in the collection that can't be played (because the path doesn't exist) and get removed again when you clean the library.

I expect this to be still incomplete, but enough to start a conversation. Open points from my side are:

  • Filename encoding in debug messages. I left them as is for now (will show up URL encoded in logs), not sure what's preferred. There's probably a case for both options.
  • Not sure how tests work so will have to look at that next

Motivation and Context

This is a fix for #16383. As outlined in the issue comments, the suggested fix was to add proper curl handling to the NFS support. I looked at ZeroconfDirectory as reference implementation and compared it with how NFS works.

The fix, however, introduces a (somewhat breaking) side-effect. Since all paths/filenames are encoded now, when you rescan the NFS share, movies/episodes with special characters in them that were valid before the fix get added again to the library resulting in two working entries where neither will be removed by cleaning the library. Most prominently this affects filenames with spaces in them.

How Has This Been Tested?

I ran kodi locally on my laptop and connected it to my NFS share on a Synology NAS. In that share I had a movie with special characters in both the folder name as well as the file name:

Stermann & Grissemann - Wollt Ihr das totale Sieb?!/Stermann & Grissemann - Wollt Ihr das totale Sieb?!.mkv

Verified that master puts a truncated path in the sqlite db, and a correct path after the fix.
Verified playing from library works after the fix.
Verified direct play from NFS works both before and after the fix.
Verified clean library behavior.

Screenshots (if appropriate):

Types of change

  • Bug fix (non-breaking change which fixes an issue)
  • Clean up (non-breaking change which removes non-working, unmaintained functionality)
  • Improvement (non-breaking change which improves existing functionality)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that will cause existing functionality to change)
  • Cosmetic change (non-breaking change that doesn't touch code)
  • None of the above (please explain below)

Checklist:

  • My code follows the Code Guidelines of this project
  • My change requires a change to the documentation, either Doxygen or wiki
  • I have updated the documentation accordingly
  • I have read the Contributing document
  • I have added tests to cover my change
  • All new and existing tests passed

@AlwinEsch AlwinEsch added Component: Database Type: Fix non-breaking change which fixes an issue Type: Improvement non-breaking change which improves existing functionality v19 Matrix labels Aug 22, 2020
@AlwinEsch AlwinEsch added this to the Matrix 19.0-alpha 2 milestone Aug 22, 2020
@yol
Copy link
Member

yol commented Aug 24, 2020

Thanks for caring and the PR. Before we can consider this going in however:

Since all paths/filenames are encoded now, when you rescan the NFS share, movies/episodes with special characters in them that were valid before the fix get added again to the library resulting in two working entries where neither will be removed by cleaning the library.

This needs to be addressed. A DB migration must be added that fixes the encoding of the preexisting paths.

@yol
Copy link
Member

yol commented Aug 24, 2020

bool CNfsConnection::Connect(const CURL& url, std::string &relativePath)

relativePath should probably be filled with an already decoded path; it doesn't really make sense to return the url-encoded version here since it is expected to be used in NFS calls (IMO)

@DaveTBlake
Copy link
Member

Bumping it to v20 as v19 entering Beta phase with focus on testing and bug fixes, and this improvement needs more extensive work e.g. db migration and encoding of prexisting paths

@pprkut
Copy link
Author

pprkut commented Aug 21, 2021

Sorry it took me so long to get back to this :-/

There's still a couple kinks to iron out, but the core pieces should be in place now. Remaining things I have to check:

  • De/Encoding of the export path
  • Somehow the migrations don't work. I see the db files with the new version, but the content hasn't been touched. Still need to dig deeper into that one.

@DaveTBlake
Copy link
Member

I hate to say it, but on seeing the actual data I am unhappy with changing to have encoded path and filename data stored in the databases. e.g Stermann%20%26%20Grissemann%20-%20Wollt%20Ihr%20das%20totale%20Sieb%3F%21%2FStermann%20%26%20Grissemann%20-%20Wollt%20Ihr%20das%20totale%20Sieb%3F%21.mkv%0A

Not only is it is far easy to do db support if these are in a format easily readable by humans, but there are implications for query formation for smart playlists, filtering and searching. These were designed around the path and filename data being a human readable string not an encoded one. I also think there will be knock on effects for JSON API consumers.

Both SQLite and MySQL/Maria can store strings like Stermann & Grissemann - Wollt Ihr das totale Sieb?!/Stermann & Grissemann - Wollt Ihr das totale Sieb?!.mkv. Paths like these are currently truncated because of how paths are processed elsewhere in the code. In particular ? being used as an option delimiter, because historically ? was not expected (or even allowed) in a path or filename.

So I am pushing back on this approach, can another solution be found please.

@DaveTBlake
Copy link
Member

My thoughts on an alternative approach would be simply changing CVideoDatabase::SplitPath and CMusicDatabase::SplitPath . Rather than them calling URIUtils::Split for every strFileNameAndPath string then handling URL options if IsURL(), to check IsURL() first and use CURL methods to split into path (+ options) and filename for URLs which includes NFS paths.

This has advantage of handling ? in filenames other than just those from NFS, and removes the need to modify existing database data.

@pprkut
Copy link
Author

pprkut commented Aug 22, 2021

I just followed the recommendation from the linked bug report :)

Discussions about alternative options seem better at the original bug report, so I'll move the discussion there.

@jenkins4kodi jenkins4kodi added the Rebase needed PR that does not apply/merge cleanly to current base branch label Nov 17, 2021
@jenkins4kodi jenkins4kodi removed the Rebase needed PR that does not apply/merge cleanly to current base branch label Dec 1, 2021
@pprkut
Copy link
Author

pprkut commented Dec 1, 2021

@DaveTBlake I revisited this, but decided to go back to the original approach.

libnfs upstream added support for handling escape characters in URLs, so the URLs kodi stores in the db are now conform to what libnfs would expect. I'm just escaping ? since that is the only character that breaks right now and the only one that can be implemented without requiring database migrations.

I'm unescaping the URLs again because kodi doesn't actually use them for anything other than db storage. All libnfs api calls kodi makes require the unescaped paths.

@pprkut pprkut force-pushed the nfs_encode branch 2 times, most recently from 37ba341 to e09c7f4 Compare December 1, 2021 11:51
@pprkut pprkut force-pushed the nfs_encode branch 3 times, most recently from e178620 to 80f680a Compare January 30, 2022 14:32
@jenkins4kodi jenkins4kodi added the Rebase needed PR that does not apply/merge cleanly to current base branch label Jul 4, 2022
@jenkins4kodi jenkins4kodi removed the Rebase needed PR that does not apply/merge cleanly to current base branch label Jul 9, 2022
The return value is exclusively used for direct NFS operations,
so it makes more sense to do the decoding centrally here than
on every nfs call after.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Database Component: FileSystem Filesystem Component: Music Component: Video Type: Fix non-breaking change which fixes an issue Type: Improvement non-breaking change which improves existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants