Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

VideoInfoScanner slightly optimized #1191

Closed
wants to merge 1 commit into from

4 participants

@dez-dk

Using assumptions about the user-specified settings the VideoInfoScanner has been optimized in an attempt to reduce roundtrips when scanning a movie collection which is mounted from a network location.

The DoScan() method has been refactored to use a CFileItemPtr instead of just a string. This let's us grab a lot more fileinfo at once instead of doing individual stat() calls on every subdir and file.

If the user has chosen "Movies are in separate folders" we assume:

  • They do not want to scan the root dir (eg. "/movies/") itself as a movie but they want to scan subdirs (eg. "/movies/Movie Title (year)"). This saves a Stack() and GetPathHash() call on a potentially large dir ("/movies/").

If the user has also selected not to recurse the dir we assume:

  • That it's okay to fasthash subdirs (eg. "/movies/Movie Title (year)") to determine if we've already scanned the movie. We use the date from the FileItem we've already grabbed to do fast hashing.

EDIT: added some examples.

dez-dk VideoInfoScanner slightly optimized
Using assumptions about the user-specified settings the VideoInfoScanner
has been optimized in an attempt to reduce roundtrips when scanning a
movie collection which is mounted from a network location.
8a117f9
@ghost

Haven't looked at the code but no stack will break eg two cd movies in cd x subfolders

@dez-dk

I guess you're right about the CD-subdirs.

I'm not in the know here but why are CD-subdirs scanned? I would think the movie is still the same and that it has the same information even if CD2 has changed.

EDIT:
I should probably point out that the focus of this PR is the RESCANNING of movies. Movies that have not yet been scanned are still scanned in the same way. CD-subdirs are still scanned initially. However, they are not RESCANNED if the user has enabled the "Movies are in separate folders" and disabled recursive scanning.

@jmarshallnz
Owner

You can't assume that the root dir contains no movies even if "movies in folders named after the movie" is enabled. Simple reason: VIDEO_TS folders are stacked down to the parent folder item, which is then converted to a file.

Further, this would make essentially bugger all difference to the work that needs to be done anyway.

@dez-dk

Can you point me towards the work that needs to be done? Any plans?

@ghost

work to be done as in the work the scanner has to perform.

@dez-dk

I disagree. With these changes I can rescan about 18 movies/second from a remote location. Before these changes I couldn't even rescan 0.5 movies/second (I stopped the scan before it finished because it simple took too long, so that's perhaps even a bit optimistic.)

The thing that makes all the difference is not the stack() call that is saved but the fact that fast hashing is used more extensively. I will revert the stack() change because it obviously makes no sense because of the DVD stacking you mentioned but I urge you to reconsider the rest of the PR.

EDIT: Please note these estimates are based on the rescanning of an entire movie directory and hence they include time spent rescanning movies that are not scannable. In a given second XBMC is rescanning 160 directories/second.

@ghost

sure, we haven't commented on the rest :)

@jmarshallnz
Owner

Do you have rarred files?

@dez-dk

No, the files are not rarred.

@jmarshallnz
Owner

So why is it so slow? Retrieving directories should be pretty quick - the music scan for example always does this, and I don't think we've had any complaints about that's rescan speed.

@dez-dk

I am using an HTTP source and stat() doesn't retrieve a modification time for HTTP directories. Nginx doesn't send a Last-Modified header because the "directory" is just a generated HTML page (IIRC).

EDIT: Which means XBMC has to do a full path hash on each directory.
(I need to stop abusing the edit feature)

@jmarshallnz
Owner

So the root cause is that the fasthash does not apply to your movie folders because Stat() is not implemented?

Do each of your movie folders doesn't have any subdirs in them? i.e. would they normally be fasthashed if they were on any filesystem other than http?

@dez-dk

It was what prompted me to take a look at the VideoInfoScanner.

I have only tested with single file directories. They also have subdirs that match exclude regexps (eg. a subs-dir) and as far as I can tell that means the dirs are fully path hashed.

I also have multi-CD movies but I haven't yet tested those. I would like to fasthash multi-CD movies as well. As I have previously stated I don't think the directory will suddenly contain a different movie because one of the CD-subdirs has changed.

@jmarshallnz
Owner

Multi-cd movies will be stacked down to a file, so they're not a problem. See the comment in CanFastHash for the other one - I wouldn't see a problem with excluding folders there. A simple modification there would fix your issue?

@dez-dk

Right you are :)

I must have stared myself blind at the problem when I was stepping through the code. Do you have any suggestions about the missing stat() feature for HTTP?

@jmarshallnz
Owner

Well, the first step is determining how it sets m_dateTime when retrieving a listing to begin with. If you can get that information you'll be fine. My guess is you can't :)

@dez-dk

In a previous PR I made I added some regexps to HTTPDirectory that made it possible to extract the date from a directory listing.

I believe I would have to retrieve the parent directory and use it to get the date information of the dir I'm looking for ... Seems like a lot of work though.

@Karlson2k
Collaborator

Personally I'm against this commits.
I don't like the idea of skipping everything in root folder - something can be temporary placed here.
And I don't like that idea of assumption that everyone is using name template like 'Title (Year)'. I'm using 'Original Title (edition) {Year}= Localized Title [Genres]' template. And XBMC works fine with my template after tuning AdvancedSettings.xml.

May be it's worth to think about different tool for sharing? NFS, SMB works pretty good with last modified time. Or in case that http is required, try other http server.
In any case: instead of implementing not-always-working workarounds, try to use right tools that are intended for your needs.

@dez-dk

@Karlson2k
The PR has already been debunked.
What difference does it make whether something is "temporarily placed" in the root? Care to clarify?
This PR changes nothing about the naming scheme mechanisms. The naming scheme were just illustrative examples. I do not use this naming scheme myself.
I need to use a protocol that works across WAN. NFS and SMB need to be tunneled through a VPN for that to work IIRC.

@classicspam

Did you try FTP?

@dez-dk

@classicspam
Yes, but it timed out a lot when listing large directories and sometimes it would fail to stream videos the first couple of tries.
HTTP was a much smoother experience so that was why I made the switch.

@jmarshallnz jmarshallnz closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jul 21, 2012
  1. VideoInfoScanner slightly optimized

    dez-dk authored
    Using assumptions about the user-specified settings the VideoInfoScanner
    has been optimized in an attempt to reduce roundtrips when scanning a
    movie collection which is mounted from a network location.
This page is out of date. Refresh to see the latest.
Showing with 67 additions and 38 deletions.
  1. +62 −35 xbmc/video/VideoInfoScanner.cpp
  2. +5 −3 xbmc/video/VideoInfoScanner.h
View
97 xbmc/video/VideoInfoScanner.cpp
@@ -190,8 +190,9 @@ namespace VIDEO
m_pObserver = pObserver;
}
- bool CVideoInfoScanner::DoScan(const CStdString& strDirectory)
+ bool CVideoInfoScanner::DoScan(const CFileItemPtr& file)
{
+ CStdString strDirectory = file->GetPath();
if (m_pObserver)
{
m_pObserver->OnDirectoryChanged(strDirectory);
@@ -233,7 +234,7 @@ namespace VIDEO
if (m_pObserver)
m_pObserver->OnStateChanged(content == CONTENT_MOVIES ? FETCHING_MOVIE_INFO : FETCHING_MUSICVIDEO_INFO);
- CStdString fastHash = GetFastHash(strDirectory);
+ CStdString fastHash = GetFastHash(file);
if (m_database.GetPathHash(strDirectory, dbHash) && !fastHash.IsEmpty() && fastHash == dbHash)
{ // fast hashes match - no need to process anything
CLog::Log(LOGDEBUG, "VideoInfoScanner: Skipping dir '%s' due to no change (fasthash)", strDirectory.c_str());
@@ -243,32 +244,41 @@ namespace VIDEO
if (!bSkip)
{ // need to fetch the folder
CDirectory::GetDirectory(strDirectory, items, g_settings.m_videoExtensions);
- items.Stack();
- // compute hash
- GetPathHash(items, hash);
- if (hash != dbHash && !hash.IsEmpty())
- {
- if (dbHash.IsEmpty())
- CLog::Log(LOGDEBUG, "VideoInfoScanner: Scanning dir '%s' as not in the database", strDirectory.c_str());
- else
- CLog::Log(LOGDEBUG, "VideoInfoScanner: Rescanning dir '%s' due to change (%s != %s)", strDirectory.c_str(), dbHash.c_str(), hash.c_str());
- }
- else
- { // they're the same or the hash is empty (dir empty/dir not retrievable)
- if (hash.IsEmpty() && !dbHash.IsEmpty())
+ if (!foundDirectly || settings.parent_name_root || !settings.parent_name)
+ { // folder may contain movies
+ items.Stack();
+ // compute hash
+ GetPathHash(items, hash);
+ if (hash != dbHash && !hash.IsEmpty())
{
- CLog::Log(LOGDEBUG, "VideoInfoScanner: Skipping dir '%s' as it's empty or doesn't exist - adding to clean list", strDirectory.c_str());
- m_pathsToClean.insert(m_database.GetPathId(strDirectory));
+ if (dbHash.IsEmpty())
+ CLog::Log(LOGDEBUG, "VideoInfoScanner: Scanning dir '%s' as not in the database", strDirectory.c_str());
+ else
+ CLog::Log(LOGDEBUG, "VideoInfoScanner: Rescanning dir '%s' due to change (%s != %s)", strDirectory.c_str(), dbHash.c_str(), hash.c_str());
}
else
- CLog::Log(LOGDEBUG, "VideoInfoScanner: Skipping dir '%s' due to no change", strDirectory.c_str());
+ { // they're the same or the hash is empty (dir empty/dir not retrievable)
+ if (hash.IsEmpty() && !dbHash.IsEmpty())
+ {
+ CLog::Log(LOGDEBUG, "VideoInfoScanner: Skipping dir '%s' as it's empty or doesn't exist - adding to clean list", strDirectory.c_str());
+ m_pathsToClean.insert(m_database.GetPathId(strDirectory));
+ }
+ else
+ CLog::Log(LOGDEBUG, "VideoInfoScanner: Skipping dir '%s' due to no change", strDirectory.c_str());
+ bSkip = true;
+ if (m_pObserver)
+ m_pObserver->OnDirectoryScanned(strDirectory);
+ }
+ // update the hash to a fast hash if needed
+ // it's okay to fast hash movies in seperate dirs (that are not scanned recursively)
+ if (!fastHash.IsEmpty() && ((settings.parent_name && settings.recurse <= 1) || CanFastHash(items)))
+ hash = fastHash;
+ }
+ else
+ {
+ CLog::Log(LOGDEBUG, "VideoInfoScanner: Skipping dir '%s' because we don't expect to find movie info in a root dir that contain movies in seperate folders", strDirectory.c_str());
bSkip = true;
- if (m_pObserver)
- m_pObserver->OnDirectoryScanned(strDirectory);
}
- // update the hash to a fast hash if needed
- if (CanFastHash(items) && !fastHash.IsEmpty())
- hash = fastHash;
}
}
else if (content == CONTENT_TVSHOWS)
@@ -336,7 +346,7 @@ namespace VIDEO
// do not recurse for tv shows - we have already looked recursively for episodes
if (pItem->m_bIsFolder && !pItem->IsParentFolder() && !pItem->IsPlayList() && settings.recurse > 0 && content != CONTENT_TVSHOWS)
{
- if (!DoScan(pItem->GetPath()))
+ if (!DoScan(pItem))
{
m_bStop = true;
}
@@ -345,6 +355,13 @@ namespace VIDEO
return !m_bStop;
}
+ bool CVideoInfoScanner::DoScan(const CStdString& strDirectory)
+ {
+ // wrap the strDirectory in a CFileItemPtr so we can start using the recursive DoScan() method
+ CFileItemPtr filePtr = CFileItemPtr(new CFileItem(strDirectory, true));
+ return DoScan(filePtr);
+ }
+
bool CVideoInfoScanner::RetrieveVideoInfo(CFileItemList& items, bool bDirNames, CONTENT_TYPE content, bool useLocal, CScraperUrl* pURL, bool fetchEpisodes, CGUIDialogProgress* pDlgProgress)
{
if (pDlgProgress)
@@ -1551,19 +1568,29 @@ namespace VIDEO
return items.GetFolderCount() == 0;
}
- CStdString CVideoInfoScanner::GetFastHash(const CStdString &directory) const
- {
- struct __stat64 buffer;
- if (XFILE::CFile::Stat(directory, &buffer) == 0)
+ CStdString CVideoInfoScanner::GetFastHash(const CFileItemPtr& file) const
+ { // attempt to use the time we've already retrieved
+ CStdString hash;
+ time_t timet;
+ file->m_dateTime.GetAsTime(timet);
+ if (timet)
{
- int64_t time = buffer.st_mtime;
- if (!time)
- time = buffer.st_ctime;
- if (time)
+ hash.Format("fast%"PRIu32, timet);
+ return hash;
+ }
+ else
+ { // fall back to doing a stat() call
+ struct __stat64 buffer;
+ if (XFILE::CFile::Stat(file->GetPath(), &buffer) == 0)
{
- CStdString hash;
- hash.Format("fast%"PRId64, time);
- return hash;
+ int64_t time = buffer.st_mtime;
+ if (!time)
+ time = buffer.st_ctime;
+ if (time)
+ {
+ hash.Format("fast%"PRId64, time);
+ return hash;
+ }
}
}
return "";
View
8 xbmc/video/VideoInfoScanner.h
@@ -137,6 +137,7 @@ namespace VIDEO
protected:
virtual void Process();
+ bool DoScan(const CFileItemPtr& file);
bool DoScan(const CStdString& strDirectory);
INFO_RET RetrieveInfoForTvShow(CFileItemPtr pItem, bool bDirNames, ADDON::ScraperPtr &scraper, bool useLocal, CScraperUrl* pURL, bool fetchEpisodes, CGUIDialogProgress* pDlgProgress);
@@ -197,13 +198,14 @@ namespace VIDEO
static int GetPathHash(const CFileItemList &items, CStdString &hash);
/*! \brief Retrieve a "fast" hash of the given directory (if available)
- Performs a stat() on the directory, and uses modified time to create a "fast"
- hash of the folder. If no modified time is available, the create time is used,
+ Uses the date of the directory to create a "fast" hash. If the directory has no date
+ a stat() is performed on the directory, and the modified time is used to create a
+ "fast" hash of the folder. If no modified time is available, the create time is used,
and if neither are available, an empty hash is returned.
\param directory folder to hash
\return the hash of the folder of the form "fast<datetime>"
*/
- CStdString GetFastHash(const CStdString &directory) const;
+ CStdString GetFastHash(const CFileItemPtr &file) const;
/*! \brief Decide whether a folder listing could use the "fast" hash
Fast hashing can be done whenever the folder contains no scannable subfolders, as the
Something went wrong with that request. Please try again.