Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persist caches to disk to avoid slow initialization upon restart #60

Merged
merged 27 commits into from
Jun 17, 2022

Conversation

blacklight
Copy link
Collaborator

@blacklight blacklight commented May 22, 2022

Note: this is still work in progress, but it already considerably reduces the extension bootstrap time upon restart.

Currently implemented:

  • Playlists cache
  • Tracks cache
  • Artists cache
  • Albums cache
  • Search cache
  • Unify the caches: they should all share the same LruCache base class, with filesystem caching support

@blacklight
Copy link
Collaborator Author

blacklight commented May 23, 2022

@tehkillerbee I have added cache_dir as an optional config parameter (default: <data-dir>/cache) and added disk persistence both for the tracks and playlists cache. After the first load, load time for some of my large playlists (~2500 items) went from ~15 minutes to a couple of seconds.

I also think that it may be worth refactoring of the caches used by the extension, what do you think? Right now there is library.Cache (used for tracks), lru_cache.LruCache (used by search, album tracks and images) and the simple dictionary cache used for playlists. Maybe it's worth using one single class to model them all, so we don't have to reinvent filesystem cache persistency for all of them?

@blacklight blacklight marked this pull request as draft May 23, 2022 10:00
@blacklight blacklight changed the title Disk caching support for playlists Persisting caches to disk to avoid slow initialization upon restart May 23, 2022
@blacklight blacklight changed the title Persisting caches to disk to avoid slow initialization upon restart Persist caches to disk to avoid slow initialization upon restart May 23, 2022
@tehkillerbee
Copy link
Owner

@blacklight Great to see you've made progress! I agree, refactoring the caches used in Mopidy-Tidal is a good idea. I noticed the same thing back when I started maintaining this project but did not find the time to do it. (If it works, right?).
I am no expert on cache frameworks so I cant say for sure which type that would work best for all use-cases - or if the use of different cache types were intentional by the previous developer.

I'm looking forward to test it later today. We also need to make sure that the default directory is suitable as a cache directory. I'd look at Mopidy documentation for pointers on where these data should be stored: link

@fmarzocca
Copy link

Other caches are here: /var/lib/mopidy/.cache/

If an item is unlikely to change (like a track, album or artist
associated to a Tidal ID) then there's usually no need to remove it from
the filesystem cache when it's removed from the memory cache because of
`max_items` limit.
@blacklight
Copy link
Collaborator Author

Other caches are here: /var/lib/mopidy/.cache/

Good point. I'll prepare a commit to change the cache dir to core/cache_dir instead of creating a subfolder under core/data_dir.

- Retrieve `cache_dir` from a static configuration-based context object
  instead of adding an additional level of indirection through the
  backend instance.

- Use mopidy's defaults for `cache_dir` - no need to explicitly set it
  among the supported configuration options.
The extension previously used three different caches:

- A simple in-memory dictionary for playlists
- A more sophisticated local `Cache` for tracks
- `LruCache` (extended by `SearchCache`) for everything else

There were also many overlaps between the tracks `Cache` and `LruCache`.

This commit rationalizes the caches and implements a single `LruCache`
for all the items, with support for filesystem persistence.

Items cached on the filesystem are indexed as it follows:

```
<cache_dir>
  -> <type>       # e.g. track or playlist
    -> <prefix>   # first two characters of the ID
      -> <id>.cache
```

Where `<id>.cache` is an object representation serialized by `pickle`.
@blacklight
Copy link
Collaborator Author

blacklight commented May 24, 2022

The persisted cache on storage has been implemented and the caches in the extension have been refactored. We now have a single LruCache class that implements (optional) filesystem persistence, and all the caches are now using it.

The LRU cache keeps a limited amount of items in memory (default: 1024) and all the other cached items are persisted on an indexed cache on the filesystem. The structure of the fs cache is inspired by that of mopidy-spotify:

<cache-dir>
  -> <type>
    -> <prefix>  # First two chars of the TIDAL ID
      -> tidal:<type>:<tidal-id>

In case of a miss on the memory cache, the filesystem cache will be inspected, and in case of a miss on the fs as well the extension would perform a lookup over API.

Some design decisions:

  • The search cache is not persisted. Search results can change over time and I don't want to go down the rabbit hole of cache invalidation for search results.
  • Image caches are not persisted. Caching them would blow the size of the cache directory. The price to pay is that images will be re-downloaded on-demand upon restart, but I think it's a quite reasonable trade-off.

Still pending:

  • I need to properly test it with playlist updates. Things should work in theory (upon load we compare the stored last_modified of the cached Mopidy playlist with the last_updated of the TIDAL playlist, and invalidate the cache entry if the upstream playlist has changed since last time), but I'd like to properly validate it.
  • I need to test artist caches.

Testing volunteers are welcome!

p.s. @kingosticks @adamcik I guess that several extensions that rely on slow remote backends have this cache persistence problem. The issue has been tackled quite well on mopidy-spotify, and I have tried to replicate some of the lessons learned there in this PR. But maybe it's worth to have something like a CacheProvider in mopidy itself that takes care of the boilerplate of implementing a backend-agnostic LRU persisted cache, so individual extensions don't have to reinvent the wheel?

@tehkillerbee
Copy link
Owner

@blacklight Great work, looking forward to test it!

@adamcik
Copy link

adamcik commented May 24, 2022

I don't disagree that this type of cache case is probably shared by multiple plugins, but I'm not sure if adding this to mopidy itself really makes sense. In a similar vein I've been wanting to factor out a bunch of the webservice/oauth type helpers so that we can just reuse things more. But I suspect that it might be better to just have someone create a mopidy-client-libs or a mopidy-plugin-helpers (or whatever a suitable name is).

If there are parts of this that can only be solved in mopidy, and there is real value to having it solved in that codebase then definitely open a bug with why this is the case. Otherwise I think this would benefit more from not being in "core", similar to how we've moved e.g. mopidy-mpd out so it can be improved independently of mopidy releases.

@blacklight
Copy link
Collaborator Author

@adamcik I totally agree - if the plan is to keep the core "lightweight" then non-core features should be moved out of the core, not in.

There's still a valid point though for features that are non-core but still shared by many extensions (such as caching, OAuth flows and playlist updates). Caching and OAuth in particular are implemented independently by multiple extensions (Spotify, SoundCloud, Tidal, YouTube etc.) even though they mostly share the same code - and, in some cases, most of the code of these features is copy-pasted from other extensions.

I like the idea of something like a mopidy-ext-helpers package that could collect most of these features, as well as provide some room for customization/extensibility on top of the boilerplate. We would have a long tail of extensions that should probably migrate at some point, but it's good to at least lay out the foundations.

@blacklight blacklight marked this pull request as ready for review May 24, 2022 23:00
@blacklight
Copy link
Collaborator Author

blacklight commented May 24, 2022

The PR is ready for test/review.

The latest changes involve a more consistent cache implementation/usage in library.lookup. I have tested the caching features for artists and albums across restarts with Iris and they also seem to work fine.

I have also fixed the logic for playlist cache invalidation upon updates. We still have a problem though because the current stable branch of python-tidal doesn't currently expose the last_updated attribute of the Tidal playlists, and with no last_updated we don't know if a playlist has been updated since last time we cached it.

More attributes are apparently exposed on the 0.7.x branch of python-tidal, but that version isn't on pip yet (the author said that it should be coming soon, but after waiting about two years I'm not sure of their definition of "soon"). In the meantime, I have pushed a fix myself in my existing PR for pagination, so you may want to use the pagination-support branch from my fork of python-tidal if you want both pagination, and your playlists to be up-to-date when you add/remove tracks.

@tehkillerbee
Copy link
Owner

tehkillerbee commented May 26, 2022

Browsing appears very fast indeed! Only issue I see is the images are continuously requested again from the server. I guess this is intentional, as we do not have a local cache of the downloaded images - only the URIs themselves. Perhaps we should consider a local image cache as well?

Edit: This is mainly an issue when requesting the full resolution images. See below comments for details on how to reduce the resolution.

mopidy_tidal/library.py Outdated Show resolved Hide resolved
mopidy_tidal/library.py Outdated Show resolved Hide resolved
@fmarzocca
Copy link

  • Error is thrown whenever playlist directory is refreshed from Iris.

Not completely true. It is thrown whenever Mopidy receive a [core.playlists.refresh](https://docs.mopidy.com/en/latest/api/core/#mopidy.core.PlaylistsController.refresh) call.

@blacklight
Copy link
Collaborator Author

Am I right in saying that for every cache miss in lookup() you will write once to disk? Can you write them once in bulk at the end?

@kingosticks this is now addressed in 6416ed3

@blacklight
Copy link
Collaborator Author

1a. Artist images are never loaded when opening "Artists" directory (no caching occurs). When clicking directory "My Artist", Images are requested but never displayed. Instead an error is shown.

1b. Playlists images are never loaded when opening "Playlists" directory (no caching occurs). Opening "My Playlists" directory results in partial loading of album art. Only some of the first playlists has an image. The remaining ones are blank.

These should both be fixed by b92064c

2. Attempting to open "Playlists" (not My Playlists) directory, results in the following (VERY long log message), although playlists get displayed. Error is only shown immediately after mopidy restart. Otherwise, it can be replicated by clicking "refresh" in the Iris GUI. The error is Expected a list of Track, not [Playlist...

This should be fixed by 2adfb83

4 Currently, we have not set the max resolution to use when requesting images. We should consider changing it, as mentioned above. Otherwise we request images/album art that is 1080x1080. It looks like iris does not like images that are too large(?).

This should be fixed by 063fd0b

5 Browsing "My Albums" result in the following error, followed by the caching itself.

6 Browsing My Playlists, result in the following error, followed by the caching itself.

This is actually expected, it's just the error logging trace that is misleading.

It happens because the extension builds URIs such as tidal:my_artists or tidal:my_albums. While browse provides a way to map those results to folders, lookup wants a list of tracks to be returned, so it's safe to just skip these entries. I have removed the error log trace from the file.

7 While refreshing "My Albums, certain albums cannot be displayed. Instead, an error is thrown:

I have tried to look up the same album (both through ncmpcpp and Iris) and I couldn't reproduce the error. I can't reproduce it when rendering My Albums either. Maybe it's a cache issue? If you have tried several versions of this PR it's likely that the cache folder may be screwed up :) if wiping the cache doesn't work can you try and replicate it when printing dir(track) and type(track) as well? The uri attribute should definitely be there on a Track instance. If it's missing on some tracks perhaps it's because is unavailable in the configured market?

@fmarzocca
Copy link

This should be fixed

Thank you. I'm going to update to latest commit and I will report any test.

@fmarzocca
Copy link

fmarzocca commented Jun 10, 2022

I have updated to the latest version of better-cache. These are the findings:

This is after a refresh (for every playlist):

Jun 10 10:50:00 pab mopidy[1056]: ERROR [Core-17] mopidy.core.library TidalBackend backend returned bad data: Expected a list of Track, not [Playlist(last_modified=0, name='I miei brani di Shazam', tracks=[Track(album=Album(artists=[Artist(name='C. Tangana', uri='tidal:artist:7995155')], name=

I don't understand this other error:
Jun 10 10:50:22 pab mopidy[1056]: ERROR [TidalBackend-6] mopidy_tidal.library <class 'requests.exceptions.HTTPError'> when processing URI 'tidal:track:3579856:70841514:70841515': 404 Client Error: Not Found for url: https://api.tidalhifi.com/v1/albums/70841514/tracks?sessionId=e5947e8d-14f4-45b7-a77e-3c363cac6ccd&countryCode=IT&limit=999

@blacklight
Copy link
Collaborator Author

I have updated to the latest version of better-cache. These are the findings:

This is after a refresh (for every playlist):

Jun 10 10:50:00 pab mopidy[1056]: ERROR [Core-17] mopidy.core.library TidalBackend backend returned bad data: Expected a list of Track, not [Playlist(last_modified=0, name='I miei brani di Shazam', tracks=[Track(album=Album(artists=[Artist(name='C. Tangana', uri='tidal:artist:7995155')], name=

I don't understand this other error:
Jun 10 10:50:22 pab mopidy[1056]: ERROR [TidalBackend-6] mopidy_tidal.library <class 'requests.exceptions.HTTPError'> when processing URI 'tidal:track:3579856:70841514:70841515': 404 Client Error: Not Found for url: https://api.tidalhifi.com/v1/albums/70841514/tracks?sessionId=e5947e8d-14f4-45b7-a77e-3c363cac6ccd&countryCode=IT&limit=999

The second error is either an album that isn't found in a market, or a maximum requests exceeded.

The first one should be fixed. Have you installed the latest version? If it still occurs, you may want to wipe the cache (older cache files may store playlists as list of tracks instead of objects).

@kingosticks
Copy link

I'm not saying we need this during development but what do you think about a mechanism to automatically clear the cache when the extension version changes?

@fmarzocca
Copy link

If it still occurs, you may want to wipe the cache (older cache files may store playlists as list of tracks instead of objects).

Yes, wiping the cache the error is gone. But wiping the cache is anyway a major PITA because I have to refresh EACH playlist, one by one. Sending a core.playlists.refresh doesn't do the job. Do you know a way to automatically refresh all playlists? I thought mopidy-tidal did this on server restart

@blacklight
Copy link
Collaborator Author

I'm not saying we need this during development but what do you think about a mechanism to automatically clear the cache when the extension version changes?

I don't expect changes in the cached formats to occur that often. While developing the feature and figuring things out the format can obviously change, but now that we've settled for pickle-ing native mopidy objects I don't expect many changes that would justify the effort of a version-based cache management logic.

But wiping the cache is anyway a major PITA because I have to refresh EACH playlist, one by one. Sending a core.playlists.refresh doesn't do the job. Do you know a way to automatically refresh all playlists? I thought mopidy-tidal did this on server restart

Manually wiping the cache should no longer be required - we've settled for storing playlists as native mopidy objects instead of lists of tracks, so there shouldn't be any more breaking changes.

When core.playlists.refresh is called, the playlist cache is hit if available, and it will be refreshed if the last_modified of the upstream object is greater than the cached one (e.g. if the playlist has changes). Otherwise, there should be no reason to force a cache refresh. Note that this works only if you're using my fork of python-tidal 0.6.x, because the standard version doesn't export the lastUpdated field.

Depending on the client implementation, however, a mopidy restart may be required in case of updates. Or at least that's the case for ncmpcpp, because the current playlist lookup logic doesn't call the upstream API to refresh the last modified date. I could fix this by forcing an API playlist lookup every time a playlist is looked up, but this could impact performance.

@fmarzocca
Copy link

Note that this works only if you're using my fork of python-tidal 0.6.x

Got it, thanks. So i have to install your python-tidal 0.6.x. Don't you think it should be set as a dependency on the final mopidy-tidal?

@blacklight
Copy link
Collaborator Author

Don't you think it should be set as a dependency on the final mopidy-tidal?

I still hope that I won't need to do that.

My PR is a temporary workaround while waiting (for the past two years) for the maintainer of python-tidal to finally push 0.7.x, which is supposed to be a total rewrite of the library.

The maintainer recently promised that 0.7.x should be coming up soon, and that's why I'm holding on that PR: it may soon become irrelevant, and since 0.7.x may also require the rewrite of some parts of mopidy-tidal, I don't want to become the de facto maintainer of the stale 0.6.x branch in the meantime.

However, if this PR gets approved before the new python-tidal comes out, it's probably worth to add at least a couple of lines to the documentation about the temporary workaround.

@fmarzocca
Copy link

I still hope that I won't need to do that.

I fully understand and agree with you. A friend of mine is used to say: "We are on this Earth to suffer". (maybe too much Calvinistic!)

These methods can be used to wipe cache entries
both from memory and disk
Playlist metadata is always fetched upstream upon
`PlaylistsProvider.as_list()` calls - the performance impact of a single
API call isn't very high.

This allows us to detect added/removed playlists at runtime and refresh
the cache accordingly, without requiring mopidy to be restarted.

cache_updates[cache_name][uri] = cache_data

tracks += data if hasattr(data, '__iter__') else [data]
Copy link
Owner

@tehkillerbee tehkillerbee Jun 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change L251 to the following:

if item_type == 'playlist' and not cache_miss:
    tracks += data.tracks
else:
    tracks += data if hasattr(data, '__iter__') else [data]

Otherwise, we will still get the error:

ERROR    [Core-7] mopidy.core.library TidalBackend backend returned bad data: Expected a list of Track, not [Playlist

@fmarzocca
Copy link

fmarzocca commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants