Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jarvis restore AudioLibrary.GetSongs speed in SQLlite #9108

Merged
merged 1 commit into from Feb 12, 2016

Conversation

DaveTBlake
Copy link
Member

AudioLibrary.GetSongs is at least 3 times slower in current build compared to RC2 on SQLite (but not MySQL). This regression is caused by a bug in the way the SQLite query optimiser deals with left joins between views. A suboptimal query plan is applied, this issue is discussed at length in #9081. The left join was introduced in #8993 to ensure that songs without artists were returned.

As a work around songartistview is replaced with the explicit song_artist and artist tables. The alternative fix for applying limits to number of songs that works in MySQL is applied modifying that patched in #9034. In addition only those artist fields exposed by the API are queried, and the unncecessary loading of replaygain from cuesheet is removed, as replaygain is not exposed by the API, this also improves the speed.

Also the half-attempt at querying specific fields is removed, as it was not supported by the way CFileItem is filled and did not work. Must have been unused.

@MilhouseVH could you check this in MySQL please, I don't want to introduce more issues!
@Tolriq, @phate89 and @razzeee hope this makes sense from our discussion. More can be done in Krypton, but want to fix Jarvis as quickly and simply as possible.
@Montellese do changes to the underlying query require an API version change?

Thorough testing appreciated, will set a test build going for that and upload to mirrors.

@DaveTBlake DaveTBlake added Type: Fix non-breaking change which fixes an issue v16 Jarvis Component: Music labels Feb 11, 2016
@MartijnKaijser MartijnKaijser added this to the Jarvis 16.0 milestone Feb 11, 2016
std::string strSQL = "SELECT %s FROM songview ";
if (artistData)
std::string strSQL = "SELECT songview.* FROM songview ";
if (artistData)

This comment was marked as spam.

This comment was marked as spam.

@phate89
Copy link
Contributor

phate89 commented Feb 11, 2016

Good job.. WIth this query we have the right query plan in sqlite.. I can't test it right now in mysql but it shouldn't change things.
Do you want to do the same in the other places where left join slows down?

@MilhouseVH
Copy link
Contributor

MySQL testing: I built Jarvis with/without this PR for x86 and as far as I can tell it's working as expected - with this PR it's a few seconds quicker than without (when querying limits":{"start":0,"end":7388}}) - 15-16 seconds for an end-to-end test with PR, 21 seconds without.

MySQL and Kodi seem to be returning the correct results for various AudioLibrary.GetSongs queries. No errors being reported.

If you produce a version of this PR for master I can test on RPi2 hardware, should this be necessary.

With this PR the following query is being executed for AudioLibrary.GetSongs:

SELECT sv.*, song_artist.idArtist AS idArtist, artist.strArtist AS strArtist, artist.strMusicBrainzArtistID AS strMusicBrainzArtistID FROM (SELECT songview.* FROM songview  LIMIT 7388) AS sv LEFT JOIN song_artist on song_artist.idsong = sv.idsong JOIN artist ON song_artist.idArtist = artist.idArtist

Without this PR the query is:

SELECT songview.*, songartistview.*  FROM songview LEFT JOIN songartistview on songartistview.idsong = songview.idsong  WHERE songview.idsong IN (SELECT idsong FROM (SELECT idsong FROM songview  LIMIT 7388) as temp)

In MySQL Workbench, the timings for either query is essentially the same:

Limit With PR Without PR
1 0.000 sec 0.000 sec
1000,1500 0.015 sec 0.016 sec
7388 0.031 sec 0.032 sec

On this basis, elimination of the repeated cue and path queries would now be the most likely explanation for the improved performance when using MySQL - many thanks for that!

👍

Not related to this PR but I'll include it anyway as something else to think about (and I've got the test data!)... the majority of the remaining time during the end-to-end test is now taken by the repeated artwork queries:

                1440525 Query   SELECT type,url FROM art WHERE media_id=1 AND media_type='song'
                1440525 Query   SELECT type,url FROM art WHERE media_id=1 AND media_type='album'
                1440525 Query   SELECT url FROM art WHERE media_id=(SELECT idArtist from song_artist WHERE idsong=1 AND iOrder=0) AND media_type='artist' AND type='fanart'

ie. for LIMIT 7388, the timings from the MySQL server log:

160211 13:01:37 1440566 Query   SELECT type,url FROM art WHERE media_id=1 AND media_type='song'
                1440566 Query   SELECT type,url FROM art WHERE media_id=1 AND media_type='album'
                1440566 Query   SELECT url FROM art WHERE media_id=(SELECT idArtist from song_artist WHERE idsong=1 AND iOrder=0) AND media_type='artist' AND type='fanart'
...
160211 13:01:52 1440566 Query   SELECT url FROM art WHERE media_id=(SELECT idArtist from song_artist WHERE idsong=7404 AND iOrder=0) AND media_type='artist' AND type='fanart'
                1440566 Query   SELECT type,url FROM art WHERE media_id=7405 AND media_type='song'
                1440566 Query   SELECT url FROM art WHERE media_id=(SELECT idArtist from song_artist WHERE idsong=7405 AND iOrder=0) AND media_type='artist' AND type='fanart'

The end-to-end test which produced the above queries took 15.071 seconds to transfer all data, ~15 seconds of which is spent querying artwork (this is testing on a fast x86 box so time spent crunching the data is now minimal).

time curl -s --data-binary '{"jsonrpc":"2.0","id":45,"method":"AudioLibrary.GetSongs","params":{"properties":["track","albumid","displayartist","duration","artistid","thumbnail","genre","playcount","title","disc","year","file","rating"],"limits":{"start":0,"end":7388}}}' -H 'content-type: application/json;' http://localhost:8080/jsonrpc -o /dev/null

real    0m15.071s
user    0m0.004s
sys     0m0.000s

The same end-to-end test but without the thumbnail property (so no repeated artwork queries) completes in 0.628s 0.491s (repeated 3 times), so scope for some serious optimisation when querying artwork! :)

@DaveTBlake
Copy link
Member Author

Do you want to do the same in the other places where left join slows down?

The speed impact of left join on views elsewhere in Jarvis is much smaller, so at this stage (release Sunday) I would say best left as it is.

For Krypton I will look at changes across the music database, it has a greater impact. I didn't spot any left joins on views in video, but will check again.

@Tolriq
Copy link
Contributor

Tolriq commented Feb 11, 2016

Anyway to have the tests builds ? :) I have no faith in rebuilding all deps and everything for Jarvis :(

@MartijnKaijser
Copy link
Member

kicked a win32 build. http://jenkins.kodi.tv/job/WIN-32/7811/

@DaveTBlake
Copy link
Member Author

Already built for all platforms http://jenkins.kodi.tv/job/WIN-32/7809/ @MartijnKaijser @Tolriq, like I said I would in the first post.
Build on mirror here http://mirrors.kodi.tv/test-builds/win32/KodiSetup-20160211-a74871c-HEAD.exe

@Montellese
Copy link
Member

@MilhouseVH this problem with one-to-many relationships has been know for a few years. Specifically for artwork I once tried to optimize it by storing the different artwork for a single item in JSON in an extra column in the item's table (e.g. movie) but in the end deserializing the artwork from JSON for every item was similarly slow as doing the separate queries but you basically loose the relational model so we (Jonathan and I) decided to abandon that road.

@Tolriq
Copy link
Contributor

Tolriq commented Feb 11, 2016

@DaveTBlake Ok so have tested this one.

All works OK and it's even faster with this patch than in 15.2 so all good. (from Yatse 22s instead of 26s for the 22k songs in 750 chunks, way better than the 60s)

Next time as Montellese do, do not hesitate to ping me for tests and feedback ;)

Glad to have found this before release.

@DaveTBlake
Copy link
Member Author

Thanks all for the testing, it seems we are good to go :) [Edit: Famous last words - no we weren't]

Glad to have found this before release.

Yes indeed. The optimiser bug was a surprise, but that is what optimisers sometimes do. I should know that after all these years of DB work! I will incorporate changes to Krypton in due course.

else
{
strSQL = PrepareSQL(strSQL, !filter.fields.empty() && filter.fields.compare("*") != 0 ? filter.fields.c_str() : "songview.*, songartistview.* ");
{ // Get data from song and song_artist tables to fully populate songs with artists

This comment was marked as spam.

This comment was marked as spam.

@Millencolin
Copy link

I have tested this as well and found an issue. It does not return songs without artist in id3 tags.
Changing the join to a left join probably fixes it.
... JOIN artist ON song_artist.idArtist = artist.idArtist
... LEFT JOIN artist ON song_artist.idArtist = artist.idArtist

@DaveTBlake
Copy link
Member Author

Good catch @Millencolin, got so focused on speed I missed testing the original point of the left join!! Of course in splitting the songartistview into consituant tables they both have to be left join or the songs without artists stil get missed.

@DaveTBlake
Copy link
Member Author

Have fixed query so that it does return songs that don't have an artist (the point of adding the left join in the first place!). It doesn't seem to impact timings too much, the optimiser is still using the right plan, but it is necessary anyway.

@Tolriq could you test this with your 902 songs without artists please an ensure we get them. A build will be on the mirrors for you in due course.

@DaveTBlake
Copy link
Member Author

Here's a win32 test build http://mirrors.kodi.tv/test-builds/win32/KodiSetup-20160212-01b104a-HEAD.exe
Need someone else's eyes on this so it can go to release.

@Tolriq
Copy link
Contributor

Tolriq commented Feb 12, 2016

Will test again but not available before a few hours :(

@Tolriq
Copy link
Contributor

Tolriq commented Feb 12, 2016

Ok so tested, all 22079 songs are correctly returned with correct data.

23s from Yatse, but since 902 more songs timing is unchanged and still better than 15.2.

@DaveTBlake
Copy link
Member Author

Thanks @Tolriq that is a satisfying result. I'm going to hit merge then as it is what we want in 16.0

DaveTBlake added a commit that referenced this pull request Feb 12, 2016
Jarvis restore AudioLibrary.GetSongs speed in SQLlite work around Left Join query optimiser bug
@DaveTBlake DaveTBlake merged commit b1d401f into xbmc:Jarvis Feb 12, 2016
@DaveTBlake DaveTBlake deleted the JarvisLeftJoinGetSongsFix branch March 13, 2016 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Music Type: Fix non-breaking change which fixes an issue v16 Jarvis
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants