New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[db] speedup of ResultQueries using string key reuse #7329
Conversation
It's already very AM here and I stopped reading at that very, very, very long sql query... If that's the only side effect, by all means, nuke it! May it disappear in a blaze of fire. |
epgtags.iYear, epgtags.sIMDBNumber, epgtags.iGenreType, epgtags.iGenreSubType, epgtags.sGenre, \ | ||
epgtags.iParentalRating, epgtags.iStarRating, epgtags.bNotify, epgtags.iEpisodeId, epgtags.iEpisodePart, \ | ||
epgtags.sEpisodeName, epgtags.iSeriesId, epgtags.sIconPath \ | ||
FROM epg left join epgtags on (epg.idEpg = epgtags.idEpg) order by epg.idEpg;"); |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
Question is, If you want speed or beauty. result["string] cost a lot of time because the string must be searched every time. On Epg it is (in my case) 5000 rows * 30 fields = 150.000 Searches for string in a list. [Edit] I have noticed that there is something not OK, not all Epgs are displayed - I've to look over it, sorry for the early PR. But it was mainly posted to get a feeling if you would acept such a change.... |
newTag->m_iEpisodePart = m_pDS->fv(25).get_asInt(); | ||
newTag->m_strEpisodeName = m_pDS->fv(26).get_asString().c_str(); | ||
newTag->m_iSeriesNumber = m_pDS->fv(27).get_asInt(); | ||
newTag->m_strIconPath = m_pDS->fv(28).get_asString().c_str(); | ||
|
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
11:45:51 T:2723148784 NOTICE: GetAll - time elapsed reading with indices: 3478 us In my business thats a lot, for Kodi it may not make any difference. I'l revert indices to names. |
Oh, my epg database was empty when testing :-( Are you really sure that you want to keep the stings? [Edit] My next solution would be reverting anything above and use only Indices in epgtags. |
Speedup > 8 warrants indices imo. In case we use those in multiple places a #define or enum for them would be fine maybe. Just my 2 cents - i leave it up to the database guys of course. |
No, its not the select / retrieval of the datasets, they are cheap. But: I have 5000 epg Lines (other would have more as I already reduced the limit). Accessing 150.000 field by string means that 150.000 times a list of 30 colum names must be searched to find the physical index. |
should we cache the indexes we already determined instead? This would speed-up things on all places where we use field name access. |
Question is how your plans are regarding backwards compatibility. I'm not sure about this. A common approach is to keep track that the table definition is as we expect (you already have version table for it)
and in CPP just access with field[EPGTAG_COL_IDEPG]. This is the way I usally access big databases, but from development side it must be make sure that the table fullfills all the index convention in .h file. Maybe this is already done - haven't looked for this. @xhaggi Do I understand right:
If so, sure, its an approach wich could be used everywhere...... |
@mapfau yep, but it would be good to adjust the Dataset implementation instead of doing this only in CEpgDatabase. Take a look at https://github.com/xbmc/xbmc/blob/master/xbmc/dbwrappers/dataset.cpp#L325 and https://github.com/xbmc/xbmc/blob/master/xbmc/dbwrappers/sqlitedataset.cpp#L536 |
Good idea... I'll give it a try this evening if nobody else is ideological against something like this. |
👍 cool, thanks |
Caching the string -> index lookups somewhere in a map sounds like a reasonable thing to do if looking up columns by their name is really that slow compared to using their indexes. |
No, we cannot use a map :-( Then we would do the same sqlite currently do internally on kodi side. Means: 150.000 std::map searches are the same as 150.000 sqlite field accesses using strings. |
if you look at the implementation details you will find two compare operations which do a upper case transformation etc. how could this be the same as a map find? |
btw it doesn't mean that it will be significant faster but you can try it :) |
With same I mean technical similar: Strings must be searched in any way. This is what cost time. My short answers should not sound complacent, if I'm at work or have the family around me my time is limited, sorry if one things in that way.... [Edit] I'l first change your (@xhaggi)'s place in dbwrappers (https://github.com/xbmc/xbmc/blob/master/xbmc/dbwrappers/dataset.cpp#L325) to use a binary search instead the linear search. Less development and could be already enough.... |
OK, this is an first approach pushed for discussion. Idea is: In general you make a query wich returns a large number of Resultsets. Then you loop over the resultsets and access the fields always in the same order. This order is now reflected in the new guess_Fields vector list wich holds simply the string and the db column index. Every time a field is accessed using strings we look at the next entry in the vector and if the string matches (in epg case its 100% true except for first row wich will be collected) we simply return the field index wich was collected using the first row. Using this guess_xxx feature can be enabled / disabled in the call of ResultQuery(). Pls. look over it [Edit] If the guess doesn't hit, we search the string using a sorted list (similar to std::map) to get the maximum performance. |
Haven't got time to check the actual PR, but just noticed this in my mail:
std::map uses a red/black tree internally, meaning O(log(n)) for searches, not a simple binary search ;) |
agree, but I'll revert first for reviewing |
thanks 👍 |
@xhaggi are there other points than renaming? If not I could finalize it this evening and we are done? |
i don't think so |
what's missing? The performance benefit of this here should not be thrown away, or? |
I meant no. Nothing except the renaming |
OK; then I'll give the things other names - we're still below 100 comments :-) |
done |
jenkins build this please |
always the same field order. | ||
We first look into this list and if we don't get a match we use the | ||
slower but more flexible field_value method | ||
For the case the retrieval is against our assumption, guessed_sorter is |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
@mapfau sorry for this nitpicking, hope you could address all and we are ready 😄 |
done |
thanks 👍 |
//Lets try to reuse a string ->index conversation | ||
if (get_index_map_entry(f_name)) | ||
return get_field_value(static_cast<int>(fieldIndexMap_Entries[fieldIndexMapID].fieldIndex)); | ||
|
||
const char* name=strstr(f_name, "."); | ||
if (name) name++; | ||
if (ds_state != dsInactive) { |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
like this? |
yep, don't saw that |
jenkins build this please any other objections? |
no, I'm fine with it |
[db] speedup of ResultQueries using string key reuse
Reading EPG's from DB improved. On my odroid XU3 its now below the expected 1 second.
DB queries are slow, so instead reading epg's and afterwards retrieving tags for each epg I read all the stuff once.
Negative side effect: loading is to fast for the Progessbar - nothing is readable anymore.
Maybe displaying each channel while loading could be eliminated.