New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[db] speedup of ResultQueries using string key reuse #7329

Merged
merged 1 commit into from Jul 4, 2015

Conversation

Projects
None yet
@ghost

ghost commented Jun 22, 2015

Reading EPG's from DB improved. On my odroid XU3 its now below the expected 1 second.
DB queries are slow, so instead reading epg's and afterwards retrieving tags for each epg I read all the stuff once.
Negative side effect: loading is to fast for the Progessbar - nothing is readable anymore.
Maybe displaying each channel while loading could be eliminated.

@hudokkow

This comment has been minimized.

Member

hudokkow commented Jun 22, 2015

Negative side effect: loading is to fast for the Progessbar

It's already very AM here and I stopped reading at that very, very, very long sql query... If that's the only side effect, by all means, nuke it! May it disappear in a blaze of fire.

@MilhouseVH

View changes

xbmc/epg/EpgDatabase.cpp Outdated
epgtags.iYear, epgtags.sIMDBNumber, epgtags.iGenreType, epgtags.iGenreSubType, epgtags.sGenre, \
epgtags.iParentalRating, epgtags.iStarRating, epgtags.bNotify, epgtags.iEpisodeId, epgtags.iEpisodePart, \
epgtags.sEpisodeName, epgtags.iSeriesId, epgtags.sIconPath \
FROM epg left join epgtags on (epg.idEpg = epgtags.idEpg) order by epg.idEpg;");

This comment has been minimized.

@MilhouseVH

MilhouseVH Jun 22, 2015

Contributor

FROM epg LEFT JOIN epgtags ON (epg.idEpg = epgtags.idEpg) ORDER BY epg.idEpg");

No need for the extra spaces/indenting at the beginning of line 188.

Also, the semi-colon isn't required in PrepareSQL().

This comment has been minimized.

@xhaggi

xhaggi Jun 23, 2015

Member

i would prefer that you use quotes at the end instead of \
And if you want to fetch all columns of epgtags use epgtags.* in the select list.

"SELECT epg.idEpg, epg.sName, epg.sScraperName, epgtags.* "
"FROM epg "
"LEFT JOIN epgtags ON epg.idEpg = epgtags.idEpg "
"ORDER BY epg.idEpg"

EDIT: and please use upper-case for SQL keywords.

@MilhouseVH

View changes

xbmc/epg/EpgDatabase.cpp Outdated
epgtags.iYear, epgtags.sIMDBNumber, epgtags.iGenreType, epgtags.iGenreSubType, epgtags.sGenre, \
epgtags.iParentalRating, epgtags.iStarRating, epgtags.bNotify, epgtags.iEpisodeId, epgtags.iEpisodePart, \
epgtags.sEpisodeName, epgtags.iSeriesId, epgtags.sIconPath \
FROM epgtags WHERE idEpg = %u;", epg.EpgID());

This comment has been minimized.

@MilhouseVH

MilhouseVH Jun 22, 2015

Contributor

Table name alias is unnecessary for this query (as is the trailing semi-colon).

This comment has been minimized.

@ghost

ghost Jun 23, 2015

correct! Will change it if you really expect to use it....

This comment has been minimized.

@MilhouseVH

MilhouseVH Jun 23, 2015

Contributor

As I'm not qualified to comment on what you are actually doing I'm merely making you aware of the style issues with your new code as this can be a barrier to merging (so better to fix than not).

However if there's no push back on this throughout today I may include it in my next RPi test build later this evening, so it would be nice if you could squash your updates when you get the chance.

This comment has been minimized.

@xhaggi

xhaggi Jun 23, 2015

Member

same as above.

@MilhouseVH

View changes

xbmc/epg/EpgDatabase.cpp Outdated
FROM epgtags WHERE idEpg = %u;", epg.EpgID());
if (!ResultQuery(strQuery))
return -1;

This comment has been minimized.

@MilhouseVH

MilhouseVH Jun 22, 2015

Contributor

Incorrect indenting for the return.

@ghost

This comment has been minimized.

ghost commented Jun 23, 2015

Question is, If you want speed or beauty. result["string] cost a lot of time because the string must be searched every time. On Epg it is (in my case) 5000 rows * 30 fields = 150.000 Searches for string in a list.
B.t.w: I don't like the look either, but it is fast and this is what counts for me. No user will ever know whatis done internally.

[Edit] I have noticed that there is something not OK, not all Epgs are displayed - I've to look over it, sorry for the early PR. But it was mainly posted to get a feeling if you would acept such a change....

@MilhouseVH

View changes

xbmc/epg/EpgDatabase.cpp Outdated
newTag->m_iEpisodePart = m_pDS->fv(25).get_asInt();
newTag->m_strEpisodeName = m_pDS->fv(26).get_asString().c_str();
newTag->m_iSeriesNumber = m_pDS->fv(27).get_asInt();
newTag->m_strIconPath = m_pDS->fv(28).get_asString().c_str();

This comment has been minimized.

@MilhouseVH

MilhouseVH Jun 23, 2015

Contributor

What's the advantage of using the column index rather than column name - performance? I'd have thought you should be using the column name whenever possible (and if a value in the result set doesn't have a column name you can give it one, ie. 0 as your_column_name)

This comment has been minimized.

@ghost

ghost Jun 23, 2015

Sure, performance. If you have a query with 30 columns and request the result with a string, sqlite has to find the matching row index from string. This is mostly done with binary search and a kind of fast, but never as fast as using index directly.

Nicer approach: I'll define the index values once in the header file of the epgtags table, then make a epgtags.* and use for retrieval a named index value. Its always kind of risky because I'm not sure if the tables are al consistent. Should be must musten't.
The way with the long query looks really ugly, I agree, but it is safe regarding positions.

@xhaggi

View changes

xbmc/epg/EpgDatabase.cpp Outdated
CEpg epg;
/* Get number of Epgs for Progressbar */
std::string strQuery = PrepareSQL("SELECT count(*) FROM epg;");

This comment has been minimized.

@xhaggi

xhaggi Jun 23, 2015

Member

COUNT(*)

@xhaggi

View changes

xbmc/epg/EpgDatabase.cpp Outdated
newTag->m_iEpisodePart = m_pDS->fv("iEpisodePart").get_asInt();
newTag->m_strEpisodeName = m_pDS->fv("sEpisodeName").get_asString().c_str();
newTag->m_iSeriesNumber = m_pDS->fv("iSeriesId").get_asInt();
newTag->m_strIconPath = m_pDS->fv("sIconPath").get_asString().c_str();

This comment has been minimized.

@xhaggi

xhaggi Jun 23, 2015

Member

please revert these changes, we prefer getting the column by name not by index.

@xhaggi

View changes

xbmc/epg/EpgContainer.cpp Outdated
lock.Leave();
it->second->Load();
lock.Enter();
it->second->UpdateLoadState();

This comment has been minimized.

@xhaggi

xhaggi Jun 23, 2015

Member

wrong indentation

@xhaggi

View changes

xbmc/epg/EpgContainer.cpp Outdated
@@ -379,10 +374,11 @@ void CEpgContainer::InsertFromDatabase(int iEpgID, const std::string &strName, c
else
{
// create a new epg table
epg = new CEpg(iEpgID, strName, strScraperName, true);
epg = new CEpg(-1);

This comment has been minimized.

@xhaggi

xhaggi Jun 23, 2015

Member

why do we need to create a new instance here if we set epg to epgdb below.
dropping the this line and the check below should be fine.

This comment has been minimized.

@ghost

ghost Jun 23, 2015

epgdb is a local variable in EpgDatabase::GetAll(), I dont call new() there because I don't know if its really inserted or only updated.

@xhaggi

View changes

xbmc/epg/EpgDatabase.cpp Outdated
epg.SetEpgID(m_pDS->fv(0).get_asInt());
epg.SetName(m_pDS->fv(1).get_asString());
epg.SetScraperName(m_pDS->fv(2).get_asString());
container.UpdateProgressDialog(iReturn,nEpgMax,epg.Name());

This comment has been minimized.

@xhaggi

xhaggi Jun 23, 2015

Member

missing spaces between parameters.
please read our coding guideline http://kodi.wiki/view/Code_guide-lines_and_formatting_conventions

@xhaggi

View changes

xbmc/epg/EpgDatabase.cpp Outdated
@@ -181,13 +199,14 @@ int CEpgDatabase::Get(CEpgContainer &container)
{
while (!m_pDS->eof())
{
int iEpgID = m_pDS->fv("idEpg").get_asInt();
std::string strName = m_pDS->fv("sName").get_asString().c_str();
std::string strScraperName = m_pDS->fv("sScraperName").get_asString().c_str();

This comment has been minimized.

@xhaggi

xhaggi Jun 23, 2015

Member

please revert these changes, this isn't really performance critical if we look-up the columns by name

This comment has been minimized.

@ghost

ghost Jun 23, 2015

I'll track the time difference between indices and strings and let you know....

[Edit] Forgot to sort the tags by iStartTime, now epg's are displayed correctly...

@xhaggi xhaggi added the Improvement label Jun 23, 2015

@ghost

This comment has been minimized.

ghost commented Jun 23, 2015

11:45:51 T:2723148784 NOTICE: GetAll - time elapsed reading with indices: 3478 us
11:45:52 T:2723148784 NOTICE: GetAll2 - time elapsed reading with names: 74623 us

In my business thats a lot, for Kodi it may not make any difference. I'l revert indices to names.

@ghost

This comment has been minimized.

ghost commented Jun 23, 2015

Oh, my epg database was empty when testing :-(
Reality is: 0.3 sek with indices, 4 seconds with strings.

Are you really sure that you want to keep the stings?
Again: DB access is slower then retrieving Epg from client, I could not live with that because of beauty reaons....

[Edit] My next solution would be reverting anything above and use only Indices in epgtags.
The JOIN query does not save that much time, its simply and only the index / string question here.

@Memphiz

This comment has been minimized.

Member

Memphiz commented Jun 23, 2015

Speedup > 8 warrants indices imo. In case we use those in multiple places a #define or enum for them would be fine maybe. Just my 2 cents - i leave it up to the database guys of course.

@ghost

This comment has been minimized.

ghost commented Jun 23, 2015

No, its not the select / retrieval of the datasets, they are cheap.

But: I have 5000 epg Lines (other would have more as I already reduced the limit).
Each epgtag row has about 30 fields, these are 150.000 field accesses.

Accessing 150.000 field by string means that 150.000 times a list of 30 colum names must be searched to find the physical index.

@xhaggi

This comment has been minimized.

Member

xhaggi commented Jun 23, 2015

Accessing 150.000 field by string means that 150.000 times a list of 30 colum names must be searched to find the physical index.

should we cache the indexes we already determined instead? This would speed-up things on all places where we use field name access.

@ghost

This comment has been minimized.

ghost commented Jun 23, 2015

Question is how your plans are regarding backwards compatibility. I'm not sure about this.

A common approach is to keep track that the table definition is as we expect (you already have version table for it)

  • Then in Header File
    enum
    {
    EPGTAG_COL_IDEPG
    ....
    }

and in CPP just access with field[EPGTAG_COL_IDEPG].

This is the way I usally access big databases, but from development side it must be make sure that the table fullfills all the index convention in .h file. Maybe this is already done - haven't looked for this.

@xhaggi Do I understand right:

  • SHOW COLUMNS
  • Map Colum Names to Idx once
  • Acess DB Table with IDX?

If so, sure, its an approach wich could be used everywhere......

@xhaggi

This comment has been minimized.

Member

xhaggi commented Jun 23, 2015

@ghost

This comment has been minimized.

ghost commented Jun 23, 2015

Good idea... I'll give it a try this evening if nobody else is ideological against something like this.

@xhaggi

This comment has been minimized.

Member

xhaggi commented Jun 23, 2015

👍 cool, thanks

@Jalle19

This comment has been minimized.

Member

Jalle19 commented Jun 23, 2015

Caching the string -> index lookups somewhere in a map sounds like a reasonable thing to do if looking up columns by their name is really that slow compared to using their indexes.

@ghost

This comment has been minimized.

ghost commented Jun 23, 2015

No, we cannot use a map :-( Then we would do the same sqlite currently do internally on kodi side.
std::map is nothing magic, if you request a value for a key it does the same binary search for the key.

Means: 150.000 std::map searches are the same as 150.000 sqlite field accesses using strings.

@xhaggi

This comment has been minimized.

Member

xhaggi commented Jun 23, 2015

if you look at the implementation details you will find two compare operations which do a upper case transformation etc. how could this be the same as a map find?

@xhaggi

This comment has been minimized.

Member

xhaggi commented Jun 23, 2015

btw it doesn't mean that it will be significant faster but you can try it :)

@ghost

This comment has been minimized.

ghost commented Jun 23, 2015

With same I mean technical similar: Strings must be searched in any way. This is what cost time.
Uppercase translation couold be done easily with a lookup - cost nearly nothing.

My short answers should not sound complacent, if I'm at work or have the family around me my time is limited, sorry if one things in that way....

[Edit] I'l first change your (@xhaggi)'s place in dbwrappers (https://github.com/xbmc/xbmc/blob/master/xbmc/dbwrappers/dataset.cpp#L325) to use a binary search instead the linear search. Less development and could be already enough....

@ghost ghost changed the title from Read all Epg entries once instead only tags for each channel to [db] optional speedup of ResultQueries using string key reuse Jun 25, 2015

@ghost

This comment has been minimized.

ghost commented Jun 25, 2015

OK, this is an first approach pushed for discussion.

Idea is: In general you make a query wich returns a large number of Resultsets. Then you loop over the resultsets and access the fields always in the same order.

This order is now reflected in the new guess_Fields vector list wich holds simply the string and the db column index. Every time a field is accessed using strings we look at the next entry in the vector and if the string matches (in epg case its 100% true except for first row wich will be collected) we simply return the field index wich was collected using the first row.

Using this guess_xxx feature can be enabled / disabled in the call of ResultQuery().
I have enabled it for first only for Epg queries.

Pls. look over it

[Edit] If the guess doesn't hit, we search the string using a sorted list (similar to std::map) to get the maximum performance.
Speed is not same goot as defining a LEFT JOIN Epg Query as done earlier, but this was expected.
Nevertheless the performance gain is very good visible.

@opdenkamp

This comment has been minimized.

Member

opdenkamp commented Jun 25, 2015

Haven't got time to check the actual PR, but just noticed this in my mail:

No, we cannot use a map :-( Then we would do the same sqlite currently do internally on kodi side.
std::map is nothing magic, if you request a value for a key it does the same binary search for the key.

std::map uses a red/black tree internally, meaning O(log(n)) for searches, not a simple binary search ;)

@ghost

This comment has been minimized.

ghost commented Jun 25, 2015

agree, but I'll revert first for reviewing

@xhaggi

This comment has been minimized.

Member

xhaggi commented Jun 25, 2015

thanks 👍

@ghost

This comment has been minimized.

ghost commented Jun 30, 2015

@xhaggi are there other points than renaming? If not I could finalize it this evening and we are done?

@xhaggi

This comment has been minimized.

Member

xhaggi commented Jun 30, 2015

i don't think so

@ghost

This comment has been minimized.

ghost commented Jun 30, 2015

what's missing? The performance benefit of this here should not be thrown away, or?

@xhaggi

This comment has been minimized.

Member

xhaggi commented Jun 30, 2015

I meant no. Nothing except the renaming

@ghost

This comment has been minimized.

ghost commented Jun 30, 2015

OK; then I'll give the things other names - we're still below 100 comments :-)

@ghost

This comment has been minimized.

ghost commented Jun 30, 2015

done

@hudokkow

This comment has been minimized.

Member

hudokkow commented Jul 1, 2015

jenkins build this please

@xhaggi

View changes

xbmc/dbwrappers/dataset.h Outdated
always the same field order.
We first look into this list and if we don't get a match we use the
slower but more flexible field_value method
For the case the retrieval is against our assumption, guessed_sorter is

This comment has been minimized.

@xhaggi

xhaggi Jul 1, 2015

Member

Could you please adjust the comment guessed_sorter and IMO we should move it over the member declaration. Sorry that I dont found it a bit earlier

@xhaggi

View changes

xbmc/dbwrappers/dataset.cpp Outdated
unsigned int next(indexMapID+1 >= indexMap_Entries.size()?0:indexMapID+1);
if (indexMap_Entries[next].strName == f_name) //Yes, our assumption hits.
{
indexMapID=next;

This comment has been minimized.

@xhaggi

xhaggi Jul 1, 2015

Member

spaces between var = value

@xhaggi

View changes

xbmc/dbwrappers/dataset.cpp Outdated
{
if (~indexMapID)
{
unsigned int next(indexMapID+1 >= indexMap_Entries.size()?0:indexMapID+1);

This comment has been minimized.

@xhaggi

xhaggi Jul 1, 2015

Member

Could you please add spaces here too x ? y : z

@xhaggi

View changes

xbmc/dbwrappers/dataset.cpp Outdated
//Insert the new item just behind last retrieved item
//In general this should be always end(), but could be different
indexMap_Sorter.insert(ins, ++indexMapID);
indexMap_Entries.insert(indexMap_Entries.begin()+indexMapID,tmp);

This comment has been minimized.

@xhaggi

xhaggi Jul 1, 2015

Member

spaces please, between x + y and after , before parameter

@xhaggi

View changes

xbmc/dbwrappers/dataset.h Outdated
unsigned int indexMapID;
struct IndexMapEntry //Struct to store a indexMapped field access entry

This comment has been minimized.

@xhaggi

xhaggi Jul 1, 2015

Member

sorry to nit-pick here, but we should add "field" to the variable names. IndexMap is a bit too generic IMO. FieldIndexMapEntry FieldIndexMapComparator etc.

@xhaggi

View changes

xbmc/dbwrappers/dataset.h Outdated
{
return c[v1] < c[v2];
};
bool operator()(const IndexMapEntry &o,const unsigned int &v)const

This comment has been minimized.

@xhaggi

xhaggi Jul 1, 2015

Member

a space before const here and above thx

@xhaggi

This comment has been minimized.

Member

xhaggi commented Jul 1, 2015

@mapfau sorry for this nitpicking, hope you could address all and we are ready 😄

@ghost ghost changed the title from [db] optional speedup of ResultQueries using string key reuse to [db] speedup of ResultQueries using string key reuse Jul 2, 2015

@ghost

This comment has been minimized.

ghost commented Jul 2, 2015

done

@xhaggi

This comment has been minimized.

Member

xhaggi commented Jul 2, 2015

thanks 👍

@xhaggi

View changes

xbmc/dbwrappers/dataset.cpp Outdated
//Lets try to reuse a string ->index conversation
if (get_index_map_entry(f_name))
return get_field_value(static_cast<int>(fieldIndexMap_Entries[fieldIndexMapID].fieldIndex));
const char* name=strstr(f_name, ".");
if (name) name++;
if (ds_state != dsInactive) {

This comment has been minimized.

@xhaggi

xhaggi Jul 2, 2015

Member

we should move the name declaration code inside this.

if (ds_state != dsInactive) {
  const char* name = strstr(f_name, ".");
  if (name) 
    name++;

and the get_index_map_entry stuff inside the else of if (ds_state == dsEdit || ds_state == dsInsert){
We could remove the check for ds_state == dsEdit || ds_state == dsInsert in get_index_map_entry() to not doing it twice.

This comment has been minimized.

@xhaggi

xhaggi Jul 2, 2015

Member

or you also need to check for ds_state != dsInactive in get_index_map_entry. but we should go the other way around ;)

@ghost

This comment has been minimized.

ghost commented Jul 2, 2015

like this?

@xhaggi

This comment has been minimized.

Member

xhaggi commented Jul 2, 2015

yep, don't saw that name is only used in the else case. Thanks.

@MartijnKaijser

This comment has been minimized.

Member

MartijnKaijser commented Jul 4, 2015

jenkins build this please

any other objections?

@xhaggi

This comment has been minimized.

Member

xhaggi commented Jul 4, 2015

no, I'm fine with it

@MartijnKaijser MartijnKaijser added this to the Isengard 16.0-alpha1 milestone Jul 4, 2015

mkortstiege added a commit that referenced this pull request Jul 4, 2015

Merge pull request #7329 from mapfau/xbmc_improvements
[db] speedup of ResultQueries using string key reuse

@mkortstiege mkortstiege merged commit 4b79e39 into xbmc:master Jul 4, 2015

1 check passed

default Merged build finished.
Details

@ghost ghost deleted the peak3d:xbmc_improvements branch Mar 29, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment