Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Picture Library #1562

Closed
wants to merge 4 commits into from

6 participants

@garbear
Collaborator

Because I needed a better way to show embarrassing college photos to some friends

Scope: picture database, VFS, GUI browsing, picture scanning

NOT FOR FRODO

@Montellese
Owner

I haven't looked at your code yet but based on what you explained on IRC a while ago I have a question. I'm currently working on an approach to improve artwork loading from the database for video and music items by using JSON to avoid additiona queries because of the one-to-many relationship between media items and artwork. I ran some tests to compare the performance of the current implementation with my new approach and saw some performance penalties in the JSON approach (which basically means retrieving an additional db field and running it's value through CJSONVariantParser::Parse()). For 560+ movies the time spent in CVideoDatabase::GetMoviesByWhere went from ~106ms to ~135ms, that's +27%. For 1650+ episodes the time spent in CVideoDatabase::GetEpisodesByWhere went from ~315ms to ~375ms, that's +19%.

Did you do any tests on the impact of using JSON? I know that it's a bit hard because your whole approach is based on the JSON object that is passed to the database.

@garbear
Collaborator

Interesting, I wasn't expecting the json parser to be so heavy. I'm storing the entire picture info tag (exif AND iptc) so i imagine I'll be seeing some performance hits as well. I'll do some tests later and get back to you

@garbear
Collaborator

Testing 1,000 pics from 16 different cameras under four scenarios, 1) Ignore picture info, where all pictures IDs are queried but no picture info is decoded. Sets the baseline for the next tests. 2) Decode from the Database, where all available data is retrieved from a completely normalized database. Skips JSON decoding and tests the speed of LEFT JOINs on six 1:N relationships. 3) Parse JSON like the PR currently does. 4) Decode info tags from files. This is kind of creative - don't store anything in the DB except file name and path, and re-load info tags from the files. Surprisingly, on a SSD, this is way faster than JSON parsing.

SQLite  (± sigma)                 MySQL
Ignore:      82ms ±   11ms        Ignore:      83ms ±    4ms
Database:  1339ms ±   83ms        Database:  2043ms ±   75ms
JSON:      7451ms ± 2134ms        JSON:      6017ms ±  591ms
Decode:    2595ms ±  349ms        Decode:    2476ms ±  466ms

Results: Picture info tags have ~65 fields, most of them useless. The Database test was on a database completely normalized over all fields, as an upper bound on performance. If fully initializing the picture info tag isn't important (maybe the complete info tag can be loaded from the file on demand), the end result is a number between the baseline and the upper bound.

I think my database can adapt to this strategy. Completely normalizing the db was done without touching any code, save for the declarations of the picture-field relationships. The JSON parsing could be avoided by having the picture database instantiate the FileItem from the dataset instead of JSON, which I do in the test. From the numbers it seems like this is the best way to go.

@garbear
Collaborator

Digging deeper into the JSON test, parsing each JSON string is 3.8ms ± 1.3ms. I'm going to try pulling in BSON as a dependency to accelerate the deserialization stage.

@davilla

ouch, wth is it doing during that ~3.8ms ? Is JSON trying to out suck SoftAE for the tile of top cpu-vampire ?

@Montellese
Owner

When I did my tests I wasn't able to get proper values for the JSON (de)serialization because they were always below 1ms and I didn't have micro- or nanosecond resolution available. Maybe there's just something wrong with our JSON deserialization code or maybe yajl (our JSON library) is rather slow.

@garbear
Collaborator

I'm just glad my ghetto AMD processor was finally good for something - making quick calcs take more than 1ms :)

I swapped JSON for BSON and ran some more tests. The BSON decoding inolves Base64 decoding as well because our database wrappers don't support BLOBs.

BSON
Ignore:    99ms ±    4ms
JSON:    6753ms ± 1220ms
BSON:    1324ms ±  103ms

Results: BSON is an 80% reduction in speed versus JSON. It is even faster than directly querying the normalized database. Swapping YAJL with a different json parser might improve parsing times, though it would require a 5-fold increase in speed to compete with BSON.

garbear added some commits
@garbear garbear BSON: Add latest snapshot of BSON C implementation from https://githu…
…b.com/jsbattig/mongo-c-driver (0cf8eb696 on 2012-10-04)

mongodb/mongo-c-driver is the official repo. jsbattig/mongo-c-driver is branched from mongodb/mongo-c-driver and contains bug fixes and VS project files.
b298af5
@garbear garbear Picture database built on dynamic database abstraction
Backend is a denormalized database with dynamic normalization for the purpose of achieving optimal read queries. Objects are stored in a central table with key/value pairs. When 1:1, 1:N and N:N relationships are declared, data is drawn from the key/value pairs to fill the newly created tables. Data is kept in sync across adds and deletes, as well as when new relations are added/dropped.
e016264
@garbear garbear Picture database VFS - picturedb://
URI looks like picturedb://tag/1/5, where 1 is the tag ID and 5 is the folder ID
b0680da
@garbear garbear Picture library caboodle - GUI browsing and picture scanning
This commit ties the picture database and picturedb VFS together, and adds GUI browsing and the ability to scan pictures into the new picture library.
c2e392c
@garbear
Collaborator

I updated the PR, database abstraction and test cases now use BSON

@topfs2
Collaborator

Just wanted to mention sparql and librdf in perticular. Its a metadata store which is able to reside ontop of sqlite, mysql, postgres and even standardized xml tripplets. Would this be another option to use instead of the metadata store you made, so that we don't have to maintain it?

http://librdf.org/

@garbear
Collaborator

BSON is fast, and I'm guessing that migrating to RDF would take a performance hit. That said, from my understanding, we would be gaining access to more complex object relationships, subject and object URIs, and external maintainability. Having native support for subject and object URIs will be welcome, if not critical, when Heimdall is ready [hint hint ;) ].

What can I do to help us decide on the best path? Any ideas, @topfs2?

@bulkzooi

@garbear: I have to ask: is this work abandoned?

@t-nelson
@garbear
Collaborator

yes this work is abandoned

@garbear garbear closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Oct 17, 2012
  1. @garbear

    BSON: Add latest snapshot of BSON C implementation from https://githu…

    garbear authored
    …b.com/jsbattig/mongo-c-driver (0cf8eb696 on 2012-10-04)
    
    mongodb/mongo-c-driver is the official repo. jsbattig/mongo-c-driver is branched from mongodb/mongo-c-driver and contains bug fixes and VS project files.
  2. @garbear

    Picture database built on dynamic database abstraction

    garbear authored
    Backend is a denormalized database with dynamic normalization for the purpose of achieving optimal read queries. Objects are stored in a central table with key/value pairs. When 1:1, 1:N and N:N relationships are declared, data is drawn from the key/value pairs to fill the newly created tables. Data is kept in sync across adds and deletes, as well as when new relations are added/dropped.
  3. @garbear

    Picture database VFS - picturedb://

    garbear authored
    URI looks like picturedb://tag/1/5, where 1 is the tag ID and 5 is the folder ID
  4. @garbear

    Picture library caboodle - GUI browsing and picture scanning

    garbear authored
    This commit ties the picture database and picturedb VFS together, and adds GUI browsing and the ability to scan pictures into the new picture library.
Something went wrong with that request. Please try again.