Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

appstream not working properly in Turkish language #264

Closed
queeup opened this issue Jan 4, 2020 · 10 comments
Closed

appstream not working properly in Turkish language #264

queeup opened this issue Jan 4, 2020 · 10 comments
Labels

Comments

@queeup
Copy link

queeup commented Jan 4, 2020

I am getting this error both elementary OS 5.1 & Kubuntu 19.10:
Error while loading the metadata pool: Unable to add data: MDB_BAD_VALSIZE: Unsupported size of key/DB name/data, or wrong DUPFIXED size

Because of this problem AppCenter (eOS) & Discover (Kubuntu) package managers not working properly. They are not showing all apps.

My comprehensive bug report is here: elementary/triage#6

@ximion ximion added the bug label Jan 4, 2020
@ximion
Copy link
Owner

ximion commented Jan 4, 2020

I can reproduce some kind of issue here... Not sure what exactly is causing it.
Thank you for reporting the bug, this is very helpful!

ximion added a commit that referenced this issue Jan 4, 2020
See #264 for the bug report. This does not fix the bug in question, but
prevents a hard fail in case we encounter such tokens.
@ximion
Copy link
Owner

ximion commented Jan 4, 2020

So, your issues should be fixed with this commit. However, a zero-length search token should never have been created in the first place, so I want to keep this report open until I had a bit of time to look into this and see what produced it (and add a test case for it).

@queeup
Copy link
Author

queeup commented Jan 4, 2020

Thank you for taking care of it. Can I try this fix without compiling appstream? Like a deb package maybe? Or can we (Turkish users) have this fix on our systems soon via package managers?

If is there any thing I can do please feel free to ask me.

@ximion
Copy link
Owner

ximion commented Jan 4, 2020

So, apparently the Snowball stemmer thinks that the stem for the Turkish token leri is an empty string.
I assume Snowball does a good thing here and pretty much helps filtering out low-quality search tokens, as "leri" seems to be no word of its own that a user would search for.
I extended AppStream to take that information into account properly, so this issue should be fixed now.

@ximion ximion closed this as completed in c7b2b58 Jan 4, 2020
@ximion
Copy link
Owner

ximion commented Jan 4, 2020

Thank you for taking care of it. Can I try this fix without compiling appstream? Like a deb package maybe? Or can we (Turkish users) have this fix on our systems soon via package managers?

You'll have to rebuild AppStream for this currently, I'm afraid. However, if everything goes well, I am intending to do another AppStream release in the following two weeks. That should then make it into Debian as a package very quickly, be in time for the upcoming Ubuntu LTS and can be built for older distribution releases as well (provided they have a recent enough GLib available).

@queeup
Copy link
Author

queeup commented Jan 4, 2020

leri is possessive suffix in Turkish language. Nobody wants to search that. I will keep my eyes on the new release. Thank you.

@queeup
Copy link
Author

queeup commented Jan 4, 2020

Please excuse my curiosity, where it came from this bug? It appeared few months ago. Come from a wrong translation of appstream or an translation of app?

@ximion
Copy link
Owner

ximion commented Jan 4, 2020

Curiosity is never a bad thing :-)
This is a plain AppStream bug. In order to save memory but still offer fast searches, AppStream creates a memory-mapped on-disk cache where it offloads data it would otherwise keep in memory. The backend for the cache, LMDB, does not accept zero-length keys for its values. In this case, a key is a search token. Tokens are generated from either explicit keywords for a software component, or by processing texts provided by the app. A token is stemmed, so a search for "cat" will also return results for apps that contain keywords like "cats", "catlike", or "catty".

To trigger this bug, Xfce's Turkish translators used "leri" in the description of the Parole app: ses CD'leri DVD'leri ve canlı. AppStream's tokenization algorithm correctly isolated "leri" as a potential token an passed it to our stemming algorithm, Snowball. Snowball stemmed this token down to a string of no length (an empty string). Subsequently, that token was registered with the AppStream component and passed down to the cache (during a refresh action, the cache gets updated) where it caused an issue with registering the component with the cache.
AppStream considers a problem with adding a component to the cache to be very severe (if it wouldn't, we may loose components and only notice later when looking at logs. Those issues may also have a cascading effect and cause other issues later) and therefore halts all caching.
Since the cache layout is used even when the cache is kept in memory, this effectively results in almost no components being loaded into AppStream's metadata pool, hence the software center thinks no apps are available (it should have received the error message from AppStream though).

@queeup
Copy link
Author

queeup commented Jan 4, 2020

Wonderful explanation. Thank you.

@safak45x
Copy link

safak45x commented Jan 4, 2020

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants