Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

media_baseurl is always prepended to screenshots urls #395

Closed
minlexx opened this issue Apr 16, 2022 · 31 comments
Closed

media_baseurl is always prepended to screenshots urls #395

minlexx opened this issue Apr 16, 2022 · 31 comments
Labels
compose Affects appstream-compose

Comments

@minlexx
Copy link

minlexx commented Apr 16, 2022

Symptom: screenshot URLs look like this (note double https://) (sreenshot from debugger session)
image

As we can see in src/as-image.c#L342 base url is appended unconditinally, resulting in screenshot image that cannot be downloaded and degraded user experience in software center application (plasma-discover in our case).

Metadata XML fragment for this looks like:

<screenshot type="default">
      <image type="source">https://cdn.kde.org/screenshots/ktorrent/ktorrent.png</image>
      <image type="source" width="766" height="541">org/kde/ktorrent.desktop/214435abe2b8d85089dc352adbfb4873/screenshots/image-1_orig.png</image>
      <image type="thumbnail" width="752" height="531">org/kde/ktorrent.desktop/214435abe2b8d85089dc352adbfb4873/screenshots/image-1_752x531.png</image>
      <image type="thumbnail" width="624" height="440">org/kde/ktorrent.desktop/214435abe2b8d85089dc352adbfb4873/screenshots/image-1_624x440.png</image>
      <image type="thumbnail" width="224" height="158">org/kde/ktorrent.desktop/214435abe2b8d85089dc352adbfb4873/screenshots/image-1_224x158.png</image>
</screenshot>

The issue can be reproduced by small example program that loads single metadata file:

test: Pool default flags:  "FlagLoadOsCollection | FlagLoadOsMetainfo | FlagLoadFlatpak"
test: Pool new flags:  "FlagLoadOsCollection"
test: Adding extra metadata path:  "/home/lexx/dev/test/build-test_appstream-System_Qt-Debug/metadata"
test: Loaded total  1  components
test: Found ktorrent  1  times
test:    "org.kde.ktorrent.desktop"
test:      "KindSource" ; url:  "https://appstream.alpinelinux.org/media/edge/https://cdn.kde.org/screenshots/ktorrent/ktorrent.png"
test:      "KindSource" ; url:  "https://appstream.alpinelinux.org/media/edge/org/kde/ktorrent.desktop/2250ef363428df697ba6e87d1d66e821/screenshots/image-1_orig.png"
test:      "KindThumbnail" ; url:  "https://appstream.alpinelinux.org/media/edge/org/kde/ktorrent.desktop/2250ef363428df697ba6e87d1d66e821/screenshots/image-1_752x531.png"
test:      "KindThumbnail" ; url:  "https://appstream.alpinelinux.org/media/edge/org/kde/ktorrent.desktop/2250ef363428df697ba6e87d1d66e821/screenshots/image-1_624x440.png"
test:      "KindThumbnail" ; url:  "https://appstream.alpinelinux.org/media/edge/org/kde/ktorrent.desktop/2250ef363428df697ba6e87d1d66e821/screenshots/image-1_224x158.png"

As you can see also from test example, first screenshot URL has extra https:// added.

Suggested fix is by checking for absolute URL with g_str_has_prefix (or whatever the correct way in glib is, I'm Qt user, don't know much about glib). In case if screenshot URL is already starting with https://, do not append media_baseurl.

P.S. downstream issue for reference: https://gitlab.alpinelinux.org/alpine/infra/compose/appstream-generator/-/issues/2

@minlexx
Copy link
Author

minlexx commented Apr 16, 2022

I have a dumb fix that works, but I'm not sure this is absolutely correct way to handle this situation

diff --git a/src/as-image.c b/src/as-image.c
index 012d243b..ee9fe9db 100644
--- a/src/as-image.c
+++ b/src/as-image.c
@@ -266,6 +266,11 @@ as_image_set_locale (AsImage *image, const gchar *locale)
        as_ref_string_assign_safe (&priv->locale, locale);
 }
 
+static gboolean is_url_absolute (const gchar *url)
+{
+       return g_str_has_prefix (url, "https://") || g_str_has_prefix (url, "http://");
+}
+
 /**
  * as_image_load_from_xml:
  * @image: a #AsImage instance.
@@ -333,8 +338,8 @@ as_image_load_from_xml (AsImage *image, AsContext *ctx, xmlNode *node, GError **
        }
 
        g_strstrip (content);
-       if (!as_context_has_media_baseurl (ctx)) {
-               /* no baseurl, we can just set the value as URL */
+       if (!as_context_has_media_baseurl (ctx) || is_url_absolute (content)) {
+               /* no baseurl or URL is already absolute path, we can just set the value as URL */
                as_image_set_url (image, content);
        } else {
                /* handle the media baseurl */

After applying this I see correct output from my test program

test: Pool default flags:  "FlagLoadOsCollection | FlagLoadOsMetainfo | FlagLoadFlatpak"
test: Pool new flags:  "FlagLoadOsCollection"
test: Adding extra metadata path:  "/home/lexx/dev/test/build-test_appstream-System_Qt-Debug/metadata"
test: Loaded total  1  components
test: Found ktorrent  1  times
test:    "org.kde.ktorrent.desktop"
test:      "KindSource" ; url:  "https://cdn.kde.org/screenshots/ktorrent/ktorrent.png"
test:      "KindSource" ; url:  "https://appstream.alpinelinux.org/media/edge/org/kde/ktorrent.desktop/2250ef363428df697ba6e87d1d66e821/screenshots/image-1_orig.png"
test:      "KindThumbnail" ; url:  "https://appstream.alpinelinux.org/media/edge/org/kde/ktorrent.desktop/2250ef363428df697ba6e87d1d66e821/screenshots/image-1_752x531.png"
test:      "KindThumbnail" ; url:  "https://appstream.alpinelinux.org/media/edge/org/kde/ktorrent.desktop/2250ef363428df697ba6e87d1d66e821/screenshots/image-1_624x440.png"
test:      "KindThumbnail" ; url:  "https://appstream.alpinelinux.org/media/edge/org/kde/ktorrent.desktop/2250ef363428df697ba6e87d1d66e821/screenshots/image-1_224x158.png"

P.S need to not forget to remove ~/.cache/appstream to see the effect

@ximion
Copy link
Owner

ximion commented Apr 16, 2022

The AppStream behavior is actually correct: If there is a media baseurl, it must be prepended to every media URL segment in a metadata file (great performance improvement when processing thousands of components).
The real bug here is the <image type="source">https://cdn.kde.org/screenshots/ktorrent/ktorrent.png</image> entry in a file that has a media_baseurl set - that entry should not be there, so unless it was inserted by other means, there is a bug in the code that generated this file, most likely libappstream-compose or appstream-generator.

@minlexx
Copy link
Author

minlexx commented Apr 17, 2022

The data is taken without modifications from what was generated by appstream-generator (with alpinelinux backend).

@ximion
Copy link
Owner

ximion commented Apr 17, 2022

Yeah, that's wrong, there is a bug somewhere. What is your settings file for asgen that you used to generate this data?

@minlexx
Copy link
Author

minlexx commented Apr 18, 2022

It's generated on Alpine infra, using this I believe https://gitlab.alpinelinux.org/alpine/infra/compose/appstream-generator/-/blob/master/asgen-config.json

@ximion
Copy link
Owner

ximion commented Apr 21, 2022

I had a look at this, but still don't know how this could happen - the code shouldn't be able to take a branch where the media_baseurl is set and source image URLs from screenshots are taken without using the media_baseurl and the media_baseurl is correctly used for any generated thumbnails.
I also couldn't reproduce this yet, but I'll keep trying. Odd question, just on the offchance that you hit an old bug: Can you make sure the generator reprocesses the ktorrent package and doesn't just return cached data? Something like appstream-generator forget ktorrent (followed by a process --force) should do the trick!
I don't think this will change anything though, because I am not aware that any such bug happened and was fixed in the past... But output like this should definitely not have been generated.

@ximion ximion added the compose Affects appstream-compose label Apr 21, 2022
@minlexx
Copy link
Author

minlexx commented Apr 21, 2022

Thank you for looking into this!

I'll try to run generator locally, but I haven't tried it yet myself and I'm afraid it will take some time.

P.S. ktorrent is not the only package with this issue, many org.kde.* packages I looked at had similar screenshot URLs

@ximion
Copy link
Owner

ximion commented Apr 21, 2022

P.S. ktorrent is not the only package with this issue, many org.kde.* packages I looked at had similar screenshot URLs

Yeah, I bet! There's a bug here somewhere, and since I really don't think it has anything to do with Alpine, I should be able to reproduce it. I may have a bit more time the next few days to try this properly.
For completeness: Which version of AppStream and which version of appstream-generator are you using?

@pabloyoyoista
Copy link
Contributor

Which version of AppStream and which version of appstream-generator are you using?

I can confirm that this is happening with appstream-generator=0.8.8 and appstream=0.15.3. I "forgot" d-feet package in main, regenerated, and the issue is still there: https://appstream.pcorreag.tk/data/edge/main/Components-x86_64.xml.gz

@pabloyoyoista
Copy link
Contributor

Would you like us to try with the latest master commit?

@ximion
Copy link
Owner

ximion commented Apr 21, 2022

Would you like us to try with the latest master commit?

You can, but I don't think this will do anything - I just fixed another potential issue that nobody encountered yet that I found while reviewing the code.
Running appstream-generator with --verbose and attaching the log of a run run may give a clue though!

@pabloyoyoista
Copy link
Contributor

Running appstream-generator with --verbose and attaching the log of a run run may give a clue though!

Absolutely!
generator.log

I modified the config Alexei pointed to. I kept only "main" section and "aarch64" and "x86_64" architectures to reduce cruft from the output.

@ximion
Copy link
Owner

ximion commented Apr 24, 2022

No matter what I try, I can't reproduce this issue at all...
I used this config file:

{
"ProjectName": "AlpineLinux",
"ArchiveRoot": "http://dl-cdn.alpinelinux.org/alpine/",
"MediaBaseUrl": "https://appstream.alpinelinux.org/media",
"Backend": "alpinelinux",
"Suites":
  {
    "edge":
      {
        "sections": ["main"],
        "architectures": ["x86_64"]
      }
  },
  "Icons":
  {
    "48x48":   {"cached": false, "remote": true},
    "64x64":   {"cached": true, "remote": true},
    "128x128": {"cached": false, "remote": true}
  }
}

Using AppStream Generator 0.8.9, AS: 0.15.4 [Alpine Linux] on Debian as host.
I guess you are running this on Alpine, which uses musl(?) as libc, but that shouldn't actually cause any problems, especially not an odd one such as this one.
I guess you already did try removing database and cache entirely and trying this from scratch?

@ximion
Copy link
Owner

ximion commented Apr 24, 2022

You can try to take ximion/appstream-generator@cc4df09 (fixes a few other issues), but I don't think this will do anything...
Maybe I do need an Alpine chroot to reproduce this issue...

@pabloyoyoista
Copy link
Contributor

pabloyoyoista commented Apr 25, 2022

@ximion, I have tried to reproduce your problems... And it is indeed a problem with alpine. Either muslc, some part of the D stack or something in between is messing up. The same configuration with 0.8.8+0.15.3 from debian sid and from alpine edge yield different results, running both from scratch and the same configuration file. The run on debian sid outputs the component file you expect, while the one in alpine edge outputs what we are reporting. I guess unless you feel like spending more time on this, we will have to report to alpine and see how can we solve it. Likely, this is also the reason for ximion/appstream-generator#100

Thanks a lot for your time and your help! And sorry in the end the bug seems to be totally unrelated to the code you wrote, it feels like we are going to have a hard time figuring this one out :(

EDIT: I also tried compiling latest master from the generator as suggested. In alpine, I still get the same results.

@ximion
Copy link
Owner

ximion commented Apr 25, 2022

@pabloyoyoista Don't worry :-) A bug is a bug. Which D compiler and which C compiler does Alpine use? This is an extremely odd bug...

@pabloyoyoista
Copy link
Contributor

This is an extremely odd bug...

Yes, if at least it crashed...

Which D compiler and which C compiler does Alpine use?

We build appstream-generator with ldc because gdc failed the build. I saw similar issues in your CI, so I thought the compiler change was uncontroversial. From appstream-generator logs:

D compiler for the host machine: ldc2 (llvm 1.28.1 "LDC - the LLVM D compiler (1.28.1):")
D linker for the host machine: ldc2 ld.bfd 2.38

And from ldc logs:

-- The C compiler identification is GNU 11.2.1
-- The CXX compiler identification is GNU 11.2.1
....
-- Found host D compiler /usr/bin/gdmd, with default flags ''
-- Host D compiler ID: GDMD
-- Host D compiler version: gdc (Alpine 11.2.1_git20220117) 11.2.1 20220117
-- Host D compiler front-end version: 2076
-- LLVM_NATIVE_ARCH: AArch64
-- Found LLVM: /usr/lib/llvm12 (found suitable version "12.0.1", minimum required is "6.0") 

I understand these are actually the C and D default compilers.

Also, all of those link against musl :)

@ximion
Copy link
Owner

ximion commented Apr 25, 2022

If you also use GCC for the C code, then that's very uncontroversial, that would be what pretty much everyone uses - except for musl, of course.
I would expect a different libc to cause a wide variety of bugs, but not this one - the code that generates the AppStream XML output also is 100% C, so no D here.
Can you extract the d-feet archive into a directory and run something like appstreamcli compose --media-baseurl='https://example.org' --media-dir=/tmp/ascmedia --result-root=/tmp/ascresult /path/to/extracted/data and see what output you get in /tmp/ascresult?

@pabloyoyoista
Copy link
Contributor

Well, it actually looks like the problem is in the C code-path:

/cache # appstreamcli compose --media-baseurl='https://example.org' --media-dir=/tmp/ascmedia --result-root=/tmp/ascresult /cache/d-feet/
WARNING: Metadata origin not set, using 'example'
Processing directory: /cache/d-feet/
Composing metadata...
Success!

/cache # ls -lR /tmp/ascresult/
/tmp/ascresult/:
total 1
drwxr-xr-x    3 root     root             4 Apr 26 10:42 usr

/tmp/ascresult/usr:
total 1
drwxr-xr-x    3 root     root             4 Apr 26 10:42 share

/tmp/ascresult/usr/share:
total 1
drwxr-xr-x    4 root     root             5 Apr 26 10:43 swcatalog

/tmp/ascresult/usr/share/swcatalog:
total 2
drwxr-xr-x    3 root     root             4 Apr 26 10:42 icons
drwxr-xr-x    2 root     root             4 Apr 26 10:43 xml

/tmp/ascresult/usr/share/swcatalog/icons:
total 9
drwxr-xr-x    5 root     root             6 Apr 26 10:42 example

/tmp/ascresult/usr/share/swcatalog/icons/example:
total 3
drwxr-xr-x    2 root     root             4 Apr 26 10:42 128x128
drwxr-xr-x    2 root     root             4 Apr 26 10:42 48x48
drwxr-xr-x    2 root     root             4 Apr 26 10:42 64x64

/tmp/ascresult/usr/share/swcatalog/icons/example/128x128:
total 9
-rw-r--r--    1 root     root          5439 Apr 26 10:42 org.gnome.dfeet.desktop.png

/tmp/ascresult/usr/share/swcatalog/icons/example/48x48:
total 5
-rw-r--r--    1 root     root          2005 Apr 26 10:42 org.gnome.dfeet.desktop.png

/tmp/ascresult/usr/share/swcatalog/icons/example/64x64:
total 5
-rw-r--r--    1 root     root          2497 Apr 26 10:42 org.gnome.dfeet.desktop.png

/tmp/ascresult/usr/share/swcatalog/xml:
total 9
-rw-r--r--    1 root     root          5585 Apr 26 10:43 example.xml.gz

The example.xml.gz actually has the wrong screenshot entry listed.

example.xml.gz

@ximion
Copy link
Owner

ximion commented Apr 26, 2022

That's what I expected. Still not sure what the cause of this issue is, but this will simplify debugging greatly :-)

@pabloyoyoista
Copy link
Contributor

I've finally had the time and the energy to step by step dig into this with GDB, and it turns out the problem is not with musl, but with locales and maybe a bug in this project. What is currently happening in my alpine system with LANG=C.UTF-8 in the environment. Note that the container in which we are running the generator is quite minimal and does not even have a locale setup, as far as I know. I will follow my example using the suggested appstreamcli command and d-feet as an example, which is downloadable here and extractable with tar -xf d-feet-0.3.16-r1.apk.

  • First compose does the processing of the screenshots images in asc_process_screenshot_images:
    • Note, that the AsScreenshot already has one image coming from its metainfo.
    • That image is read and downloaded successfully to create the screenshot store and its correspondent local image is created and inserted into the images.
    • At this point, my intuition told me that there was something going wrong, since the previous original screenshot already existed in the list of images. However, an as_screenshot_get_images(scr) reported a GPtrArray of length one. Upon inspection of the AsScreenshot, I figured out that the original "remote" image, was only stored in priv->images, whereas the downloaded image existed both in priv->images and priv->images_lang.
    • The function continues and generates all the thumbnails. At this stage, priv->images_lang returned by as_screenshot_get_images(scr) has 5 images, reportedly the correct ones, whereas priv->images returned by as_screenshot_get_images_all(scr) has 6 images, reportedly the problem we are seeing in this issue.
  • After the processing, the screenshots do not seem to be modified, and one arrives at as_screenshot_to_xml_node
    • Such function iterates through priv->images here, giving out the 6 images we see in alpine.

A simple fix would be to use as_screenshot_get_images in as_screenshot_to_xml_node, but I am not sure which is the expected behavior or how the images_lang is supposed to work. @ximion, you say if this is the correct fix or there is something that we must change in the locales and retry. I have not tried to reproduce with my proposed fix, although the GDB output is pretty clear at this point :)

Thread 1 "appstreamcli-co" hit Breakpoint 10, as_screenshot_to_xml_node (screenshot=0x7ffff6a32680 [AsScreenshot], ctx=ctx@entry=0x7ffff6d59500 [AsContext], root=root@entry=0x7ffff62fc690) at ../src/as-screenshot.c:573
573     ../src/as-screenshot.c: No such file or directory.
(gdb) n
56      in ../src/as-screenshot.c
(gdb) n
573     in ../src/as-screenshot.c
(gdb) n
576     in ../src/as-screenshot.c
(gdb) n
577     in ../src/as-screenshot.c
(gdb) p priv->images
$72 = (GPtrArray *) 0x7ffff6d59b80
(gdb) p *priv->images
$73 = {pdata = 0x7ffff5fa5a70, len = 6}
(gdb) p *priv->images_lang
$74 = {pdata = 0x7ffff5f6e0a0, len = 5}

ximion added a commit that referenced this issue May 13, 2022
This helps #395, but the actual proper fix is to implement handling of
screenshot localization in compose properly, as the current code is
completely oblivious to that.
In addition to that change, language compatibility checks for the C
locales also don't seem to work as expected on Alpine, which needs
further investigation as well.
@ximion
Copy link
Owner

ximion commented May 13, 2022

Oh no! Thank you for the debugging effort! This is a direct cause of the localized-screenshots support having been bolted on at a later stage, with older code now doing the wrong thing... On its own that would be a bug, but there also appears to be a second issue here with AppStream not considering the "C" locale compatible with "C.UTF-8" on Alpine in as_utils_locale_is_compatible.
Not sure what that is about yet, but the compose code needs to be made aware of localized screenshots. I will work on that next week (I can't squeeze this in right now...).

In the meanwhile, you could maybe try the patch above, which, if I understand the problem correctly, should at least prevent the bad data generation for now (fixing the root cause will take a bit more effort though).

@pabloyoyoista
Copy link
Contributor

That certainly looks like it will certainly fix the issue. I'll test it this weekend locally, and if it works, push the fix to alpine.

there also appears to be a second issue here with AppStream not considering the "C" locale compatible with "C.UTF-8" on Alpine in as_utils_locale_is_compatible

Is this something you would need us to debug too? Some specific information on what is the expected behavior and what is that the localization is messing up?

@pabloyoyoista
Copy link
Contributor

And BTW, thanks a lot for the very fast fix!! :D

@pabloyoyoista
Copy link
Contributor

I can confirm the issue is gone locally!! Will proceed to push it to alpine and to test it in the generator container. Thank you very, very much for your help!!

@pabloyoyoista
Copy link
Contributor

I can confirm this is gone in the container after dropping the caches :) Feel free to close this if you consider it solved!

@minlexx
Copy link
Author

minlexx commented May 19, 2022

I can confirm this is gone in the container after dropping the caches :) Feel free to close this if you consider it solved!

Is there a new data generated somewhere already, to test?

@pabloyoyoista
Copy link
Contributor

Is there a new data generated somewhere already, to test?

The full run is in process. You'll find it updated in my testing machine in a couple of hours: https://appstream.pcorreag.tk/data/

@ximion
Copy link
Owner

ximion commented May 19, 2022

I can confirm this is gone in the container after dropping the caches :) Feel free to close this if you consider it solved!

This needs two more things to be properly fixed :-) Not sure yet what the locale issue is in Alpine, but I'll add a unit test for that so we can investigate that further (and maybe doing so will also yield some insight already).
The localized screenshots stuff also needs to be implemented. All of this will for sure be in the next release.

ximion added a commit that referenced this issue May 22, 2022
@ximion
Copy link
Owner

ximion commented May 22, 2022

This should work now, but keep an eye out if localization of AppStream data works well on Alpine clients!
I am not 100% sure that that issue is fully addressed as well (this one would definitely be Alpine-specific though).

@ximion ximion closed this as completed May 22, 2022
@pabloyoyoista
Copy link
Contributor

pabloyoyoista commented May 22, 2022

Wonderful, thank you very much! We'll see if we bump into more issues. But it looks like the code handles well the circumstance where there is no locale environment variable set. So it should be all fine :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compose Affects appstream-compose
Projects
None yet
Development

No branches or pull requests

3 participants