@daniel-boehme
Copy link

Investigating why KDE Discover crashes constantly on startup, I could trace it back to an XML appstream file, that contains a component without an id tag.
While this is violating the spec, it still should not cause a segmentation fault.

Version: appstream-0.14.5-1.fc34
After a quick look at master it seems like the issue still exists.

Backtrace of the segmentation fault:

#0  g_str_hash (v=0x0) at ../glib/ghash.c:2333
#1  0x00007ffff4501e08 in g_hash_table_lookup_node (hash_return=<synthetic pointer>, key=0x0, hash_table=0x7fff700019e0) at ../glib/ghash.c:472
#2  g_hash_table_insert_internal (keep_new_key=1, value=0x0, key=0x0, hash_table=0x7fff700019e0) at ../glib/ghash.c:1598
#3  g_hash_table_add (hash_table=0x7fff700019e0, key=0x0) at ../glib/ghash.c:1689
#4  0x00007ffff3e79ae1 in as_cache_insert (cache=0x7fff900283d0, cpt=0x7fff3119bf90, error=0x7fff7f7fd710) at ../src/as-cache.c:1156
#5  0x00007ffff3e9498e in as_pool_insert (pool=<optimized out>, cpt=0x7fff3119bf90, error=0x7fff7f7fd710) at ../src/as-pool.c:324
#6  0x00007ffff3e94dec in as_pool_add_component_internal (pool=0x7fff8800eb40, cpt=0x7fff3119bf90, pedantic_noadd=1, error=0x7fff7f7fd710) at ../src/as-pool.c:447
#7  0x00007ffff3e9583f in as_pool_load_collection_data (pool=0x7fff8800eb40, refresh=<optimized out>, error=0x7fff7f7fd788) at ../src/as-pool.c:980
#8  0x00007ffff3e9da8f in as_pool_refresh_system_cache (pool=0x7fff8800eb40, user=1, force=<optimized out>, error=0x7fff7f7fd8b0) at ../src/as-pool.c:2071
#9  0x00007ffff3e95c48 in as_pool_load_collection_data (pool=0x7fff9000af00, refresh=0, error=0x7fff7f7fd9f8) at ../src/as-pool.c:799
#10 0x00007ffff3e96f04 in as_pool_load (pool=0x7fff9000af00, cancellable=<optimized out>, error=0x7fff7f7fd9f8) at ../src/as-pool.c:1333
#11 0x00007ffff5482f9d in AppStream::Pool::load (this=this@entry=0x5555560e05e0) at ../qt/pool.cpp:72
#12 0x00007fffa068e9d2 in loadAppStream (appdata=0x5555560e05e0) at /usr/src/debug/plasma-discover-5.22.5-1.fc34.x86_64/libdiscover/backends/PackageKitBackend/PackageKitBackend.cpp:166
#13 0x00007fffa06947cb in QtConcurrent::StoredFunctorCall1<DelayedAppStreamLoad, DelayedAppStreamLoad (*)(AppStream::Pool*), AppStream::Pool*>::runFunctor (this=0x555556553dc0)
    at /usr/include/qt5/QtConcurrent/qtconcurrentstoredfunctioncall.h:422
#14 0x00007fffa06955a3 in QtConcurrent::RunFunctionTask<DelayedAppStreamLoad>::run (this=0x555556553dc0) at /usr/include/qt5/QtConcurrent/qtconcurrentrunbase.h:108
#15 0x00007ffff5e495d0 in QThreadPoolThread::run (this=0x55555644eba0) at thread/qthreadpool.cpp:100
#16 0x00007ffff5e464c6 in QThreadPrivate::start (arg=0x55555644eba0) at thread/qthread_unix.cpp:329
#17 0x00007ffff4e99299 in start_thread () from /lib64/libpthread.so.0
#18 0x00007ffff5a6f353 in clone () from /lib64/libc.so.6

It seems like the code loading the component (as_metadata_parse_file(), as_metadata_parse_data(), as_metadata_xml_parse_components_node(), as_metadata_xml_parse_components_node()) does not verify if an id is present, and then later it is just assumed that it exists.

I could imagine the check for an id existing at the end of as_component_load_from_xml(), but I don't know the code well enough to judge if there is a better, more central place for that.