Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when processing a pax-format tarball #101

Closed
antonio-rojas opened this issue Mar 23, 2022 · 11 comments
Closed

Segfault when processing a pax-format tarball #101

antonio-rojas opened this issue Mar 23, 2022 · 11 comments

Comments

@antonio-rojas
Copy link
Contributor

appstream-generator segfaults when processing a repo database tarball if it is in pax format, like the attached one, with this backtrace:

#0  0x00007ffff6df866f in find_derivation () at /usr/lib/libc.so.6
#1  0x00007ffff6df8f6c in __gconv_find_transform () at /usr/lib/libc.so.6
#2  0x00007ffff6df7bd6 in __gconv_open () at /usr/lib/libc.so.6
#3  0x00007ffff6df7788 in iconv_open () at /usr/lib/libc.so.6
#4  0x00007ffff774762d in create_sconv_object
    (flag=582, current_codepage=<optimized out>, tc=0x7ffff6f8bb60 <_nl_C_codeset> "ANSI_X3.4-1968", fc=0x7ffff777c91a "UTF-8")
    at libarchive/archive_string.c:1267
#5  get_sconv_object.part.0.lto_priv.0
    (a=0x5555558f7dc0, fc=0x7ffff777c91a "UTF-8", tc=0x7ffff6f8bb60 <_nl_C_codeset> "ANSI_X3.4-1968", flag=<optimized out>)
    at libarchive/archive_string.c:1656
#6  0x00007ffff7736d4c in pax_header (in_as=<optimized out>, in_as=<optimized out>, entry=0x5555558f3a2f, tar=0x5555558f8b30, a=0x5555558f8b30)
    at libarchive/archive_read_support_format_tar.c:1696
#7  header_pax_extensions
    (a=a@entry=0x5555558f7dc0, tar=tar@entry=0x5555558f8b30, entry=entry@entry=0x5555558f8620, h=h@entry=0x55555592d140, unconsumed=unconsumed@entry=0x7ffff6db9e60) at libarchive/archive_read_support_format_tar.c:1512
#8  0x00007ffff7734cfb in tar_read_header
    (a=a@entry=0x5555558f7dc0, tar=tar@entry=0x5555558f8b30, entry=entry@entry=0x5555558f8620, unconsumed=unconsumed@entry=0x7ffff6db9e60)
    at libarchive/archive_read_support_format_tar.c:809
#9  0x00007ffff7735b84 in archive_read_format_tar_read_header (a=0x5555558f7dc0, entry=0x5555558f8620)
    at libarchive/archive_read_support_format_tar.c:544
#10 0x00007ffff7706055 in _archive_read_next_header2 (_a=_a@entry=0x5555558f7dc0, entry=0x5555558f8620) at libarchive/archive_read.c:647
#11 0x00007ffff7706173 in _archive_read_next_header (_a=0x5555558f7dc0, entryp=0x7ffff6db9fd8) at libarchive/archive_read.c:685
#12 0x00005555557ed8c3 in _D5asgen8zarchive19ArchiveDecompressor4readMFZ9__lambda1MFZv () at ../appstream-generator-0.8.7/src/asgen/zarchive.d:404
#13 0x00007ffff71a8c0b in fiber_entryPoint () at /usr/lib/libdruntime-ldc-shared.so.98

Although the backtrace points to a libarchive crash, I am not able to reproduce this by running the same code appstream-generator runs in a separate C program. This works flawlessly on the attached file:

#include <stdlib.h>
#include <archive.h>
#include <archive_entry.h>

int main() {
  struct archive *a;
  struct archive_entry *entry;
  int r;

  a = archive_read_new();
  archive_read_support_filter_all(a);
  archive_read_support_format_all(a);
  r = archive_read_open_filename(a, "core.files.tar.gz", 65536);
  if (r != ARCHIVE_OK)
    exit(1);
  while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
    const char* pathname = archive_entry_pathname(entry);
    printf("%s\n", pathname);
    archive_read_data_skip(a);
  }
  r = archive_read_free(a);
}

core.files.tar.gz

@ximion
Copy link
Owner

ximion commented Mar 26, 2022

Weird, this absolutely worked before!
I'll try to reproduce this.... Thanks for the sample tarball!

@ximion
Copy link
Owner

ximion commented Mar 26, 2022

Weird, I can't reproduce this at all with the file you attached. My libarchive version is 3.5.2, asgen was compiled with LDC 1.28.0, are you using different versions?

@antonio-rojas
Copy link
Contributor Author

Using ldc 1.28.1 and libarchive 3.6.0, but downgrading doesn't make any difference

@antonio-rojas
Copy link
Contributor Author

Weird, I can't reproduce this at all with the file you attached. My libarchive version is 3.5.2, asgen was compiled with LDC 1.28.0, are you using different versions?

Really weird indeed: I have tested the binaries from Fedora and Debian, including their respective shared libraries down to glibc, and I can always reproduce it.

@antonio-rojas
Copy link
Contributor Author

antonio-rojas commented Mar 27, 2022

Recompiling libarchive without optimization fixes this. Still, I've found no way to reproduce it with pure libarchive code.

Edit: Nevermind, removing optimization actually breaks iconv detection, so libarchive was compiled without the codepath that causes the crash

@heftig
Copy link

heftig commented Mar 27, 2022

This seems to be stack exhaustion. This could fix it:

diff -u -r a/src/asgen/zarchive.d a2/src/asgen/zarchive.d
--- a/src/asgen/zarchive.d	2022-02-22 18:16:54.000000000 +0100
+++ a2/src/asgen/zarchive.d	2022-03-27 20:51:22.390736900 +0200
@@ -436,7 +436,7 @@
                 aentry.data = this.readEntry (ar);
                 yield (aentry);
             }
-        });
+        }, 65536);
 
         return gen;
     }

@ximion
Copy link
Owner

ximion commented Mar 27, 2022

@heftig That would make quite a bit of sense! @antonio-rojas can you test this?

@antonio-rojas
Copy link
Contributor Author

Can confirm the fix, thanks!

@ximion
Copy link
Owner

ximion commented Mar 27, 2022

Brrr, eww! That's an ugly trap set for D's Fibers, but at least it's documented... @heftig do you want to submit a PR to fix this, or should I include a patch? (The thing that I might want to change is storing the stack size in a constant on top of the function and add a comment about this behavior there, so it's clear what the argument does and we have a future reference).

@heftig
Copy link

heftig commented Mar 27, 2022

Do what you think is best. This was my first time touching D code.

@ximion ximion closed this as completed in c5d28f2 Mar 27, 2022
@ximion
Copy link
Owner

ximion commented Mar 27, 2022

I can imagine many issues in D code that would be a lot easier to find and easier to debug when starting with the language than this one!
Thank you for having a look and finding the issue! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants