Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mke2fs: the -d option can now handle tarball input #118

Merged
merged 1 commit into from Apr 21, 2024

Conversation

josch
Copy link
Contributor

@josch josch commented Jul 22, 2022

This is an independent implementation that does the same thing as #107. Unfortunately I wasn't aware that e2fsprogs had a github repository (http://e2fsprogs.sourceforge.net/ only mentions the kernel.org repo). Differences of this approach compared to #107 are:

  • no hard dependency on libarchive via some #ifdefs
  • re-uses the -d option instead of adding a new option
  • supports reading the archive from stdin using -
  • comes with several test cases checking that
    • a roundtrip from tar to ext4 to tar produces bit-by-bit identical results
    • extended attributes work
    • gnu tar and pax work
    • character devices and symlinks work

Just as @russdill I'm looking for comments on this. Thanks!

@josch
Copy link
Contributor Author

josch commented Jul 23, 2022

Use dlopen instead of linking against libarchive to keep runtime dependencies minimal as requested in #107 (comment)

@josch
Copy link
Contributor Author

josch commented Sep 19, 2022

Hi, @tytso, could you give some feedback for this pull request or for #107 so that I can work on fixing remaining issues? Thanks!

@tytso
Copy link
Owner

tytso commented Feb 5, 2023

Sorry for the delay in looking at this pull request. Things have been pretty busy. So some comments about this change.

First of all, when I tried doing a trial merge, I got test failures for the m_rootgnutar test. There are a lot of failures of the form:

dump_file: Operation not permitted while changing ownership of test.img.dir/progs/hold_inode.c
dump_file: Operation not permitted while changing ownership of test.img.dir/progs/random_exercise.c
....

Secondly, please test to make sure that (a) e2fsprogs builds with and without libarchive installed. One of the really nasty things about configure scripts is that very often, people are careless about them, and the autoconf scripts slow down the build process (since you have to run configure script, which takes time), but it doesn't buy you anything in terms of portability if the build crashes and burns if the build environment does not exactly match the developer's. If it's going to be non-portable, why waste time with the configure script? Secondly, after fixing the configure script, we then find that "make check" fails all of the three new tests, since m_rootgnutar, m_rootpaxtar, and m_roottar are written assuming that the libarchive functionality is present.

While I do appreciate adding the dlopen() support, one question that comes to mind is how stable is the libarchive ABI. Given that the the SOVERSION is up to 13, it causes me to wonder whether it's like openssl, where the ABI is so badly designed that a SOBUMP is needed at essentially every single release. Also, the man page needs to warn that a particular bit of functionality might not be present --- either because at compile-time, the archive.h header file isn't there, in which case the functionality will never be present, or at run-time, because the shared library for libarchive isn't -present, in which case presumably we should get a nice warning message a file was passed to the -d option, but we can't handle it because libarchive isn't available --- and not a confusing error message such as:

__populate_fs: Not a directory while changing working directory to "test.img.tar"

@josch josch force-pushed the libarchive branch 3 times, most recently from 8980abb to 701d783 Compare April 2, 2023 15:27
@josch
Copy link
Contributor Author

josch commented Apr 2, 2023

Sorry for the delay in looking at this pull request. Things have been pretty busy.

No problem, same over here. :)

So some comments about this change.

Thank you for your review!

First of all, when I tried doing a trial merge, I got test failures for the m_rootgnutar test. There are a lot of failures of the form:

dump_file: Operation not permitted while changing ownership of test.img.dir/progs/hold_inode.c
dump_file: Operation not permitted while changing ownership of test.img.dir/progs/random_exercise.c
....

This should be fixed now.

Secondly, please test to make sure that (a) e2fsprogs builds with and without libarchive installed. One of the really nasty things about configure scripts is that very often, people are careless about them, and the autoconf scripts slow down the build process (since you have to run configure script, which takes time), but it doesn't buy you anything in terms of portability if the build crashes and burns if the build environment does not exactly match the developer's. If it's going to be non-portable, why waste time with the configure script?

I verified that e2fsprog builds with and without libarchive installed.

Secondly, after fixing the configure script, we then find that "make check" fails all of the three new tests, since m_rootgnutar, m_rootpaxtar, and m_roottar are written assuming that the libarchive functionality is present.

Fixed now as well.

While I do appreciate adding the dlopen() support, one question that comes to mind is how stable is the libarchive ABI. Given that the the SOVERSION is up to 13, it causes me to wonder whether it's like openssl, where the ABI is so badly designed that a SOBUMP is needed at essentially every single release.

The last SOVERSION bump happened 10 years ago. I asked about the ABI stability here but got no reply yet: libarchive/libarchive#1854

Also, the man page needs to warn that a particular bit of functionality might not be present --- either because at compile-time, the archive.h header file isn't there, in which case the functionality will never be present, or at run-time, because the shared library for libarchive isn't -present,

I added this information to the man page.

in which case presumably we should get a nice warning message a file was passed to the -d option, but we can't handle it because libarchive isn't available and not a confusing error message such as:

__populate_fs: Not a directory while changing working directory to "test.img.tar"

I was unable to trigger this error. Do you remember what you did to get it? In any case, I changed some of the code paths surrounding __populate_fs so this might not get generated anymore.

@josch
Copy link
Contributor Author

josch commented Jun 20, 2023

I'm submitted my patch to the linux-ext4 list to get more feedback: https://lore.kernel.org/linux-ext4/20230620121641.469078-1-josch@mister-muffin.de/

@guillemj
Copy link

While I do appreciate adding the dlopen() support, one question that comes to mind is how stable is the libarchive ABI. Given that the the SOVERSION is up to 13, it causes me to wonder whether it's like openssl, where the ABI is so badly designed that a SOBUMP is needed at essentially every single release.

In general I think dlopen()ing a specific shared library via its full SONAME from an external project where there is no tight coordination going on is the wrong approach because it's crossing an ABI boundary with no checks in place, which requires manual patching when the SONAME gets bumped even if the used symbols have not seen any ABI or ABI change, and the code will not automatically pick up ABI changes that are API compatible. I think a better solution is to create a shared object "module" or "plugin" exposing the functions that you need that links against the libarchive library as usual, which will mean the external project can always be used safely, or the compiled will just barf, and this module within the project boundaries can be safely dlopen()ed if present. Then when packaging this module can be split into its own package and installed if desired f.ex, which will also make it possible to automatically pick up the required dependencies w/o having to hardcode those as well.

@josch josch force-pushed the libarchive branch 2 times, most recently from 17fb226 to e1a8915 Compare August 15, 2023 18:53
Copy link

@evelikov evelikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The libarchive ABI has been fairly stable for nearly a decade now and I would be in favour of seeing e2fsprogs make use of it.

As mentioned inline - I would suggest sticking to dlopen, in part because forcing extra deps isn't fun. And in part because the cmake libarchive build produces a different soname - see this bug.

The cmake build does not produce a pkg-config file, which in part why I would suggest using the PKG_CHECK_MODULES macro.

HTH o/

ARCHIVE_LIB=$DLOPEN_LIB
fi
AC_SUBST(ARCHIVE_LIB)
dnl

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would swap the above check for a simple PKG_CHECK_MODULES(ARCHIVE, [libarchive >= the-version-whose-api-is-used.

Since this is an optional feature, I would only have the dlopen path. Using linking and forcing bunch of runtime dependencies onto people who won't use the feature seems moot IMHO.

/* 64KiB is the minimum blksize to best minimize system call overhead. */
//#define COPY_FILE_BUFLEN 65536
//#define COPY_FILE_BUFLEN 1048576
#define COPY_FILE_BUFLEN 16777216

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our of curiosity: where does the 16MiB come from? It's few orders of magnitude larger than the minimum 64KiB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding an inline comment/reference IMHO.

@richardweinberger
Copy link

FWIW, I really would love to see this feature merged.

@josch
Copy link
Contributor Author

josch commented Jan 9, 2024

Hello and happy new year! 🙂

I just rebased this branch on top of master. If there is anything I can do to move this forward, @tytso, please do tell. Thank you!

@josch
Copy link
Contributor Author

josch commented Jan 9, 2024

The github workflow seems to fail for macos and android but the failures do not seem to be due to the changes of this merge request. In fact on android, the relevant tests are successfully skipped:

2024-01-09T11:58:40.7151560Z m_rootgnutar: create fs image from GNU tarball: skipped (no libarchive)
2024-01-09T11:58:40.7473800Z m_roottar: skipped (no libarchive)
2024-01-09T11:58:40.7475090Z m_rootpaxtar: skipped (no libarchive)

@h01ger
Copy link

h01ger commented Feb 27, 2024

hey, any update on this? being able to create reproducible ext3/4 filesystems would be really great.

@tytso
Copy link
Owner

tytso commented Apr 17, 2024

Apologies for the delay in reviewing it. Things have just been really crazy. I am in the middle of reviewing the patch right now. One thing which I've noticed is that the test m_rootgnutar assumes that "tar" is the GNU tar. This might not be true on non-Linux platforms for which e2fprogs is built (e.g., *BSD, MacOS, etc.). So it might be that we need a mkgnutar.pl ala the m_roottar and m_rootpaxtar tests.

@josch
Copy link
Contributor Author

josch commented Apr 17, 2024

Thanks a lot for reviewing my code! 🥳 I'm currently in the middle of switching jobs, so I cannot promise you until when I can cook up a mkgnutar.pl. Maybe a quicker fix would be to check for gnu tar and skip the test if it's not available for now?

EDIT

Nevermind. Replicating what GNU tar does and creating bit-by-bit identical tarballs with a Perl script was easy enough. @tytso do you want me to throw out GNU tar completely or do you think that there is value to have two tests: one which is only run if GNU tar is available and one which does the same thing but with a Perl fake tar.

@josch
Copy link
Contributor Author

josch commented Apr 18, 2024

I now replaced all calls to tar in m_rootgnutar by mkgnutar.pl. I didn't rebase on master yet, so that you can easily compare the changes of my last force-push. In the process I also found a bug in the GNU tar version, hence the changes to the expect file.

If archive.h is available during compilation, enable mke2fs to read a
tarball as input. Since libarchive.so.13 is opened with dlopen,
libarchive is not a hard library dependency of the resulting binary.

In comparison with feeding a directory tree to mke2fs via -d this has
the following advantages:

 - no superuser privileges, nor fakeroot, nor unshared user namespaces
   are needed to create filesystems with arbitrary ownership information
   and special files like device nodes which otherwise require being root

 - by reading a tarball from standard input, no temporary files need to
   be written out first as mke2fs can be used as part of a shell pipeline
   which reduces disk usage and makes the conversion independent of the
   underlying file system

A round-trip from tarball to ext4 to tarball yields bit-by-bit identical
results

Signed-off-by: Johannes Schauer Marin Rodrigues <josch@mister-muffin.de>
@tytso
Copy link
Owner

tytso commented Apr 19, 2024

OK, things looks good. I've done a trial merge with the next branch, and the CI passes clean:

https://github.com/tytso/e2fsprogs/actions/runs/8743318976

Do you have any other changes you want to make? If not, I can just take the merge at https://github.com/tytso/e2fsprogs/tree/josch-libarchive

@josch
Copy link
Contributor Author

josch commented Apr 19, 2024

If there are more things that I want to change/fix, you will probably receive pull requests after the first e2fsprogs version with this feature lands in Debian unstable and I start using this functionality in all the places I plan to us it (like mmdebstrap, debvm, or the MNT Reform system image tooling).

If there are any bugs related to this functionality, please do not hesitate to ping me about it either via github or via josch@debian.org.

Thank you!! ❤️

@tytso
Copy link
Owner

tytso commented Apr 21, 2024

I've merged this on the next branch, and have noted a potential portability problem with the m_root*tar test scripts when I tried doing a test build on FreeBSD 14. Namely, the -c option to stat is something which is only in the GNU coreutils implementation. It is not in the version of stat found in the *BSD userspace, which is also going to be the version found in MacOS/Darwin (because it is based on FreeBSD's userspace). Fortunately, MacOS apparently doesn't ship with libarchive, so it's not breaking the CI test on github. Since the test scripts are using perl, probably what needs to happen is implement "stat -c %Y" in perl.

stat: illegal option -- c
usage: stat [-FLnq] [-f format | -l | -r | -s | -x] [-t timefmt] [file|handle ...]
Value "" invalid for option mtime (number expected)
Use of uninitialized value $mtime in sprintf at m_rootgnutar/mkgnutar.pl line 73.
Use of uninitialized value $mtime in sprintf at m_rootgnutar/mkgnutar.pl line 73.
Use of uninitialized value $mtime in sprintf at m_rootgnutar/mkgnutar.pl line 73.

If you have time to fix this, that would be great. If not, I'm sure that once I do an e2fsprogs release, the FreeBSD ports maintainer for e2fsprogs will send a patcch fairly quickly once it is released. :-)

@josch
Copy link
Contributor Author

josch commented Apr 21, 2024

Since the test scripts are using perl, probably what needs to happen is implement "stat -c %Y" in perl.

$ stat -c %Y testfile
1706820340
$ perl -e 'print((stat ($ARGV[0]))[9])' testfile
1706820340

I can also send you a full patch but only later as it is currently Sunday morning over here. Thank you for notifying me about this and thank you for merging! :)

EDIT: sent you a patch by mail

@tytso tytso merged commit ddf8d89 into tytso:master Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants