xz-after-bootstrap: init at & 5.6.4 add to release blockers #391569

JulienMalka · 2025-03-20T13:30:57Z

Oppening this PR as a draft because it is more of a proof of concept design to trigger community feedback that patch destined to be merged as is. At the very least the comparePhase should be generalized to check all outputs.

This PR showcase a protection option that helps build trust into tarballs used to build software in our stdenv. In particular, with this kind of protection we avoid software supply chain attacks built on malicious maintainer provided tarballs such as the xz backdoor.

The idea here is to rebuild xz after the bootstrap from trusted sources. Given that xz is reproducible, if it doesn't produce the same results as the xz built during the bootstrap, something is wrong.

I discuss this proposition and alternatives at great length in this blog post.

The idea here is to rebuild xz after the bootstrap from trusted sources. Given that xz is reproducible, if it doesn't produce the same results as the xz built during the bootstrap, something is wrong.

getchoo

First off, thanks for your blog post (and your previous one!). I can't say I agree with everything in it, but it was a good read during lunch and it makes me happy to see discussions about this kind of stuff. Keep it up, it's awesome :)

I see a few issues here though:

This implementation is limited to just xz
- At the very least, verifying the output between two sources should work across multiple packages. I don't think it's worth only implementing here since xz just so happens to be the package that was compromised (somewhat) recently; this could happen to anything in the stdenv (or Nixpkgs), and should thus be re-usable
- The previous work you linked in your post had a much better approach in this aspect
This is built on the assumption that differing binaries from differing sources is a problem
- In the case of xz, the maintainer provided tarball and auto generated one should currently result in the same binary, but this may not be true forever, malicious intent or not
- Again, as mentioned in your post, some maintainers do regularly prefer to generate their own tarballs -- like curl for example. These tarballs may have completely different (non-malicious) files from the auto-generated equivalent, and differing outputs should regularly be expected
This can block channels for benign differences
- Again, different sources may produce different outputs for totally innocent reasons. Blocking a channel over this doesn't make much sense IMO
- Reproducibility problems in the base package may also exist, triggering the above
- There's no way to "allow" certain differences between the binaries in these cases (sorta like how we can skip some tests in actual packages)
This assumes trust in auto-generated artifacts that could still be manipulated by bad actors
- From how I see it, the threat model of this solution is to protect against compromised maintainers with release perms
- In this situation, how can we guarantee that something malicious wasn't checked into the upstream Git repository and is now present in the generated tarball? What if only some parts of a malicious payload are in the Git repo, and the restare put into the maintainer-provided one as to attract less attention in the source and binary diffs?
- This is far from impossible too, with university students being able to get hypocrite commits through even Linux kernel review. One could only imagine this type of attack from actors trying to cause real harm
- Generally, I feel this approach is specifically targeting the xz situation rather than trying to deal with the broader potential of malicious/compromised maintainers. It could help in some scenarios still, but I don't think it would really "catch" much unless the same exact situation happened again (unlikely)
This isn't the only solution
- From your post (last time I swear!), you mention two solutions in better trusting these kinds of sources: the implementation here, and comparing the sources directly
- Naturally, inspecting source code differences is (almost always) much more reliable than diffing binaries directly. Malicious code being injected into a function for example is much easier to find when comparing your source compared to attempting to decompile the binary -- new symbols being introduced isn't the only attack vector after all
- There are some tradeoffs of course, like source diffing being harder to automate and Nixpkgs maybe not being the best place for this kind of thing, but I would prefer it to many potential false positives and (manual) binary diffing in general. It could also be applied to the ecosystem at large rather than keeping it psuedo-exclusive here in Nixpkgs -- maybe something like Google's oss-fuzz?

TL;DR: I don't think this is the best approach to take, as it's quite specific to xz and it's past "situation" in both the current implementation, and a bit in concept

Trying to track potential compromises like this may also be better off in the wider ecosystem, as it's much larger than any one repository and would probably benefit a lot from more focused attention

doronbehar

Thanks for this contribution! Very much enjoyed reading the blog post.

doronbehar · 2025-03-20T21:30:55Z

nixos/release-small.nix

@@ -99,6 +99,7 @@ rec {
      mariadb
      nginx
      nodejs
+      xz-after-bootstrap


Naturally, I wouldn't define this package globally, but only here via callPackage.

makes complete sense yes.

doronbehar · 2025-03-20T21:31:46Z

pkgs/tools/compression/xz/xz-after-bootstrap.nix

+  meta = with lib; {
+    changelog = "https://github.com/tukaani-project/xz/releases/tag/v${finalAttrs.version}";
+    description = "General-purpose data compression software, successor of LZMA";
+    homepage = "https://tukaani.org/xz/";
+    longDescription = ''
+      XZ Utils is free general-purpose data compression software with high
+      compression ratio.  XZ Utils were written for POSIX-like systems,
+      but also work on some not-so-POSIX systems.  XZ Utils are the
+      successor to LZMA Utils.
+
+      The core of the XZ Utils compression code is based on LZMA SDK, but
+      it has been modified quite a lot to be suitable for XZ Utils.  The
+      primary compression algorithm is currently LZMA2, which is used
+      inside the .xz container format.  With typical files, XZ Utils
+      create 30 % smaller output than gzip and 15 % smaller output than
+      bzip2.
+
+      Note that this is the version built after the bootstrap of nixpkgs.
+    '';
+    license = with licenses; [
+      gpl2Plus
+      lgpl21Plus
+    ];
+    maintainers = with maintainers; [ julienmalka ];
+    platforms = platforms.all;
+    pkgConfigModules = [ "liblzma" ];
+  };


Suggested change

meta = with lib; {

changelog = "https://github.com/tukaani-project/xz/releases/tag/v${finalAttrs.version}";

description = "General-purpose data compression software, successor of LZMA";

homepage = "https://tukaani.org/xz/";

longDescription = ''

XZ Utils is free general-purpose data compression software with high

compression ratio. XZ Utils were written for POSIX-like systems,

but also work on some not-so-POSIX systems. XZ Utils are the

successor to LZMA Utils.

The core of the XZ Utils compression code is based on LZMA SDK, but

it has been modified quite a lot to be suitable for XZ Utils. The

primary compression algorithm is currently LZMA2, which is used

inside the .xz container format. With typical files, XZ Utils

create 30 % smaller output than gzip and 15 % smaller output than

bzip2.

Note that this is the version built after the bootstrap of nixpkgs.

'';

license = with licenses; [

gpl2Plus

lgpl21Plus

];

maintainers = with maintainers; [ julienmalka ];

platforms = platforms.all;

pkgConfigModules = [ "liblzma" ];

};

meta = xz.meta // {

maintainers = with lib.maintainers; [ julienmalka ];

};

doronbehar · 2025-03-20T21:33:07Z

pkgs/tools/compression/xz/xz-after-bootstrap.nix

+stdenv.mkDerivation (finalAttrs: {
+  pname = "xz";
+  version = xz.version;
+
+  src = fetchFromGitHub {
+    owner = "tukaani-project";
+    repo = "xz";
+    rev = "v${finalAttrs.version}";
+    hash = "sha256-Xp1uLtQIoOG/qVLpq5D/KFmTOJ0+mNkNclyuJsvlUbE=";
+  };
+
+  strictDeps = true;
+
+  outputs = [
+    "bin"
+    "dev"
+    "out"
+    "man"
+    "doc"
+  ];
+
+  enableParallelBuilding = true;
+
+  doCheck = true;
+
+  nativeBuildInputs = [
+    gettext
+    libtool
+    automake
+    autoconf


Maybe in general it'd be better to use .overrideAttrs, so the inputs will stay the same, up to auto* tools of course.

doronbehar · 2025-03-20T21:34:53Z

pkgs/tools/compression/xz/xz-after-bootstrap.nix

+  compareArtifacts = ''
+    diff $out/lib/liblzma.so ${xz.out}/lib/liblzma.so
+  '';


Maybe we can write a Nix function that gets a derivation as a single input, and iterates all .so files in it and compares them to the .so files in $out.. That should be a fairly good starting point in generalizing this.

I think that ideally we would check all binaries
Could also just diff the $out paths entirely if everything else is reproducible -- though that case would be probably be unlikely at a scale

There is also something that I experienced in my testing (again to keep things simple for this discussion I did not include it), is that sometimes packages have self references and we have to get rid of them before comparing the two versions.

getchoo · 2025-03-20T21:39:38Z

pkgs/tools/compression/xz/xz-after-bootstrap.nix

+  postPhases = [ "compareArtifacts" ];
+
+  compareArtifacts = ''
+    diff $out/lib/liblzma.so ${xz.out}/lib/liblzma.so


Suggested change

diff $out/lib/liblzma.so ${xz.out}/lib/liblzma.so

diff $out/lib/liblzma.so ${xz.out}/lib/liblzma${lib.optionalString stdenv.hostPlatform.hasSharedLibraries stdenv.hostPlatform.extensions.sharedLibrary}

Should fix building on Darwin

JulienMalka · 2025-03-20T22:14:09Z

First off, thanks for your blog post (and your previous one!). I can't say I agree with everything in it, but it was a good read during lunch and it makes me happy to see discussions about this kind of stuff. Keep it up, it's awesome :)

Thank you!

This implementation is limited to just xz

That's true! It's an example though. I suggest doing this kind of thing for every package in the bootstrap that is built from untrusted tarballs. However, I didn't find a less "manual" way to implement this. With the number of packages that could be targetted, the effort may stay limited though.

This is built on the assumption that differing binaries from differing sources is a problem

In the case of xz, the maintainer provided tarball and auto generated one should currently result in the same binary, but this may not be true forever, malicious intent or not

Again, as mentioned in your post, some maintainers do regularly prefer to generate their own tarballs -- like curl for example. These tarballs may have completely different (non-malicious) files from the auto-generated equivalent, and differing outputs should regularly be expected

Why would differing outputs should be expected? Can you provide examples? Indeed this is built on the assumption that if the two artifacts differ then we must have built from two sources that are not semantically identical, so not semantically identical to building from source and that is probably something to look at. I don't think you have explained why this assumption is not sane.

This can block channels for benign differences

Maybe a release blocker is not the best tool for the job. I can see this breaking and being a false positive at the very least if the software becomes non reproducible.
Unfortunately, in nixpkgs we don't have anything that triggers a red alarm if it starts failing appart for release blockers.

Reproducibility problems in the base package may also exist, triggering the above

Indeed, this also should only be active on packages that build reproducibly.

There's no way to "allow" certain differences between the binaries in these cases (sorta like how we can skip some tests in actual packages)

There is no such thing as a "innocuous" difference between two artifacts. Reasonning on the behavioral differences between two executables is something extremely difficult. That is the reason behind the reproducible-builds project

In this situation, how can we guarantee that something malicious wasn't checked into the upstream Git repository and is now present in the generated tarball? What if only some parts of a malicious payload are in the Git repo, and the restare put into the maintainer-provided one as to attract less attention in the source and binary diffs?

This is not about providing a magical solution to all supply chain security issues, this is about provided a mitigation to attacks using malicious tarballs.

This isn't the only solution

From your post (last time I swear!), you mention two solutions in better trusting these kinds of sources: the implementation here, and comparing the sources directly

Naturally, inspecting source code differences is (almost always) much more reliable than diffing binaries directly. Malicious code being injected into a function for example is much easier to find when comparing your source compared to attempting to decompile the binary -- new symbols being introduced isn't the only attack vector after all

I disagree, as Daniel Stenberg explains, it is often the case that the content differ between the source code and the release tarball. On the other hand, if the software is reproducible and the release tarball doesn't contain anything that could influence the build process, both artifacts should be identical.

getchoo · 2025-03-21T02:11:48Z

Why would differing outputs should be expected?

These are different sources that may have different files that can affect the build. Nix is (largely) built on the assumption that with the same inputs, you should get the same outputs; this is directly against that and should thus be expected

Can you provide examples?

From a quick test with curl:

curl.overrideAttrs (finalAttrs: {
  src = fetchFromGitHub {
    owner = "curl";
    repo = "curl";
    tag = "curl-${lib.replaceStrings ["."] ["_"] finalAttrs.version}";
    hash = "sha256-Q6vGENvtpxuhehhyMSz/Q6NmJkhJgp3Vccf5H23+rrs=";
  };
})

The build fails outright due to the missing files in the maintainer tarball

error: builder for '/nix/store/5ahxb3cszca6kv1k8jmi37pkza62jnds-curl-8.12.1.drv' failed with exit code 1;
       last 5 log lines:
       > Running phase: unpackPhase
       > unpacking source archive /nix/store/gyv2xv3cd0pnh6fn1asnfxanxw3n0x3i-source
       > source root is source
       > Running phase: patchPhase
       > substitute(): ERROR: file './config.guess' does not exist
       For full logs, run 'nix log /nix/store/5ahxb3cszca6kv1k8jmi37pkza62jnds-curl-8.12.1.drv'.

A project I personally work on -- Prism Launcher -- distributes both the generated source tarball, as well as a custom tarball including our submodules (since GitHub will not include them by default). You can see them here

This setup can make it pretty easy to accidentally include the dependencies we vendor over the systems, which is basically required for many distributions that don't happen to package the sometimes obscure libraries we use. We have tried our best to avoid this issue by adding proper guards to when our CMake configuration will decide to fallback to our vendored dependencies, but this is a problem I've actually had to help other packagers with and still have a small "workaround" for

There are probably many more examples of this, with some being much harder to catch. This is just what first came to mind tonight

Unfortunately, in nixpkgs we don't have anything that triggers a red alarm if it starts failing appart for release blockers.

Yep, this is something that would probably need some kind of specific/individual monitoring IMO. I have no idea what that would be, but hey 😆

Indeed, this also should only be active on packages that build reproducibly.

There would still be an issue in cases where a package no longer is reproducible, but if it's not a channel blocker or anything, I'm really just nitpicking

There is no such thing as a "innocuous" difference between two artifacts

When the two artifacts are technically built from different sources, there most definitely can be. This is why the Reproducible Build projects explicitly defines "reproducible builds" as (emphasis mine)

A build is reproducible if given the same source code, build environment and build instructions, any party can recreate bit-by-bit identical copies of all specified artifacts.

This is not about providing a magical solution to all supply chain security issues, this is about provided a mitigation to attacks using malicious tarballs.

Of course, and I'll admit I was getting pretty out of scope there with the first instance. The second -- referring to payloads split across the actual Git repo and the maintainer tarball -- is still a bit worrying to me though, as the release tarballs exist as a sort of root of trust with this implementation. How reliable is checking the artifacts of two different sources for the actions of a bad actor in one, when that same bad actor controls both sources?

I disagree, as Daniel Stenberg explains, it is often the case that the content differ between the source code and the release tarball. On the other hand, if the software is reproducible and the release tarball doesn't contain anything that could influence the build process, both artifacts should be identical.

I would put a lot of emphasis on the if here. While reproducibility is common in Nixpkgs, having files that may influence the build seems to be at least somewhat common in release tarballs as well -- with Daniel Stenberg's curl being a small example of this, as shown above

JulienMalka · 2025-03-21T09:22:39Z

Why would differing outputs should be expected?

These are different sources that may have different files that can affect the build. Nix is (largely) built on the assumption that with the same inputs, you should get the same outputs; this is directly against that and should thus be expected

Can you provide examples?

From a quick test with curl:
curl.overrideAttrs (finalAttrs: {
  src = fetchFromGitHub {
    owner = "curl";
    repo = "curl";
    tag = "curl-${lib.replaceStrings ["."] ["_"] finalAttrs.version}";
    hash = "sha256-Q6vGENvtpxuhehhyMSz/Q6NmJkhJgp3Vccf5H23+rrs=";
  };
})
The build fails outright due to the missing files in the maintainer tarball
error: builder for '/nix/store/5ahxb3cszca6kv1k8jmi37pkza62jnds-curl-8.12.1.drv' failed with exit code 1;
       last 5 log lines:
       > Running phase: unpackPhase
       > unpacking source archive /nix/store/gyv2xv3cd0pnh6fn1asnfxanxw3n0x3i-source
       > source root is source
       > Running phase: patchPhase
       > substitute(): ERROR: file './config.guess' does not exist
       For full logs, run 'nix log /nix/store/5ahxb3cszca6kv1k8jmi37pkza62jnds-curl-8.12.1.drv'.

This doesn't really prove your point though, because the build is failing and not producing a different output. For the xz example, I also had to produce a different nix expression because the build from source needs more steps than the build from the tarball.

A project I personally work on -- Prism Launcher -- distributes both the generated source tarball, as well as a custom tarball including our submodules (since GitHub will not include them by default). You can see them here

This setup can make it pretty easy to accidentally include the dependencies we vendor over the systems, which is basically required for many distributions that don't happen to package the sometimes obscure libraries we use.

IMO in your tarball embeds code that is not in the original repository, maybe it is not a software supply chain attack but it's still a red flag and should be vetted appropriately. There is no easy way to differentiate between code that is added for legitimate reason and code that is added for malicious reasons (appart maybe if they generate the exact same artifacts). In that sense I stand by my proposition.

Of course, and I'll admit I was getting pretty out of scope there with the first instance. The second -- referring to payloads split across the actual Git repo and the maintainer tarball -- is still a bit worrying to me though, as the release tarballs exist as a sort of root of trust with this implementation. How reliable is checking the artifacts of two different sources for the actions of a bad actor in one, when that same bad actor controls both sources?

The idea is to remove the option of including malicious code in the release tarball and not in the original source. This of course doesn't prevent a malicious maintainer from putting malicious code in the VCS, but probably it makes the attack a bit less stealth and we made one more attack vector impossible to use.

JulienMalka marked this pull request as draft March 20, 2025 13:31

github-actions bot added the 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS label Mar 20, 2025

xz-after-bootstrap: init at & 5.6.4 add to release blockers

f8a7c48

The idea here is to rebuild xz after the bootstrap from trusted sources. Given that xz is reproducible, if it doesn't produce the same results as the xz built during the bootstrap, something is wrong.

JulienMalka force-pushed the xz-after-bootstrap branch from 57c6f48 to f8a7c48 Compare March 20, 2025 13:34

github-actions bot added 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. labels Mar 20, 2025

getchoo reviewed Mar 20, 2025

View reviewed changes

doronbehar reviewed Mar 20, 2025

View reviewed changes

getchoo reviewed Mar 20, 2025

View reviewed changes

Saterfield990 mentioned this pull request May 8, 2025

ventoy*: Questionable upstream #404663

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

xz-after-bootstrap: init at & 5.6.4 add to release blockers #391569

xz-after-bootstrap: init at & 5.6.4 add to release blockers #391569

Uh oh!

JulienMalka commented Mar 20, 2025

Uh oh!

getchoo left a comment

Uh oh!

doronbehar left a comment

Uh oh!

doronbehar Mar 20, 2025

Uh oh!

JulienMalka Mar 20, 2025

Uh oh!

doronbehar Mar 20, 2025

Uh oh!

doronbehar Mar 20, 2025

Uh oh!

doronbehar Mar 20, 2025

Uh oh!

getchoo Mar 20, 2025

Uh oh!

JulienMalka Mar 20, 2025

Uh oh!

getchoo Mar 20, 2025

Uh oh!

JulienMalka commented Mar 20, 2025 •

edited

Loading

Uh oh!

getchoo commented Mar 21, 2025

Uh oh!

JulienMalka commented Mar 21, 2025

Uh oh!

Uh oh!

	diff $out/lib/liblzma.so ${xz.out}/lib/liblzma.so
	diff $out/lib/liblzma.so ${xz.out}/lib/liblzma${lib.optionalString stdenv.hostPlatform.hasSharedLibraries stdenv.hostPlatform.extensions.sharedLibrary}

Uh oh!

xz-after-bootstrap: init at & 5.6.4 add to release blockers #391569

Are you sure you want to change the base?

xz-after-bootstrap: init at & 5.6.4 add to release blockers #391569

Uh oh!

Conversation

JulienMalka commented Mar 20, 2025

Uh oh!

getchoo left a comment

Choose a reason for hiding this comment

Uh oh!

doronbehar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JulienMalka commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

getchoo commented Mar 21, 2025

Uh oh!

JulienMalka commented Mar 21, 2025

Uh oh!

Uh oh!

JulienMalka commented Mar 20, 2025 •

edited

Loading