-
-
Notifications
You must be signed in to change notification settings - Fork 15.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xz-after-bootstrap: init at & 5.6.4 add to release blockers #391569
base: master
Are you sure you want to change the base?
Conversation
The idea here is to rebuild xz after the bootstrap from trusted sources. Given that xz is reproducible, if it doesn't produce the same results as the xz built during the bootstrap, something is wrong.
57c6f48
to
f8a7c48
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First off, thanks for your blog post (and your previous one!). I can't say I agree with everything in it, but it was a good read during lunch and it makes me happy to see discussions about this kind of stuff. Keep it up, it's awesome :)
I see a few issues here though:
-
This implementation is limited to just
xz
- At the very least, verifying the output between two sources should work across multiple packages. I don't think it's worth only implementing here since
xz
just so happens to be the package that was compromised (somewhat) recently; this could happen to anything in the stdenv (or Nixpkgs), and should thus be re-usable - The previous work you linked in your post had a much better approach in this aspect
- At the very least, verifying the output between two sources should work across multiple packages. I don't think it's worth only implementing here since
-
This is built on the assumption that differing binaries from differing sources is a problem
- In the case of
xz
, the maintainer provided tarball and auto generated one should currently result in the same binary, but this may not be true forever, malicious intent or not - Again, as mentioned in your post, some maintainers do regularly prefer to generate their own tarballs -- like curl for example. These tarballs may have completely different (non-malicious) files from the auto-generated equivalent, and differing outputs should regularly be expected
- In the case of
-
This can block channels for benign differences
- Again, different sources may produce different outputs for totally innocent reasons. Blocking a channel over this doesn't make much sense IMO
- Reproducibility problems in the base package may also exist, triggering the above
- There's no way to "allow" certain differences between the binaries in these cases (sorta like how we can skip some tests in actual packages)
-
This assumes trust in auto-generated artifacts that could still be manipulated by bad actors
- From how I see it, the threat model of this solution is to protect against compromised maintainers with release perms
- In this situation, how can we guarantee that something malicious wasn't checked into the upstream Git repository and is now present in the generated tarball? What if only some parts of a malicious payload are in the Git repo, and the restare put into the maintainer-provided one as to attract less attention in the source and binary diffs?
- This is far from impossible too, with university students being able to get hypocrite commits through even Linux kernel review. One could only imagine this type of attack from actors trying to cause real harm
- Generally, I feel this approach is specifically targeting the xz situation rather than trying to deal with the broader potential of malicious/compromised maintainers. It could help in some scenarios still, but I don't think it would really "catch" much unless the same exact situation happened again (unlikely)
-
This isn't the only solution
- From your post (last time I swear!), you mention two solutions in better trusting these kinds of sources: the implementation here, and comparing the sources directly
- Naturally, inspecting source code differences is (almost always) much more reliable than diffing binaries directly. Malicious code being injected into a function for example is much easier to find when comparing your source compared to attempting to decompile the binary -- new symbols being introduced isn't the only attack vector after all
- There are some tradeoffs of course, like source diffing being harder to automate and Nixpkgs maybe not being the best place for this kind of thing, but I would prefer it to many potential false positives and (manual) binary diffing in general. It could also be applied to the ecosystem at large rather than keeping it psuedo-exclusive here in Nixpkgs -- maybe something like Google's oss-fuzz?
TL;DR: I don't think this is the best approach to take, as it's quite specific to xz and it's past "situation" in both the current implementation, and a bit in concept
Trying to track potential compromises like this may also be better off in the wider ecosystem, as it's much larger than any one repository and would probably benefit a lot from more focused attention
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution! Very much enjoyed reading the blog post.
@@ -99,6 +99,7 @@ rec { | |||
mariadb | |||
nginx | |||
nodejs | |||
xz-after-bootstrap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naturally, I wouldn't define this package globally, but only here via callPackage
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes complete sense yes.
meta = with lib; { | ||
changelog = "https://github.com/tukaani-project/xz/releases/tag/v${finalAttrs.version}"; | ||
description = "General-purpose data compression software, successor of LZMA"; | ||
homepage = "https://tukaani.org/xz/"; | ||
longDescription = '' | ||
XZ Utils is free general-purpose data compression software with high | ||
compression ratio. XZ Utils were written for POSIX-like systems, | ||
but also work on some not-so-POSIX systems. XZ Utils are the | ||
successor to LZMA Utils. | ||
|
||
The core of the XZ Utils compression code is based on LZMA SDK, but | ||
it has been modified quite a lot to be suitable for XZ Utils. The | ||
primary compression algorithm is currently LZMA2, which is used | ||
inside the .xz container format. With typical files, XZ Utils | ||
create 30 % smaller output than gzip and 15 % smaller output than | ||
bzip2. | ||
|
||
Note that this is the version built after the bootstrap of nixpkgs. | ||
''; | ||
license = with licenses; [ | ||
gpl2Plus | ||
lgpl21Plus | ||
]; | ||
maintainers = with maintainers; [ julienmalka ]; | ||
platforms = platforms.all; | ||
pkgConfigModules = [ "liblzma" ]; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meta = with lib; { | |
changelog = "https://github.com/tukaani-project/xz/releases/tag/v${finalAttrs.version}"; | |
description = "General-purpose data compression software, successor of LZMA"; | |
homepage = "https://tukaani.org/xz/"; | |
longDescription = '' | |
XZ Utils is free general-purpose data compression software with high | |
compression ratio. XZ Utils were written for POSIX-like systems, | |
but also work on some not-so-POSIX systems. XZ Utils are the | |
successor to LZMA Utils. | |
The core of the XZ Utils compression code is based on LZMA SDK, but | |
it has been modified quite a lot to be suitable for XZ Utils. The | |
primary compression algorithm is currently LZMA2, which is used | |
inside the .xz container format. With typical files, XZ Utils | |
create 30 % smaller output than gzip and 15 % smaller output than | |
bzip2. | |
Note that this is the version built after the bootstrap of nixpkgs. | |
''; | |
license = with licenses; [ | |
gpl2Plus | |
lgpl21Plus | |
]; | |
maintainers = with maintainers; [ julienmalka ]; | |
platforms = platforms.all; | |
pkgConfigModules = [ "liblzma" ]; | |
}; | |
meta = xz.meta // { | |
maintainers = with lib.maintainers; [ julienmalka ]; | |
}; |
stdenv.mkDerivation (finalAttrs: { | ||
pname = "xz"; | ||
version = xz.version; | ||
|
||
src = fetchFromGitHub { | ||
owner = "tukaani-project"; | ||
repo = "xz"; | ||
rev = "v${finalAttrs.version}"; | ||
hash = "sha256-Xp1uLtQIoOG/qVLpq5D/KFmTOJ0+mNkNclyuJsvlUbE="; | ||
}; | ||
|
||
strictDeps = true; | ||
|
||
outputs = [ | ||
"bin" | ||
"dev" | ||
"out" | ||
"man" | ||
"doc" | ||
]; | ||
|
||
enableParallelBuilding = true; | ||
|
||
doCheck = true; | ||
|
||
nativeBuildInputs = [ | ||
gettext | ||
libtool | ||
automake | ||
autoconf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in general it'd be better to use .overrideAttrs
, so the inputs will stay the same, up to auto*
tools of course.
compareArtifacts = '' | ||
diff $out/lib/liblzma.so ${xz.out}/lib/liblzma.so | ||
''; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can write a Nix function that gets a derivation as a single input, and iterates all .so
files in it and compares them to the .so
files in $out
.. That should be a fairly good starting point in generalizing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that ideally we would check all binaries
Could also just diff the $out paths entirely if everything else is reproducible -- though that case would be probably be unlikely at a scale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also something that I experienced in my testing (again to keep things simple for this discussion I did not include it), is that sometimes packages have self references and we have to get rid of them before comparing the two versions.
postPhases = [ "compareArtifacts" ]; | ||
|
||
compareArtifacts = '' | ||
diff $out/lib/liblzma.so ${xz.out}/lib/liblzma.so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diff $out/lib/liblzma.so ${xz.out}/lib/liblzma.so | |
diff $out/lib/liblzma.so ${xz.out}/lib/liblzma${lib.optionalString stdenv.hostPlatform.hasSharedLibraries stdenv.hostPlatform.extensions.sharedLibrary} |
Should fix building on Darwin
Thank you!
That's true! It's an example though. I suggest doing this kind of thing for every package in the bootstrap that is built from untrusted tarballs. However, I didn't find a less "manual" way to implement this. With the number of packages that could be targetted, the effort may stay limited though.
Why would differing outputs should be expected? Can you provide examples? Indeed this is built on the assumption that if the two artifacts differ then we must have built from two sources that are not semantically identical, so not semantically identical to building from source and that is probably something to look at. I don't think you have explained why this assumption is not sane.
Maybe a release blocker is not the best tool for the job. I can see this breaking and being a false positive at the very least if the software becomes non reproducible.
Indeed, this also should only be active on packages that build reproducibly.
There is no such thing as a "innocuous" difference between two artifacts. Reasonning on the behavioral differences between two executables is something extremely difficult. That is the reason behind the reproducible-builds project
This is not about providing a magical solution to all supply chain security issues, this is about provided a mitigation to attacks using malicious tarballs.
I disagree, as Daniel Stenberg explains, it is often the case that the content differ between the source code and the release tarball. On the other hand, if the software is reproducible and the release tarball doesn't contain anything that could influence the build process, both artifacts should be identical. |
These are different sources that may have different files that can affect the build. Nix is (largely) built on the assumption that with the same inputs, you should get the same outputs; this is directly against that and should thus be expected
From a quick test with curl: curl.overrideAttrs (finalAttrs: {
src = fetchFromGitHub {
owner = "curl";
repo = "curl";
tag = "curl-${lib.replaceStrings ["."] ["_"] finalAttrs.version}";
hash = "sha256-Q6vGENvtpxuhehhyMSz/Q6NmJkhJgp3Vccf5H23+rrs=";
};
}) The build fails outright due to the missing files in the maintainer tarball
A project I personally work on -- Prism Launcher -- distributes both the generated source tarball, as well as a custom tarball including our submodules (since GitHub will not include them by default). You can see them here This setup can make it pretty easy to accidentally include the dependencies we vendor over the systems, which is basically required for many distributions that don't happen to package the sometimes obscure libraries we use. We have tried our best to avoid this issue by adding proper guards to when our CMake configuration will decide to fallback to our vendored dependencies, but this is a problem I've actually had to help other packagers with and still have a small "workaround" for There are probably many more examples of this, with some being much harder to catch. This is just what first came to mind tonight
Yep, this is something that would probably need some kind of specific/individual monitoring IMO. I have no idea what that would be, but hey 😆
There would still be an issue in cases where a package no longer is reproducible, but if it's not a channel blocker or anything, I'm really just nitpicking
When the two artifacts are technically built from different sources, there most definitely can be. This is why the Reproducible Build projects explicitly defines "reproducible builds" as (emphasis mine)
Of course, and I'll admit I was getting pretty out of scope there with the first instance. The second -- referring to payloads split across the actual Git repo and the maintainer tarball -- is still a bit worrying to me though, as the release tarballs exist as a sort of root of trust with this implementation. How reliable is checking the artifacts of two different sources for the actions of a bad actor in one, when that same bad actor controls both sources?
I would put a lot of emphasis on the if here. While reproducibility is common in Nixpkgs, having files that may influence the build seems to be at least somewhat common in release tarballs as well -- with Daniel Stenberg's curl being a small example of this, as shown above |
This doesn't really prove your point though, because the build is failing and not producing a different output. For the
IMO in your tarball embeds code that is not in the original repository, maybe it is not a software supply chain attack but it's still a red flag and should be vetted appropriately. There is no easy way to differentiate between code that is added for legitimate reason and code that is added for malicious reasons (appart maybe if they generate the exact same artifacts). In that sense I stand by my proposition.
The idea is to remove the option of including malicious code in the release tarball and not in the original source. This of course doesn't prevent a malicious maintainer from putting malicious code in the VCS, but probably it makes the attack a bit less stealth and we made one more attack vector impossible to use. |
Oppening this PR as a draft because it is more of a proof of concept design to trigger community feedback that patch destined to be merged as is. At the very least the
comparePhase
should be generalized to check all outputs.This PR showcase a protection option that helps build trust into tarballs used to build software in our
stdenv
. In particular, with this kind of protection we avoid software supply chain attacks built on malicious maintainer provided tarballs such as thexz
backdoor.The idea here is to rebuild xz after the bootstrap from trusted sources. Given that xz is reproducible, if it doesn't produce the same results as the xz built during the bootstrap, something is wrong.
I discuss this proposition and alternatives at great length in this blog post.