Permalink
9 comments
on commit
sign in to comment.
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Showing
1 changed file
with
7 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
b4061a1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rootkit? Humor or pentest?
b4061a1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Response from Linus:
Misleading URL. Not in my tree, just using github to make it look like it.
These is no actual commit ID b4061a1
in my tree, but when you pass github a SHA1, it doesn't do any
reachability analysis whether that actually exists in the named tree,
so it uses a completely unrelated commit from somebody elses tree on
github.
b4061a1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, all forks of a Github repo are set up to use a sort of a "super-repository" containing all objects from all other forks. The actual forked repositories are thin repacks with alternates set to point to that "super-repo." This allows for huge savings in disk space, because git is able to deduplicate a lot of redundant data and create efficient deltas for most commits. However, this also means that you can fork a repo, add a nasty commit to it like this one, and wait till the "super-repo" fetches it. After that happens, you are able to refer to it from any of the other forks as is demonstrated here.
This behaviour is benign in the sense that the commit in question is not actually part of torvalds/linux.git -- you can clone this repo from Github right now and you won't find this object in the resulting repository. The reason this works on Github is because with alternates, if you look up an object that's not in torvalds/linux.git, it will look in that "super-repository" containing objects from all other forks. Git has no way of telling if such loose object actually belongs in the current fork, because it could have been part of a deleted branch. For example, say you created refs/heads/test and added a few commits to it, but then deleted the test branch. Those commits are now "loose" and will be cleaned up by "git gc" after a period of time. However, you can still get to them with git if you know their hash. When a repository has alternates, all such objects not belonging to any of the heads but present in the "super-repo" are basically "loose objects" as far as git is concerned, and if you know their exact hash, you can get git to show them to you.
This is probably not the behaviour that Github would want to happen, though, as it obviously can lead to confusion. However, I'm not sure if there's a sane (or inexpensive) fix that can limit displayed objects to just what is reachable from any of the actual heads.
b4061a1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mricon That's a very helpful analysis, thank you.
b4061a1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe cache available heads in a hash set for speed, and gc the tree more often?
b4061a1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda pointless but... does anyone know what the payload actually does? I stripped the
\x
and got897df88845f74889025dc3488b45f848d8488b4de08a55f7
then disassembled it but it doesn't look like it's doing anything nefarious...b4061a1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikhiljha it's nonsense. I used
hexdump -C
on a random .o file I had lying around, grabbed some random bytes from it, lazily typed up this code to look vaguely like a back door, and completely forgot thechar
and the 3rd argument tomemcpy
. I'm just showing that GitHub has a bug. This has nothing to do with Linux. I'm sorry that Linus had to be involved in any way.b4061a1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reported to GitHub about this back in October. Their response at the time was: this is a known behavior and a low risk issue. They store blobs and commits in the same super-repo as described by @mricon's comment, and store refs separately for each fork. They only remove blobs and commits from the super-repo when visibility changes (e.g. repository going from public to private)
b4061a1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just stumbled upon this issue as well yesterday & reported as well. Hadn't heard about it before but GitHub's staff say that it's been discussed a lot "over the years" apparently. So no fix in sight it seems.
The issue applies to more URLs than just the GitHub UI, eg https://raw.githubusercontent.com/torvalds/linux/b4061a1/drivers/hid/hid-samsung.c
A timely reminder to not trust these types of URLs, or indeed any URL without careful review.