Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This repo is now suddenly massive (many blobs)? #7011

Open
Earnestly opened this issue Sep 28, 2015 · 5 comments
Open

This repo is now suddenly massive (many blobs)? #7011

Earnestly opened this issue Sep 28, 2015 · 5 comments

Comments

@Earnestly
Copy link

@Earnestly Earnestly commented Sep 28, 2015

There seems to be lots of very strange blobs in this repo suddenly (at least the last time I looked) which has pushed the total size to around 121M.

Looking at some of these blobs is a little concerning, such as one blob containing the string This program cannot be run in DOS mode. (c20d06ec2c7d643a4eeb45be26a0c6a6e5b03990)

Here is a list of largest suspicious blobs (there are many more though):

c20d06ec2c7d643a4eeb45be26a0c6a6e5b03990 blob   35795968 14425221 118568
7641ba9ecc6f3aa3c42e5bd0c999bae88578e221 blob   21602572 21556474 43269236
37d582aa9f2207261cef5f06a2f69f305a5ec16e blob   21593168 21521197 16331241
3fbd17129ef9478c3cab5f9b804ba28a0fb225d4 blob   5612499 5416798 37852438
ec793eceeba58e8d621d4007d48bb302352476e6 blob   5488269 4340874 12539858
2ee57c593047f095717a6039089e73e9032b7865 blob   3989886 2837559 4204850
fb3950597507fe72e738b009bdb37f1c9dbe2b46 blob   3988820 2771850 19143485
2d25af7120c35490fc60ba65d5cd30a8fa360399 blob   2868255 1122468 21915335
3122ac66f28bb5eee7a127751a562eed09933330 blob   1237893 98544 18645071
8ee40ce4cf3d69f9e88e9acbe85cd53af51f22b3 blob   1199167 94251 18841299
...

You can find the rest via:
From here

git verify-pack -v .git/objects/pack/pack-*.idx | egrep '^\w+ blob\W+[0-9]+ [0-9]+ [0-9]+$' | sort -nrk 3

It seems that perhaps many contributors may have commited large blobs and software to this repo only to later remove them, but it remained in the history.

Please consider purging/filtering this out of the repository.


More findings in some of these blobs: http://ix.io/l3J

@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented Sep 28, 2015

The reason is that the repo used to have both the youtube-dl and youtube-dl.exe binaries.

As far as I know filtering could be problematic, everyone would need to pull again the whole repository and existing branches would need to be updated (also what would happen to the open pull requests). I don't know if it's worth to purge it, if you don't need the whole history you can always use git clone --depth 1 <repo>.

@Earnestly
Copy link
Author

@Earnestly Earnestly commented Sep 28, 2015

Sorry, but what? Yeah, what if I do want the history? Yeah, fixing it is probably not going to be trivial, but then maybe there shouldn't be almost 100M of blobs in a repo in the first place?

Perhaps when removing the binaries they should have also cleaned the history of these blobs too?

But I don't think this is just the youtube-dl binaries as I've been pulling from git master for awhile now and only after a few weeks the repo almost doubled in size. In theory, the repo should not have increased in size (so drastically) since the removal of the blobs (even if not from history).

@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented Sep 28, 2015

I wasn't around when it was used and nobody seem to complain at the moment and/or step in to help until #603. Since then and until Sep 2014 they were only kept for compatibily with older youtube-dl versions.

Don't misunderstand me, it's an unfortunate situation and it's bad that you need >100M to clone the full repo, but the solution is not easy.

Perhaps when removing the binaries they should have also cleaned the history of these blobs too?

In a public repo and with a relative high userbase with a portion of them using the git repo, that's not a simple task.

@Earnestly
Copy link
Author

@Earnestly Earnestly commented Sep 28, 2015

It's a fairly simple task if you simply don't worry about things you don't need to worry about. Anyway, if nothing will be done about this, okay. At least it's here and google can find it for others looking.

(Most people don't complain because most people don't really care about the means, only the ends.)

@viluon
Copy link

@viluon viluon commented Jul 31, 2019

Sorry for necroing, but I installed from AUR and had to download about 165 MB (while cloning to a bare repository) to build youtube-dl as a package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.