Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary disk space usage with the native hls downloader #7087

Closed
e00E opened this issue Oct 7, 2015 · 7 comments
Closed

Unnecessary disk space usage with the native hls downloader #7087

e00E opened this issue Oct 7, 2015 · 7 comments

Comments

@e00E
Copy link

@e00E e00E commented Oct 7, 2015

When downloading an HLS stream using the --hls-prefer-native command line option from Twitch.tv for example, while downloading the used disk space is effectively double as much as it needs to be.
This happens because youtube-dl keeps all the fragment files and at the same time writes into the final file containing all the fragments.

I believe this is done at youtube_dl/downloader/hls.py +97. There in a loop for all fragments, each fragment is downloaded AND appended to the final output file. Only after the loop is finished the fragment files are removed.

This can be fixed by either removing a successfully downloaded fragment right after appending it to the final file or by waiting to create the final file in the deletion step after all fragment files are downloaded.

I am guessing fixing this might create some problem for resuming aka continuing an older download however I think this could be fixed by merely storing a list of the fragments that have been successfully downloaded instead of the files and their contents.

@remitamine
Copy link
Collaborator

@remitamine remitamine commented Oct 7, 2015

i asked to make this change before(in this pull request #6392) and this is the answer from one of the core Collaborators(@jaimeMF):

Segments should be appended to the file as they are being downloaded, because it allows to directly play a livestream.
About keeping the segments: that's how we currently support resuming, it could be better to store the list of downloaded segments but it would make the code a bit more complex.

@e00E
Copy link
Author

@e00E e00E commented Oct 7, 2015

I made pull request #7088 to fix this bug.

@e00E
Copy link
Author

@e00E e00E commented Oct 7, 2015

Oh just saw your comment. I see. If the default behavior will not be changed I would really like a command line option to make the file not be able to be directly played but not use double the disk space.
I am downloading fairly big VODs and this is a real problem for me.

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Oct 7, 2015

With --continue (it's a default now), segments should be kept unless there's a better implementation. I guess dropping intermediate fragments is acceptable if --no-continue is specified.

@remitamine
Copy link
Collaborator

@remitamine remitamine commented Oct 7, 2015

for m3u8 it is possible to continue download and in the same time be able to play the video without appending the segments to the file.

@e00E
Copy link
Author

@e00E e00E commented Oct 7, 2015

I might just look into storing a list of downloaded fragments instead of the whole fragments for a pull request myself. It does not look that complicated.

@raszpl
Copy link

@raszpl raszpl commented Sep 6, 2016

I remember experiencing this a ~year ago, and then it went away?!?!?
up to yesterday when downloading twitch streams files were kept at 0 size?(or growing very slowly compared to the downstream traffic) and I assumed fragments were stored in alternate NTFS stream? but today I got a twitch stream that wouldnt download using 6 months old youtube-dl.exe so I upgraded :/ and there it is again, double disk usage and >10000 small files in one directory while downloading :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.