Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store the source URL in extended attributes of a .part file #5467

Open
fstirlitz opened this issue Apr 18, 2015 · 6 comments
Open

Store the source URL in extended attributes of a .part file #5467

fstirlitz opened this issue Apr 18, 2015 · 6 comments
Labels

Comments

@fstirlitz
Copy link
Contributor

@fstirlitz fstirlitz commented Apr 18, 2015

It would be nice if youtube-dl stored the source URL in an extended attribute of the .part file (could be conditionalised on the --xattrs option). This could be useful in the case one wishes to complete a partial video download whose original URL they have forgotten.

Bonus points for having youtube-dl -c path/to/video.mp4.part resume the download (and maybe youtube-dl path/to/video.mp4.part restart downloading from scratch; maybe some other combination of options would be better).

@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented Apr 18, 2015

This can be achieved by doing youtube-dl URL --write-info-json URL and then you can run youtube-dl --load-info <filename>.info.json (I have --write-info-json in my config file, so I can always do that).

@jaimeMF jaimeMF added the request label Apr 18, 2015
@fstirlitz
Copy link
Contributor Author

@fstirlitz fstirlitz commented Apr 18, 2015

Interesting, thanks. But that creates another file, which clutters up directory listings, and which I would have to clean up later. I would rather avoid this kind of inconvenience for the sake of a feature I'll only use occasionally and at rather unpredictable times.

Though that seems to imply that it should be a smallish patch to do. Maybe I'll look into it myself if no one else does.

@fstirlitz
Copy link
Contributor Author

@fstirlitz fstirlitz commented Apr 19, 2015

Okay, I tried implementing this in a local branch, and ran into a problem -- ext4 limits the size of every extended attribute to one block (4K), while the info_dict takes ~50k when serialised into JSON. Since I noticed it contains many redundancies, I tried gzipping the JSON, but even after doing so it takes ~5k.

Three options now:

  • Split the info_dict JSON into parts stored in separate xattrs (straightforward but ugly)
  • Store only the original URL (will not allow transparent restart)
  • Strip down info_dict to bare essentials before serialising (may be hard, invasive, and/or fragile)
@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented Apr 19, 2015

Assuming you are using the same system as the --load-info option, I think you can safely remove the formats field (which for YouTube videos can be quite big), the rest is needed for building the filename (you can use any of the fields in the output template).

Another solution that may be simpler would be to add an option (--save-state or something like that) that will write a <filename>.state.json file that would be deleted when the download is done. Then you could run youtube-dl --load-info <filename>.state.json (maybe we could add a --resume alias to make it clear).

@fstirlitz
Copy link
Contributor Author

@fstirlitz fstirlitz commented Apr 19, 2015

By the time you want to resume the download, the target file name has already been determined. On the other hand, the metadata may need to be kept if a post-processor needs it. (By the way, no information about post-processors seems to be present in info_dict. Is this a bug or a feature?)

Also:

$ youtube-dl -j tw2WwcX7fAg -f bestvideo+bestaudio | jshon -d formats -j | wc -c
4333
$ youtube-dl -j tw2WwcX7fAg -f bestvideo+bestaudio | jshon -d formats -j | gzip | wc -c
1694

Hmm.

@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented Apr 19, 2015

By the time you want to resume the download, the target file name has already been determined.

You are right, I was thinking in the current implementation for --load-info. You'll have to implement your own.

no information about post-processors seems to be present in info_dict. Is this a bug or a feature?

None of them, the postprocessors are global. As with the rest options you would need to use the same command line arguments (unless you want to store the arguments).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.