Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert urls to the format that the option --download-archive archiveFile.txt converts them to #32730

Closed
3 tasks done
Iridium-Lo opened this issue Feb 23, 2024 · 5 comments
Closed
3 tasks done
Labels

Comments

@Iridium-Lo
Copy link

Iridium-Lo commented Feb 23, 2024

Checklist

  • I'm asking a question
  • I've looked through the README and FAQ for similar questions
  • I've searched the bugtracker for similar questions including closed ones

Question

Is there a way I can convert urls I have into the format that the option --download-archive archiveFIle.txt converts them to?

I deleted my archive file by accident, but still have the urls, I'd like to convert them back into the format they'd be in archiveFile.txt so I don't duplicate downloads.

@dirkf
Copy link
Contributor

dirkf commented Feb 23, 2024

The format is f'{cls.ie_key()} {video_id}', where cls is the IE class used by yt-dl to download the item and video_id is the id of the item from the info-json.

If you still have the info-json files for the archived items you can use a jq command to extract and format these values, or a Python (etc) script.

In general the only way to generate the archive entry is to process the URL with yt-dl, but that won't help with items that are no longer available (though the effect for such items is as if they were in the archive anyway).

Subject to that, I think that the only simple way to regenerate the archive is to re-download the items to a junk location. By using -f "worstvideo/worst" --test, the actual amount of downloading would be trivial, certainly compared with fetching a YouTube bloat web page.

If you can reliably work out the archive index values for some site, then it could be easy to make a script to write a fake archive file.

Related: #13687.

@Iridium-Lo
Copy link
Author

Iridium-Lo commented Feb 23, 2024

many thanks @dirkf, I was trying to avoid redownloading all the files.

I have a bash script which uses gnu parallel to download an array of urls simultaneously with youtube-dl. I use it all the time.

You might think 'that's what a playlist is for,' but creating playlists is time consuming, you have to select the video then add it to the playlist one by one.

What I do is:

  • open many tabs with video urls I want
  • 'bookmark all'
  • copy the text in the book mark
  • paste it to a file, named <whatever> run a script which writes the contents to an array
  • bash downloadSimultaneously.bash <whatever>
  • it makes a directory named <whatever> downloads videos simultaneously to that directory, with the archive option set

Would you accept a PR for that?

@dirkf
Copy link
Contributor

dirkf commented Feb 24, 2024

Isn't this just:

<whatever xargs --max-procs=1 --max-args=1 --delimiter=' ' youtube-dl args...

If the URL list contains items whose generated filename happens to be the same, those downloads could interfere with each other. Ideally two yt-dl instances running at the same time should have different current directories, and the output templates should be relative to those directories.

@Iridium-Lo
Copy link
Author

Iridium-Lo commented Feb 24, 2024

having wrote this I changed my logic. My script has no issues with same url downloads.

  • 1 directory with a text file of download links
  • bash downloadSimultaneously.bash textFileName
  • creates textFileName, cds to created dir, yt-dl starts and makes archive file there. writes textFileName to an array (parallel needs an array)
  • parallel limits to 60 instances
  • running 400 yt-dl instances for example would be too much (for my laptop at least)

say I do args $(cat textFileName), does your command download all the urls simultaneously or asynchronously?

also you'd have to create a dir and cd to it, a manual step, this script is just cmd run.

Also why do you extract things like view count etc? I wanted to add some sites but can't be bothered with that kind of stuff

@Iridium-Lo
Copy link
Author

solution given

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants