Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-platform duplicates (extension duplicates) #23502

Closed
DrewImm opened this issue Dec 23, 2019 · 5 comments
Closed

Cross-platform duplicates (extension duplicates) #23502

DrewImm opened this issue Dec 23, 2019 · 5 comments

Comments

@DrewImm
Copy link

@DrewImm DrewImm commented Dec 23, 2019

  • I'm reporting a broken site support issue
  • I've verified that I'm running youtube-dl version 2019.11.28
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar bug reports including closed ones
  • I've read bugs section in FAQ

Verbose log

No errors or warning generated

Description

Duplicates are generated when a drive is shared between OS's, e.g. Windows & Linux.

This is due to the fact that the video files are saved in different formats (e.g. .mkv and .mp4)

For example, if a batch file is executed on Linux, and then resumed on Windows, each file will be duplicated (once with mkv and once with mp4). This results in duplicated videos and wasted download time.

Example

A batch file contains many links:

...
https://www.youtube.com/watch?v=z7aXex18_wg
...

The batch file is downloaded on Linux, resulting in the file:
5 Simple RV Hacks-z7aXex18_wg.mkv

Now the batch file is resumed on Windows. You'd expect to see '... has already been downloaded', but instead you get the following file
5 Simple RV Hacks-z7aXex18_wg.mp4

Now you have duplicate files and the files that had already been downloaded on Linux are downloaded again

Expected Behavior

Files should not be duplicated, regardless of extension

Proposed Fix

Before downloading a video, check if that video exists in directory with any extension

@remitamine
Copy link
Collaborator

@remitamine remitamine commented Dec 24, 2019

--download-archive FILE          Download only videos not listed in the
                                 archive file. Record the IDs of all
                                 downloaded videos in it.
@remitamine remitamine closed this Dec 24, 2019
@DrewImm
Copy link
Author

@DrewImm DrewImm commented Dec 24, 2019

This does not solve the issue because there is no archive file in the above scenario.

Shouldn't it be the default behavior be to avoid duplicates??

If a video is present within the output directory, shouldn't it be assumed that it would be present in the archive?

@remitamine
Copy link
Collaborator

@remitamine remitamine commented Dec 24, 2019

no:

  • the user might want to download multiple formats, for example, both a webm, mp4 formats.
  • the user might use a different output template.
    ...

so, either download with the same environment(same executables(ex: ffmpeg) and dependencies(ex: pycrypto) present and the same configuration) or use a download archive.

@dstftw
Copy link
Collaborator

@dstftw dstftw commented Dec 24, 2019

  1. These files are not duplicates per se.
  2. As already pointed out by @remitamine you must use download archive feature specially designed for such scenarios.
@DrewImm
Copy link
Author

@DrewImm DrewImm commented Dec 24, 2019

Hindsight is always 20/20. I believe avoiding duplicates should be the default, and allowing extension duplicates should be a cli argument, but that's just me.

Although I don't quite agree, I appreciate the feedback and the awesome software!

@ytdl-org ytdl-org deleted a comment from DrewImm Dec 24, 2019
@ytdl-org ytdl-org locked and limited conversation to collaborators Dec 24, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.