Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should `--all-sub` works also for automatic captions or only for subtitles? #1412

Closed
jaimeMF opened this issue Sep 12, 2013 · 11 comments
Closed

Should `--all-sub` works also for automatic captions or only for subtitles? #1412

jaimeMF opened this issue Sep 12, 2013 · 11 comments
Assignees

Comments

@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented Sep 12, 2013

I think that it should also work for automatic captions.
The other question is if --all-sub should set --write-sub to keep backwards compatibility or if it would require either --write-sub or --write-auto-sub to work.

@phihag
Copy link
Contributor

@phihag phihag commented Sep 12, 2013

The way I understand it, --all-sub is an alternative to --sub-lang, and should be handled as such (we may want to deprecate it in favor of a better name or simply --sub-lang all). I.e. if the available subtitles are en, fr, auto-en, auto-es, download

  • --write-sub --sub-lang en: Download en
  • --write-sub --write-auto-sub --sub-lang en: Download en, auto-en
  • --write-sub --all-sub: Download en, fr
  • --write-auto-sub --all-sub: Download auto-en, auto-es
  • --write-sub --write-auto-sub --all-sub: Download en, fr, auto-en, auto-es
  • --all-sub: Download en, fr (Backwards compatibility)
@ghost ghost assigned jaimeMF Sep 12, 2013
@jaimeMF jaimeMF closed this in 0b7f311 Sep 14, 2013
@jaimeMF
Copy link
Collaborator Author

@jaimeMF jaimeMF commented Sep 14, 2013

Done, but currently --write-sub --write-auto-sub --all-sub would download en, fr, auto-es. Downloading both en and auto-en would require creating a new field in info dicts for automatic captions.

@porg
Copy link

@porg porg commented Jan 1, 2016

After a lot of trial and error, reading the manpages and the issue tracker, plus finding a proper demo video, I determined the subtitle logic as of youtube-dl --version 2015.12.23 to suit my needs.

I wanted to achieve this: I specify my desired languages. If manually created subtitles exists, download these, else try to get auto subtitles. Manually created subtitles are preferred over auto subtitles, no need to keep both versions. Thus no language code conflicts, no need for a further "auto" language code class for me.

BTW: The auto-translated version at best comes from translating a manual subtitle or from translating from an audio/speech recognition of the audio source plus, or its a combination from multiple manual subtitles, and/or speech recognition, or some even weirder Big Data algorithm voodoo.

Test video

https://www.youtube.com/watch?v=4zVS7BLbbkk has an English manual subtitle.
I'm 100% certain that this one has a manually created English subtitle because:

  1. The Youtube CC states so (not always reliably in conjunction with youtube-dl, remark later).
  2. The dialogues are sometimes prefixed with the character names. Current AI could never achieve this!
  3. Contains paraphrases such as (GIGGLES) (WHINING) (PANTING HAPPILY). Current AI can not do this!
  4. It's an official Disney Youtube outlet, as such providing the necessary quality.

Command line and results

youtube-dl https://www.youtube.com/watch?v=4zVS7BLbbkk --skip-download --write-sub --write-auto-sub --sub-lang "de,en,fr"

The EN manual subtitle is downloaded. I'm certain as it contains the character name prefixes and paraphrases.
The DE and FR auto subtitles get downloaded. These are translations from the EN manual subtitle, as they contain the according character name prefixes and paraphrases.

The CCs as stated by Youtube seem to not always work

--write-subs --sub-lang "en" works for the forementioned https://www.youtube.com/watch?v=4zVS7BLbbkk but not for https://www.youtube.com/watch?v=tFoUuFq3vHw (Youtube CC stated as "English UK" but regardless wether requesting "en" or "en-uk" or "uk" or "en*" I always get WARNING: en subtitles not available for tFoUuFq3vHw. --write-sub --write-auto-sub --sub-lang "en" works though. I guess EN is technically a auto subtitle, but it comes from a manual created subtitle with a language code known by Youtube but not youtube-dl, which results in the auto subtitle being the same as the manual source, at most very small changes (such as spelling variations). What let's me suppose this are these two sample lines which the auto recognizer could hardly ever create, the "hisssss" with the many S-es and the quotes around "quack", current speech recognition hardly ever realizes this.

2
00:00:19,880 --> 00:00:27,160
Mother duck said, "Quack, quack, quack"
But only four little ducks came back

16
00:02:31,840 --> 00:02:36,800
The tyre on the bus goes…hisssssssssssssssssssssssssssssssssssssssssssss
@dstftw
Copy link
Collaborator

@dstftw dstftw commented Jan 1, 2016

youtube-dl https://www.youtube.com/watch?v=tFoUuFq3vHw --list-sub

...
Available subtitles for tFoUuFq3vHw:
Language formats
en-GB    srt, vtt, sbv
...
@porg
Copy link

@porg porg commented Jan 1, 2016

Thanks, the --list-sub is even mentioned in the manpage! Oversaw it!
Nevertheless, is there a query syntax for "any English variation available" like "en*" or similar?

@dstftw
Copy link
Collaborator

@dstftw dstftw commented Jan 1, 2016

No.

@porg
Copy link

@porg porg commented Jan 1, 2016

But with "en,en-us,en-uk,en-gb,etc" one should be safe?
Or are there further permutations such as "eng-gb, eng-uk, eng-us" or is it even case sensitive?

I'd welcome a language wildcard ("I want 'en' no matter what, or I want 'en-uk,en*' as in 'British English else any English available' ") to spare the user lengthy language code guesswork/generalisation or the need for a manual --list-sub investigation prior downloading, which somehow diminuishes youtube-dl as a batch tool.

@dstftw
Copy link
Collaborator

@dstftw dstftw commented Jan 1, 2016

Format code in general is arbitrary string since this metadata is provided by video services.

@porg
Copy link

@porg porg commented Jan 1, 2016

No possibility for a wildcard syntax which greps through those arbitrary strings, empowering us as they anyhow potentially differ across providers?

@jaimeMF
Copy link
Collaborator Author

@jaimeMF jaimeMF commented Jan 2, 2016

@porg as explained in the bug reporting instructions, please open a new issue.

@porg
Copy link

@porg porg commented Jan 2, 2016

@jaimeMF opened #8120.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.