Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Crunchyroll?] Only extract requested subtitles #6264

Open
wiiaboo opened this issue Jul 18, 2015 · 8 comments
Open

[Crunchyroll?] Only extract requested subtitles #6264

wiiaboo opened this issue Jul 18, 2015 · 8 comments
Labels

Comments

@wiiaboo
Copy link
Contributor

@wiiaboo wiiaboo commented Jul 18, 2015

Problem

Using --sub-lang to request one or two subtitles from Crunchyroll doesn't just extract the requested subtitles, but instead extracts all of them, leading to big delays before starting the stream, whether you use --all-subs or just --sub-lang enUS.
In the case of sites where the subs just point to a certain URL, the extraction seems faster, so it's probably more of a problem for sites like Crunchyroll where you extract the full subtitles.

Solution 1

At least for sites like Crunchyroll, just extract the requested languages.

Solution 2

Add an option that just extracts the requested languages?

I should probably also mention that this is mostly useful when you want to stream the resulting URL, like through mpv. When you're just using youtube-dl directly to download the video the time extracting the subs is probably not an issue either.

@dstftw dstftw added the request label Jul 18, 2015
@dstftw dstftw mentioned this issue Jul 18, 2015
@remitamine
Copy link
Collaborator

@remitamine remitamine commented Jul 18, 2015

i propose 2 solution for this:

  • pass the requested subtitles to the extractor(i think it's not possible with the current youtube-dl code because the only information passed to the extractor is the url)
  • change the extractor return the subtitles urls and in process_subtitles detect if they are from the crunchyroll than process them in the same way they are processed in the crunchyroll extractor with the differance that the YoutubeDL object know the subtitleslangs so it can get only the requested languages.
@dstftw
Copy link
Collaborator

@dstftw dstftw commented Jul 18, 2015

@remitamine both flawed as well as current approach. The reasonable solution would be a customizable extraction behavior (in particular for crunchyroll - subtitles decryption) that will be used by subtitles extractor or even a postprocessor.

@fstirlitz
Copy link
Contributor

@fstirlitz fstirlitz commented Jul 22, 2015

I had a similar problem while writing #6144. I ended up solving it with a few kludges to plug the downloader infrastructure into subtitle downloading (commit acbc6d38660092e90c4ab36110b30355d26c4363), but I'm not particularly proud of it.

@wiiaboo
Copy link
Contributor Author

@wiiaboo wiiaboo commented Jul 22, 2015

Seems to be an issue not just with subtitles but with resolutions too. At least on my connection, it takes half-a-dozen seconds for each "media info" page to download, even if I just request one resolution.

@humitos
Copy link

@humitos humitos commented Aug 11, 2015

I'm having a similar problem: youtube-dl don't download just requested subtitles. Take a look at this example running:

$ youtube-dl --verbose --sub-lang "en,es" http://www.ted.com/talks/lang/es/john_hodgman_s_brief_digression[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'--restrict-filenames', u'--retries', u'50', u'--continue', u'--verbose', u'--sub-lang', u'en,es', u'http://www.ted.com/talks/lang/es/john_hodgman_s_brief_digression']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2015.08.09
[debug] Git HEAD: 9f3da13
[debug] Python version 2.7.6 - Linux-3.13.0-57-generic-x86_64-with-Ubuntu-14.04-trusty
[debug] exe versions: avconv 9.18-6, avprobe 9.18-6
[debug] Proxy map: {}
[ted] john_hodgman_s_brief_digression: Downloading webpage
[ted] john_hodgman_s_brief_digression: Extracting information
[ted] john_hodgman_s_brief_digression: Downloading m3u8 information
WARNING: Your copy of avconv is outdated and unable to properly mux separate video and audio files, youtube-dl will download single file media. Update avconv to version 10-0 or newer to fix this.
[debug] Invoking downloader on u'http://download.ted.com/talks/JohnHodgman_2008-480p.mp4?apikey=489b859150fc58263f17110eeb44ed5fba4a3b22'
[download] Resuming download at byte 1865239
[download] Destination: John_Hodgman_-_Una_breve_digresi_n_sobre_asuntos_del_tiempo_perdido-374.mp4
[download]   2.9% of 110.46MiB at 98.36KiB/s ETA 18:36^C
ERROR: Interrupted by user
$

But if I list the subtitles, they appear:

$ youtube-dl --verbose --list-subs http://www.ted.com/talks/lang/es/john_hodgman_s_brief_digression
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'--restrict-filenames', u'--retries', u'50', u'--continue', u'--verbose', u'--list-subs', u'http://www.ted.com/talks/lang/es/john_hodgman_s_brief_digression']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2015.08.09
[debug] Git HEAD: 9f3da13
[debug] Python version 2.7.6 - Linux-3.13.0-57-generic-x86_64-with-Ubuntu-14.04-trusty
[debug] exe versions: avconv 9.18-6, avprobe 9.18-6
[debug] Proxy map: {}
[ted] john_hodgman_s_brief_digression: Downloading webpage
[ted] john_hodgman_s_brief_digression: Extracting information
[ted] john_hodgman_s_brief_digression: Downloading m3u8 information
Available subtitles for 374:
Language formats
el       srt, ted
en       srt, ted
it       srt, ted
ar       srt, ted
pt-br    srt, ted
cs       srt, ted
es       srt, ted
ru       srt, ted
nl       srt, ted
pt       srt, ted
zh-tw    srt, ted
tr       srt, ted
zh-cn    srt, ted
ro       srt, ted
pl       srt, ted
fr       srt, ted
bg       srt, ted
hr       srt, ted
de       srt, ted
hu       srt, ted
ja       srt, ted
he       srt, ted
sr       srt, ted
ko       srt, ted
sv       srt, ted
$ 

Thanks!

@wiiaboo
Copy link
Contributor Author

@wiiaboo wiiaboo commented Aug 11, 2015

You need --write-sub in addition to --sub-lang. --sub-lang just selects the ones to download. --all-subs doesn't need --write-sub.

@humitos
Copy link

@humitos humitos commented Aug 11, 2015

@wiiaboo thanks a lot! It worked! I think it shouldn't be necessary to add that option, it doesn't make sense for me :)

@wiiaboo
Copy link
Contributor Author

@wiiaboo wiiaboo commented Oct 10, 2015

There's another way to associate the language names with the codes by reading the page language selection. Example:

languages = {k: v for (v, k) in re.findall(r';([a-z]{2}[A-Z]{2})[^ ]+ data-language="([^"]+)', webpage)}

Is there any way for _get_subtitles or _extract_subtitles to know which languages were requested?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.