Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dramafever] Support subtitle extraction in .srt format #6207

Closed
ping opened this issue Jul 13, 2015 · 8 comments
Closed

[dramafever] Support subtitle extraction in .srt format #6207

ping opened this issue Jul 13, 2015 · 8 comments

Comments

@ping
Copy link
Contributor

@ping ping commented Jul 13, 2015

The DramaFever extractor currently only extract subtitles in .ttml format. But DramaFever actually provides subtitles in .srt format but the path to which is only found in the episode/series endpoint (afaik).

Example: http://www.dramafever.com/api/4/episode/series/?cs=DA59dtVXYLxajktV&series_id=4709&page_size=1

{
  "type": "Array",
  "registration_wall_starts_after": 5,
  "num_pages": 4,
  "is_provisioned": true,
  "value": [
    {
      "episode_url": "/drama/4709/1/Oh_My_Ghostess/",
      "subfile": "http://www.dramafever.com/st/sub/ohmyghostess_01_clean_JS.srt",
      "premium": false,
      "is_user_fan_of_series": false,
      "registration_required": false,
      "title": "",
      "new_subfile": "",
      "duration_android": "",
      "release_date": "2015-07-03T23:45:07",
      "twist_thumb": "http://www.dramafever.com/st/img/epth/4709_1.jpg",
      "number": 1,
      "geoblocked": false,
      "is_provisioned": true,
      "duration": "01:05:04",
      "premium_required": false,
      "guid": "4709.1",
      "type": "Episode",
      "id": 18847,
      "has_download_files": false
    }
  ]
}

.srt is a more widely supported format in media players compared to .ttml in my opinion. Can the extractor pull srt subtitles instead, or support the option to choose srt over ttml?

@dstftw
Copy link
Collaborator

@dstftw dstftw commented Jul 13, 2015

Generally, you should use --convert-subtitles for that.

@ping
Copy link
Contributor Author

@ping ping commented Jul 13, 2015

Is ttml supported for conversion? I'm doing it wrong, or it doesn't seem to work for me.

$ youtube-dl --convert-subtitles "srt" --sub-lang "English" --write-sub --skip-download --verbose "http://www.dramafever.com/drama/4709/1/oh-my-ghostess/"
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'--convert-subtitles', u'srt', u'--sub-lang', u'English', u'--write-sub', u'--skip-download', u'--verbose', u'http://www.dramafever.com/drama/4709/1/oh-my-ghostess/']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2015.07.07
[debug] Python version 2.7.5 - Darwin-13.4.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg 2.6.2, ffprobe 2.6.2, rtmpdump 2.4
[debug] Proxy map: {}
[dramafever] 4709.1: Downloading episode JSON
[dramafever] 4709.1: Downloading f4m manifest
[dramafever] 4709.1: Downloading m3u8 information
[info] Writing video subtitles to: Oh My Ghostess 4709.1-4709.1.English.ttml
@dstftw
Copy link
Collaborator

@dstftw dstftw commented Jul 13, 2015

Yes. Conversion itself takes place after download is finished. However, there is a problem when one wants to download just subtitles.

@ping
Copy link
Contributor Author

@ping ping commented Jul 13, 2015

This is unfortunate. I would still argue though that if srt is available at source, it should be a preferred format over ttml.

@dstftw
Copy link
Collaborator

@dstftw dstftw commented Jul 13, 2015

IIRC srt would require additional HTTP request, my rationale was to minimize networking for info extraction pass. Moreover, ttml is more markup-rich, so it's always possible to convert ttml to srt without information loss, but not vice versa.

@ping
Copy link
Contributor Author

@ping ping commented Jul 13, 2015

Understood.

  1. The larger file size of a ttml will nullify much of the network saving in my opinion.
  2. I believe DramaFever's original source subs are in srt, and the ttml is generated from that for the purpose of web streaming.

If you compare http://www.dramafever.com/st/subtitle/4709_1_en.xml against http://www.dramafever.com/st/sub/ohmyghostess_01_clean_JS.srt
A line containing <i> in the srt file is encoded (wrongly) as &lt;i&gt; in the ttml version. The ttml file also contains the bare minimum style info (just the overall style of the sub) and does not make full use of the markup.

For other sources, ttml is possibly more preferred, but in DramaFever's case, I do not see the value.

@dstftw
Copy link
Collaborator

@dstftw dstftw commented Jul 13, 2015

I'm talking about info extraction pass (e.g. --list-sub), not actual subtitles downloading, in this case it's -1 request as I remember.
It's even two more requests since consumer secret is required in case of srt.

@dstftw dstftw closed this in 1d1dd59 Jul 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.