Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCOVERY] Unable to download .scc subtitles! #19920

Closed
dnlzzxz opened this issue Feb 24, 2019 · 9 comments
Closed

[DISCOVERY] Unable to download .scc subtitles! #19920

dnlzzxz opened this issue Feb 24, 2019 · 9 comments

Comments

@dnlzzxz
Copy link

@dnlzzxz dnlzzxz commented Feb 24, 2019

Please follow the guide below

  • You will be asked some questions and requested to provide some information, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your issue (like this: [x])
  • Use the Preview tab to see what your issue will actually look like

Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2019.02.18. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

  • I've verified and I assure that I'm running youtube-dl 2019.02.18

Before submitting an issue make sure you have:

  • At least skimmed through the README, most notably the FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones
  • Checked that provided video/audio/playlist URLs (if any) are alive and playable in a browser

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)
  • Site support request (request for adding support for a new site)
  • Feature request (request for a new functionality)
  • Question
  • Other

youtube-dl -a list.txt --cookies cookies.txt --write-sub --add-metadata --geo-verification-proxy http://127.0.0.1:10331 --console-title -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-a', 'list.txt', '--cookies', 'cookies.txt', '--write-sub', '--add-metadata', '--geo-verification-proxy', 'http://127.0.0.1:10331', '--console-title', '-v']
[debug] Batch file urls: ['https://www.discovery.com/tv-shows/ed-stafford-first-man-out/full-episodes/mongolia']
[debug] Encodings: locale cp1252, fs mbcs, out cp850, pref cp1252
[debug] youtube-dl version 2019.02.18
[debug] Python version 3.4.4 (CPython) - Windows-10-10.0.17134
[debug] exe versions: ffmpeg 4.1, ffprobe 4.1
[debug] Proxy map: {}
[Discovery] mongolia: Downloading webpage
[Discovery] mongolia: Downloading JSON metadata
[Discovery] mongolia: Downloading m3u8 information
[debug] Default format spec: bestvideo+bestaudio/best
[info] Writing video subtitles to: Mongolia-5c71ecb16b66d145a86cf55b.en.scc
WARNING: Unable to download subtitle for "en": Unable to download webpage: HTTP Error 403: Forbidden (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
[debug] Invoking downloader on 'https://content-asab1.uplynk.com/e731b07d90b94f3e947bd2357483709d/i.m3u8?tc=1&exp=1551042175&rn=844789622&ct=a&cid=e731b07d90b94f3e947bd2357483709d&ad.pingf=3&ad.customer_id=&ad.nw=&ad.prof=&ad.csid=&ad.vip=64.71.174.91&pp2ip=0&ad.cping=1&ad=fw&rays=cdefghiba&v=2&sig=2fd10b538a36b74d888a458d33f364ba21de0c0239003860575716fd608db68e&pbs=a355943f780a4ddea5913c1cf9ba7925'
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 604
[download] Destination: Mongolia-5c71ecb16b66d145a86cf55b.mp4
[download]   1.0% of ~1.55GiB at  2.52MiB/s ETA 06:02
...
<end of log>

Description of your issue, suggested solution and other information

Unable to download .scc subtitles from the Discovery networks sites. Another subtitles seems to be download alright only the .scc aren't downloading.

Thank you very much for the time, and attention.

@Nii-90
Copy link
Contributor

@Nii-90 Nii-90 commented Feb 25, 2019

If it's anything like How The Universe Works (also affected by this, as Science Channel is owned by Discovery), there should also be a Closed Captions stream inside the video stream (I mean that literally; EIA-608 subtitles packed into the H.264 frames as a substream). You can extract them with FFmpeg:

ffmpeg -f lavfi -i movie=input.ts[out+subcc]  -map 0:1  output.srt

If you absolutely need Scenarist subtitles, there's probably something out there that can convert SRT to .scc.

@dnlzzxz
Copy link
Author

@dnlzzxz dnlzzxz commented Feb 25, 2019

@Nii-90 Thank you very much Nii for the heads up. Actually to process it via FFMPEG it has to be a mpegts file... But I found this software https://www.ccextractor.org/public:general:downloads which is free and has a GUI for windows, I could extract the .srt from the file with just a few clicks and literally seconds. Happy days!

Thanks again, I didn't know this detail!

@dnlzzxz dnlzzxz closed this Feb 25, 2019
@Nii-90
Copy link
Contributor

@Nii-90 Nii-90 commented Feb 25, 2019

FFmpeg doesn't require it to be in MPEG-TS. At one time, yes, but not anymore (or the handling of this particular edge case has gotten better). Whenever I grab HTUW, it's definitely in ISO Base Media/MP4, but the EIA-608 subs are correctly extracted with the command above. Before, you could still do it, but the timings would be all wrong; not anymore, though.

On other shows (Miracle Workers off of TBS, for example), ccextractor works correctly, but not for shows where I've had to reconstruct the chapter layout because ytdl doesn't preserve it (stuff from FOX and HTUW - presumably other Science Channel/Discovery shows too, because this is a basic problem with Uplynk-handled streams). On those files, the 'Closed Captions' indicator no longer appears in the h264 stream list, but it is still there and FFmpeg is required to handle it at that point because ccextractor errors out.

@Nii-90
Copy link
Contributor

@Nii-90 Nii-90 commented Feb 25, 2019

To illustrate:

>ccextractorwin -out=smptett -nobom "How The Universe Works - S07E01 - Nightmares of Neutron Stars.mp4"
CCExtractor 0.87, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: How The Universe Works - S07E01 - Nightmares of Neutron Stars.mp4
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .ttml] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: How The Universe Works - S07E01 - Nightmares of Neutron Stars.mp4
Detected MP4 box with name: ftyp
Detected MP4 box with name: moov
File seems to be a MP4
Analyzing data with GPAC (MP4 library)
Opening 'How The Universe Works - S07E01 - Nightmares of Neutron Stars.mp4': [iso file] Unknown box type gmhd
[iso file] Unknown box type gmin
[ISO file] dataReferenceIndex set to 0 in sample entry, overriding to 1

ccextractor didn't extract anything, probably because it didn't recognize one of the atoms that FFmpeg wrote.

Meanwhile, FFmpeg on the same file:

>ffmpeg -f lavfi -i movie="How The Universe Works - S07E01 - Nightmares of Neutron Stars.mp4"[out+subcc] -map 0:1 test.ass
ffmpeg version r93220+5 master-c679119a73 HEAD-2dbbe2c02c
 contains: datetime new_pkgconfig silent_invoke versioninfo
 Copyright (c) 2000-2019 the FFmpeg developers
  built on Feb 22 2019 22:24:01 with gcc 8.3.0 (GCC)
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 47.102 / 58. 47.102
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  6.101 / 58.  6.101
  libavfilter     7. 48.100 /  7. 48.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
  libpostproc    55.  4.100 / 55.  4.100
[mov,mp4,m4a,3gp,3g2,mj2 @ 00000274d0389400] stream 0, timescale not set
Input #0, lavfi, from 'movie=How The Universe Works - S07E01 - Nightmares of Neutron Stars.mp4[out+subcc]':
  Duration: N/A, start: 0.000000, bitrate: N/A
    Stream #0:0: Video: rawvideo (I420 / 0x30323449), yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], 29.97 fps, 29.97 tbr, 90k t
bn, 90k tbc
    Stream #0:1: Subtitle: eia_608
Output #0, ass, to 'test.ass':
  Metadata:
    encoder         : Lavf58.26.101
    Stream #0:0: Subtitle: ass (ssa)
    Metadata:
      encoder         : Lavc58.47.102 ssa
Stream mapping:
  Stream #0:1 -> #0:0 (eia_608 (cc_dec) -> ass (ssa))
Press [q] to stop, [?] for help
size=       7kB time=00:02:48.87 bitrate=   0.3kbits/s speed=1.85x
video:0kB audio:0kB subtitle:5kB other streams:0kB global headers:1kB muxing overhead: 42.817738%

I stopped it early, since the lavfi subcc filter is slow. It properly detects the file as MP4, and it extracts the subtitles.

@dnlzzxz
Copy link
Author

@dnlzzxz dnlzzxz commented Feb 26, 2019

Huh, that's interesting! But I'm actually able to extract correctly subs from HTUW with ccextractor. Just ran the command and it did extracted flawlessly and converted it to .srt.

Look:

ccextractorwin` -out=srt -bom -latin1 "Battle of the Dark Universe-5c6c897f6b66d145a86ced8c.mp4"
CCExtractor 0.87, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc

Input: Battle of the Dark Universe-5c6c897f6b66d145a86ced8c.mp4
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: Latin-1] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

Opening file: Battle of the Dark Universe-5c6c897f6b66d145a86ced8c.mp4
Detected MP4 box with name: ftyp
Detected MP4 box with name: free
Detected MP4 box with name: mdat
File seems to be a MP4
Analyzing data with GPAC (MP4 library)
Opening 'Battle of the Dark Universe-5c6c897f6b66d145a86ced8c.mp4': [iso file] Unknown box type desc
[iso file] Box "data" is invalid in container desc
ok
Track 1, type=vide subtype=avc1
Track 2, type=soun subtype=MPEG
MP4: found 2 tracks: 1 avc and 0 cc
Processing track 1, type=vide subtype=avc1
100%  |  42:29Processing track 2, type=soun subtype=MPEG

Closing media: ok
Found 1 AVC track(s). Found no dedicated CC track(s).


Total frames time:        00:42:29:079  (76396 frames at 29.97fps)

Min PTS:                                00:00:00:000
Max PTS:                                00:42:29:112
Length:                          00:42:29:112
Done, processing time = 19 seconds
Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues

But in this case I used the parameters that were recommended in the GUI. And that is the difference, you're using different parameters. Why?

I played the episode and the characters are fine, the sync seems fine as well. And the most impressive is the time used to process, just 19 seconds that's amazing!

In the practical case I'm running a .bat script for %A IN (*.mp4) DO ccextractorwin -out=srt -bom -latin1 "%A" that takes all the video files in the folder and extract it's subs to .srt files. I think it can't keep any better.

I just didn't understand the chapters part you mentioned tough. What do you mean? Do the episodes have chapters, why would you use them?

@Nii-90
Copy link
Contributor

@Nii-90 Nii-90 commented Feb 26, 2019

The reason it succeeded in your example with HTUW is because you just told youtube-dl to download it, with hls-native (because I know the --hls-prefer-ffmpeg option doesn't like to operate well on Uplynk), plain as anything else. That's the exact same reason I gave for why Miracle Workers worked as well, although TBS doesn't use Uplynk and the chapter information is preserved in the file youtube-dl does download when you use the --add-metadata option.

TTML output and no-BOM UTF-8 is just a personal choice - any format errors out the same way, because there's something about the concatenated file ccextractor doesn't like (my money's on that gmin box error).

Yes, the videos do have chapters, as evidenced when you watch them through the online players (they exist at the spots where ads would be inserted, and these boundaries exist in the *.m3u8 files as well, which is how I can recover them). Partially it's for completeness that I want to keep them. And while I could certainly just download it twice, then throw away the individual segments, I at least tried to streamline the process so I used the minimum amount of bandwidth, leading to the problem that the stitched-together file can only have its subs extracted by FFmpeg.

@dnlzzxz
Copy link
Author

@dnlzzxz dnlzzxz commented Feb 27, 2019

Oh well, now I got it! Thank you for taking the time to explain, I really appreciate it.
Have good downloads my friend, see you!

@Nii-90
Copy link
Contributor

@Nii-90 Nii-90 commented Mar 4, 2019

As a follow-up, through a bit of a run-around, I found out what laid at the heart of the problem with my fix for chapters not working with ccextractor: FFmpeg creates a junk track (or rather, likely redundant, iTunes-style chapter info) when putting chapter-laden metadata into the file, and this is what ccextractor chokes on.

A bug report on FFmpeg's tracker from several years ago pointed out a solution, and now I can generate files without that junk (while retaining all the metadata, cover art/thumbnail, and chapter info), and ccextractor is fine with it.

@dnlzzxz
Copy link
Author

@dnlzzxz dnlzzxz commented Mar 14, 2019

Well, I'm glad you made it! It's a shame that they didn't fix it, probably someone else is struggling with it somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.