Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mewatch unable to download video; Unable to download JSON metadata #32043

Open
6 tasks done
TechvitalCompitar opened this issue Apr 14, 2023 · 19 comments · May be fixed by #32172
Open
6 tasks done

mewatch unable to download video; Unable to download JSON metadata #32043

TechvitalCompitar opened this issue Apr 14, 2023 · 19 comments · May be fixed by #32172
Labels
broken-IE problem with existing site extraction patch-available

Comments

@TechvitalCompitar
Copy link

Checklist

  • I'm reporting a broken site support issue
  • I've verified that I'm running youtube-dl version 2021.12.17
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar bug reports including closed ones
  • I've read bugs section in FAQ

Verbose log

PASTE VERBOSE LOG HERE

Description

WRITE DESCRIPTION HERE

Hi, I am unable to download any video from mewatch recently, i have been receiving this error while trying to download the videos.
image

@dirkf
Copy link
Contributor

dirkf commented Apr 14, 2023

I checked this earlier report: yt-dlp/yt-dlp#6718

The site is obviously not working in the way it did before.

The failure is on access to this API URL: http://tvpapi.as.tvinci.com/v2_9/gateways/jsonpostgw.aspx?m=GetMediaInfo

Either this API no longer works, or there is a newer version (.../v3_0/... ?). This would have to be identified by tracing browser requests, or by reverse engineering the site JS, or from secret knowledge. This might have to be done in-region.

However there is an outstanding PR #25898 which does update the API version to v3_9. Please try that.

CC: @hueyy (PR author)

@dirkf
Copy link
Contributor

dirkf commented Apr 15, 2023

Apparently the original video host tvinci.com was acquired by Kaltura; however the transitional API URL added in the PR is also 404 now.

Probably the site is using the Kaltura hosting directly. For other sites that use Kaltura we can form the pseudo-URL kaltura:{partner_id}:{entry_id}, where partner_id is linked to the site and entry_id identifies the media item.

As OP's example is a super-long URL in an image I won't be bothering to test it (see manual: BUGS). The page in the yt-dlp issue gives this

  • partner_id: 2082301, from this URL loaded in the page https://cdnapisec.kaltura.com/p/2082301/embedPlaykitJs/uiconf_id/49453092/
  • entry_id: 1_g9ihx6sz, from the hydration JSON.

Using yt-dl on kaltura:2082301:1_g9ihx6sz: An extractor error has occurred. (caused by KeyError(u'dataUrl',))

With yt-dlp: Kaltura said: Invalid entry id ["1_g9ihx6sz"]

Perhaps this has expired?

@october262
Copy link

this episode works for me - https://www.mewatch.sg/watch/Titoudao-Inspired-by-the-True-Story-of-a-Wayang-Star-E1-Matchstick-Girl-136853
but you need to use a browser add-on called the stream detector
to download the episode - yt-dlp --no-part --restrict-filenames -N 4 --user-agent "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36" --referer "https://www.mewatch.sg/" "https://cdnapisec.kaltura.com/p/2082311/sp/208231100/playManifest/protocol/https/entryId/0_ie93g2ql/format/mpegdash/tags/web_hd/f/a.mpd?clientTag=html5:v2.0.0&playSessionId=7ac32418-35f1-35a0-233a-e2a25f47ab7f:b9995535-1445-34c0-88bc-87760b867929"

tried the awards ceremony but it is region locked.

@dirkf
Copy link
Contributor

dirkf commented Apr 15, 2023

The partner ID seems to have changed since the page from the yt-dlp issue was created. Now 2082311, was 2082301.

The modified pseudo-URL kaltura:2082311:1_g9ihx6sz works in both yt-dl and yt-dlp. -f worst gives http://cdnapi.kaltura.com/p/2082311/sp/208231100/playManifest/entryId/1_g9ihx6sz/format/url/protocol/http/flavorId/1_d730hmji which is gettable from UK.

Similarly kaltura:2082311:0_ie93g2ql is playable from yt-dlp with -f worst -o - | mpv -.

@dirkf
Copy link
Contributor

dirkf commented Apr 15, 2023

This MeWatchIE._real_extract() seems to work:

    def _real_extract(self, url):
        video_id = self._match_id(url)
        webpage = self._download_webpage(
            url, video_id, note='Downloading video page')
        page_data = self._search_regex(
            r'(?s)window\s*\.\s*__data\s*=\s*(\{.*?\})\s*</script>',
            webpage, 'hydration JSON')
        page_data = self._parse_json(page_data, video_id)
        partner_id = traverse_obj(
            page_data, ('app', 'config', 'playback', 'kalturaThumbnailBaseUrl'),
            expected_type=lambda u: (url_or_none(u) or '').rstrip('/').rpartition('/')[2] or 2082311)
        show_data = traverse_obj(page_data,
                                 ('cache', 'page', Ellipsis, 'entries',
                                  lambda _, v: v['item']['id'] == video_id),
                                 get_all=False)
        
        entry_id = traverse_obj(show_data, ('item', 'customFields', 'EntryId'))

        txt_or_none = lambda x: x.strip() or None

        return merge_dicts(
            {'_type': 'url_transparent'},
            self.url_result(
                'kaltura:%s:%s' % (partner_id, entry_id), ie='Kaltura', video_id=video_id),
            {
                'title': traverse_obj(show_data, 'title', ('item', ('title', ('customFields', 'sortTitle'), 'episodeName')),
                                      get_all=False, expected_type=txt_or_none) or self._generic_title(url),
                'description': traverse_obj(show_data, ('item', ('description', 'shortDescription')), get_all=False, expected_type=txt_or_none),
                'uploader': traverse_obj(show_data,('item', 'distributor'), expected_type=txt_or_none),
                'categories': traverse_obj(show_data,('item', 'genres', Ellipsis), expected_type=txt_or_none),
                'episode_number': traverse_obj(show_data,('item', 'episodeNumber'), expected_type=int_or_none),
                'episode': traverse_obj(show_data, ('item', 'episodeName'), expected_type=lambda x: re.sub(r'^Ep\s+\d+\s+(.*?)\s*$', r'\1', x) or None),
                'season_id': traverse_obj(show_data, ('item', 'seasonId'), expected_type=txt_or_none),
                'series_id': traverse_obj(show_data, ('item', ('showId', ('season', 'show', 'id'))), get_all=False, expected_type=txt_or_none),
                'season_number': traverse_obj(show_data,('item', 'season', 'seasonNumber'), expected_type=int_or_none),
                'season': traverse_obj(show_data, ('item', 'season', 'title'), expected_type=txt_or_none),
                'series': traverse_obj(show_data, ('item', 'season', 'show', 'title'), expected_type=txt_or_none),
            })

I didn't investigate how ToggleIE should be updated. The first test fails in the same way as the MeWatch pages.

@dirkf
Copy link
Contributor

dirkf commented Apr 16, 2023

Apparently the ToggleIE tests are all expired. New examples welcome.

@benjaminyam
Copy link

benjaminyam commented Apr 19, 2023

I just realized what is being asked for ToggleIE and the url format no longer exists. All of them are in the MeWatchIE format now. So we can just go with MeWatchIE and not bother with ToggleIE format

@zengjiawei98
Copy link

I am not sure how I can help but I hope my info can help in some way.

Say I wanted to download this video link,
https://www.mewatch.sg/watch/The-Star-Athlete-E9-Floorball-368951

It creates a .mpd file where it shows that files are hosted on cloudfront.net

And with the .mpd file, I could play the stream via VLC.

And these are the content of the .mpd file
<?xml version="1.0"?> <MPD xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:mpeg:dash:schema:mpd:2011" xsi:schemaLocation="urn:mpeg:dash:schema:mpd:2011 http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-DASH_schema_files/DASH-MPD.xsd" type="static" mediaPresentationDuration="PT2727.080S" minBufferTime="PT4S" profiles="urn:mpeg:dash:profile:isoff-main:2011"> <Period> <AdaptationSet id="1" segmentAlignment="true" maxWidth="1920" maxHeight="1080" maxFrameRate="25"> <SegmentTemplate timescale="1000" media="https://d1gbibt4vs5yna.cloudfront.net/dash/p/2082311/sp/208231100/serveFlavor/entryId/1_4n8bmm4x/v/1/pv/1/ev/17/flavorId/1_,a5kbb1y2,f1tmjacw,dze9tizj,lcfh661u,9szc627d,c8c75745,/forceproxy/true/name/a.mp4.urlset/fragment-$Number$-$RepresentationID$.m4s" initialization="https://d1gbibt4vs5yna.cloudfront.net/dash/p/2082311/sp/208231100/serveFlavor/entryId/1_4n8bmm4x/v/1/pv/1/ev/17/flavorId/1_,a5kbb1y2,f1tmjacw,dze9tizj,lcfh661u,9szc627d,c8c75745,/forceproxy/true/name/a.mp4.urlset/init-$RepresentationID$.mp4" duration="4000" startNumber="1"> </SegmentTemplate> <Representation id="f1-v1-x3" mimeType="video/mp4" codecs="avc1.42c01e" width="640" height="360" frameRate="25" sar="1:1" startWithSAP="1" bandwidth="715430"> </Representation> <Representation id="f2-v1-x3" mimeType="video/mp4" codecs="avc1.42c01e" width="854" height="480" frameRate="25" sar="1:1" startWithSAP="1" bandwidth="1293595"> </Representation> <Representation id="f3-v1-x3" mimeType="video/mp4" codecs="avc1.42c01f" width="960" height="540" frameRate="25" sar="1:1" startWithSAP="1" bandwidth="1811971"> </Representation> <Representation id="f4-v1-x3" mimeType="video/mp4" codecs="avc1.42c028" width="1280" height="720" frameRate="25" sar="1:1" startWithSAP="1" bandwidth="2434897"> </Representation> <Representation id="f5-v1-x3" mimeType="video/mp4" codecs="avc1.640028" width="1920" height="1080" frameRate="25" sar="1:1" startWithSAP="1" bandwidth="3945434"> </Representation> </AdaptationSet> <AdaptationSet id="2" segmentAlignment="true"> <AudioChannelConfiguration schemeIdUri="urn:mpeg:dash:23003:3:audio_channel_configuration:2011" value="1"/> <SegmentTemplate timescale="1000" media="https://d1gbibt4vs5yna.cloudfront.net/dash/p/2082311/sp/208231100/serveFlavor/entryId/1_4n8bmm4x/v/1/pv/1/ev/17/flavorId/1_,a5kbb1y2,f1tmjacw,dze9tizj,lcfh661u,9szc627d,c8c75745,/forceproxy/true/name/a.mp4.urlset/fragment-$Number$-$RepresentationID$.m4s" initialization="https://d1gbibt4vs5yna.cloudfront.net/dash/p/2082311/sp/208231100/serveFlavor/entryId/1_4n8bmm4x/v/1/pv/1/ev/17/flavorId/1_,a5kbb1y2,f1tmjacw,dze9tizj,lcfh661u,9szc627d,c8c75745,/forceproxy/true/name/a.mp4.urlset/init-$RepresentationID$.mp4" duration="4000" startNumber="1"> </SegmentTemplate> <Representation id="f2-a1-x3" mimeType="audio/mp4" codecs="mp4a.40.2" audioSamplingRate="44100" startWithSAP="1" bandwidth="93589"> </Representation> <Representation id="f4-a1-x3" mimeType="audio/mp4" codecs="mp4a.40.2" audioSamplingRate="44100" startWithSAP="1" bandwidth="125588"> </Representation> <Representation id="f6-a1-x3" mimeType="audio/mp4" codecs="mp4a.40.2" audioSamplingRate="44100" startWithSAP="1" bandwidth="64001"> </Representation> </AdaptationSet> </Period> </MPD>

@dirkf
Copy link
Contributor

dirkf commented Apr 19, 2023

For URLs like that we know what to do, but it doesn't obviously involve DASH:

$ python -m youtube_dl -v -F 'https://www.mewatch.sg/watch/The-Star-Athlete-E9-Floorball-368951'
[debug] System config: [u'--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-F', u'https://www.mewatch.sg/watch/The-Star-Athlete-E9-Floorball-368951']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: d7b502a72
[debug] Python version 2.7.18 (CPython) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[mewatch] Extracting URL: https://www.mewatch.sg/watch/The-Star-Athlete-E9-Floorball-368951
[mewatch] 368951: Downloading video page
[Kaltura] Extracting URL: kaltura:2082311:1_4n8bmm4x
[Kaltura] 1_4n8bmm4x: Downloading video info JSON
[Kaltura] 1_4n8bmm4x: Downloading m3u8 information
[info] Available formats for 1_4n8bmm4x:
format code        extension  resolution note
hls-audio-Chinese  mp4        audio only [zh] Chinese 
mp4-65             mp4        audio only   65k , isom container, 0fps, audio@ 65k, ~21.26MiB
mp4-195            mp4        320x180     195k , isom container, avc1@ 195k, 25fps, audio@  0k, ~63.60MiB
hls-222            mp4        320x180     222k video@ 222k, audio@  0k
mp4-472            mp4        480x270     472k , isom container, avc1@ 472k, 25fps, audio@  0k, ~153.73MiB
hls-512            mp4        480x270     512k video@ 512k, audio@  0k
mp4-789            mp4        640x360     789k , isom container, avc1@ 789k, 25fps, audio@  0k, ~256.64MiB
hls-844            mp4        640x360     844k video@ 844k, audio@  0k
mp4-1399           mp4        854x480    1399k , isom container, avc1@1399k, 25fps, audio@  0k, ~455.00MiB
hls-1482           mp4        854x480    1482k video@1482k, audio@  0k
mp4-1917           mp4        960x540    1917k , isom container, avc1@1917k, 25fps, audio@  0k, ~623.53MiB
hls-2024           mp4        960x540    2024k video@2024k, audio@  0k
mp4-2572           mp4        1280x720   2572k , isom container, avc1@2572k, 25fps, audio@  0k, ~836.39MiB
mp4-4084           mp4        1920x1080  4084k , isom container, avc1@4084k, 25fps, audio@  0k, ~1.30GiB (best)
$

@dirkf dirkf added broken-IE problem with existing site extraction patch-available labels Apr 19, 2023
@benjaminyam
Copy link

@dirkf: Can you check if the downloaded files are playable? I was able to "download" using your mewatch _real_extract code, but the output file was not playable in VLC.

@dirkf
Copy link
Contributor

dirkf commented Apr 21, 2023

python -m youtube_dl -v -f worst -o - 'https://www.mewatch.sg/watch/The-Star-Athlete-E9-Floorball-368951' | mpv - gives a grey screen and lots of decode errors from mpv for me. I assume it's encrypted and we ought to find out how that can be detected from the metadata.

Same for 'https://www.mewatch.sg/watch/Titoudao-Inspired-by-the-True-Story-of-a-Wayang-Star-E1-Matchstick-Girl-136853'.

Maybe all shows are "protected"? This needs to be tested in-region using a browser with DRM disabled: how?.

@benjaminyam
Copy link

benjaminyam commented Apr 21, 2023

this episode works for me - https://www.mewatch.sg/watch/Titoudao-Inspired-by-the-True-Story-of-a-Wayang-Star-E1-Matchstick-Girl-136853 but you need to use a browser add-on called the stream detector to download the episode - yt-dlp --no-part --restrict-filenames -N 4 --user-agent "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36" --referer "https://www.mewatch.sg/" "https://cdnapisec.kaltura.com/p/2082311/sp/208231100/playManifest/protocol/https/entryId/0_ie93g2ql/format/mpegdash/tags/web_hd/f/a.mpd?clientTag=html5:v2.0.0&playSessionId=7ac32418-35f1-35a0-233a-e2a25f47ab7f:b9995535-1445-34c0-88bc-87760b867929"

tried the awards ceremony but it is region locked.

@dirkf: The yt-dlp command provided by @october262 works for me in the correct region, but if I run the https://www.mewatch.sg/watch/Titoudao-Inspired-by-the-True-Story-of-a-Wayang-Star-E1-Matchstick-Girl-136853 through yt-dlp as an input using your real extract, it has the grey screen like you mention

@dirkf
Copy link
Contributor

dirkf commented Apr 22, 2023

The HLS formats seem to work, but the Kaltura extractor doesn't know about DASH. Using a similar tricktechnique to that used for HLS gets some DASH formats but they give 400.

Passing the original URL through to Kaltura (kaltura_url = smuggle_url(kaltura_url, {'source_url': url})) seems like a good idea but the plain formats (eg worst) are even less playable "Failed to recognize file format." ffprobe produces errors and then identifies H.264+AAC(LC) video.

Maybe some browser tracing would show how the playSession query parameter is being generated. The other parts of the quoted DASH URL all seem to be accessible.

It might also be useful to know what @zengjiawei98's MPD URL was.

@chlee00
Copy link

chlee00 commented Apr 24, 2023

3 urls are capture from hls stream detector

https://d1gbibt4vs5yna.cloudfront.net/dash/p/2082311/sp/208231100/serveFlavor/entryId/1_4n8bmm4x/v/1/pv/1/ev/17/flavorId/1_,a5kbb1y2,f1tmjacw,dze9tizj,lcfh661u,9szc627d,c8c75745,/forceproxy/true/name/a.mp4.urlset/manifest.mpd

https://cdnapisec.kaltura.com/p/2082311/sp/208231100/playManifest/protocol/https/entryId/1_4n8bmm4x/format/mpegdash/tags/web_hd/f/a.mpd?clientTag=html5:v2.0.0&playSessionId=189b24a0-13c2-cb94-0f18-d61f8161d74b:c87f8895-e1f8-045c-be8d-c4885f66bef4

https://rest-as.ott.kaltura.com/api_v3/service/assetFile/action/playManifest/partnerId/147/assetId/1461060/assetType/media/assetFileId/18641451/contextType/PLAYBACK/isAltUrl/False/ks/djJ8MTQ3fK4vKULHMeo0NLLwFzN8mlZbK3sx9_NBc5rflsZ5VulcejRvAfmFnqR53pswqosPRVNF1rV6nq2H6deDViKKkwd9B-SrukEEEKxByUVIga__QcytKI5F9yhx_jFXX2pBDzyXr4011Rs-93khQN18wqFlStF9d-7ADZ7vL3odzHNnAa9xPSyMQX7pw39GivAYhKgj1LDDmt-8EgoQVcB5GxcFiq0Nt46plYInJEMWlitVXZQAwLZWo7wCXjuXIjBPHql6zIEIFleeHnFheB1dZOfz2FvbBOuc89s7f_1bsOQm-t_xIiWipOxXgvs14_2f587EcwdoU_CtpcOf4ccyI1MLQFKpgV5dEIKNOI9zzflZBq05-GTGriQoNJLTP9JyIq7DvaTZdfB3MXbSa6iAb52XyG3A4kOem6mKmshKhCLuVrI9FyTan_juCiVJAJG9LQ67MtfxKbyuYre7fRcq_p-C6Lo8Td_9YqlZdG2TSULV2PBjcZgNHjSYu86PqWTkUXM7RduoNtj1VicyY_2J2OTfQrs0d9Z0tTpDLWZHJxmu/a.mpd?playSessionId=189b24a0-13c2-cb94-0f18-d61f8161d74b:c87f8895-e1f8-045c-be8d-c4885f66bef4&referrer=aHR0cHM6Ly93d3cubWV3YXRjaC5zZy93YXRjaC9UaGUtU3Rhci1BdGhsZXRlLUU5LUZsb29yYmFsbC0zNjg5NTE=&clientTag=html5:v2.0.0

================
using the 2nd link, the episode can be download. the stream is playable and is 1080p.

yt-dlp -o "S01.E09.mp4" -uV --no-part --restrict-filenames -N 4 --user-agent "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36" --referer "https://www.mewatch.sg/"  --merge-output-format mp4 --ffmpeg-location ffmpeg\bin "https://cdnapisec.kaltura.com/p/2082311/sp/208231100/playManifest/protocol/https/entryId/1_4n8bmm4x/format/mpegdash/tags/web_hd/f/a.mpd?clientTag=html5:v2.0.0&playSessionId=58dd2045-ee1e-5ac8-0784-8d6009fb3144:f895f2d5-d010-31d7-e8af-3b23ba901857"
Type account password and press [Return]:
[generic] Extracting URL: https://cdnapisec.kaltura.com/p/2082311/sp/208231100/playManifest/protocol/https/entryId/1_4n8bmm...d7-e8af-3b23ba901857
[generic] a.mpd?clientTag=html5:v2.0: Downloading webpage
[redirect] Following redirect to https://d1gbibt4vs5yna.cloudfront.net/dash/p/2082311/sp/208231100/serveFlavor/entryId/1_4n8bmm4x/v/1/pv/1/ev/17/flavorId/1_,a5kbb1y2,f1tmjacw,dze9tizj,lcfh661u,9szc627d,c8c75745,/forceproxy/true/name/a.mp4.urlset/manifest.mpd
[generic] Extracting URL: https://d1gbibt4vs5yna.cloudfront.net/dash/p/2082311/sp/208231100/serveFlavor/entryId/1_4n8bmm4x/....urlset/manifest.mpd
[generic] manifest: Downloading webpage
WARNING: [generic] Falling back on generic information extractor
[generic] manifest: Extracting information
[info] manifest: Downloading 1 format(s): f5-v1-x3+f4-a1-x3
[dashsegments] Total fragments: 683
[download] Destination: S01.E09.ff5-v1-x3.mp4
WARNING: The download speed shown is only of one thread. This is a known issue
[download] 100% of    1.25GiB in 00:01:38 at 13.02MiB/s
[dashsegments] Total fragments: 683
[download] Destination: S01.E09.ff4-a1-x3.m4a
WARNING: The download speed shown is only of one thread. This is a known issue
[download] 100% of   41.83MiB in 00:01:20 at 533.70KiB/s
[Merger] Merging formats into "S01.E09.mp4"
Deleting original file S01.E09.ff5-v1-x3.mp4 (pass -k to keep)
Deleting original file S01.E09.ff4-a1-x3.m4a (pass -k to keep)

@zengjiawei98
Copy link

zengjiawei98 commented Apr 24, 2023

3 urls are capture from hls stream detector

https://d1gbibt4vs5yna.cloudfront.net/dash/p/2082311/sp/208231100/serveFlavor/entryId/1_4n8bmm4x/v/1/pv/1/ev/17/flavorId/1_,a5kbb1y2,f1tmjacw,dze9tizj,lcfh661u,9szc627d,c8c75745,/forceproxy/true/name/a.mp4.urlset/manifest.mpd

https://cdnapisec.kaltura.com/p/2082311/sp/208231100/playManifest/protocol/https/entryId/1_4n8bmm4x/format/mpegdash/tags/web_hd/f/a.mpd?clientTag=html5:v2.0.0&playSessionId=189b24a0-13c2-cb94-0f18-d61f8161d74b:c87f8895-e1f8-045c-be8d-c4885f66bef4

https://rest-as.ott.kaltura.com/api_v3/service/assetFile/action/playManifest/partnerId/147/assetId/1461060/assetType/media/assetFileId/18641451/contextType/PLAYBACK/isAltUrl/False/ks/djJ8MTQ3fK4vKULHMeo0NLLwFzN8mlZbK3sx9_NBc5rflsZ5VulcejRvAfmFnqR53pswqosPRVNF1rV6nq2H6deDViKKkwd9B-SrukEEEKxByUVIga__QcytKI5F9yhx_jFXX2pBDzyXr4011Rs-93khQN18wqFlStF9d-7ADZ7vL3odzHNnAa9xPSyMQX7pw39GivAYhKgj1LDDmt-8EgoQVcB5GxcFiq0Nt46plYInJEMWlitVXZQAwLZWo7wCXjuXIjBPHql6zIEIFleeHnFheB1dZOfz2FvbBOuc89s7f_1bsOQm-t_xIiWipOxXgvs14_2f587EcwdoU_CtpcOf4ccyI1MLQFKpgV5dEIKNOI9zzflZBq05-GTGriQoNJLTP9JyIq7DvaTZdfB3MXbSa6iAb52XyG3A4kOem6mKmshKhCLuVrI9FyTan_juCiVJAJG9LQ67MtfxKbyuYre7fRcq_p-C6Lo8Td_9YqlZdG2TSULV2PBjcZgNHjSYu86PqWTkUXM7RduoNtj1VicyY_2J2OTfQrs0d9Z0tTpDLWZHJxmu/a.mpd?playSessionId=189b24a0-13c2-cb94-0f18-d61f8161d74b:c87f8895-e1f8-045c-be8d-c4885f66bef4&referrer=aHR0cHM6Ly93d3cubWV3YXRjaC5zZy93YXRjaC9UaGUtU3Rhci1BdGhsZXRlLUU5LUZsb29yYmFsbC0zNjg5NTE=&clientTag=html5:v2.0.0

================ using the 2nd link, the episode can be download. the stream is playable and is 1080p.

yt-dlp -o "S01.E09.mp4" -uV --no-part --restrict-filenames -N 4 --user-agent "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36" --referer "https://www.mewatch.sg/"  --merge-output-format mp4 --ffmpeg-location ffmpeg\bin "https://cdnapisec.kaltura.com/p/2082311/sp/208231100/playManifest/protocol/https/entryId/1_4n8bmm4x/format/mpegdash/tags/web_hd/f/a.mpd?clientTag=html5:v2.0.0&playSessionId=58dd2045-ee1e-5ac8-0784-8d6009fb3144:f895f2d5-d010-31d7-e8af-3b23ba901857"
Type account password and press [Return]:
[generic] Extracting URL: https://cdnapisec.kaltura.com/p/2082311/sp/208231100/playManifest/protocol/https/entryId/1_4n8bmm...d7-e8af-3b23ba901857
[generic] a.mpd?clientTag=html5:v2.0: Downloading webpage
[redirect] Following redirect to https://d1gbibt4vs5yna.cloudfront.net/dash/p/2082311/sp/208231100/serveFlavor/entryId/1_4n8bmm4x/v/1/pv/1/ev/17/flavorId/1_,a5kbb1y2,f1tmjacw,dze9tizj,lcfh661u,9szc627d,c8c75745,/forceproxy/true/name/a.mp4.urlset/manifest.mpd
[generic] Extracting URL: https://d1gbibt4vs5yna.cloudfront.net/dash/p/2082311/sp/208231100/serveFlavor/entryId/1_4n8bmm4x/....urlset/manifest.mpd
[generic] manifest: Downloading webpage
WARNING: [generic] Falling back on generic information extractor
[generic] manifest: Extracting information
[info] manifest: Downloading 1 format(s): f5-v1-x3+f4-a1-x3
[dashsegments] Total fragments: 683
[download] Destination: S01.E09.ff5-v1-x3.mp4
WARNING: The download speed shown is only of one thread. This is a known issue
[download] 100% of    1.25GiB in 00:01:38 at 13.02MiB/s
[dashsegments] Total fragments: 683
[download] Destination: S01.E09.ff4-a1-x3.m4a
WARNING: The download speed shown is only of one thread. This is a known issue
[download] 100% of   41.83MiB in 00:01:20 at 533.70KiB/s
[Merger] Merging formats into "S01.E09.mp4"
Deleting original file S01.E09.ff5-v1-x3.mp4 (pass -k to keep)
Deleting original file S01.E09.ff4-a1-x3.m4a (pass -k to keep)

Thanks for this info! It works but these are downloading fragments instead of the original files. It works for now and requires a bit more work. But better than nothing of course! Also, all 3 .mpd(s) generated links to the same library and are able to download.

@humanitiesclinic
Copy link

Is there anyone working on updating youtube-dl itself to solve this issue, so a smooth download directly with the YouTube-dl command is possible (rather than just a workaround)? I am in Singapore, I can access the site urls and without location restriction.. is there anything I can help with? (I am not fully familiar with the source code though…)

@shouldsee shouldsee linked a pull request May 9, 2023 that will close this issue
11 tasks
@benjaminyam
Copy link

benjaminyam commented Aug 23, 2023

This MeWatchIE._real_extract() seems to work:

    def _real_extract(self, url):
        video_id = self._match_id(url)
        webpage = self._download_webpage(
            url, video_id, note='Downloading video page')
        page_data = self._search_regex(
            r'(?s)window\s*\.\s*__data\s*=\s*(\{.*?\})\s*</script>',
            webpage, 'hydration JSON')
        page_data = self._parse_json(page_data, video_id)
        partner_id = traverse_obj(
            page_data, ('app', 'config', 'playback', 'kalturaThumbnailBaseUrl'),
            expected_type=lambda u: (url_or_none(u) or '').rstrip('/').rpartition('/')[2] or 2082311)
        show_data = traverse_obj(page_data,
                                 ('cache', 'page', Ellipsis, 'entries',
                                  lambda _, v: v['item']['id'] == video_id),
                                 get_all=False)
        
        entry_id = traverse_obj(show_data, ('item', 'customFields', 'EntryId'))

        txt_or_none = lambda x: x.strip() or None

        return merge_dicts(
            {'_type': 'url_transparent'},
            self.url_result(
                'kaltura:%s:%s' % (partner_id, entry_id), ie='Kaltura', video_id=video_id),
            {
                'title': traverse_obj(show_data, 'title', ('item', ('title', ('customFields', 'sortTitle'), 'episodeName')),
                                      get_all=False, expected_type=txt_or_none) or self._generic_title(url),
                'description': traverse_obj(show_data, ('item', ('description', 'shortDescription')), get_all=False, expected_type=txt_or_none),
                'uploader': traverse_obj(show_data,('item', 'distributor'), expected_type=txt_or_none),
                'categories': traverse_obj(show_data,('item', 'genres', Ellipsis), expected_type=txt_or_none),
                'episode_number': traverse_obj(show_data,('item', 'episodeNumber'), expected_type=int_or_none),
                'episode': traverse_obj(show_data, ('item', 'episodeName'), expected_type=lambda x: re.sub(r'^Ep\s+\d+\s+(.*?)\s*$', r'\1', x) or None),
                'season_id': traverse_obj(show_data, ('item', 'seasonId'), expected_type=txt_or_none),
                'series_id': traverse_obj(show_data, ('item', ('showId', ('season', 'show', 'id'))), get_all=False, expected_type=txt_or_none),
                'season_number': traverse_obj(show_data,('item', 'season', 'seasonNumber'), expected_type=int_or_none),
                'season': traverse_obj(show_data, ('item', 'season', 'title'), expected_type=txt_or_none),
                'series': traverse_obj(show_data, ('item', 'season', 'show', 'title'), expected_type=txt_or_none),
            })

I didn't investigate how ToggleIE should be updated. The first test fails in the same way as the MeWatch pages.

Added ToggleIE back in as seems like it is being used by www.channelnewsasia.com
This piece of code works for me. Can someone help commit and merge?

toggle.py

import json
import re

from .common import InfoExtractor
from ..utils import (
    determine_ext,
    float_or_none,
    int_or_none,
    parse_iso8601,
    strip_or_none,
    url_or_none,
    traverse_obj,
    merge_dicts,
)


class ToggleIE(InfoExtractor):
    IE_NAME = 'toggle'
    _VALID_URL = r'(?:https?://(?:(?:www\.)?mewatch|video\.toggle)\.sg/(?:en|zh)/(?:[^/]+/){2,}|toggle:)(?P<id>[0-9]+)'
    _TESTS = [{
        'url': 'http://www.mewatch.sg/en/series/lion-moms-tif/trailers/lion-moms-premier/343115',
        'info_dict': {
            'id': '343115',
            'ext': 'mp4',
            'title': 'Lion Moms Premiere',
            'description': 'md5:aea1149404bff4d7f7b6da11fafd8e6b',
            'upload_date': '20150910',
            'timestamp': 1441858274,
        },
        'params': {
            'skip_download': 'm3u8 download',
        }
    }, {
        'note': 'DRM-protected video',
        'url': 'http://www.mewatch.sg/en/movies/dug-s-special-mission/341413',
        'info_dict': {
            'id': '341413',
            'ext': 'wvm',
            'title': 'Dug\'s Special Mission',
            'description': 'md5:e86c6f4458214905c1772398fabc93e0',
            'upload_date': '20150827',
            'timestamp': 1440644006,
        },
        'params': {
            'skip_download': 'DRM-protected wvm download',
        }
    }, {
        # this also tests correct video id extraction
        'note': 'm3u8 links are geo-restricted, but Android/mp4 is okay',
        'url': 'http://www.mewatch.sg/en/series/28th-sea-games-5-show/28th-sea-games-5-show-ep11/332861',
        'info_dict': {
            'id': '332861',
            'ext': 'mp4',
            'title': '28th SEA Games (5 Show) -  Episode  11',
            'description': 'md5:3cd4f5f56c7c3b1340c50a863f896faa',
            'upload_date': '20150605',
            'timestamp': 1433480166,
        },
        'params': {
            'skip_download': 'DRM-protected wvm download',
        },
        'skip': 'm3u8 links are geo-restricted'
    }, {
        'url': 'http://video.toggle.sg/en/clips/seraph-sun-aloysius-will-suddenly-sing-some-old-songs-in-high-pitch-on-set/343331',
        'only_matching': True,
    }, {
        'url': 'http://www.mewatch.sg/en/clips/seraph-sun-aloysius-will-suddenly-sing-some-old-songs-in-high-pitch-on-set/343331',
        'only_matching': True,
    }, {
        'url': 'http://www.mewatch.sg/zh/series/zero-calling-s2-hd/ep13/336367',
        'only_matching': True,
    }, {
        'url': 'http://www.mewatch.sg/en/series/vetri-s2/webisodes/jeeva-is-an-orphan-vetri-s2-webisode-7/342302',
        'only_matching': True,
    }, {
        'url': 'http://www.mewatch.sg/en/movies/seven-days/321936',
        'only_matching': True,
    }, {
        'url': 'https://www.mewatch.sg/en/tv-show/news/may-2017-cna-singapore-tonight/fri-19-may-2017/512456',
        'only_matching': True,
    }, {
        'url': 'http://www.mewatch.sg/en/channels/eleven-plus/401585',
        'only_matching': True,
    }]

    _API_USER = 'tvpapi_147'
    _API_PASS = '11111'

    def _real_extract(self, url):
        video_id = self._match_id(url)

        params = {
            'initObj': {
                'Locale': {
                    'LocaleLanguage': '',
                    'LocaleCountry': '',
                    'LocaleDevice': '',
                    'LocaleUserState': 0
                },
                'Platform': 0,
                'SiteGuid': 0,
                'DomainID': '0',
                'UDID': '',
                'ApiUser': self._API_USER,
                'ApiPass': self._API_PASS
            },
            'MediaID': video_id,
            'mediaType': 0,
        }

        info = self._download_json(
            'http://tvpapi.as.tvinci.com/v2_9/gateways/jsonpostgw.aspx?m=GetMediaInfo',
            video_id, 'Downloading video info json', data=json.dumps(params).encode('utf-8'))

        title = info['MediaName']

        formats = []
        for video_file in info.get('Files', []):
            video_url, vid_format = video_file.get('URL'), video_file.get('Format')
            if not video_url or video_url == 'NA' or not vid_format:
                continue
            ext = determine_ext(video_url)
            vid_format = vid_format.replace(' ', '')
            # if geo-restricted, m3u8 is inaccessible, but mp4 is okay
            if ext == 'm3u8':
                m3u8_formats = self._extract_m3u8_formats(
                    video_url, video_id, ext='mp4', m3u8_id=vid_format,
                    note='Downloading %s m3u8 information' % vid_format,
                    errnote='Failed to download %s m3u8 information' % vid_format,
                    fatal=False)
                for f in m3u8_formats:
                    # Apple FairPlay Streaming
                    if '/fpshls/' in f['url']:
                        continue
                    formats.append(f)
            elif ext == 'mpd':
                formats.extend(self._extract_mpd_formats(
                    video_url, video_id, mpd_id=vid_format,
                    note='Downloading %s MPD manifest' % vid_format,
                    errnote='Failed to download %s MPD manifest' % vid_format,
                    fatal=False))
            elif ext == 'ism':
                formats.extend(self._extract_ism_formats(
                    video_url, video_id, ism_id=vid_format,
                    note='Downloading %s ISM manifest' % vid_format,
                    errnote='Failed to download %s ISM manifest' % vid_format,
                    fatal=False))
            elif ext == 'mp4':
                formats.append({
                    'ext': ext,
                    'url': video_url,
                    'format_id': vid_format,
                })
        if not formats:
            for meta in (info.get('Metas') or []):
                if (not self.get_param('allow_unplayable_formats')
                        and meta.get('Key') == 'Encryption' and meta.get('Value') == '1'):
                    self.report_drm(video_id)
            # Most likely because geo-blocked if no formats and no DRM

        thumbnails = []
        for picture in info.get('Pictures', []):
            if not isinstance(picture, dict):
                continue
            pic_url = picture.get('URL')
            if not pic_url:
                continue
            thumbnail = {
                'url': pic_url,
            }
            pic_size = picture.get('PicSize', '')
            m = re.search(r'(?P<width>\d+)[xX](?P<height>\d+)', pic_size)
            if m:
                thumbnail.update({
                    'width': int(m.group('width')),
                    'height': int(m.group('height')),
                })
            thumbnails.append(thumbnail)

        def counter(prefix):
            return int_or_none(
                info.get(prefix + 'Counter') or info.get(prefix.lower() + '_counter'))

        return {
            'id': video_id,
            'title': title,
            'description': strip_or_none(info.get('Description')),
            'duration': int_or_none(info.get('Duration')),
            'timestamp': parse_iso8601(info.get('CreationDate') or None),
            'average_rating': float_or_none(info.get('Rating')),
            'view_count': counter('View'),
            'like_count': counter('Like'),
            'thumbnails': thumbnails,
            'formats': formats,
        }

class MeWatchIE(InfoExtractor):
    IE_NAME = 'mewatch'
    _VALID_URL = r'https?://(?:(?:www|live)\.)?mewatch\.sg/watch/[^/?#&]+-(?P<id>[0-9]+)'
    _TESTS = [{
        'url': 'https://www.mewatch.sg/watch/Recipe-Of-Life-E1-179371',
        'info_dict': {
            'id': '1008625',
            'ext': 'mp4',
            'title': 'Recipe Of Life 味之道',
            'timestamp': 1603306526,
            'description': 'md5:6e88cde8af2068444fc8e1bc3ebf257c',
            'upload_date': '20201021',
        },
        'params': {
            'skip_download': 'm3u8 download',
        },
    }, {
        'url': 'https://www.mewatch.sg/watch/Little-Red-Dot-Detectives-S2-搜密。打卡。小红点-S2-E1-176232',
        'only_matching': True,
    }, {
        'url': 'https://www.mewatch.sg/watch/Little-Red-Dot-Detectives-S2-%E6%90%9C%E5%AF%86%E3%80%82%E6%89%93%E5%8D%A1%E3%80%82%E5%B0%8F%E7%BA%A2%E7%82%B9-S2-E1-176232',
        'only_matching': True,
    }, {
        'url': 'https://live.mewatch.sg/watch/Recipe-Of-Life-E41-189759',
        'only_matching': True,
    }]

    def _real_extract(self, url):
        video_id = self._match_id(url)
        webpage = self._download_webpage(
            url, video_id, note='Downloading video page')
        page_data = self._search_regex(
            r'(?s)window\s*\.\s*__data\s*=\s*(\{.*?\})\s*</script>',
            webpage, 'hydration JSON')
        page_data = self._parse_json(page_data, video_id)
        partner_id = traverse_obj(
            page_data, ('app', 'config', 'playback', 'kalturaThumbnailBaseUrl'),
            expected_type=lambda u: (url_or_none(u) or '').rstrip('/').rpartition('/')[2] or 2082311)
        show_data = traverse_obj(page_data,
                                 ('cache', 'page', Ellipsis, 'entries',
                                  lambda _, v: v['item']['id'] == video_id),
                                 get_all=False)
        
        entry_id = traverse_obj(show_data, ('item', 'customFields', 'EntryId'))

        txt_or_none = lambda x: x.strip() or None

        return merge_dicts(
            {'_type': 'url_transparent'},
            self.url_result(
                'kaltura:%s:%s' % (partner_id, entry_id), ie='Kaltura', video_id=video_id),
            {
                'title': traverse_obj(show_data, 'title', ('item', ('title', ('customFields', 'sortTitle'), 'episodeName')),
                                      get_all=False, expected_type=txt_or_none) or self._generic_title(url),
                'description': traverse_obj(show_data, ('item', ('description', 'shortDescription')), get_all=False, expected_type=txt_or_none),
                'uploader': traverse_obj(show_data,('item', 'distributor'), expected_type=txt_or_none),
                'categories': traverse_obj(show_data,('item', 'genres', Ellipsis), expected_type=txt_or_none),
                'episode_number': traverse_obj(show_data,('item', 'episodeNumber'), expected_type=int_or_none),
                'episode': traverse_obj(show_data, ('item', 'episodeName'), expected_type=lambda x: re.sub(r'^Ep\s+\d+\s+(.*?)\s*$', r'\1', x) or None),
                'season_id': traverse_obj(show_data, ('item', 'seasonId'), expected_type=txt_or_none),
                'series_id': traverse_obj(show_data, ('item', ('showId', ('season', 'show', 'id'))), get_all=False, expected_type=txt_or_none),
                'season_number': traverse_obj(show_data,('item', 'season', 'seasonNumber'), expected_type=int_or_none),
                'season': traverse_obj(show_data, ('item', 'season', 'title'), expected_type=txt_or_none),
                'series': traverse_obj(show_data, ('item', 'season', 'show', 'title'), expected_type=txt_or_none),
            })

@dirkf
Copy link
Contributor

dirkf commented Aug 24, 2023

See PR #32172.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broken-IE problem with existing site extraction patch-available
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants