New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mewatch unable to download video; Unable to download JSON metadata #32043
Comments
I checked this earlier report: yt-dlp/yt-dlp#6718 The site is obviously not working in the way it did before. The failure is on access to this API URL: http://tvpapi.as.tvinci.com/v2_9/gateways/jsonpostgw.aspx?m=GetMediaInfo Either this API no longer works, or there is a newer version ( However there is an outstanding PR #25898 which does update the API version to CC: @hueyy (PR author) |
Apparently the original video host tvinci.com was acquired by Kaltura; however the transitional API URL added in the PR is also 404 now. Probably the site is using the Kaltura hosting directly. For other sites that use Kaltura we can form the pseudo-URL As OP's example is a super-long URL in an image I won't be bothering to test it (see manual: BUGS). The page in the yt-dlp issue gives this
Using yt-dl on With yt-dlp: Perhaps this has expired? |
this episode works for me - https://www.mewatch.sg/watch/Titoudao-Inspired-by-the-True-Story-of-a-Wayang-Star-E1-Matchstick-Girl-136853 tried the awards ceremony but it is region locked. |
The partner ID seems to have changed since the page from the yt-dlp issue was created. Now 2082311, was 2082301. The modified pseudo-URL Similarly |
This def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
url, video_id, note='Downloading video page')
page_data = self._search_regex(
r'(?s)window\s*\.\s*__data\s*=\s*(\{.*?\})\s*</script>',
webpage, 'hydration JSON')
page_data = self._parse_json(page_data, video_id)
partner_id = traverse_obj(
page_data, ('app', 'config', 'playback', 'kalturaThumbnailBaseUrl'),
expected_type=lambda u: (url_or_none(u) or '').rstrip('/').rpartition('/')[2] or 2082311)
show_data = traverse_obj(page_data,
('cache', 'page', Ellipsis, 'entries',
lambda _, v: v['item']['id'] == video_id),
get_all=False)
entry_id = traverse_obj(show_data, ('item', 'customFields', 'EntryId'))
txt_or_none = lambda x: x.strip() or None
return merge_dicts(
{'_type': 'url_transparent'},
self.url_result(
'kaltura:%s:%s' % (partner_id, entry_id), ie='Kaltura', video_id=video_id),
{
'title': traverse_obj(show_data, 'title', ('item', ('title', ('customFields', 'sortTitle'), 'episodeName')),
get_all=False, expected_type=txt_or_none) or self._generic_title(url),
'description': traverse_obj(show_data, ('item', ('description', 'shortDescription')), get_all=False, expected_type=txt_or_none),
'uploader': traverse_obj(show_data,('item', 'distributor'), expected_type=txt_or_none),
'categories': traverse_obj(show_data,('item', 'genres', Ellipsis), expected_type=txt_or_none),
'episode_number': traverse_obj(show_data,('item', 'episodeNumber'), expected_type=int_or_none),
'episode': traverse_obj(show_data, ('item', 'episodeName'), expected_type=lambda x: re.sub(r'^Ep\s+\d+\s+(.*?)\s*$', r'\1', x) or None),
'season_id': traverse_obj(show_data, ('item', 'seasonId'), expected_type=txt_or_none),
'series_id': traverse_obj(show_data, ('item', ('showId', ('season', 'show', 'id'))), get_all=False, expected_type=txt_or_none),
'season_number': traverse_obj(show_data,('item', 'season', 'seasonNumber'), expected_type=int_or_none),
'season': traverse_obj(show_data, ('item', 'season', 'title'), expected_type=txt_or_none),
'series': traverse_obj(show_data, ('item', 'season', 'show', 'title'), expected_type=txt_or_none),
}) I didn't investigate how |
Apparently the |
I just realized what is being asked for ToggleIE and the url format no longer exists. All of them are in the MeWatchIE format now. So we can just go with MeWatchIE and not bother with ToggleIE format |
I am not sure how I can help but I hope my info can help in some way. Say I wanted to download this video link, It creates a .mpd file where it shows that files are hosted on cloudfront.net And with the .mpd file, I could play the stream via VLC. And these are the content of the .mpd file |
For URLs like that we know what to do, but it doesn't obviously involve DASH: $ python -m youtube_dl -v -F 'https://www.mewatch.sg/watch/The-Star-Athlete-E9-Floorball-368951'
[debug] System config: [u'--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-F', u'https://www.mewatch.sg/watch/The-Star-Athlete-E9-Floorball-368951']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: d7b502a72
[debug] Python version 2.7.18 (CPython) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[mewatch] Extracting URL: https://www.mewatch.sg/watch/The-Star-Athlete-E9-Floorball-368951
[mewatch] 368951: Downloading video page
[Kaltura] Extracting URL: kaltura:2082311:1_4n8bmm4x
[Kaltura] 1_4n8bmm4x: Downloading video info JSON
[Kaltura] 1_4n8bmm4x: Downloading m3u8 information
[info] Available formats for 1_4n8bmm4x:
format code extension resolution note
hls-audio-Chinese mp4 audio only [zh] Chinese
mp4-65 mp4 audio only 65k , isom container, 0fps, audio@ 65k, ~21.26MiB
mp4-195 mp4 320x180 195k , isom container, avc1@ 195k, 25fps, audio@ 0k, ~63.60MiB
hls-222 mp4 320x180 222k video@ 222k, audio@ 0k
mp4-472 mp4 480x270 472k , isom container, avc1@ 472k, 25fps, audio@ 0k, ~153.73MiB
hls-512 mp4 480x270 512k video@ 512k, audio@ 0k
mp4-789 mp4 640x360 789k , isom container, avc1@ 789k, 25fps, audio@ 0k, ~256.64MiB
hls-844 mp4 640x360 844k video@ 844k, audio@ 0k
mp4-1399 mp4 854x480 1399k , isom container, avc1@1399k, 25fps, audio@ 0k, ~455.00MiB
hls-1482 mp4 854x480 1482k video@1482k, audio@ 0k
mp4-1917 mp4 960x540 1917k , isom container, avc1@1917k, 25fps, audio@ 0k, ~623.53MiB
hls-2024 mp4 960x540 2024k video@2024k, audio@ 0k
mp4-2572 mp4 1280x720 2572k , isom container, avc1@2572k, 25fps, audio@ 0k, ~836.39MiB
mp4-4084 mp4 1920x1080 4084k , isom container, avc1@4084k, 25fps, audio@ 0k, ~1.30GiB (best)
$ |
@dirkf: Can you check if the downloaded files are playable? I was able to "download" using your mewatch _real_extract code, but the output file was not playable in VLC. |
Same for 'https://www.mewatch.sg/watch/Titoudao-Inspired-by-the-True-Story-of-a-Wayang-Star-E1-Matchstick-Girl-136853'. Maybe all shows are "protected"? This needs to be tested in-region using a browser with DRM disabled: how?. |
@dirkf: The yt-dlp command provided by @october262 works for me in the correct region, but if I run the https://www.mewatch.sg/watch/Titoudao-Inspired-by-the-True-Story-of-a-Wayang-Star-E1-Matchstick-Girl-136853 through yt-dlp as an input using your real extract, it has the grey screen like you mention |
The HLS formats seem to work, but the Kaltura extractor doesn't know about DASH. Using a similar Passing the original URL through to Kaltura ( Maybe some browser tracing would show how the It might also be useful to know what @zengjiawei98's MPD URL was. |
3 urls are capture from hls stream detector ================ yt-dlp -o "S01.E09.mp4" -uV --no-part --restrict-filenames -N 4 --user-agent "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36" --referer "https://www.mewatch.sg/" --merge-output-format mp4 --ffmpeg-location ffmpeg\bin "https://cdnapisec.kaltura.com/p/2082311/sp/208231100/playManifest/protocol/https/entryId/1_4n8bmm4x/format/mpegdash/tags/web_hd/f/a.mpd?clientTag=html5:v2.0.0&playSessionId=58dd2045-ee1e-5ac8-0784-8d6009fb3144:f895f2d5-d010-31d7-e8af-3b23ba901857"
Type account password and press [Return]:
[generic] Extracting URL: https://cdnapisec.kaltura.com/p/2082311/sp/208231100/playManifest/protocol/https/entryId/1_4n8bmm...d7-e8af-3b23ba901857
[generic] a.mpd?clientTag=html5:v2.0: Downloading webpage
[redirect] Following redirect to https://d1gbibt4vs5yna.cloudfront.net/dash/p/2082311/sp/208231100/serveFlavor/entryId/1_4n8bmm4x/v/1/pv/1/ev/17/flavorId/1_,a5kbb1y2,f1tmjacw,dze9tizj,lcfh661u,9szc627d,c8c75745,/forceproxy/true/name/a.mp4.urlset/manifest.mpd
[generic] Extracting URL: https://d1gbibt4vs5yna.cloudfront.net/dash/p/2082311/sp/208231100/serveFlavor/entryId/1_4n8bmm4x/....urlset/manifest.mpd
[generic] manifest: Downloading webpage
WARNING: [generic] Falling back on generic information extractor
[generic] manifest: Extracting information
[info] manifest: Downloading 1 format(s): f5-v1-x3+f4-a1-x3
[dashsegments] Total fragments: 683
[download] Destination: S01.E09.ff5-v1-x3.mp4
WARNING: The download speed shown is only of one thread. This is a known issue
[download] 100% of 1.25GiB in 00:01:38 at 13.02MiB/s
[dashsegments] Total fragments: 683
[download] Destination: S01.E09.ff4-a1-x3.m4a
WARNING: The download speed shown is only of one thread. This is a known issue
[download] 100% of 41.83MiB in 00:01:20 at 533.70KiB/s
[Merger] Merging formats into "S01.E09.mp4"
Deleting original file S01.E09.ff5-v1-x3.mp4 (pass -k to keep)
Deleting original file S01.E09.ff4-a1-x3.m4a (pass -k to keep) |
Thanks for this info! It works but these are downloading fragments instead of the original files. It works for now and requires a bit more work. But better than nothing of course! Also, all 3 .mpd(s) generated links to the same library and are able to download. |
Is there anyone working on updating youtube-dl itself to solve this issue, so a smooth download directly with the YouTube-dl command is possible (rather than just a workaround)? I am in Singapore, I can access the site urls and without location restriction.. is there anything I can help with? (I am not fully familiar with the source code though…) |
Added ToggleIE back in as seems like it is being used by www.channelnewsasia.com toggle.py import json
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
parse_iso8601,
strip_or_none,
url_or_none,
traverse_obj,
merge_dicts,
)
class ToggleIE(InfoExtractor):
IE_NAME = 'toggle'
_VALID_URL = r'(?:https?://(?:(?:www\.)?mewatch|video\.toggle)\.sg/(?:en|zh)/(?:[^/]+/){2,}|toggle:)(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://www.mewatch.sg/en/series/lion-moms-tif/trailers/lion-moms-premier/343115',
'info_dict': {
'id': '343115',
'ext': 'mp4',
'title': 'Lion Moms Premiere',
'description': 'md5:aea1149404bff4d7f7b6da11fafd8e6b',
'upload_date': '20150910',
'timestamp': 1441858274,
},
'params': {
'skip_download': 'm3u8 download',
}
}, {
'note': 'DRM-protected video',
'url': 'http://www.mewatch.sg/en/movies/dug-s-special-mission/341413',
'info_dict': {
'id': '341413',
'ext': 'wvm',
'title': 'Dug\'s Special Mission',
'description': 'md5:e86c6f4458214905c1772398fabc93e0',
'upload_date': '20150827',
'timestamp': 1440644006,
},
'params': {
'skip_download': 'DRM-protected wvm download',
}
}, {
# this also tests correct video id extraction
'note': 'm3u8 links are geo-restricted, but Android/mp4 is okay',
'url': 'http://www.mewatch.sg/en/series/28th-sea-games-5-show/28th-sea-games-5-show-ep11/332861',
'info_dict': {
'id': '332861',
'ext': 'mp4',
'title': '28th SEA Games (5 Show) - Episode 11',
'description': 'md5:3cd4f5f56c7c3b1340c50a863f896faa',
'upload_date': '20150605',
'timestamp': 1433480166,
},
'params': {
'skip_download': 'DRM-protected wvm download',
},
'skip': 'm3u8 links are geo-restricted'
}, {
'url': 'http://video.toggle.sg/en/clips/seraph-sun-aloysius-will-suddenly-sing-some-old-songs-in-high-pitch-on-set/343331',
'only_matching': True,
}, {
'url': 'http://www.mewatch.sg/en/clips/seraph-sun-aloysius-will-suddenly-sing-some-old-songs-in-high-pitch-on-set/343331',
'only_matching': True,
}, {
'url': 'http://www.mewatch.sg/zh/series/zero-calling-s2-hd/ep13/336367',
'only_matching': True,
}, {
'url': 'http://www.mewatch.sg/en/series/vetri-s2/webisodes/jeeva-is-an-orphan-vetri-s2-webisode-7/342302',
'only_matching': True,
}, {
'url': 'http://www.mewatch.sg/en/movies/seven-days/321936',
'only_matching': True,
}, {
'url': 'https://www.mewatch.sg/en/tv-show/news/may-2017-cna-singapore-tonight/fri-19-may-2017/512456',
'only_matching': True,
}, {
'url': 'http://www.mewatch.sg/en/channels/eleven-plus/401585',
'only_matching': True,
}]
_API_USER = 'tvpapi_147'
_API_PASS = '11111'
def _real_extract(self, url):
video_id = self._match_id(url)
params = {
'initObj': {
'Locale': {
'LocaleLanguage': '',
'LocaleCountry': '',
'LocaleDevice': '',
'LocaleUserState': 0
},
'Platform': 0,
'SiteGuid': 0,
'DomainID': '0',
'UDID': '',
'ApiUser': self._API_USER,
'ApiPass': self._API_PASS
},
'MediaID': video_id,
'mediaType': 0,
}
info = self._download_json(
'http://tvpapi.as.tvinci.com/v2_9/gateways/jsonpostgw.aspx?m=GetMediaInfo',
video_id, 'Downloading video info json', data=json.dumps(params).encode('utf-8'))
title = info['MediaName']
formats = []
for video_file in info.get('Files', []):
video_url, vid_format = video_file.get('URL'), video_file.get('Format')
if not video_url or video_url == 'NA' or not vid_format:
continue
ext = determine_ext(video_url)
vid_format = vid_format.replace(' ', '')
# if geo-restricted, m3u8 is inaccessible, but mp4 is okay
if ext == 'm3u8':
m3u8_formats = self._extract_m3u8_formats(
video_url, video_id, ext='mp4', m3u8_id=vid_format,
note='Downloading %s m3u8 information' % vid_format,
errnote='Failed to download %s m3u8 information' % vid_format,
fatal=False)
for f in m3u8_formats:
# Apple FairPlay Streaming
if '/fpshls/' in f['url']:
continue
formats.append(f)
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
video_url, video_id, mpd_id=vid_format,
note='Downloading %s MPD manifest' % vid_format,
errnote='Failed to download %s MPD manifest' % vid_format,
fatal=False))
elif ext == 'ism':
formats.extend(self._extract_ism_formats(
video_url, video_id, ism_id=vid_format,
note='Downloading %s ISM manifest' % vid_format,
errnote='Failed to download %s ISM manifest' % vid_format,
fatal=False))
elif ext == 'mp4':
formats.append({
'ext': ext,
'url': video_url,
'format_id': vid_format,
})
if not formats:
for meta in (info.get('Metas') or []):
if (not self.get_param('allow_unplayable_formats')
and meta.get('Key') == 'Encryption' and meta.get('Value') == '1'):
self.report_drm(video_id)
# Most likely because geo-blocked if no formats and no DRM
thumbnails = []
for picture in info.get('Pictures', []):
if not isinstance(picture, dict):
continue
pic_url = picture.get('URL')
if not pic_url:
continue
thumbnail = {
'url': pic_url,
}
pic_size = picture.get('PicSize', '')
m = re.search(r'(?P<width>\d+)[xX](?P<height>\d+)', pic_size)
if m:
thumbnail.update({
'width': int(m.group('width')),
'height': int(m.group('height')),
})
thumbnails.append(thumbnail)
def counter(prefix):
return int_or_none(
info.get(prefix + 'Counter') or info.get(prefix.lower() + '_counter'))
return {
'id': video_id,
'title': title,
'description': strip_or_none(info.get('Description')),
'duration': int_or_none(info.get('Duration')),
'timestamp': parse_iso8601(info.get('CreationDate') or None),
'average_rating': float_or_none(info.get('Rating')),
'view_count': counter('View'),
'like_count': counter('Like'),
'thumbnails': thumbnails,
'formats': formats,
}
class MeWatchIE(InfoExtractor):
IE_NAME = 'mewatch'
_VALID_URL = r'https?://(?:(?:www|live)\.)?mewatch\.sg/watch/[^/?#&]+-(?P<id>[0-9]+)'
_TESTS = [{
'url': 'https://www.mewatch.sg/watch/Recipe-Of-Life-E1-179371',
'info_dict': {
'id': '1008625',
'ext': 'mp4',
'title': 'Recipe Of Life 味之道',
'timestamp': 1603306526,
'description': 'md5:6e88cde8af2068444fc8e1bc3ebf257c',
'upload_date': '20201021',
},
'params': {
'skip_download': 'm3u8 download',
},
}, {
'url': 'https://www.mewatch.sg/watch/Little-Red-Dot-Detectives-S2-搜密。打卡。小红点-S2-E1-176232',
'only_matching': True,
}, {
'url': 'https://www.mewatch.sg/watch/Little-Red-Dot-Detectives-S2-%E6%90%9C%E5%AF%86%E3%80%82%E6%89%93%E5%8D%A1%E3%80%82%E5%B0%8F%E7%BA%A2%E7%82%B9-S2-E1-176232',
'only_matching': True,
}, {
'url': 'https://live.mewatch.sg/watch/Recipe-Of-Life-E41-189759',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
url, video_id, note='Downloading video page')
page_data = self._search_regex(
r'(?s)window\s*\.\s*__data\s*=\s*(\{.*?\})\s*</script>',
webpage, 'hydration JSON')
page_data = self._parse_json(page_data, video_id)
partner_id = traverse_obj(
page_data, ('app', 'config', 'playback', 'kalturaThumbnailBaseUrl'),
expected_type=lambda u: (url_or_none(u) or '').rstrip('/').rpartition('/')[2] or 2082311)
show_data = traverse_obj(page_data,
('cache', 'page', Ellipsis, 'entries',
lambda _, v: v['item']['id'] == video_id),
get_all=False)
entry_id = traverse_obj(show_data, ('item', 'customFields', 'EntryId'))
txt_or_none = lambda x: x.strip() or None
return merge_dicts(
{'_type': 'url_transparent'},
self.url_result(
'kaltura:%s:%s' % (partner_id, entry_id), ie='Kaltura', video_id=video_id),
{
'title': traverse_obj(show_data, 'title', ('item', ('title', ('customFields', 'sortTitle'), 'episodeName')),
get_all=False, expected_type=txt_or_none) or self._generic_title(url),
'description': traverse_obj(show_data, ('item', ('description', 'shortDescription')), get_all=False, expected_type=txt_or_none),
'uploader': traverse_obj(show_data,('item', 'distributor'), expected_type=txt_or_none),
'categories': traverse_obj(show_data,('item', 'genres', Ellipsis), expected_type=txt_or_none),
'episode_number': traverse_obj(show_data,('item', 'episodeNumber'), expected_type=int_or_none),
'episode': traverse_obj(show_data, ('item', 'episodeName'), expected_type=lambda x: re.sub(r'^Ep\s+\d+\s+(.*?)\s*$', r'\1', x) or None),
'season_id': traverse_obj(show_data, ('item', 'seasonId'), expected_type=txt_or_none),
'series_id': traverse_obj(show_data, ('item', ('showId', ('season', 'show', 'id'))), get_all=False, expected_type=txt_or_none),
'season_number': traverse_obj(show_data,('item', 'season', 'seasonNumber'), expected_type=int_or_none),
'season': traverse_obj(show_data, ('item', 'season', 'title'), expected_type=txt_or_none),
'series': traverse_obj(show_data, ('item', 'season', 'show', 'title'), expected_type=txt_or_none),
}) |
See PR #32172. |
Checklist
Verbose log
Description
WRITE DESCRIPTION HERE
Hi, I am unable to download any video from mewatch recently, i have been receiving this error while trying to download the videos.
The text was updated successfully, but these errors were encountered: