add support for muse.ai #7543

2011 · 2023-07-08T12:26:09Z

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

I'm reporting a new site support request
I've verified that I'm running yt-dlp version 2023.07.06 (update instructions) or later (specify commit)
I've checked that all provided URLs are playable in a browser with the same IP and same login details
I've checked that none of provided URLs violate any copyrights or contain any DRM to the best of my knowledge
I've searched known issues and the bugtracker for similar issues including closed ones. DO NOT post duplicates
I've read the guidelines for opening an issue
I've read about sharing account credentials and am willing to share it if required

Region

global

Example URLs

https://muse.ai/embed/YdTWvUW

Provide a description that is worded well enough to be understood

Above link returns "Unsupported URL", but digging through the source (and network traffic with javascript enabled) returns the following .mpd file (which yt-dlp can download without any difficulty):

https://cdn-eu.muse.ai/u/JdsD4tX/04a549c2aa68bdc90d9c6fe59913aa09d40404b9abfbc3a77a338426ec3590d0/videos/dash.mpd

Provide verbose output that clearly demonstrates the problem

Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
If using API, add 'verbose': True to YoutubeDL params instead
Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

[debug] Command-line config: ['--restrict-filenames', '-o', '%(title)s-%(id)s-%(uploader)s.%(ext)s', '-w', '-v', 'https://muse.ai/embed/YdTWvUW']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8 (No ANSI), error utf-8 (No ANSI), screen utf-8 (No ANSI)
[debug] yt-dlp version stable@2023.07.06 [b532a3481]
[debug] Python 3.11.4 (CPython x86_64 64bit) - Linux-6.1.28-with-glibc2.37 (OpenSSL 3.0.9 30 May 2023, glibc 2.37)
[debug] exe versions: ffmpeg 4.4.4 (setts), ffprobe 4.4.4
[debug] Optional libraries: certifi-3021.03.16, pycrypto-3.18.0
[debug] Proxy map: {}
[debug] Loaded 1855 extractors
[generic] Extracting URL: https://muse.ai/embed/YdTWvUW
[generic] YdTWvUW: Downloading webpage
WARNING: [generic] Falling back on generic information extractor
[generic] YdTWvUW: Extracting information
[debug] Looking for embeds
ERROR: Unsupported URL: https://muse.ai/embed/YdTWvUW
Traceback (most recent call last):
  File "/usr/lib/python3.11/site-packages/yt_dlp/YoutubeDL.py", line 1560, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/yt_dlp/YoutubeDL.py", line 1688, in __extract_info
    ie_result = ie.extract(url)
                ^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/yt_dlp/extractor/common.py", line 710, in extract
    ie_result = self._real_extract(url)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/yt_dlp/extractor/generic.py", line 2568, in _real_extract
    raise UnsupportedError(url)
yt_dlp.utils.UnsupportedError: Unsupported URL: https://muse.ai/embed/YdTWvUW

The text was updated successfully, but these errors were encountered:

CHJ85 · 2023-07-13T22:39:29Z

The example you provided won't do as it's encrypted.
Use the HLS format instead.
https://cdn-eu.muse.ai/u/JdsD4tX/04a549c2aa68bdc90d9c6fe59913aa09d40404b9abfbc3a77a338426ec3590d0/videos/hls.m3u8
As for your embed link, it says Video Unavailable.
Please provide a valid link.

2011 · 2023-07-14T13:21:48Z

The example you provided won't do as it's encrypted. Use the HLS format instead.
https://cdn-eu.muse.ai/u/JdsD4tX/04a549c2aa68bdc90d9c6fe59913aa09d40404b9abfbc3a77a338426ec3590d0/videos/hls.m3u8

Where did you get that link from? I have rather minimal skills at traffic analysis (meaning looking at things in a browser's Developer Tools window). I didn't see that. The (.mpd) link I provided works fine (for me) in yt-dlp.

As for your embed link, it says Video Unavailable. Please provide a valid link.

I just tried the embed link again, and it also opens fine for me. Maybe a geographic restriction exists that I don't know about (I have a United States IP address).

Of note also (looking at the Network tab of Developer Tools), the traffic shows links to the complete files (which I can download with wget, for example), even though the browser keeps returning status 206 (partial content):

https://cdn-na.muse.ai/u/JdsD4tX/04a549c2aa68bdc90d9c6fe59913aa09d40404b9abfbc3a77a338426ec3590d0/videos/video-720p-video.mp4

https://cdn-na.muse.ai/u/JdsD4tX/04a549c2aa68bdc90d9c6fe59913aa09d40404b9abfbc3a77a338426ec3590d0/audios/audio-196k-stereo.mp4

CHJ85 · 2023-07-14T17:59:43Z

It must be geo-protected, I guess.
Also, I didn't "get" the link from anywhere. I literally just renamed the file itself (dash.mpd). Because hls.m3u8 made the most sense.
But without access the video on the website itself

That being said though. The video thumbnail is https://cdn-eu.muse.ai/u/JdsD4tX/04a549c2aa68bdc90d9c6fe59913aa09d40404b9abfbc3a77a338426ec3590d0/thumbnails/video.jpg
But the video playlist where the video files are available is https://cdn-eu.muse.ai/u/JdsD4tX/04a549c2aa68bdc90d9c6fe59913aa09d40404b9abfbc3a77a338426ec3590d0/videos/hls.m3u8.
So unless the video URL exists within the source code, the extractor needs a rename function in order to get the correct video URL, and then pipe this URL through ffmpeg.

CHJ85 · 2023-07-14T18:39:52Z

Here's what I put together. But I'm stuck between a rock and a hard place.
My attempt here is to target the thumbnail class and rename "thumbnails/video.jpg' to 'videos/hls.m3u8".
With the current setup, the m3u8 playlist URL extracted from the Muse.ai website will be in the following format:
https://cdn-eu.muse.ai/u/{user_id}/{video_id}/videos/hls.m3u8.
This is the part that for some reason goes wrong.

from __future__ import unicode_literals
import re
from yt_dlp.extractor.common import InfoExtractor

class MuseAIIE(InfoExtractor):
    _VALID_URL = r'https?://(?:www\.)?muse\.ai/embed/(?P<id>[^/?#]+)'
    _TESTS = [
        {
            'url': 'https://muse.ai/embed/YdTWvUW',
            'info_dict': {
                'id': 'YdTWvUW',
                'ext': 'm3u8',
                'title': 'Video Title',
                'thumbnail': r're:https?://.*\.jpg$',
            },
            'params': {
                'skip_download': True,
            },
        }
    ]

    def _real_extract(self, url):
        video_id = self._match_id(url)
        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
        }
        webpage = self._download_webpage(url, video_id, headers=headers)

        thumbnail = self._html_search_regex(
            r'<img[^>]+src=["\'](https?://[^"\']+\.jpg)["\']',
            webpage, 'thumbnail', default=None)

        if not thumbnail:
            self.report_warning('Thumbnail not found')
            return None

        try:
            playlist_url = thumbnail.replace('thumbnails/video.jpg', 'videos/hls.m3u8')
        except:
            self.report_error('Error in forming playlist_url')
            return None

        title = self._html_search_regex(
            r'<a[^>]+class=["\']player-title["\'][^>]+href=["\'][^"\']*["\'][^>]*>([^<]+)',
            webpage, 'title', default='Untitled Video')

        formats = []
        if playlist_url:
            formats.append({
                'url': playlist_url,
                'format_id': 'hls-720p',
                'ext': 'mp4',
                'vcodec': 'h264',
                'acodec': 'aac',
            })

        return {
            'id': video_id,
            'title': title,
            'formats': formats,
            'thumbnail': thumbnail,
        }

CHJ85 · 2023-07-14T19:42:58Z

@pukkandan It turned out to be Firefox related. Not a geo block.
As it turns out, muse.ai is not optimized to work with Firefox for some reason.

CHJ85 · 2023-07-15T02:51:11Z

I took another look at this project. The website seem to block access to user agents that aren't Chrome based browsers. So I tried using recent Chrome user-agent. Still no luck grabbing the thumbnail url.
Not sure what else I can do here, with my limited programming skills. 😀 Sorry.

2011 · 2023-07-16T12:06:30Z

Maybe a geographic restriction exists that I don't know about (I have a United States IP address).

This is why you don't fill in "global"...

No indication existed of any geogragphical blocking (the site basically hosts videos for customers, and news sites generally want their content available everywhere).

Not sure what else I can do here, with my limited programming skills. grinning Sorry.

I have even more limited programming skills, but I have to ask if downloading the files in one piece (from the .mpd file) uses fewer resources (and reduces complexity) compared to downloading "streamed" (hls) versions.

Still no luck grabbing the thumbnail url.

Even the very short (non-javascript) version of the web page has this as part of the (non-executed, obviously) script:

player.setData(
{"description": "", "download": 0, "duration": 1291.03, "embed_domains": ["todaynewsafrica.com", "substack.com"], "fid": "04a549c2aa68bdc90d9c6fe59913aa09d40404b9abfbc3a77a338426ec3590d0", "filename": "2023-05-28-Grabien-1941111 1.mp4", "height": 720, "ingest_video": 2, "ingesting": false, "license": "owned", "mature": false, "owner_name": "Today News Africa", "owner_shid": "8ytRPX2", "owner_username": "TodayNewsAfrica", "regions": ["na", "eu"], "size": 894950240, "svid": "YdTWvUW", "tcreated": 1685285044, "title": "2023-05-28-Grabien-1941111 (1)", "twatched": 3164676, "url": "https://cdn.muse.ai/u/JdsD4tX/04a549c2aa68bdc90d9c6fe59913aa09d40404b9abfbc3a77a338426ec3590d0/data", "views": 5852, "visibility": "public", "width": 1280},
{time: args.get('start') || 0},
);

That snippet appears to contain all of the information needed (user id and video id) to download the video and audio files.

Also, this looks like another form of the video (although that page contains other videos, which makes extracting the correct video information slightly trickier):

https://muse.ai/v/YdTWvUW

CHJ85 · 2023-07-16T22:35:39Z

@bashonly Great work as always, man!

Closes #7543 Authored by: bashonly

Closes yt-dlp#7543 Authored by: bashonly

2011 added site-request Request to support a new website triage Untriaged issue labels Jul 8, 2023

This comment was marked as resolved.

Sign in to view

pukkandan removed the triage Untriaged issue label Jul 14, 2023

bashonly mentioned this issue Jul 16, 2023

[ie/MuseAI] Add extractor #7614

Merged

9 tasks

bashonly closed this as completed in #7614 Jul 20, 2023

bashonly added a commit that referenced this issue Jul 20, 2023

[ie/MuseAI] Add extractor (#7614)

65cfa2b

Closes #7543 Authored by: bashonly

aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this issue Apr 21, 2024

[ie/MuseAI] Add extractor (yt-dlp#7614)

b0ebaac

Closes yt-dlp#7543 Authored by: bashonly

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for muse.ai #7543

add support for muse.ai #7543

2011 commented Jul 8, 2023

CHJ85 commented Jul 13, 2023 •

edited

2011 commented Jul 14, 2023

CHJ85 commented Jul 14, 2023

This comment was marked as resolved.

CHJ85 commented Jul 14, 2023 •

edited

CHJ85 commented Jul 14, 2023 •

edited

CHJ85 commented Jul 15, 2023

2011 commented Jul 16, 2023

CHJ85 commented Jul 16, 2023

add support for muse.ai #7543

add support for muse.ai #7543

Comments

2011 commented Jul 8, 2023

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

Checklist

Region

Example URLs

Provide a description that is worded well enough to be understood

Provide verbose output that clearly demonstrates the problem

Complete Verbose Output

CHJ85 commented Jul 13, 2023 • edited

2011 commented Jul 14, 2023

CHJ85 commented Jul 14, 2023

This comment was marked as resolved.

CHJ85 commented Jul 14, 2023 • edited

CHJ85 commented Jul 14, 2023 • edited

CHJ85 commented Jul 15, 2023

2011 commented Jul 16, 2023

CHJ85 commented Jul 16, 2023

CHJ85 commented Jul 13, 2023 •

edited

CHJ85 commented Jul 14, 2023 •

edited

CHJ85 commented Jul 14, 2023 •

edited