Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for muse.ai #7543

Closed
9 of 11 tasks
2011 opened this issue Jul 8, 2023 · 9 comments · Fixed by #7614
Closed
9 of 11 tasks

add support for muse.ai #7543

2011 opened this issue Jul 8, 2023 · 9 comments · Fixed by #7614
Labels
site-request Request to support a new website

Comments

@2011
Copy link

2011 commented Jul 8, 2023

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

Region

global

Example URLs

https://muse.ai/embed/YdTWvUW

Provide a description that is worded well enough to be understood

Above link returns "Unsupported URL", but digging through the source (and network traffic with javascript enabled) returns the following .mpd file (which yt-dlp can download without any difficulty):

https://cdn-eu.muse.ai/u/JdsD4tX/04a549c2aa68bdc90d9c6fe59913aa09d40404b9abfbc3a77a338426ec3590d0/videos/dash.mpd

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • If using API, add 'verbose': True to YoutubeDL params instead
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

[debug] Command-line config: ['--restrict-filenames', '-o', '%(title)s-%(id)s-%(uploader)s.%(ext)s', '-w', '-v', 'https://muse.ai/embed/YdTWvUW']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8 (No ANSI), error utf-8 (No ANSI), screen utf-8 (No ANSI)
[debug] yt-dlp version stable@2023.07.06 [b532a3481]
[debug] Python 3.11.4 (CPython x86_64 64bit) - Linux-6.1.28-with-glibc2.37 (OpenSSL 3.0.9 30 May 2023, glibc 2.37)
[debug] exe versions: ffmpeg 4.4.4 (setts), ffprobe 4.4.4
[debug] Optional libraries: certifi-3021.03.16, pycrypto-3.18.0
[debug] Proxy map: {}
[debug] Loaded 1855 extractors
[generic] Extracting URL: https://muse.ai/embed/YdTWvUW
[generic] YdTWvUW: Downloading webpage
WARNING: [generic] Falling back on generic information extractor
[generic] YdTWvUW: Extracting information
[debug] Looking for embeds
ERROR: Unsupported URL: https://muse.ai/embed/YdTWvUW
Traceback (most recent call last):
  File "/usr/lib/python3.11/site-packages/yt_dlp/YoutubeDL.py", line 1560, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/yt_dlp/YoutubeDL.py", line 1688, in __extract_info
    ie_result = ie.extract(url)
                ^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/yt_dlp/extractor/common.py", line 710, in extract
    ie_result = self._real_extract(url)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/yt_dlp/extractor/generic.py", line 2568, in _real_extract
    raise UnsupportedError(url)
yt_dlp.utils.UnsupportedError: Unsupported URL: https://muse.ai/embed/YdTWvUW
@2011 2011 added site-request Request to support a new website triage Untriaged issue labels Jul 8, 2023
@CHJ85
Copy link
Contributor

CHJ85 commented Jul 13, 2023

The example you provided won't do as it's encrypted.
Use the HLS format instead.
https://cdn-eu.muse.ai/u/JdsD4tX/04a549c2aa68bdc90d9c6fe59913aa09d40404b9abfbc3a77a338426ec3590d0/videos/hls.m3u8
As for your embed link, it says Video Unavailable.
Please provide a valid link.

@2011
Copy link
Author

2011 commented Jul 14, 2023

The example you provided won't do as it's encrypted. Use the HLS format instead.
https://cdn-eu.muse.ai/u/JdsD4tX/04a549c2aa68bdc90d9c6fe59913aa09d40404b9abfbc3a77a338426ec3590d0/videos/hls.m3u8

Where did you get that link from? I have rather minimal skills at traffic analysis (meaning looking at things in a browser's Developer Tools window). I didn't see that. The (.mpd) link I provided works fine (for me) in yt-dlp.

As for your embed link, it says Video Unavailable. Please provide a valid link.

I just tried the embed link again, and it also opens fine for me. Maybe a geographic restriction exists that I don't know about (I have a United States IP address).

Of note also (looking at the Network tab of Developer Tools), the traffic shows links to the complete files (which I can download with wget, for example), even though the browser keeps returning status 206 (partial content):

https://cdn-na.muse.ai/u/JdsD4tX/04a549c2aa68bdc90d9c6fe59913aa09d40404b9abfbc3a77a338426ec3590d0/videos/video-720p-video.mp4

https://cdn-na.muse.ai/u/JdsD4tX/04a549c2aa68bdc90d9c6fe59913aa09d40404b9abfbc3a77a338426ec3590d0/audios/audio-196k-stereo.mp4

@CHJ85
Copy link
Contributor

CHJ85 commented Jul 14, 2023

It must be geo-protected, I guess.
Also, I didn't "get" the link from anywhere. I literally just renamed the file itself (dash.mpd). Because hls.m3u8 made the most sense.
But without access the video on the website itself

That being said though. The video thumbnail is https://cdn-eu.muse.ai/u/JdsD4tX/04a549c2aa68bdc90d9c6fe59913aa09d40404b9abfbc3a77a338426ec3590d0/thumbnails/video.jpg
But the video playlist where the video files are available is https://cdn-eu.muse.ai/u/JdsD4tX/04a549c2aa68bdc90d9c6fe59913aa09d40404b9abfbc3a77a338426ec3590d0/videos/hls.m3u8.
So unless the video URL exists within the source code, the extractor needs a rename function in order to get the correct video URL, and then pipe this URL through ffmpeg.

@pukkandan

This comment was marked as resolved.

@pukkandan pukkandan removed the triage Untriaged issue label Jul 14, 2023
@CHJ85
Copy link
Contributor

CHJ85 commented Jul 14, 2023

Here's what I put together. But I'm stuck between a rock and a hard place.
My attempt here is to target the thumbnail class and rename "thumbnails/video.jpg' to 'videos/hls.m3u8".
With the current setup, the m3u8 playlist URL extracted from the Muse.ai website will be in the following format:
https://cdn-eu.muse.ai/u/{user_id}/{video_id}/videos/hls.m3u8.
This is the part that for some reason goes wrong.

from __future__ import unicode_literals
import re
from yt_dlp.extractor.common import InfoExtractor

class MuseAIIE(InfoExtractor):
    _VALID_URL = r'https?://(?:www\.)?muse\.ai/embed/(?P<id>[^/?#]+)'
    _TESTS = [
        {
            'url': 'https://muse.ai/embed/YdTWvUW',
            'info_dict': {
                'id': 'YdTWvUW',
                'ext': 'm3u8',
                'title': 'Video Title',
                'thumbnail': r're:https?://.*\.jpg$',
            },
            'params': {
                'skip_download': True,
            },
        }
    ]

    def _real_extract(self, url):
        video_id = self._match_id(url)
        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
        }
        webpage = self._download_webpage(url, video_id, headers=headers)

        thumbnail = self._html_search_regex(
            r'<img[^>]+src=["\'](https?://[^"\']+\.jpg)["\']',
            webpage, 'thumbnail', default=None)

        if not thumbnail:
            self.report_warning('Thumbnail not found')
            return None

        try:
            playlist_url = thumbnail.replace('thumbnails/video.jpg', 'videos/hls.m3u8')
        except:
            self.report_error('Error in forming playlist_url')
            return None

        title = self._html_search_regex(
            r'<a[^>]+class=["\']player-title["\'][^>]+href=["\'][^"\']*["\'][^>]*>([^<]+)',
            webpage, 'title', default='Untitled Video')

        formats = []
        if playlist_url:
            formats.append({
                'url': playlist_url,
                'format_id': 'hls-720p',
                'ext': 'mp4',
                'vcodec': 'h264',
                'acodec': 'aac',
            })

        return {
            'id': video_id,
            'title': title,
            'formats': formats,
            'thumbnail': thumbnail,
        }


@CHJ85
Copy link
Contributor

CHJ85 commented Jul 14, 2023

@pukkandan It turned out to be Firefox related. Not a geo block.
As it turns out, muse.ai is not optimized to work with Firefox for some reason.

@CHJ85
Copy link
Contributor

CHJ85 commented Jul 15, 2023

I took another look at this project. The website seem to block access to user agents that aren't Chrome based browsers. So I tried using recent Chrome user-agent. Still no luck grabbing the thumbnail url.
Not sure what else I can do here, with my limited programming skills. 😀 Sorry.

@2011
Copy link
Author

2011 commented Jul 16, 2023

Maybe a geographic restriction exists that I don't know about (I have a United States IP address).

This is why you don't fill in "global"...

No indication existed of any geogragphical blocking (the site basically hosts videos for customers, and news sites generally want their content available everywhere).

Not sure what else I can do here, with my limited programming skills. grinning Sorry.

I have even more limited programming skills, but I have to ask if downloading the files in one piece (from the .mpd file) uses fewer resources (and reduces complexity) compared to downloading "streamed" (hls) versions.

Still no luck grabbing the thumbnail url.

Even the very short (non-javascript) version of the web page has this as part of the (non-executed, obviously) script:

player.setData(
{"description": "", "download": 0, "duration": 1291.03, "embed_domains": ["todaynewsafrica.com", "substack.com"], "fid": "04a549c2aa68bdc90d9c6fe59913aa09d40404b9abfbc3a77a338426ec3590d0", "filename": "2023-05-28-Grabien-1941111 1.mp4", "height": 720, "ingest_video": 2, "ingesting": false, "license": "owned", "mature": false, "owner_name": "Today News Africa", "owner_shid": "8ytRPX2", "owner_username": "TodayNewsAfrica", "regions": ["na", "eu"], "size": 894950240, "svid": "YdTWvUW", "tcreated": 1685285044, "title": "2023-05-28-Grabien-1941111 (1)", "twatched": 3164676, "url": "https://cdn.muse.ai/u/JdsD4tX/04a549c2aa68bdc90d9c6fe59913aa09d40404b9abfbc3a77a338426ec3590d0/data", "views": 5852, "visibility": "public", "width": 1280},
{time: args.get('start') || 0},
);

That snippet appears to contain all of the information needed (user id and video id) to download the video and audio files.

Also, this looks like another form of the video (although that page contains other videos, which makes extracting the correct video information slightly trickier):

https://muse.ai/v/YdTWvUW

@CHJ85
Copy link
Contributor

CHJ85 commented Jul 16, 2023

@bashonly Great work as always, man!

bashonly added a commit that referenced this issue Jul 20, 2023
Closes #7543
Authored by: bashonly
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this issue Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-request Request to support a new website
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants