Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VideoCdn] Add new extractor #31481

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

mbunse
Copy link

@mbunse mbunse commented Jan 21, 2023

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

I came across same company websites that provided videos via video-cdn.net, e.g. this. Some more can be found with the help of Google. The PR adds an extractor for those videos.

Copy link
Contributor

@dirkf dirkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work!

I've made a few comments.

Also, if there's some standard embedding that's used with this host, that embedding should be supported in the generic extractor.

youtube_dl/extractor/videocdn.py Outdated Show resolved Hide resolved
youtube_dl/extractor/videocdn.py Outdated Show resolved Hide resolved
youtube_dl/extractor/videocdn.py Outdated Show resolved Hide resolved
youtube_dl/extractor/videocdn.py Outdated Show resolved Hide resolved
youtube_dl/extractor/videocdn.py Outdated Show resolved Hide resolved
youtube_dl/extractor/videocdn.py Outdated Show resolved Hide resolved
youtube_dl/extractor/videocdn.py Show resolved Hide resolved
youtube_dl/extractor/videocdn.py Outdated Show resolved Hide resolved
youtube_dl/extractor/videocdn.py Outdated Show resolved Hide resolved
@mbunse
Copy link
Author

mbunse commented Jan 21, 2023

Also, if there's some standard embedding that's used with this host, that embedding should be supported in the generic extractor.

I see so many different ways videos from this cdn are embedded, e.g. via a clickable image without any hint on the 'player-id' or a div element with mi24-video-player video-id and player-id attribute or a div with data-video-id and data-player-id attributes. I think it'll be hard to maintain all those different embeddings. What do you think, @dirkf?

@mbunse mbunse requested a review from dirkf January 21, 2023 20:22
@dirkf
Copy link
Contributor

dirkf commented Jan 23, 2023

These pages don't even mention the video host in the non-JS HTML seen by yt-dl:

This has each video link as the value of the src attribute of an <iframe>:

Embedded links of the above type could be extracted. yt-dl doesn't find links like this by default (#12692, #6216) but a lot of extractors are doing something similar:

<iframe> extraction list

./cbc.py:144:                  r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"',
./dbtv.py:37:              r'<iframe[^>]+src=(["\'])((?:https?:)?//(?:www\.)?dagbladet\.no/video/embed/(?:[0-9A-Za-z_-]{11}|[a-zA-Z0-9]{8}).*?)\1',
./springboardplatform.py:56:                  r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//cms\.springboardplatform\.com/embed_iframe/\d+/video/\d+.*?)\1',
./thisoldhouse.py:45:              r'<iframe[^>]+src=[\'"](?:https?:)?//(?:www\.)?thisoldhouse\.(?:chorus\.build|com)/videos/zype/([0-9a-f]{24})',
./ustream.py:78:              r'<iframe[^>]+?src=(["\'])(?P<url>https?://(?:www\.)?(?:ustream\.tv|video\.ibm\.com)/embed/.+?)\1', webpage)
./vshare.py:32:              r'<iframe[^>]+?src=["\'](?P<url>(?:https?:)?//(?:www\.)?vshare\.io/v/[^/?#&]+)',
./iwara.py:64:                  r'<iframe[^>]+src=([\'"])(?P<url>[^\'"]+)\1',
./vodlocker.py:56:                  r'<iframe[^>]+src=(["\'])(?P<url>(?:https?://)?vodlocker\.(?:com|city)/embed-.+?)\1',
./generic.py:2831:              r'<iframe[^>]+?src="((?:https?:)?//(?:(?:www|static)\.)?rtl\.nl/(?:system/videoplayer/[^"]+(?:video_)?)?embed[^"]+)"',
./generic.py:2862:              r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.[a-z]{2,3}/widget/jukebox\?.+?)\1', webpage)
./generic.py:2906:              r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:cache\.)?vevo\.com/.+?)\1', webpage)
./generic.py:2919:              r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//graphics8\.nytimes\.com/bcvideo/[^/]+/iframe/embed\.html.+?)\1>',
./generic.py:2926:              r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//html5-player\.libsyn\.com/embed/.+?)\1', webpage)
./generic.py:2956:          mobj = re.search(r'<iframe .*?src="(http://www\.aparat\.com/video/[^"]+)"', webpage)
./generic.py:2961:          mobj = re.search(r'<iframe .*?src="(http://mpora\.(?:com|de)/videos/[^"]+)"', webpage)
./generic.py:2971:          mobj = re.search(r'<iframe[^>]+?src=(["\'])(?P<url>https?://vk\.com/video_ext\.php.+?)\1', webpage)
./generic.py:2992:              r'<iframe[^>]+?src=(["\'])(?P<url>https?://embed\.live\.huffingtonpost\.com/.+?)\1', webpage)
./generic.py:3005:          matches = re.findall(r'<iframe[^>]+?src="(https?://(?:www\.)?funnyordie\.com/embed/[^"]+)"', webpage)
./generic.py:3083:              r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//cloud\.tvigle\.ru/video/.+?)\1', webpage)
./generic.py:3089:              r'<iframe[^>]+?src=(["\'])(?P<url>https?://embed(?:-ssl)?\.ted\.com/.+?)\1', webpage)
./generic.py:3105:              r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?://)?embed\.francetv\.fr/\?ue=.+?)\1',
./generic.py:3132:              r'<iframe[^>]+?src=(["\'])(?P<url>https?://(?:screen|movies)\.yahoo\.com/.+?\.html\?format=embed)\1',
./generic.py:3151:              r'<iframe[^>]+?src=(["\'])(?P<url>https?://player\.cinchcast\.com/.+?)\1',
./generic.py:3157:              r'<iframe[^>]+?src=(["\'])(?P<url>https?://m(?:lb)?\.mlb\.com/shared/video/embed/embed\.html\?.+?)\1',
./generic.py:3173:              r'<iframe[^>]+src="(?P<url>https?://(?:new\.)?livestream\.com/[^"]+/player[^"]+)"',
./generic.py:3180:              r'<iframe[^>]+src="(?P<url>https?://(?:www\.)?zapiks\.fr/index\.php\?.+?)"', webpage)
./generic.py:3199:              r'<iframe[^>]+src="https?://(?P<host>media\.clipyou\.ru)/index/player\?.*\brecord_id=(?P<id>\d+).*"', webpage)
./generic.py:3243:              r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//www\.nbcnews\.com/widget/video-embed/[^"\']+)\1', webpage)
./generic.py:3254:              r'<iframe[^>]+src="(?:https?:)?(?P<url>%s)"' % UDNEmbedIE._PROTOCOL_RELATIVE_VALID_URL, webpage)
./generic.py:3314:              r'<iframe[^>]+src=[\'"]((?:https?:)?//video\.tv\.adobe\.com/v/\d+[^"]+)[\'"]',
./generic.py:3323:              r'<iframe[^>]+src=[\'"]((?:https?:)?//(?:www\.)?vine\.co/v/[^/]+/embed/(?:simple|postcard))',
./generic.py:3331:              r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:(?:www\.)?vod-platform\.net|embed\.kwikmotion\.com)/[eE]mbed/.+?)\1',
./generic.py:3519:              r'<iframe[^>]+?\bsrc\s*=\s*(["\'])(?P<url>(?:https?:)?//embed\.share-videos\.se/auto/embed/\d+\?.*?\buid=\d+.*?)\1',
./myvi.py:65:              r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//myvi\.(?:ru/player|tv)/(?:embed/html|flash)/[^"]+)\1', webpage)
./vine.py:17:      _EMBED_REGEX = [r'<iframe\b[^>]+\bsrc\s*=\s*[\'"](?P<url>(?:https?:)?//(?:www\.)?vine\.co/v/[^/]+/embed/(?:simple|postcard))']
./common.py.ld+json:701:                  r'<iframe src="([^"]+)"', content,
./tunein.py:17:              r'<iframe[^>]+src=["\'](?P<url>(?:https?://)?tunein\.com/embed/player/[pst]\d+)',
./videopress.py:48:              r'<iframe[^>]+src=["\']((?:https?://)?%s%s)' % (VideoPressIE._PATH_REGEX, VideoPressIE._ID_REGEX),
./threeqsdn.py:83:              r'<iframe[^>]+\b(?:data-)?src=(["\'])(?P<url>%s.*?)\1' % ThreeQSDNIE._VALID_URL, webpage)
./streamable.py:60:              r'<iframe[^>]+src=(?P<q1>[\'"])(?P<src>(?:https?:)?//streamable\.com/(?:(?!\1).+))(?P=q1)',
./francetv.py:45:      _EMBED_REGEX = [r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?://)?embed\.francetv\.fr/\?ue=.+?)\1']
./iprima.py:86:              (r'<iframe[^>]+\bsrc=["\'](?:https?:)?//(?:api\.play-backend\.iprima\.cz/prehravac/embedded|prima\.iprima\.cz/[^/]+/[^/]+)\?.*?\bid=(p\d+)',
./karaoketv.py:27:              r'<iframe[^>]+src=(["\'])(?P<url>https?://www\.karaoke\.co\.il/api_play\.php\?.+?)\1',
./karaoketv.py:32:              r'<iframe[^>]+src=(["\'])(?P<url>https?://www\.video-cdn\.com/embed/iframe/.+?)\1',
./seznamzpravy.py:58:                  r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?seznamzpravy\.cz/iframe/player\?.*?)\1',
./lcp.py:72:              r'<iframe[^>]+src=(["\'])(?P<url>%s?(?:(?!\1).)*)\1' % LcpPlayIE._VALID_URL,
./mofosex.py:72:              r'<iframe[^>]+\bsrc=["\']((?:https?:)?//(?:www\.)?mofosex\.com/embed/?\?.*?\bvideoid=\d+)',
./vice.py:112:              r'<iframe\b[^>]+\bsrc=["\']((?:https?:)?//video\.vice\.com/[^/]+/embed/[\da-f]{24})',
./channel9.py:87:              r'<iframe[^>]+src=["\'](https?://channel9\.msdn\.com/(?:[^/]+/)+)player\b',
./abc.py:110:                  mobj = re.search(r'<iframe width="100%" src="(?P<url>//www\.youtube-nocookie\.com/embed/[^?"]+)', webpage)
./howcast.py:33:              r'<iframe[^>]+src="[^"]+\bembed_code=([^\b]+)\b',
./tvc.py:30:              r'<iframe[^>]+?src=(["\'])(?P<url>(?:http:)?//(?:www\.)?tvc\.ru/video/iframe/id/[^"]+)\1', webpage)
./nova.py:211:              r'<iframe[^>]+\bsrc=["\'](?:https?:)?//media\.cms\.nova\.cz/embed/([^/?#&]+)',
./tvnoe.py:31:              r'<iframe[^>]+src="([^"]+)"', webpage, 'iframe URL')
./sportbox.py:51:              r'<iframe[^>]+src="(https?://(?:news\.sportbox|matchtv)\.ru/vdl/player[^"]+)"',
./instagram.py.org:127:              r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?instagram\.com/p/[^/]+/embed.*?)\1',
./facebook.py:303:                  r'<iframe[^>]+?src=(["\'])(?P<url>https?://www\.facebook\.com/(?:video/embed|plugins/video\.php).+?)\1',
./twentymin.py:53:              r'<iframe[^>]+src=(["\'])(?P<url>(?:(?:https?:)?//)?(?:www\.)?20min\.ch/videoplayer/videoplayer.html\?.*?\bvideoId@\d+.*?)\1',
./abc.dlp.py:109:                  mobj = re.search(r'<iframe width="100%" src="(?P<url>//www\.youtube-nocookie\.com/embed/[^?"]+)', webpage)
./veehd.py:89:                  r'<iframe[^>]+src="/?([^"]+)"', player_page, 'iframe url')
./mtv.py:430:      _EMBED_REGEX = [r'<iframe\b[^>]+?\bsrc=(["\'])(?P<url>(?:https?:)?//media\.mtvnservices\.com/embed/.+?)\1']
./spankwire.py:73:              r'<iframe[^>]+\bsrc=["\']((?:https?:)?//(?:www\.)?spankwire\.com/EmbedPlayer\.aspx/?\?.*?\bArticleId=\d+)',
./rutube.py:138:              r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//rutube\.ru/embed/[\da-z]{32}.*?)\1',
./redtube.py:42:              r'<iframe[^>]+?src=["\'](?P<url>(?:https?:)?//embed\.redtube\.com/\?.*?\bid=\d+)',
./yapfiles.py:39:              r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?%s.*?)\1'
./cbssports.py:77:              r'<iframe[^>]+(?:data-)?src="(https?://[^/]+/player/embed[^"]+)"',
./tube8.py:37:              r'<iframe[^>]+\bsrc=["\']((?:https?:)?//(?:www\.)?tube8\.com/embed/(?:[^/]+/)+\d+)',
./drtuber.py:41:              r'<iframe[^>]+?src=["\'](?P<url>(?:https?:)?//(?:www\.)?drtuber\.com/embed/\d+)',
./viqeo.py:46:                  r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//cdn\.viqeo\.tv/embed/*\?.*?\bvid=[\da-f]+.*?)\1',
./espn.py:235:              r'<iframe[^>]+src=["\'](https?://fivethirtyeight\.abcnews\.go\.com/video/embed/\d+/\d+)',
./gdcvault.py:182:              PLAYER_REGEX = r'<iframe src="(?P<xml_root>.+?)/(?:gdc-)?player.*?\.html.*?".*?</iframe>'
./gdcvault.py:198:                  r'<iframe src=".*?\?xml(?:=|URL=xml/)(.+?\.xml).*?".*?</iframe>',
./odnoklassniki.py:129:              r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:odnoklassniki|ok)\.ru/videoembed/.+?)\1', webpage)
./googledrive.py:85:              r'<iframe[^>]+src="https?://(?:video\.google\.com/get_player\?.*?docid=|(?:docs|drive)\.google\.com/file/d/)(?P<id>[a-zA-Z0-9_-]{28,})',
./rutv.py:116:              r'<iframe[^>]+?src=(["\'])(?P<url>https?://(?:test)?player\.(?:rutv\.ru|vgtrk\.com)/(?:iframe/(?:swf|video|live)/id|index/iframe/cast_id)/.+?)\1', webpage)
./indavideo.py:54:              r'<iframe[^>]+\bsrc=["\'](?P<url>(?:https?:)?//embed\.indavideo\.hu/player/video/[\da-f]+)',
./pornhub.py:254:              r'<iframe[^>]+?src=["\'](?P<url>(?:https?:)?//(?:www\.)?pornhub(?:premium)?\.(?:com|net|org)/embed/[\da-z]+)',
./foxgay.py:40:              r'<iframe[^>]+src=([\'"])(?P<url>[^\'"]+)\1', webpage,
./pbs.py:472:                  r'<iframe[^>]+\bsrc=["\'](?:https?:)?//video\.pbs\.org/widget/partnerplayer/(\d+)',  # https://www.pbs.org/wgbh/masterpiece/episodes/victoria-s2-e1/
./washingtonpost.py:35:              r'<iframe[^>]+\bsrc=["\'](%s)' % cls._EMBED_URL, webpage)
./cbsnews.py:98:          for embed_url in re.findall(r'<iframe[^>]+data-src="(https?://(?:www\.)?cbsnews\.com/embed/video/[^#]*#[^"]+)"', webpage):
./funimation.py:100:              r'<iframe[^>]+src="/player/(\d+)',
./dailymail.py:35:              r'<iframe\b[^>]+\bsrc=["\'](?P<url>(?:https?:)?//(?:www\.)?dailymail\.co\.uk/embed/video/\d+\.html)',
./common.py:702:                  r'<iframe src="([^"]+)"', content,
./filmweb.py:35:              r'<iframe[^>]+src="([^"]+)', embed_code, 'iframe url'))
./ampl.py:38:              r'<iframe\b[^>]+?\bsrc=["\'](?P<url>%s)' % (cls._VALID_URL),
./pornhub.meld.py:137:      _EMBED_REGEX = [r'<iframe[^>]+?src=["\'](?P<url>(?:https?:)?//(?:www\.)?pornhub(?:premium)?\.(?:com|net|org)/embed/[\da-z]+)']
./pornhub.meld.py:287:              r'<iframe[^>]+?src=["\'](?P<url>(?:https?:)?//(?:www\.)?pornhub(?:premium)?\.(?:com|net|org)/embed/[\da-z]+)',
./mediaset.py:110:                  r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?video\.mediaset\.it/player/playerIFrame(?:Twitter)?\.shtml.*?)\1',
./ministrygrid.py:50:                  r'<iframe.*?src="([^"]+)"', portlet_code, 'video iframe',
./nexx.py:445:              r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//embed\.nexx(?:\.cloud|cdn\.com)/\d+/(?:(?!\1).)+)\1',
./soundcloud.py:45:              r'<iframe[^>]+src=(["\'])(?P<url>(?:https?://)?(?:w\.)?soundcloud\.com/player.+?)\1',
./instagram.py:305:      _EMBED_REGEX = [r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?instagram\.com/p/[^/]+/embed.*?)\1']
./instagram.py:464:              r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?instagram\.com/p/[^/]+/embed.*?)\1',
./vbox7.py:60:              r'<iframe[^>]+src=(?P<q>["\'])(?P<url>(?:https?:)?//vbox7\.com/emb/external\.php.+?)(?P=q)',
./apa.py:41:                  r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//[^/]+\.apa\.at/embed/[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}.*?)\1',
./joj.py:48:                  r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//media\.joj\.sk/embed/(?:(?!\1).)+)\1',
./vk.py:123:      _EMBED_REGEX = [r'<iframe[^>]+?src=(["\'])(?P<url>https?://vk\.(?:com|ru)/video_ext\.php.+?)\1']
./vk.py:125:      __SIBNET_EMBED_REGEX = r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//video\.sibnet\.ru/shell\.php\?.*?\bvideoid=\d+.*?)\1'
./biobiochiletv.py:67:              r'<iframe[^>]+src=(?P<q1>[\'"])(?P<url>(?:https?:)?//rudo\.video/vod/[0-9a-zA-Z]+)(?P=q1)',
./lifenews.py:102:              r'<iframe[^>]+src=["\']((?:https?:)?//embed\.life\.ru/(?:embed|video)/.+?)["\']',
./viewlift.py:89:              r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:embed\.)?(?:%s)/embed/player.+?)\1' % ViewLiftBaseIE._DOMAINS_REGEX,
./pladform.py:52:              r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//out\.pladform\.ru/player\?.+?)\1', webpage)
./mwave.py:87:              r'<iframe[^>]+src="/mnettv/ifr_clip\.m\?searchVideoDetailVO\.clip_id=(\d+)',
./youporn.py:74:              r'<iframe[^>]+\bsrc=["\']((?:https?:)?//(?:www\.)?youporn\.com/embed/\d+)',
./abc.py.txt:112:                  mobj = re.search(r'<iframe width="100%" src="(?P<url>//www\.youtube-nocookie\.com/embed/[^?"]+)', webpage)
./periscope.py:101:              r'<iframe[^>]+src=([\'"])(?P<url>(?:https?:)?//(?:www\.)?(?:periscope|pscp)\.tv/(?:(?!\1).)+)\1', webpage)
./tnaflix.py:202:              r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//player\.(?:tna|emp)flix\.com/video/\d+)\1',
./tvp.py:326:              r'<iframe[^>]+src="[^"]*?embed\.php\?(?:[^&]+&)*ID=(\d+)',
./tvp.py:327:              r'<iframe[^>]+src="[^"]*?object_id=(\d+)',
./brightcove.py:284:              r'<iframe[^>]+src=([\'"])((?:https?:)?//link\.brightcove\.com/services/player/(?!\1).+)\1', webpage)]
./brightcove.py:423:                  r'<iframe[^>]+src=(["\'])((?:https?:)?//players\.brightcove\.net/\d+/[^/]+/index\.html.+?)\1', webpage):
./piksel.py:71:              r'<iframe[^>]+src=["\'](?P<url>(?:https?:)?//player\.piksel\.com/v/[a-z0-9]+)',
./vimeo.py:556:                  r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//player\.vimeo\.com/video/\d+.*?)\1',
./vimeo.py:1159:              r'<iframe[^>]+src="(https?://embed\.vhx\.tv/videos/\d+[^"]*)"', webpage)
./ndr.py:206:               r'<iframe[^>]+id="pp_([\da-z]+)"', ),
./bilibili.org.py:138:                   r'<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'],
./xhamster.py:396:              r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?xhamster\.com/xembed\.php\?video=\d+)\1',
./arkena.py:60:              r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//play\.arkena\.com/embed/avp/.+?)\1',
./hentaistigma.py:28:              r'<iframe[^>]+src="([^"]+mp4)"', webpage, 'wrapper url')
./common.py.org:697:                  r'<iframe src="([^"]+)"', content,
./megaphone.py:55:              r'<iframe[^>]*?\ssrc=["\'](%s)' % cls._VALID_URL, webpage)]
./expressen.py:54:                  r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?(?:expressen|di)\.se/(?:tvspelare/video|videoplayer/embed)/tv/.+?)\1',
./videa.py:86:              r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//videa\.hu/player\?.*?\bv=.+?)\1',
./videa.py:121:                  r'<iframe.*?src="(/player\?[^"]+)"', video_page, 'player url')
./eagleplatform.py:62:              r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//.+?\.media\.eagleplatform\.com/index/player\?.+?)\1',
./motorsport.py:37:              r'<iframe id="player_iframe"[^>]+src="([^"]+)"', webpage,
./xfileshare.py:96:                  r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:%s)/embed-[0-9a-zA-Z]+.*?)\1'
./vzaar.py:56:              r'<iframe[^>]+src=["\']((?:https?:)?//(?:view\.vzaar\.com)/[0-9]+)',
./videomore.py:141:                  r'<iframe[^>]+src=([\'"])(?P<url>https?://videomore\.ru/embed/\d+)',

In due course we'll follow this refactoring of the generic extractor from yt-dlp. If a routine is added for this case, it should use the APIs there, eg add a local IE with class property _EMBED_REGEX and (temporary) class method _extract_from_webpage(cls, url, webpage) that generates (or returns) the unique entries from the page, and call that method from a new fragment in the generic extractor.

Copy link
Contributor

@dirkf dirkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to delete l.5 to pass the Linter test (done). Otherwise all good so far.

@mbunse
Copy link
Author

mbunse commented Jan 23, 2023

@dirkf Thank you for the linter fix!

So, should I provide additional code for the generic extractor to at least find the iframe embeddings in this PR?

@dirkf
Copy link
Contributor

dirkf commented Jan 23, 2023

Up to you, really.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants