How to properly add custom extractor? #32402

CHJ85 · 2023-07-04T14:30:15Z

Hi there. So I know how to create an extractor python file.
But if I add from .extractor import NameIE to _extractors.py, it'll eventually be overwritten next time an update edits the file.
Is there a better to do this that I don't know of?

dirkf · 2023-07-04T14:48:37Z

_extractors.py belongs to yt-dlp. In that case read about yt-dlp plugins, which is probably what you want. With yt-dl, you have to maintain your own version of the program if you want to use a private custom extractor.

CHJ85 · 2023-07-04T14:55:41Z

RIght. I mixed up the two cuz I have both installed.
Sorry about that.
So how do I add a custom extractor to youtube-dl?
Do I just add it to the _init.py file and hope it won't be updated anytime soon?

dirkf · 2023-07-04T15:25:42Z

See https://github.com/ytdl-org/youtube-dl#user-content-adding-support-for-a-new-site, especially item 5.

CHJ85 · 2023-07-04T16:52:35Z

Thank you.
But just to clarify, this is the correct way?
Because I did this, but the site is still not a "supported url".
And if this is correct, there must be something wrong with my extractor.

import contextlib
import os

from ..plugins import load_plugins

# NB: Must be before other imports so that plugins can be correctly injected
_PLUGIN_CLASSES = load_plugins('extractor', 'IE')

_LAZY_LOADER = False
if not os.environ.get('YTDLP_NO_LAZY_EXTRACTORS'):
    with contextlib.suppress(ImportError):
        from .lazy_extractors import *  # noqa: F403
        from .lazy_extractors import _ALL_CLASSES
        _LAZY_LOADER = True

if not _LAZY_LOADER:
    from ._extractors import *  # noqa: F403
    _ALL_CLASSES = [  # noqa: F811
        klass
        for name, klass in globals().items()
        if name.endswith('IE') and name != 'GenericIE'
    ]
    _ALL_CLASSES.append(GenericIE)  # noqa: F405

globals().update(_PLUGIN_CLASSES)
_ALL_CLASSES[:0] = _PLUGIN_CLASSES.values()

from .common import _PLUGIN_OVERRIDES  # noqa: F401

# Custom extractors
from .kimcartoon import KimCartoonIE
_ALL_CLASSES.append(KimCartoonIE)

dirkf · 2023-07-04T17:01:28Z

In yt-dl, edit extractor/extractors.py. For yt-dlp, ask there (#30839).

CHJ85 · 2023-07-04T17:22:01Z

Yea, that is my extractors.py file in youtube-dl.
I don't really care for editing yt-dlp. I just confused it earlier.
So is the one I provided the correct implementation?

dirkf · 2023-07-04T23:11:22Z

Example (git log --patch de48105dd870e353af468bfb8d49b14d9894e649 youtube_dl/extractor/extractors.py):

commit de48105dd870e353af468bfb8d49b14d9894e649
Author: fonkap <fonk666@gmail.com>
Date:   Sat Feb 11 03:47:43 2023 +0100

    [KommunetvIE] Add extractor for kommunetv.no (#31516)
    
    * Add extractor for kommunetv.no
    * Using utils.update_url instead of regex
    
    ---------
    
    Co-authored-by: dirkf <fieldhouse@gmx.net>

diff --git a/youtube_dl/extractor/extractors.py b/youtube_dl/extractor/extractors.py
index f63a2e030..d8428f46f 100644
--- a/youtube_dl/extractor/extractors.py
+++ b/youtube_dl/extractor/extractors.py
@@ -557,6 +557,7 @@ from .khanacademy import (
 from .kickstarter import KickStarterIE
 from .kinja import KinjaEmbedIE
 from .kinopoisk import KinoPoiskIE
+from .kommunetv import KommunetvIE
 from .konserthusetplay import KonserthusetPlayIE
 from .krasview import KrasViewIE
 from .kth import KTHIE

CHJ85 · 2023-07-05T01:13:48Z

Thank you. Yea, I figure that part out.
Turns out there's an issue with my extractor.

from __future__ import unicode_literals
from youtube_dl.extractor.common import InfoExtractor


class KimCartoonIE(InfoExtractor):
    _VALID_URL = r'https?://(?:www\.)?kimcartoon\.(?:li|me)/[^/]+/watch/[^/]+'
    _TESTS = [
        {
            'url': 'https://kimcartoon.li/Cartoon/Camp-Lazlo/Season-02-Episode-12-Theres-No-Place-Like-Gnome-Hot-Spring-Fever',
            'info_dict': {
                'id': '21899',
                'ext': 'mp4',
                'description': 'Watch online and download cartoon Camp Lazlo! Season 02 Episode 12 Theres No Place Like Gnome - Hot Spring Fever  in high quality. Various formats from 240p to 720p HD (or even 1080p). HTML5 available for mobile devices',
                'title': 'Season 02 Episode 12 Theres No Place Like Gnome - Hot Spring Fever',
            },
            'params': {
                # Some episodes may be geo-restricted
                'skip_download': True,
            },
        }
    ]

    def _real_extract(self, url):
        webpage = self._download_webpage(url, None, note=False)

        # Extract video information from the webpage
        video_id = self._match_id(url)
        title = self._html_search_regex(r'<title>(.*?)</title>', webpage, 'title')
        description = self._html_search_regex(r'<meta name="description" content="(.*?)"', webpage, 'description')

        return {
            'id': video_id,
            'title': title,
            'url': 'Sample URL',
            'ext': 'mp4',
            'description': description,
        }

CHJ85 added the question label Jul 4, 2023

dirkf closed this as completed Jul 4, 2023

dirkf added the documentation label Jul 4, 2023

dirkf mentioned this issue Jul 13, 2023

Youtube Unable to extract uploader id - DUPLICATE #32436

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to properly add custom extractor? #32402

How to properly add custom extractor? #32402

CHJ85 commented Jul 4, 2023

dirkf commented Jul 4, 2023

CHJ85 commented Jul 4, 2023

dirkf commented Jul 4, 2023 •

edited

Loading

CHJ85 commented Jul 4, 2023 •

edited

Loading

dirkf commented Jul 4, 2023

CHJ85 commented Jul 4, 2023

dirkf commented Jul 4, 2023

CHJ85 commented Jul 5, 2023

How to properly add custom extractor? #32402

How to properly add custom extractor? #32402

Comments

CHJ85 commented Jul 4, 2023

dirkf commented Jul 4, 2023

CHJ85 commented Jul 4, 2023

dirkf commented Jul 4, 2023 • edited Loading

CHJ85 commented Jul 4, 2023 • edited Loading

dirkf commented Jul 4, 2023

CHJ85 commented Jul 4, 2023

dirkf commented Jul 4, 2023

CHJ85 commented Jul 5, 2023

dirkf commented Jul 4, 2023 •

edited

Loading

CHJ85 commented Jul 4, 2023 •

edited

Loading