Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to properly add custom extractor? #32402

Closed
CHJ85 opened this issue Jul 4, 2023 · 8 comments
Closed

How to properly add custom extractor? #32402

CHJ85 opened this issue Jul 4, 2023 · 8 comments

Comments

@CHJ85
Copy link

CHJ85 commented Jul 4, 2023

Hi there. So I know how to create an extractor python file.
But if I add from .extractor import NameIE to _extractors.py, it'll eventually be overwritten next time an update edits the file.
Is there a better to do this that I don't know of?

@CHJ85 CHJ85 added the question label Jul 4, 2023
@dirkf
Copy link
Contributor

dirkf commented Jul 4, 2023

_extractors.py belongs to yt-dlp. In that case read about yt-dlp plugins, which is probably what you want. With yt-dl, you have to maintain your own version of the program if you want to use a private custom extractor.

@dirkf dirkf closed this as completed Jul 4, 2023
@CHJ85
Copy link
Author

CHJ85 commented Jul 4, 2023

RIght. I mixed up the two cuz I have both installed.
Sorry about that.
So how do I add a custom extractor to youtube-dl?
Do I just add it to the _init.py file and hope it won't be updated anytime soon?

@dirkf
Copy link
Contributor

dirkf commented Jul 4, 2023

See https://github.com/ytdl-org/youtube-dl#user-content-adding-support-for-a-new-site, especially item 5.

@CHJ85
Copy link
Author

CHJ85 commented Jul 4, 2023

Thank you.
But just to clarify, this is the correct way?
Because I did this, but the site is still not a "supported url".
And if this is correct, there must be something wrong with my extractor.

import contextlib
import os

from ..plugins import load_plugins

# NB: Must be before other imports so that plugins can be correctly injected
_PLUGIN_CLASSES = load_plugins('extractor', 'IE')

_LAZY_LOADER = False
if not os.environ.get('YTDLP_NO_LAZY_EXTRACTORS'):
    with contextlib.suppress(ImportError):
        from .lazy_extractors import *  # noqa: F403
        from .lazy_extractors import _ALL_CLASSES
        _LAZY_LOADER = True

if not _LAZY_LOADER:
    from ._extractors import *  # noqa: F403
    _ALL_CLASSES = [  # noqa: F811
        klass
        for name, klass in globals().items()
        if name.endswith('IE') and name != 'GenericIE'
    ]
    _ALL_CLASSES.append(GenericIE)  # noqa: F405

globals().update(_PLUGIN_CLASSES)
_ALL_CLASSES[:0] = _PLUGIN_CLASSES.values()

from .common import _PLUGIN_OVERRIDES  # noqa: F401

# Custom extractors
from .kimcartoon import KimCartoonIE
_ALL_CLASSES.append(KimCartoonIE)

@dirkf
Copy link
Contributor

dirkf commented Jul 4, 2023

In yt-dl, edit extractor/extractors.py. For yt-dlp, ask there (#30839).

@CHJ85
Copy link
Author

CHJ85 commented Jul 4, 2023

Yea, that is my extractors.py file in youtube-dl.
I don't really care for editing yt-dlp. I just confused it earlier.
So is the one I provided the correct implementation?

@dirkf
Copy link
Contributor

dirkf commented Jul 4, 2023

Example (git log --patch de48105dd870e353af468bfb8d49b14d9894e649 youtube_dl/extractor/extractors.py):

commit de48105dd870e353af468bfb8d49b14d9894e649
Author: fonkap <fonk666@gmail.com>
Date:   Sat Feb 11 03:47:43 2023 +0100

    [KommunetvIE] Add extractor for kommunetv.no (#31516)
    
    * Add extractor for kommunetv.no
    * Using utils.update_url instead of regex
    
    ---------
    
    Co-authored-by: dirkf <fieldhouse@gmx.net>

diff --git a/youtube_dl/extractor/extractors.py b/youtube_dl/extractor/extractors.py
index f63a2e030..d8428f46f 100644
--- a/youtube_dl/extractor/extractors.py
+++ b/youtube_dl/extractor/extractors.py
@@ -557,6 +557,7 @@ from .khanacademy import (
 from .kickstarter import KickStarterIE
 from .kinja import KinjaEmbedIE
 from .kinopoisk import KinoPoiskIE
+from .kommunetv import KommunetvIE
 from .konserthusetplay import KonserthusetPlayIE
 from .krasview import KrasViewIE
 from .kth import KTHIE

@CHJ85
Copy link
Author

CHJ85 commented Jul 5, 2023

Thank you. Yea, I figure that part out.
Turns out there's an issue with my extractor.

from __future__ import unicode_literals
from youtube_dl.extractor.common import InfoExtractor


class KimCartoonIE(InfoExtractor):
    _VALID_URL = r'https?://(?:www\.)?kimcartoon\.(?:li|me)/[^/]+/watch/[^/]+'
    _TESTS = [
        {
            'url': 'https://kimcartoon.li/Cartoon/Camp-Lazlo/Season-02-Episode-12-Theres-No-Place-Like-Gnome-Hot-Spring-Fever',
            'info_dict': {
                'id': '21899',
                'ext': 'mp4',
                'description': 'Watch online and download cartoon Camp Lazlo! Season 02 Episode 12 Theres No Place Like Gnome - Hot Spring Fever  in high quality. Various formats from 240p to 720p HD (or even 1080p). HTML5 available for mobile devices',
                'title': 'Season 02 Episode 12 Theres No Place Like Gnome - Hot Spring Fever',
            },
            'params': {
                # Some episodes may be geo-restricted
                'skip_download': True,
            },
        }
    ]

    def _real_extract(self, url):
        webpage = self._download_webpage(url, None, note=False)

        # Extract video information from the webpage
        video_id = self._match_id(url)
        title = self._html_search_regex(r'<title>(.*?)</title>', webpage, 'title')
        description = self._html_search_regex(r'<meta name="description" content="(.*?)"', webpage, 'description')

        return {
            'id': video_id,
            'title': title,
            'url': 'Sample URL',
            'ext': 'mp4',
            'description': description,
        }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants