Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ecchi.iwara.tv metadata uploader being NA #24237

Open
mo-han opened this issue Mar 4, 2020 · 8 comments
Open

ecchi.iwara.tv metadata uploader being NA #24237

mo-han opened this issue Mar 4, 2020 · 8 comments

Comments

@mo-han
Copy link

@mo-han mo-han commented Mar 4, 2020

Checklist

  • [] I'm reporting a broken site support issue
  • I've verified that I'm running youtube-dl version 2020.03.01
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar bug reports including closed ones
  • I've read bugs section in FAQ

Verbose log

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-f', '(mp4)[height<=1080][fps<=60]+(m4a/aac)/bestvideo+bestaudio/best', '--proxy=http://127.0.0.1:7777', '--external-downloader', 'aria2c', '--external-downloader-args', '-x5 -s5 -k 1M --file-allocation=trunc', '-o', '%(title)s [%(id)s][%(uploader)s].%(ext)s', '--yes-playlist', 'https://ecchi.iwara.tv/videos/z0y6puxqzqfkm7vre', '--no-check-certificate', '-v']
[debug] Encodings: locale cp936, fs utf-8, out utf-8, pref cp936
[debug] youtube-dl version 2020.03.01
[debug] Python version 3.6.6 (CPython) - Windows-10-10.0.10240-SP0
[debug] exe versions: ffmpeg 4.0.2, ffprobe 4.0.2, phantomjs 2.1.1
[debug] Proxy map: {'http': 'http://127.0.0.1:7777', 'https': 'http://127.0.0.1:7777'}
[Iwara] z0y6puxqzqfkm7vre: Downloading webpage
[Iwara] z0y6puxqzqfkm7vre: Downloading JSON metadata
[debug] Invoking downloader on 'https://ling.iwara.tv/file.php?expire=1583332380&hash=14947e72f69e3f85d6cfdd589ebde21cdc20ee0d&file=2019%2F03%2F12%2F1552384452_z0y6puXQZQFkm7vRE_Source.mp4&op=dl&r=0'
[download] 疑心暗鬼 [z0y6puxqzqfkm7vre][NA].mp4 has already been downloaded
[download] 100% of 105.55MiB

Description

Videos are downloaded successfully, but %(uploader)s is always replaced by NA (it's part of my wanted filename format).

@mo-han
Copy link
Author

@mo-han mo-han commented Mar 23, 2020

Dammit I cannot wait anymore so just wrote a tiny python script to get uploader and rename downloaded mp4 files, which turned out the iwara web page can be simply parsed using lxml library and the uploader is so easy to be extractd. I refused to create a pull-request though because such piece of cake should be done without difficulty at all.

Here is my self-using script, anyway.

#!/usr/bin/env python3
# encoding=utf8
import sys
from urllib.parse import urlparse
from lxml import html
from requests import get
from glob import glob
from os.path import split, splitext, join
from os import rename


class IwaraVideo:
    def __init__(self, url: str):
        self.urlparse = urlparse(url)
        if 'iwara' not in self.urlparse.hostname:
            raise ValueError(url)
        elif 'video' not in self.urlparse.path:
            raise ValueError(url)
        self.url = url
        self.html = None
        self.meta = {
            'id': self.urlparse.path.split('/')[-1],
        }

    def get_page(self):
        if not self.html:
            r = get(self.url)
            self.html = html.document_fromstring(r.text)
        return self.html

    def get_uploader(self):
        video_page = self.get_page()
        uploader = video_page.xpath('//div[@class="node-info"]//div[@class="submitted"]//a[@class="username"]')[0].text
        self.meta['uploader'] = uploader
        return uploader

    def find_files_by_id(self, search_in=''):
        id_tag = '[{}]'.format(self.meta['id'])
        self.meta['id_tag'] = id_tag
        mp4_l = glob(search_in + '*.mp4')
        r_l = []
        for i in mp4_l:
            if id_tag in i:
                r_l.append(i)
        return r_l

    def rename_files_from_ytdl_na_to_uploader(self, search_in=''):
        na_tag = '[NA]'
        path_l = self.find_files_by_id(search_in=search_in)
        id_tag = self.meta['id_tag']
        uploader = self.get_uploader()
        up_tag = '[{}]'.format(uploader)
        for p in path_l:
            dirname, basename = split(p)
            filename, extension = splitext(basename)
            if na_tag in filename:
                left, right = filename.split(id_tag, maxsplit=1)
                right = right.replace(na_tag, up_tag, 1)
                new_basename = left + id_tag + right + extension
                new_path = join(dirname, new_basename)
                rename(p, new_path)


if __name__ == '__main__':
    u = sys.argv[1]
    video = IwaraVideo(u)
    video.rename_files_from_ytdl_na_to_uploader()
@mo-han
Copy link
Author

@mo-han mo-han commented Jul 13, 2020

monkey patch version:

class YoutubeDLIwaraX(youtube_dl.extractor.iwara.IwaraIE, metaclass=ABCMeta):
    def _real_extract(self, url):
        html = get_html_element_tree(url)
        uploader = html.xpath('//div[@class="node-info"]//div[@class="submitted"]//a[@class="username"]')[0].text
        data = super(YoutubeDLIwaraX, self)._real_extract(url)
        data['uploader'] = uploader
        # print('#', 'uploader:', uploader)
        return data


def youtube_dl_main_x_iwara(argv=None):
    youtube_dl.extractor.IwaraIE = YoutubeDLIwaraX
    youtube_dl.main(argv)
@ZYinMD
Copy link

@ZYinMD ZYinMD commented Jul 13, 2020

Hi mo-han, in case this issue is never fixed, could you explain to a non-python-programmer how to use your code?

@mo-han
Copy link
Author

@mo-han mo-han commented Jul 14, 2020

@ZYinMD
Refer to my self-using module as an example, which is really simple, with the youtube_dl_main_x_iwara as a modified main() function of the original youtube-dl. Just call this function and everything works the same as the original, except iwara extractor has uploader data now.

@ZYinMD
Copy link

@ZYinMD ZYinMD commented Jul 15, 2020

Thanks! I'll try... By the way, since youtube-dl doesn't support downloading "channels" on iwara, how do you download all videos from one uploader? Do you write your own crawler? I know it's quite easy, but just wondering.

@mo-han
Copy link
Author

@mo-han mo-han commented Jul 15, 2020

I don't have "channel" extractor (nor ytdl), which is not "quite easy" for me -- it needs to check "private flag" of the videos and do some "next page" actions on the "all videos" result page, etc. I didn't try to do that, and there definitely will be a lot of problems and work to achieve that.

As for your demand: batch downloading from iwara.tv or similar -- I do have a solution, not fully automated, but still saving a lot of copy-paste and mouse-click operations.

First we need to get the URLs of the selected videos. I don't use Chrome, but Firefox has a feature called "View Selection Source". When anything is selected (or selete everything by ctrl+a), there will be that feature in the right-click context menu, which will bring you to a new tab containing the source code of the page and the parts corresponding to the selection will be auto selected for you. So we can just use mouse to select multiple videos (their thumbnails on the web page) or just select all elements on the page, then choose View Selection Source in the right-click-menu, then copy (ctrl+c) the selected source code into clipboard, and move on.

Secondly, we need to find all the video URLs in the clipboard. While a lot of tools and methods could be used to do this job, I do write my own tool, called mykit.py. It's a CLI program, with a lot of sub-commands, among which is a command called clipboard.findurl or cb.url or cburl, same command, just several aliases. This cburl sub-command will extract text strings from a file or the clipboard, by a given pattern. The pattren is a regex, but we don't need to write our own because iwara's video URL pattern is already one of the presets. So, a simple command mykit cburl iwara will find all of the video URLs from the source code in the clipboard, and print them out line by line, meanwhile, the results are also copied back to clipboard.

Finally, just use those lines of URLs as argument to launch multiple download processes. We could save those URL lines into a file, using shell script to read them out and run youtube-dl (or the modifed version) with each line. Again, the mykit.py could give a hand, with a sub-command called run.from.lines or runlines or rl, which reads lines from a file or from the clipboard, and run a command format template with each line. What I do is typing a single command mykit.py rl ytdl {}, and it will read those URL lines in clipboard and run a command as ytdl {url} for each.

Not very automatic, but convinient enought, isn't it?

Or you could wirte you own "channel" extractor, if it's worth it.

@ZYinMD
Copy link

@ZYinMD ZYinMD commented Jul 15, 2020

Thanks so much!! I read all the code in those script files you mentioned, and they make perfect sense. As a python-noob and powershell-noob I still have questions about installing, I think I could open issues in your repo. Thanks and see you there!

@mo-han
Copy link
Author

@mo-han mo-han commented Jul 16, 2020

@ZYinMD
That's fine, me a half python noob and a total ps noob as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.