Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
ecchi.iwara.tv metadata uploader being NA #24237
Comments
|
Dammit I cannot wait anymore so just wrote a tiny python script to get uploader and rename downloaded mp4 files, which turned out the iwara web page can be simply parsed using lxml library and the uploader is so easy to be extractd. I refused to create a pull-request though because such piece of cake should be done without difficulty at all. Here is my self-using script, anyway. #!/usr/bin/env python3
# encoding=utf8
import sys
from urllib.parse import urlparse
from lxml import html
from requests import get
from glob import glob
from os.path import split, splitext, join
from os import rename
class IwaraVideo:
def __init__(self, url: str):
self.urlparse = urlparse(url)
if 'iwara' not in self.urlparse.hostname:
raise ValueError(url)
elif 'video' not in self.urlparse.path:
raise ValueError(url)
self.url = url
self.html = None
self.meta = {
'id': self.urlparse.path.split('/')[-1],
}
def get_page(self):
if not self.html:
r = get(self.url)
self.html = html.document_fromstring(r.text)
return self.html
def get_uploader(self):
video_page = self.get_page()
uploader = video_page.xpath('//div[@class="node-info"]//div[@class="submitted"]//a[@class="username"]')[0].text
self.meta['uploader'] = uploader
return uploader
def find_files_by_id(self, search_in=''):
id_tag = '[{}]'.format(self.meta['id'])
self.meta['id_tag'] = id_tag
mp4_l = glob(search_in + '*.mp4')
r_l = []
for i in mp4_l:
if id_tag in i:
r_l.append(i)
return r_l
def rename_files_from_ytdl_na_to_uploader(self, search_in=''):
na_tag = '[NA]'
path_l = self.find_files_by_id(search_in=search_in)
id_tag = self.meta['id_tag']
uploader = self.get_uploader()
up_tag = '[{}]'.format(uploader)
for p in path_l:
dirname, basename = split(p)
filename, extension = splitext(basename)
if na_tag in filename:
left, right = filename.split(id_tag, maxsplit=1)
right = right.replace(na_tag, up_tag, 1)
new_basename = left + id_tag + right + extension
new_path = join(dirname, new_basename)
rename(p, new_path)
if __name__ == '__main__':
u = sys.argv[1]
video = IwaraVideo(u)
video.rename_files_from_ytdl_na_to_uploader() |
|
monkey patch version: class YoutubeDLIwaraX(youtube_dl.extractor.iwara.IwaraIE, metaclass=ABCMeta):
def _real_extract(self, url):
html = get_html_element_tree(url)
uploader = html.xpath('//div[@class="node-info"]//div[@class="submitted"]//a[@class="username"]')[0].text
data = super(YoutubeDLIwaraX, self)._real_extract(url)
data['uploader'] = uploader
# print('#', 'uploader:', uploader)
return data
def youtube_dl_main_x_iwara(argv=None):
youtube_dl.extractor.IwaraIE = YoutubeDLIwaraX
youtube_dl.main(argv) |
|
Hi mo-han, in case this issue is never fixed, could you explain to a non-python-programmer how to use your code? |
|
@ZYinMD |
|
Thanks! I'll try... By the way, since youtube-dl doesn't support downloading "channels" on iwara, how do you download all videos from one uploader? Do you write your own crawler? I know it's quite easy, but just wondering. |
|
I don't have "channel" extractor (nor ytdl), which is not "quite easy" for me -- it needs to check "private flag" of the videos and do some "next page" actions on the "all videos" result page, etc. I didn't try to do that, and there definitely will be a lot of problems and work to achieve that. As for your demand: batch downloading from iwara.tv or similar -- I do have a solution, not fully automated, but still saving a lot of copy-paste and mouse-click operations. First we need to get the URLs of the selected videos. I don't use Chrome, but Firefox has a feature called "View Selection Source". When anything is selected (or selete everything by ctrl+a), there will be that feature in the right-click context menu, which will bring you to a new tab containing the source code of the page and the parts corresponding to the selection will be auto selected for you. So we can just use mouse to select multiple videos (their thumbnails on the web page) or just select all elements on the page, then choose Secondly, we need to find all the video URLs in the clipboard. While a lot of tools and methods could be used to do this job, I do write my own tool, called mykit.py. It's a CLI program, with a lot of sub-commands, among which is a command called Finally, just use those lines of URLs as argument to launch multiple download processes. We could save those URL lines into a file, using shell script to read them out and run youtube-dl (or the modifed version) with each line. Again, the Not very automatic, but convinient enought, isn't it? Or you could wirte you own "channel" extractor, if it's worth it. |
|
Thanks so much!! I read all the code in those script files you mentioned, and they make perfect sense. As a python-noob and powershell-noob I still have questions about installing, I think I could open issues in your repo. Thanks and see you there! |
|
@ZYinMD |
Checklist
Verbose log
Description
Videos are downloaded successfully, but
%(uploader)sis always replaced byNA(it's part of my wanted filename format).