Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hanime] Add new extractor #24328

Closed
wants to merge 10 commits into from
Closed

[hanime] Add new extractor #24328

wants to merge 10 commits into from

Conversation

BrutuZ
Copy link

@BrutuZ BrutuZ commented Mar 12, 2020

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

Extractor for hanime.tv (NSFW) using as many fields as they provide

@dstftw
Copy link
Collaborator

dstftw commented Mar 12, 2020

Read coding conventions.

Calculate TBR from Filesize and Duration, if provided
Use parsing and conversion functions
on int_or_none and float_or_none
@BrutuZ
Copy link
Author

BrutuZ commented Mar 12, 2020

Did that cover all convention recommendations or I still missed anything?

video_slug = self._match_id(url)

webpage = self._download_webpage(url, video_slug)
page_json = self._html_search_regex(r'window.__NUXT__=(.+?);<\/script>', webpage, 'Inline JSON')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Extract dict if you expect dict.
  2. Relax regex.
  3. Escape dots.


webpage = self._download_webpage(url, video_slug)
page_json = self._html_search_regex(r'window.__NUXT__=(.+?);<\/script>', webpage, 'Inline JSON')
page_json = self._parse_json(page_json, video_slug).get('state').get('data').get('video').get('hentai_video')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read coding conventions on mandatory data.

'API Call', headers={'X-Directive': 'api'}).get('videos_manifest').get('servers')[0].get('streams')

title = page_json.get('name') or api_json.get[0].get('video_stream_group_id')
tags = [t.get('text') for t in page_json.get('hentai_tags')]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaks.


formats = []
for f in api_json:
item_url = url_or_none(f.get('url')) or url_or_none('https://hanime.tv/api/v1/m3u8s/%s.m3u8' % f.get('id'))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaks.

youtube_dl/extractor/hanime.py Outdated Show resolved Hide resolved
youtube_dl/extractor/hanime.py Outdated Show resolved Hide resolved
youtube_dl/extractor/hanime.py Outdated Show resolved Hide resolved
youtube_dl/extractor/hanime.py Outdated Show resolved Hide resolved
youtube_dl/utils.py Outdated Show resolved Hide resolved
youtube_dl/utils.py Outdated Show resolved Hide resolved
Copy link
Author

@BrutuZ BrutuZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think everything was addressed. Hopefully got the meaning of the single-words right xD

@BrutuZ BrutuZ requested a review from dstftw March 13, 2020 19:48
Iterate over server list instead of always using first index
Add a couple fallbacks
'https://members.hanime.tv/api/v3/videos_manifests/%s' % video_slug,
video_slug,
'API Call', headers={'X-Directive': 'api'}), lambda x: x['videos_manifest']['servers'], list) or []
title = page_json.get('name')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mandatory.

video_slug,
'API Call', headers={'X-Directive': 'api'}), lambda x: x['videos_manifest']['servers'], list) or []
title = page_json.get('name')
duration = parse_duration('%sms' % page_json.get('duration_in_ms'))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again: float_or_none, not parse_duration.

title = page_json.get('name')
duration = parse_duration('%sms' % page_json.get('duration_in_ms'))
tags = []
for tag in page_json.get('hentai_tags'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaks.


def _real_extract(self, url):
video_slug = self._match_id(url)
page_json = self._html_search_regex(r'<script>.+__NUXT__=(.+?);<\/script>', self._download_webpage(url, video_slug), 'Inline JSON')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing changed.

for stream in server['streams']:
if stream.get('compatibility') != 'all':
continue
item_url = sanitize_url(stream.get('url')) or sanitize_url('https://hanime.tv/api/v1/m3u8s/%s.m3u8' % stream.get('id'))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing changed.

format = {
'width': width,
'height': height,
'filesize_approx': float_or_none(parse_filesize('%sMb' % stream.get('filesize_mbs'))),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Comment on lines 93 to 94
{'preference': 0, 'id': 'Poster', 'url': page_json.get('poster_url')},
{'preference': 1, 'id': 'Cover', 'url': page_json.get('cover_url')},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing changed.

@BrutuZ
Copy link
Author

BrutuZ commented Mar 14, 2020

Since I'm not an actual programmer, could I kindly ask to get more than a couple words on the requested changes? Laconic answers can (and some have) become a time-consuming guessing game 😕

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants