Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[duboku] add new extractor #26467

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

[duboku] add new extractor #26467

wants to merge 8 commits into from

Conversation

lkho
Copy link
Contributor

@lkho lkho commented Aug 29, 2020

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense

What is the purpose of your pull request?

  • New extractor

Resolves #22125.

Copy link
Contributor

@dirkf dirkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passes tests with all suggested changes applied on top of 530f458.

I'll apply them if you like.

IE_NAME = 'duboku'
IE_DESC = 'www.duboku.co'

_VALID_URL = r'(?:https?://[^/]+\.duboku\.co/vodplay/)(?P<id>[0-9]+-[0-9-]+)\.html.*'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Require n-n-n in id field; no need to match the tail:

Suggested change
_VALID_URL = r'(?:https?://[^/]+\.duboku\.co/vodplay/)(?P<id>[0-9]+-[0-9-]+)\.html.*'
_VALID_URL = r'(?:https?://[^/]+\.duboku\.co/vodplay/)(?P<id>(?:[0-9]+-){2}[0-9]+)\.html'

'url': 'https://www.duboku.co/vodplay/1575-1-1.html',
'info_dict': {
'id': '1575-1-1',
'ext': 'ts',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix test:

Suggested change
'ext': 'ts',
'ext': 'mp4',

'url': 'https://www.duboku.co/vodplay/1588-1-1.html',
'info_dict': {
'id': '1588-1-1',
'ext': 'ts',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix test:

Suggested change
'ext': 'ts',
'ext': 'mp4',

'id': '1588-1-1',
'ext': 'ts',
'series': '亲爱的自己',
'title': 'contains:预告片',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Page has changed:

Suggested change
'title': 'contains:预告片',
'title': '亲爱的自己 第1集',

Comment on lines +92 to +95
temp = video_id.split('-')
series_id = temp[0]
season_id = temp[1]
episode_id = temp[2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simpler:

Suggested change
temp = video_id.split('-')
series_id = temp[0]
season_id = temp[1]
episode_id = temp[2]
series_id, season_id, episode_id = video_id.split('-')

Comment on lines +114 to +122
href = extract_attributes(mobj.group(0)).get('href')
if href:
mobj1 = re.search(r'/(\d+)\.html', href)
if mobj1 and mobj1.group(1) == series_id:
series_title = clean_html(mobj.group(0))
series_title = re.sub(r'[\s\r\n\t]+', ' ', series_title)
title = clean_html(html)
title = re.sub(r'[\s\r\n\t]+', ' ', title)
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • use the resulting match object
  • avoid excessive indentation
  • r'\s' includes any whitespace
  • simplify clean_html() expressions
Suggested change
href = extract_attributes(mobj.group(0)).get('href')
if href:
mobj1 = re.search(r'/(\d+)\.html', href)
if mobj1 and mobj1.group(1) == series_id:
series_title = clean_html(mobj.group(0))
series_title = re.sub(r'[\s\r\n\t]+', ' ', series_title)
title = clean_html(html)
title = re.sub(r'[\s\r\n\t]+', ' ', title)
break
href = extract_attributes(html[mobj.start(0):mobj.start('content')]).get('href')
if not href:
continue
mobj1 = re.search(r'/(?P<s_id>\d+)\.html', href)
if mobj1 and mobj1.group('s_id') == series_id:
series_title = clean_html(re.sub(r'\s+', ' ', mobj.group('content')))
title = clean_html(re.sub(r'\s+', ' ', html))
break

'episode_id': episode_id,
}

formats = self._extract_m3u8_formats(data_url, video_id, 'mp4')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass Referer header to avoid 403:

Suggested change
formats = self._extract_m3u8_formats(data_url, video_id, 'mp4')
headers = {'Referer': 'https://www.duboku.co/static/player/videojs.html'}
formats = self._extract_m3u8_formats(data_url, video_id, 'mp4', headers=headers)

'episode_number': int_or_none(episode_id),
'episode_id': episode_id,
'formats': formats,
'http_headers': {'Referer': 'https://www.duboku.co/static/player/videojs.html'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use headers as introduced above:

Suggested change
'http_headers': {'Referer': 'https://www.duboku.co/static/player/videojs.html'}
'http_headers': headers,

Comment on lines +180 to +182
'url': 'https://www.duboku.co/voddetail/1554.html#playlist2',
'info_dict': {
'id': '1554#playlist2',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#playlist2 has gone: use #playlist1 instead:

Suggested change
'url': 'https://www.duboku.co/voddetail/1554.html#playlist2',
'info_dict': {
'id': '1554#playlist2',
'url': 'https://www.duboku.co/voddetail/1554.html#playlist1',
'info_dict': {
'id': '1554#playlist1',

Comment on lines +189 to +192
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError('Invalid URL: %s' % url)
series_id = mobj.group('id')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify:

Suggested change
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError('Invalid URL: %s' % url)
series_id = mobj.group('id')
series_id = self._match_id(url)

@lkho
Copy link
Contributor Author

lkho commented Jun 16, 2022

@dirkf can you please help me apply the changes, my original repo was being deleted by github..

@dirkf
Copy link
Contributor

dirkf commented Jun 16, 2022

I'll apply them if you like.

Unfortunately the GH website logic says "diff is outdated" if I try to do that, presumably because the source branch is blocked.

Contact GH support to get your repo unblocked (mention #27013). Much the easiest.

Or read https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally.

Or perhaps:

  • delete your blocked repo
  • fork yt-dl again under the same name
  • clone it to your system
  • cd to the youtube-dl directory
  • git checkout -b same_name_as your_original_branch 2020.07.28
  • save patch file https://github.com/ytdl-org/youtube-dl/pull/26467.patch as 26467.patch
  • git am 26467.patch
  • git push your_forked_repo
  • now we hope that GH will treat your new branch as the original PR source.

@pukkandan
Copy link
Contributor

delete your blocked repo

You can't delete blocked repo without contacting support. You can create a new fork and a new PR from it though

Contact GH support to get your repo unblocked (mention #27013). Much the easiest.

When I contacted support a while ago to get my fork (not yt-dlp) restored, they only provided the option to delete it. I had to let them delete it and then re-fork. Luckily, I had local copy of all the branches

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unable to download video from duboku.net
3 participants