Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vxxx] add new extractors for vxxx and "friend" sites #31288

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

tabjy
Copy link

@tabjy tabjy commented Oct 14, 2022

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

This pull request adds extractor for vxxx.com (NSFW!) and its "friend" sites, presumably using the same technology stack, therefore, can be extracted in a similar way. These sites are:

All sites below are NSFW!

Since there is no existing issue asking for supporting the above-mentioned site, I'm attaching a site support request info here:

Checklist

  • I'm reporting a new site support request
  • I've verified that I'm running youtube-dl version 2021.12.17
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that none of provided URLs violate any copyrights
  • I've searched the bugtracker for similar site support requests including closed ones

Example URLs

All links below are NSFW!

None of these sites supports playlists.

@dirkf dirkf added nsfw site-support-request Add extractor(s) for a new domain labels Oct 15, 2022
Copy link
Contributor

@dirkf dirkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work!

I've checked that none of provided URLs violate any copyrights

Really?

Generally we assume that even an apparently fly-by-night site like these has permission to serve the media if it appears to operate a DMCA policy. A non-JS example page (that yt-dl would see if it downloaded the target URL) doesn't show evidence of this but perhaps the JS-enabled pages do?

I've made some suggestions. The main thing is that this should all go in one module and avoid duplicated code in derived extractor classes. Otherwise it's all pretty good.

youtube_dl/extractor/bdsmxtube.py Outdated Show resolved Hide resolved
youtube_dl/extractor/bdsmxtube.py Outdated Show resolved Hide resolved
youtube_dl/extractor/bdsmxtube.py Outdated Show resolved Hide resolved
youtube_dl/extractor/xmilf.py Outdated Show resolved Hide resolved
youtube_dl/extractor/bdsmxtube.py Outdated Show resolved Hide resolved
youtube_dl/extractor/vxxx.py Outdated Show resolved Hide resolved
youtube_dl/extractor/vxxx.py Outdated Show resolved Hide resolved
youtube_dl/extractor/xmilf.py Outdated Show resolved Hide resolved
youtube_dl/extractor/blackporntube.py Outdated Show resolved Hide resolved
youtube_dl/extractor/vxxx.py Outdated Show resolved Hide resolved
@tabjy
Copy link
Author

tabjy commented Oct 29, 2022

@dirkf

Sorry I've been quite busy over the last few weeks. I've applied your suggested changes and rebased onto the latest master.

Generally we assume that even an apparently fly-by-night site like these has permission to serve the media if it appears to operate a DMCA policy.

Yah they're quite sketchy. They do have DMCA policy pages, but some are just straight-out blank...

(Again, all sites below are NSFW.)

Maybe we could remove supports for those with empty DMCA?

@dirkf
Copy link
Contributor

dirkf commented Oct 30, 2022

Please exclude any sites that don't have a working DMCA page (minimal requirement: valid email address or working contact page).

If you want to make it easier to revert any excluded sites, omit them from extractor/extractors.py and either set the class var _WORKING to False with an appropriate comment or just wrap a block of excluded sites in """...""", so that yt-dl can't see the sites.

@tabjy
Copy link
Author

tabjy commented Nov 2, 2022

@dirkf

Please exclude any sites that don't have a working DMCA page

Done. Thank you very much!

@tabjy tabjy requested a review from dirkf November 11, 2022 21:05
Copy link
Contributor

@dirkf dirkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes needed based on the CI tests.

unified_timestamp,
url_or_none,
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str.maketrans() and str.translate() need to be shimmed for Python 2.

If this works the definitions can eventually be moved into compat.py.

Suggested change
try:
compat_str_maketrans, compat_str_translate = (
compat_str.maketrans,
lambda s, table: s.translate(table)
)
except AttributeError:
# Python 2
def compat_str_maketrans(x, *args):
if not args:
return x
y, z = args[0], args[1] if len(args) > 1 else ''
if len(x) != len(y):
raise ValueError(
'the first two maketrans arguments must have equal length')
tbl = dict(zip(x, y))
tbl.update((k, None) for k in z)
return tbl
def compat_str_translate(s, table):
def xlate(c):
try:
return table[c] or ''
except LookupError:
return c
return ''.join(xlate(c) for c in s)

def get_trans_tbl(from_, to, tbl={}):
k = (from_, to)
if not tbl.get(k):
tbl[k] = str.maketrans(from_, to)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tbl[k] = str.maketrans(from_, to)
tbl[k] = compat_str_maketrans(from_, to)

trans_tbl = get_trans_tbl(
'\u0410\u0412\u0421\u0415\u041c.,~',
'ABCEM+/=')
return base64.b64decode(e.translate(trans_tbl)).decode()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return base64.b64decode(e.translate(trans_tbl)).decode()
return base64.b64decode(compat_str_translate(e, trans_tbl)).decode()

self._BASE_URL,
self._decode_base164(format_object[0]['video_url'])
),
video_id, 'mp4')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try this.

Suggested change
video_id, 'mp4')
video_id, 'mp4', entry_protocol='m3u8_native')

Otherwise the download tests will have to be tweaked to skip the actual download.

'categories': ['Asian', 'Brunette', 'Casting', 'HD', 'Japanese',
'JAV Uncensored'],
'age_limit': 18,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If m3u8_native doesn't work, put this in (here and in the other tests). The skip_download line can be commented out for local testing.

Suggested change
},
},
'params': {
# ffmpeg download
'skip_download': True,
},

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nsfw site-support-request Add extractor(s) for a new domain
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants