-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[91porn] fix title & comment extraction #5932
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give me an example for m3u8?
@pukkandan as for that m3u8 example: I don't have one. The latest version of the PR uses |
yt_dlp/extractor/porn91.py
Outdated
'info_dict': { | ||
'id': '726186267387ffe1e5e6', | ||
'title': '见过卖老婆的,那你见过卖亲闺女的吗?', | ||
'description': '疫情当下,如何约炮?\n--19kn.cc--\n拥有全国线下学生、少妇、反差婊、兼职良家。\n并且免费!!!\n只需要一个电话,一个定位,就能送炮上门。可提前查看照片\n(妹子自带48小时核酸报告)\n约炮,我们是认真的!\n并且拥有三大优势!\n\n1、各种求包养母狗,学生妹资源。为你解决各种需要。--19kn.cc--\n\n2、所有女性会员经过实名视频验证,平台严选,杜绝各种骗红包,口嗨者。--19kn.cc--\n\n3、5年大平台,91许多约炮案例,包括知名博主女伴,均是我们撮合成功的,保障会员隐私,并且约炮3次可自行联系平台进行信息发布。--19kn.cc--\n\n平台5周年庆活动,特回馈91狼友\n\n1、所有女性会员,如果参假,举报客服,核实成功奖励10000人民币。\n\n2、约炮成功并且反馈客服,赠送91vip自拍达人号\n\n3、情侣入驻,可享受专属奖励(奖金5000元)\n\n年关将近,平台大放血,只为各位狼友能找到固定性伴侣,度过美好新年!\n\n约炮渠道请登录--19kn.cc--\n\nPS:招网络客服,对接客户,安排妹子(要求耐心,熟悉客服流程优先,有电脑优先)工作时间:12小时制,\n\n招男模,女模(要求形象气质佳,需提供体检报告)\n有意可以联系官方招聘邮箱[email\xa0protected]', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use md5
yt_dlp/extractor/porn91.py
Outdated
'info_dict': { | ||
'id': '7e42283b4f5ab36da134', | ||
'title': '18岁大一漂亮学妹,水嫩性感,再爽一次!', | ||
'description': '想看我拍新的系列都请帮我加精跟5星好评哦!希望大家鼎力支持,谢过了。我再重申,这次是朋友介绍安排的漂亮学生,费用不低,不过胜在年轻听话,水嫩性感,很超值的女生(6分05有91验证)。PS:本人强壮耐久,事业型男,愿意结交江浙沪的漂亮学妹,加Q:2889560495,语音验证性别,欢迎女生约我,或者靠谱男来一起泡美眉。', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
yt_dlp/extractor/porn91.py
Outdated
@@ -29,32 +51,42 @@ def _real_extract(self, url): | |||
webpage = self._download_webpage( | |||
'http://91porn.com/view_video.php?viewkey=%s' % video_id, video_id) | |||
|
|||
if '作为游客,你每天只可观看10个视频' in webpage: | |||
raise ExtractorError('91 Porn says: Daily limit 10 videos exceeded', expected=True) | |||
if '作为游客,你每天只可观看15个视频' in webpage: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use regex to extract out the number?
yt_dlp/extractor/porn91.py
Outdated
r'<textarea[^>]+id=["\']fm-video_link[^>]+>([^<]+)</textarea>', | ||
webpage, 'video link') | ||
videopage = self._download_webpage(video_link_url, video_id) | ||
r'document\.write\(\s*strencode2\s*\(\s*((?:"[^"]+")|(?:\'[^\']+\'))\s*\)\s*\)', webpage, 'video link') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r'document\.write\(\s*strencode2\s*\(\s*((?:"[^"]+")|(?:\'[^\']+\'))\s*\)\s*\)', webpage, 'video link') | |
r'document\.write\(\s*strencode2\s*\(\s*((?:"[^"]+")|(?:\'[^\']+\'))', webpage, 'video link') |
yt_dlp/extractor/porn91.py
Outdated
'title': title, | ||
'upload_date': upload_date, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'upload_date': upload_date, | |
'upload_date': unified_strdate(self._search_regex( | |
r'<span\s+class=["\']title-yakov["\']>(\d{4}-\d{2}-\d{2})</span>', | |
webpage, 'upload_date', fatal=False)), |
etc
yt_dlp/extractor/porn91.py
Outdated
|
||
duration = parse_duration(self._search_regex( | ||
r'时长:\s*</span>\s*(\d+:\d+)', webpage, 'duration', fatal=False)) | ||
r'时长:\s*<span[^>]*>\s*(\d+:\d+:\d+)\s*</span>', webpage, 'duration', fatal=False)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r'时长:\s*<span[^>]*>\s*(\d+:\d+:\d+)\s*</span>', webpage, 'duration', fatal=False)) | |
r'时长:\s*<span[^>]*>\s*(\d+(?::\d+){1,2})', webpage, 'duration', fatal=False)) |
{1,2}
to support old format too- Is
</span>
needed?
yt_dlp/extractor/porn91.py
Outdated
upload_date = unified_strdate(upload_date) | ||
|
||
description = self._html_search_regex( | ||
r'<span\s+class=["\']more title["\']>\s*(.*(?!</span>))\s*</span>', webpage, 'description', fatal=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex is wrong. You have to group the .
with (?!)
and *
the whole thing to do what you want. But it's better to just do:
r'<span\s+class=["\']more title["\']>\s*(.*(?!</span>))\s*</span>', webpage, 'description', fatal=False) | |
r'<span\s+class=["\']more title["\']>\s*([^<]+', webpage, 'description', fatal=False) |
yt_dlp/extractor/porn91.py
Outdated
'id': video_id, | ||
'url': video_link_url, | ||
'ext': determine_ext(video_link_url), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'ext': determine_ext(video_link_url), |
Unnecessary
yt_dlp/extractor/porn91.py
Outdated
'id': '726186267387ffe1e5e6', | ||
'title': '见过卖老婆的,那你见过卖亲闺女的吗?', | ||
'description': '疫情当下,如何约炮?\n--19kn.cc--\n拥有全国线下学生、少妇、反差婊、兼职良家。\n并且免费!!!\n只需要一个电话,一个定位,就能送炮上门。可提前查看照片\n(妹子自带48小时核酸报告)\n约炮,我们是认真的!\n并且拥有三大优势!\n\n1、各种求包养母狗,学生妹资源。为你解决各种需要。--19kn.cc--\n\n2、所有女性会员经过实名视频验证,平台严选,杜绝各种骗红包,口嗨者。--19kn.cc--\n\n3、5年大平台,91许多约炮案例,包括知名博主女伴,均是我们撮合成功的,保障会员隐私,并且约炮3次可自行联系平台进行信息发布。--19kn.cc--\n\n平台5周年庆活动,特回馈91狼友\n\n1、所有女性会员,如果参假,举报客服,核实成功奖励10000人民币。\n\n2、约炮成功并且反馈客服,赠送91vip自拍达人号\n\n3、情侣入驻,可享受专属奖励(奖金5000元)\n\n年关将近,平台大放血,只为各位狼友能找到固定性伴侣,度过美好新年!\n\n约炮渠道请登录--19kn.cc--\n\nPS:招网络客服,对接客户,安排妹子(要求耐心,熟悉客服流程优先,有电脑优先)工作时间:12小时制,\n\n招男模,女模(要求形象气质佳,需提供体检报告)\n有意可以联系官方招聘邮箱[email\xa0protected]', | ||
'ext': 'm3u8', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong. Either test def is wrong, or you need to use _extract_m3u8_formats_and_subtitles
in code
yt_dlp/extractor/porn91.py
Outdated
ExtractorError, | ||
) | ||
|
||
|
||
class Porn91IE(InfoExtractor): | ||
IE_NAME = '91porn' | ||
_VALID_URL = r'(?:https?://)(?:www\.|)91porn\.com/.+?\?viewkey=(?P<id>[\w\d]+)' | ||
_VALID_URL = r'(?:https?://)(?:www\.|)91porn\.com/.*([\?&])viewkey=(?P<id>[\w\d]+)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_VALID_URL = r'(?:https?://)(?:www\.|)91porn\.com/.*([\?&])viewkey=(?P<id>[\w\d]+)' | |
_VALID_URL = r'(?:https?://)(?:www\.|)91porn\.com/view_video.php\?([^#]+&)?viewkey=(?P<id>\w+)' |
Explain 6a7a551. You shouldn't just pass master m3u8 without processing |
If it's an m3u8, use
|
if the hard-coded |
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
Authored by: pmitchell86 Fixes yt-dlp#3256
IMPORTANT: PRs without the template will be CLOSED
Description of your pull request and other information.
Fix extraction of Title and Comments fields for the
Porn91
info extractor.I noticed that ytdl-org/youtube-dl#29876 attempts to do the same, but it's not working and appears abandoned.
Fixes #3256
Template
Before submitting a pull request make sure you have:
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?