Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request update thisav.com #32595

Open
5 tasks done
WalterShomer opened this issue Oct 12, 2023 · 9 comments
Open
5 tasks done

Request update thisav.com #32595

WalterShomer opened this issue Oct 12, 2023 · 9 comments
Labels
broken-IE problem with existing site extraction

Comments

@WalterShomer
Copy link

WalterShomer commented Oct 12, 2023

The link ( nsfw )
https://thisav.com/ja/juq-380

  • I'm reporting a broken site support
  • I've verified that I'm running youtube-dl version 2021.12.17
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar issues including closed ones

-Version log ( not sure why it wont update )

C:\Users\homebase>youtube-dl -U
ERROR: can't find the current version. Please try again later.

C:\Users\homebase>youtube-dl --version
2021.12.17

Verbose log

C:\Users\homebase>youtube-dl https://thisav.com/ja/juq-380 -o "D:\2 - Programs\youtube-dl" --verbose
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['https://thisav.com/ja/juq-380', '-o', 'D:\\2 - Programs\\youtube-dl', '--verbose']
[debug] Encodings: locale cp932, fs mbcs, out cp932, pref cp932
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.4.4 (CPython) - Windows-10-10.0.19041
[debug] exe versions: ffmpeg 2023-04-26-git-e3143703e9-full_build-www.gyan.dev, ffprobe 2023-04-26-git-e3143703e9-full_build-www.gyan.dev
[debug] Proxy map: {}
[generic] juq-380: Requesting header
WARNING: Could not send HEAD request to https://thisav.com/ja/juq-380: HTTP Error 403: Forbidden
[generic] juq-380: Downloading webpage
ERROR: Unable to download webpage: HTTP Error 403: Forbidden (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpupik7c6w\build\youtube_dl\extractor\common.py", line 634, in _request_webpage
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpupik7c6w\build\youtube_dl\YoutubeDL.py", line 2288, in urlopen
  File "C:\Python\Python34\lib\urllib\request.py", line 470, in open
  File "C:\Python\Python34\lib\urllib\request.py", line 580, in http_response
  File "C:\Python\Python34\lib\urllib\request.py", line 508, in error
  File "C:\Python\Python34\lib\urllib\request.py", line 442, in _call_chain
  File "C:\Python\Python34\lib\urllib\request.py", line 588, in http_error_default

Description

Downloader doesn't work, i've seen that it supports thisjav.com, currently can't find the file but i remeber seeing that it was written thisjav.com/video/name but now the site points to thisjav.com/name

also, running the youtube-dl.exe on path

@dirkf
Copy link
Contributor

dirkf commented Oct 13, 2023

Various issues are preventing the archival of important Asian babe content.

  1. The expected URL pattern has /videos/... while the site now uses /{lang}/... with an optional 2-3 character language component. However old-style URLs may still be valid.
  2. Once the extractor sees the page, it can't find any video link using the existing tactics. The link is obfuscated in an eval(function (p,a,c,k,e,d){..}) JS block. extractor/xfileshare.py knows how to decode this, and the decoded URL in a test page could be retrieved when using the page URL as Referer header.
  3. The DASH MPD manifest retrieved seems to be degenerate, in that no segment URLs are provided. Further investigation, ideally by someone who is a DASH expert, unlike me, is needed.

@dirkf
Copy link
Contributor

dirkf commented Oct 14, 2023

Well, I've tweaked the DASH format extraction so that the old test video 2 in the extractor works, though I might well have broken some or all other DASH extraction in doing so. With 1 and 2 above as well:

$ python -m youtube_dl -v -F 'https://thisav.com/ja/juq-380'
[debug] System config: [u'--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-F', u'https://thisav.com/ja/juq-380']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 66ab0814c
[debug] Python 2.7.18 (CPython i686 32bit) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial - OpenSSL 1.1.1w  11 Sep 2023 - glibc 2.15
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[ThisAV] juq-380: Downloading webpage
[ThisAV] juq-380: Extracting from obfuscated HTML5
[ThisAV] juq-380: Downloading m3u8 information
WARNING: unable to extract uploader name; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
[info] Available formats for juq-380:
format code  extension  resolution note
800          mp4        640x360     800k 
1400         mp4        842x480    1400k 
2800         mp4        1280x720   2800k  (best)
$ 

I was able to view the video using format 800. This video uses HLS instead of DASH and doesn't need ffmpeg. The new page format (even for old URLs) doesn't seem to include any uploader info.

@WalterShomer
Copy link
Author

Various issues are preventing the archival of important Asian babe content.

1. The expected URL pattern has `/videos/...` while the site now uses `/{lang}/...` with an optional 2-3 character language component. However old-style URLs may still be valid.

2. Once the extractor sees the page, it can't find any video link using the existing tactics. The link is obfuscated in an `eval(function (p,a,c,k,e,d){..})` JS block. `extractor/xfileshare.py` knows how to decode this, and the decoded URL in a test page could be retrieved when using the page URL as `Referer` header.

3. The DASH MPD manifest retrieved seems to be degenerate, in that no segment URLs are provided. Further investigation, ideally by someone who is a DASH expert, unlike me, is needed.

Hi, thanks for the quick replay and the attention.

could you please eleborate on what have you changed in point 1? i've tried some small editing in thisav.py but doesn't seem to got anywhere.

also for point 2 would referer something like this be enough? python -m youtube_dl -v -F --add-header referer 'http://xvideosharing.com/fq65f94nd2ve' 'https://thisav.com/ja/juq-380'
or --referer 'http://xvideosharing.com/fq65f94nd2ve

@dirkf
Copy link
Contributor

dirkf commented Oct 14, 2023

Really the changes are too extensive to publish as a patch. A PR will be needed.

@longsack
Copy link

I'm following this but not familiar with much, just learning. What is a PR (public release?)

@dirkf
Copy link
Contributor

dirkf commented Oct 15, 2023

@longsack
Copy link

Thanks @dirkf I will keep my eye on this issue, really interested in this site.

@dirkf
Copy link
Contributor

dirkf commented Jan 24, 2024

... I might well have broken some or all other DASH extraction in doing so. ...

The specification for the resolution of BaseURL is in ISO/IEC 23009-1 section 5.6 which in turn references RFC 3986 (the worm has certainly turned there).

@dirkf
Copy link
Contributor

dirkf commented Jan 28, 2024

My modified extractor code still succeeds as above (ie, no uploader is found, but otherwise OK), with PR #32710.

@dirkf dirkf mentioned this issue Jan 28, 2024
6 tasks
@dirkf dirkf added the broken-IE problem with existing site extraction label Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broken-IE problem with existing site extraction
Projects
None yet
Development

No branches or pull requests

4 participants
@dirkf @longsack @WalterShomer and others