Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HQPorner] Add new extractor: video, playlist and search pages #32245

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dirkf
Copy link
Contributor

@dirkf dirkf commented May 28, 2023

Boilerplate: own code, new extractor ## Please follow the guide below
  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

This PR adds an extractor module for hqporner.com, as suggested in yt-dlp/yt-dlp#7116. The module provides three extractors:

  • HQPornerIE for video pages
  • HQPornerListIE for playlist pages based on category or performer
  • HQPornerSearchIE for search pages.

For video pages, the following metadata is extracted:

  • title from video caption or page title
  • age_limit fixed at 18, though pages include RTA tag
  • upload_date from approximate age in caption
  • description from "featuring ..." in caption
  • duration from caption or page description
  • categories from caption or page meta keywords
  • tags from page description
  • thumbnail from HTML5 video.

Playlist entries get title, duration, thumbnail.

@JChris246
Copy link
Contributor

An issue was opened previously for this site and subsequently closed as it was determined that this site hosts copyrighted content. #7201

@dirkf
Copy link
Contributor Author

dirkf commented May 29, 2023

Since content is generally subject to copyright, that's not the deciding factor. Look at other sites supported by yt-dl and you will find copyrighted material that may be hosted with, or (shockingly) without, the copyright holder's permission: not just porn, but user-generated content sites like YouTube.

Also, #7201 applies to the site as it was in 2015, though there's no indication of whether the site has changed.

The project's policy has two parts:

Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites.

In order for site support request to be accepted all provided example URLs should not violate any copyrights.

The linked guidance is this:

As a matter of policy (as well as legality), youtube-dl does not include support for services that specialize in infringing copyright. As a rule of thumb, if you cannot easily find a video that the service is quite obviously allowed to distribute (i.e. that has been uploaded by the creator, the creator's distributor, or is published under a free license), the service is probably unfit for inclusion to youtube-dl.

A note on the service that they don't host the infringing content, but just link to those who do, is evidence that the service should not be included into youtube-dl. The same goes for any DMCA note when the whole front page of the service is filled with videos they are not allowed to distribute. A "fair use" note is equally unconvincing if the service shows copyright-protected videos in full without authorization.

I would say that "the whole front page of the service is filled with videos they are not allowed to distribute" could cover almost every porn site that yt-dl supports. I looked at a large sample of yt-dl[p] porn site extractors and most failed the test: eg, daft.sex.

The porn ecosystem is entirely different from, say, broadcasting and movie streaming sites. If a movie is available on BBC iPlayer, we can take it that the BBC is distributing it with permission. Sites like YouTube and Vimeo do distribute actual user-generated content alongside material for which the submitter may not have the rights. In fact, YT conspires with its users and the music industry to promote ad-supported content submitted by users who do not own the content.

A typical porn site provides no explicit copyright attribution. The user can't be expected to know which video might have been licensed by the copyright owner, even in the presence of a logo or watermark. Does the site have some sort of ad-revenue-sharing deal with content owners? Is the content being supplied legitimately as a promotion? No-one can tell for sure, and the waters are further muddied because the same company may operate both a paid content site and ad-supported "user-generated" sites that include old content from the paid site.

My interpretation of the policy is rather that yt-dl is like a web browser and it cannot be responsible for identifying whether any particular content is legitimate. If a user is concerned about some material, don't download it. The site must allow copyright owners to remove unauthorised content. If the site claims to follow DMCA, that should be enough, as it is for YouTube. Therefore, I don't entirely support the statement about DMCA in the guidance quoted above.

Also, the project requirement "example URLs should not violate any copyrights" seems to be very difficult to interpret in this context. One might think that a URL is a reference that cannot be restricted by copyright: otherwise no-one could write the movie "Top Gun - Maverick" without permission. If the policy is saying that no example URL should reference content that is explicitly being distributed without permission ("hey l33t guys, grab this warez movie I hacked"), that's clear enough. But anything more than that fails in the UGC or porn site context.

Regarding HQPorner, there are particular issues:

  1. it claims to host user-submitted content, but there's no identification of the uploader (the site says it registers users manually)
  2. its terms require that users have permission to post videos, but all videos that I checked appeared to be unedited commercial productions
  3. its terms acknowledge DMCA and there is a dedicated email address for DMCA take-down requests, but DMCA is linked in the terms and contact pages, rather than on the main video and list pages.

Because of point 1 above, as well as the ecosystem context, it's difficult to tell whether any video is posted with valid permission. Maybe (point 2) the material is actually being provided by the owners as a sort of ad-supported front? The example URL in yt-dlp/yt-dlp#7147 showed a gigantic banner ad for the apparent owner of the video.

In this PR I selected a more anonymous video for the main test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants