-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ie/sheeta] Support websites based on sheeta #9978
base: master
Are you sure you want to change the base?
Conversation
A target: OOP. Make |
if not mail_tel: | ||
return | ||
|
||
cache_key = hashlib.sha1(f'{self._DOMAIN}:{mail_tel}:{password}'.encode()).hexdigest() |
Check failure
Code scanning / CodeQL
Use of a broken or weak cryptographic hashing algorithm on sensitive data High
Sensitive data (password)
Sensitive data (id)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we caching this to disk? From what I can tell, the data we are caching cache_key
doesn't depend on any responses from the site, we are just caching username and password to disk? And it will only be loaded if the user is already passing their username and password?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'm caching tokens for successfully logged-in users.
Almost all sheeta-based websites (except https://nicochannel.jp/) don't support cookie-based authorization, so login credentials are required. If a user runs yt-dlp multiple times without cached tokens, time will be wasted submitting login information.
This is emulating a web browser. The JS code stores tokens in Local Storage, so users don't have to type password every time they enter sheeta sites.
To keep each user's token distinct from anyone else's on each website, I'm using a hash algorithm to make the cache key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache_key
just seems unnecessary to me, and there is definitely no reason to be hashing the password. Using self._NETRC_MACHINE
as the section
should do the work of distinguishing between different sites, and the key
can just be the username
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or if for some reason you do need _DOMAIN
instead of simply self._NETRC_MACHINE
, the password can still be omitted from the key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this IE supports multiple sites, but self._NETRC_MACHINE
is a constant string. If a user uses the same username (e-mail address) on different sites, the cache will be overwritten. So, I think it's not OK to use self._NETRC_MACHINE
as the section
.
About hashing password: Now I think that's totally unnecessary. If a cached token is invalidated for some reason, our code will drop it and try getting a new one. We shouldn't care if the password is changed or not.
I still need a cache_key
since I need update cached tokens outside of the login procedure.
So:
cache_key = base64.b64decode(f'{self._DOMAIN}:{mail_tel}'.encode()).decode()
Your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this IE supports multiple sites, but self._NETRC_MACHINE is a constant string.
We should not be subclassing these other sites' extractors from a concrete IE like SheetaEmbedIE
. There should be a SheetaBaseIE
from which they are subclassed. This will also simplify it so we can set the _NETRC_MACHINE
in the subclasses and it will be unique per site
IMPORTANT: PRs without the template will be CLOSED
Description of your pull request and other information
This PR fixes #9541, fixes #4655, closes #8645, fixes #8317.
Sheeta is a fan club system developed by DWANGO Co., Ltd. (株式会社ドワンゴ).
You can find websites supported by this extractor by Googling "
"(C) DWANGO Co., Ltd." "入会"
".This PR should NOT be merged unless the
expected_status=404
logic for generic sites gets implemented (see #9541 (comment)).Template
Before submitting a pull request make sure you have:
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:
What is the purpose of your pull request?