Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rokfin] Add extractor #1534

Merged
merged 53 commits into from Mar 8, 2022
Merged

Conversation

P-reducible
Copy link
Contributor

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

This closes issue 1351.

@pukkandan pukkandan added pending-fixes PR has had changes requested pending-review PR needs a review and removed pending-fixes PR has had changes requested labels Nov 3, 2021
@pukkandan
Copy link
Member

let me know when you are done making changes

@P-reducible
Copy link
Contributor Author

You can go ahead. I am done.

@P-reducible
Copy link
Contributor Author

P-reducible commented Nov 18, 2021

It's been 9 days, so I was wondering where the review process is. Any outstanding issues?

I would not be asking, but Rokfin is such an important site. It's a political website. Supporting it supports free speech. Rokfin
became popular because of the on-going censorship on YT. Examples of this censorship are countless. You can find one of them in rokfin.py. The second test case has a link to a show that's been wiped from YT. The author got his entire YT channel taken down with no clear explanation.

The other reason Rokfin support is so important is lack of the "Download" feature. The only way to download something from Rokfin is through things like yt-dlp.

I know that every extractor may be released as a plugin. However, there are two problems with this. First, not everyone has technical expertise to be able to use plugins. Second, even I (to be fully honest) didn't know about yt-dlp plugins until recently, and I've been using youtube-dl for years. So, can you really trust an average user to figure this out, since plugins are unsupported?

In summary, I would appreciate a substantive feedback regarding the Rokfin extractor. If everything is good, then perhaps it can be merged. Simultaneously, I will do what I can to speed the process up: let me know if there are urgent/important yt-dlp bugs
that must be fixed, and I will look into fixing them. (Yes, I am familiar with the Discord channel).

I hope this message won't be ignored. Certain videos on Rokfin are so important that they can (literally, not metaphorically) save lives. I can provide specifics privately, if you are interested.

Thank you.

@P-reducible
Copy link
Contributor Author

I've been testing the Rokfin extractor with the --wait-for-video option, and, as I was waiting for a stream, a network glitch occurred. The 'release_timestamp' immediately became None, as did diff (line 1370 in YoutubeDL.py). So far, so good. But then line 1375 gave a warning that the video should have become available by now, which is false, and line 1376 gave

TypeError: '>' not supported between instances of 'int' and 'NoneType'

There are two ways out of this. The extractor could do _download_json(... fatal = True ...) instead of fatal = False, the way this is done right now. That would silence the warning and avoid the TypeError. Problem is: this won't work well if --wait-for-network gets implemented.

So, should your code handle both diff and 'live_status' being None?

@pukkandan
Copy link
Member

It's a simple fix. I'll push later

diff --git a/yt_dlp/YoutubeDL.py b/yt_dlp/YoutubeDL.py
index 227098656..528d727ee 100644
--- a/yt_dlp/YoutubeDL.py
+++ b/yt_dlp/YoutubeDL.py
@@ -1375,7 +1375,7 @@ class YoutubeDL(object):
             self.report_warning('Release time of video is not known')
         elif (diff or 0) <= 0:
             self.report_warning('Video should already be available according to extracted info')
-        diff = min(max(diff, min_wait or 0), max_wait or float('inf'))
+        diff = min(max(diff or 0, min_wait or 0), max_wait or float('inf'))
         self.to_screen(f'[wait] Waiting for {format_dur(diff)} - Press Ctrl+C to try now')

         wait_till = time.time() + diff

@P-reducible
Copy link
Contributor Author

Yes, but the lines

elif (diff or 0) <= 0:
    self.report_warning('Video should already be available according to extracted info')

will still give a false warning in the situation I described.

@pukkandan
Copy link
Member

The warning is correct. If the extractor doesn't set was_upcoming, it means that the video should be already available according to extracted info

@orbea
Copy link

orbea commented Dec 9, 2021

@P-reducible Can you please rebase and squash these commits so there aren't a lot of unrelated changes in the commit history?

@P-reducible
Copy link
Contributor Author

@orbea Yeah, sure. Right now, I'm still testing, fixing bugs, and adding new things. Should be finished soon.

@P-reducible
Copy link
Contributor Author

Suppose there is a pre-filmed non-live video. What's the difference between release_timestamp and timestamp in this case? Does release_timestamp show when the video was released by the author, possibly on other platforms? Does timestamp show when the video (quoting the description) "became available" on this particular platform?

Also, what should the timestamp be for live streams?

Is there a place where the difference is explained?

@pukkandan
Copy link
Member

pukkandan commented Dec 19, 2021

For some services, there is a clear separation between "uploaded" time and "released" time of the video. The two fields exists to account for that. If such a distincton doesnt make sense for a site, just use timestamp

Does release_timestamp show when the video was released by the author, possibly on other platforms?

No, it has nothing to do with whatever happens on other platforms

@P-reducible
Copy link
Contributor Author

Ok, I got confused because common.py and README.md say different things about timestamp. It looks like timestamp would be better named upload_timestamp due to its link with upload_date.

But what about live streams? Live streams need release_timestamp. Otherwise --wait-for-video won't work (line 1378 in YoutubeDL.py). What should their timestamp be?

@pukkandan
Copy link
Member

It looks like timestamp would be better named upload_timestamp due to its link with upload_date.

Originally there was only timestamp and upload_date. The release_ variants were added later. The field names cannot be changed for compatibility reasons

But what about live streams? Live streams need release_timestamp. Otherwise --wait-for-video won't work (line 1378 in YoutubeDL.py). What should their timestamp be?

For livestreams, I think it makes sense to have only release_timestamp. But if you want to copy the same to timestamp, that's fine as well

@pukkandan
Copy link
Member

Is this done now? If so, rebase on master and ping me

@P-reducible
Copy link
Contributor Author

@pukkandan As you requested ...

Copy link
Member

@pukkandan pukkandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same issues exist in the other classes too

yt_dlp/extractor/rokfin.py Outdated Show resolved Hide resolved
Comment on lines 88 to 92
downloaded_json = try_get(downloaded_json, lambda x: x.get)
created_by = try_get(downloaded_json, lambda x: x('createdBy')).get
upload_date_time = try_get(downloaded_json, lambda x: x('creationDateTime'))
channel_name = try_get(created_by, lambda x: x('name'))
content_subdict = try_get(content_subdict, lambda x: x.get)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove all this. see below comment

yt_dlp/extractor/rokfin.py Outdated Show resolved Hide resolved
is_unlisted=False)

content_subdict = try_get(downloaded_json, lambda x: x['content'])
video_formats_url = try_get(content_subdict, lambda x: url_or_none(x['contentUrl']))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
video_formats_url = try_get(content_subdict, lambda x: url_or_none(x['contentUrl']))
video_formats_url = url_or_none(content.get('contentUrl'))

The point of try_get is to avoid .get chains. Not to unnecessarily over-complicate

Comment on lines 103 to 115
'duration': try_get(content_subdict, lambda x: float_or_none(x('duration'))),
'thumbnail': try_get(content_subdict, lambda x: x('thumbnailUrl1')),
'description': try_get(content_subdict, lambda x: x('contentDescription')),
'like_count': try_get(downloaded_json, lambda x: x('likeCount')),
'dislike_count': try_get(downloaded_json, lambda x: x('dislikeCount')),
'comment_count': try_get(downloaded_json, lambda x: x('numComments')),
'availability': availability,
'creator': channel_name,
'channel_id': try_get(created_by, lambda x: x('id')),
'channel': channel_name,
'channel_url': try_get(created_by, lambda x: self._CHANNEL_BASE_URL + x('username')),
'timestamp': unified_timestamp(upload_date_time),
'tags': [str(tag) for tag in try_get(downloaded_json, lambda x: x('tags')) or []],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'duration': try_get(content_subdict, lambda x: float_or_none(x('duration'))),
'thumbnail': try_get(content_subdict, lambda x: x('thumbnailUrl1')),
'description': try_get(content_subdict, lambda x: x('contentDescription')),
'like_count': try_get(downloaded_json, lambda x: x('likeCount')),
'dislike_count': try_get(downloaded_json, lambda x: x('dislikeCount')),
'comment_count': try_get(downloaded_json, lambda x: x('numComments')),
'availability': availability,
'creator': channel_name,
'channel_id': try_get(created_by, lambda x: x('id')),
'channel': channel_name,
'channel_url': try_get(created_by, lambda x: self._CHANNEL_BASE_URL + x('username')),
'timestamp': unified_timestamp(upload_date_time),
'tags': [str(tag) for tag in try_get(downloaded_json, lambda x: x('tags')) or []],
'duration': content.get('duration'),
'thumbnail': url_or_none(content.get('thumbnailUrl1')),
'description': content.get('contentDescription'),
'like_count': downloaded_json.get('likeCount'),
'dislike_count': downloaded_json.get('dislikeCount'),
'comment_count': downloaded_json.get('numComments'),
'availability': availability,
'creator': channel_name,
'channel_id': try_get(downloaded_json, x['createdBy']['id']),
'channel': channel_name,
'channel_url': try_get(downloaded_json, self._CHANNEL_BASE_URL + x['createdBy']['username'])
'timestamp': unified_timestamp(upload_date_time),
'tags': [str(tag) for tag in downloaded_json.get('tags') or []],

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, try_get(d, lambda x: x[a][b][c]) can be simplified to traverse_obj(d, (a, b, c) if you want

yt_dlp/extractor/rokfin.py Outdated Show resolved Hide resolved
yt_dlp/extractor/rokfin.py Outdated Show resolved Hide resolved
yt_dlp/extractor/rokfin.py Outdated Show resolved Hide resolved
yt_dlp/extractor/rokfin.py Outdated Show resolved Hide resolved
yt_dlp/extractor/rokfin.py Outdated Show resolved Hide resolved
yt_dlp/extractor/rokfin.py Outdated Show resolved Hide resolved
yt_dlp/extractor/rokfin.py Outdated Show resolved Hide resolved
yt_dlp/extractor/rokfin.py Outdated Show resolved Hide resolved
yt_dlp/extractor/rokfin.py Outdated Show resolved Hide resolved
@orbea
Copy link

orbea commented Dec 25, 2021

Is this supposed to work yet? I applied it to 2021.12.25 yt-dlp release and ran into this error.

$ yt-dlp --verbose 'https://rokfin.com/stream/12029/What-were-allowed-to-debate--Dr-Pierre-Kory'
[debug] Command-line config: ['--verbose', 'https://rokfin.com/stream/12029/What-were-allowed-to-debate--Dr-Pierre-Kory']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, err utf-8, pref UTF-8
[debug] yt-dlp version 2021.12.25 [87e049962]
[debug] Python version 3.9.0 (CPython 64bit) - Linux-5.10.45-x86_64-AMD_FX-tm-6350_Six-Core_Processor-with-glibc2.33
[debug] exe versions: ffmpeg git-2021-06-26-ad06929 (setts), ffprobe git-2021-06-26-ad06929, rtmpdump 2.4
[debug] Optional libraries: sqlite
[debug] Proxy map: {}
[debug] [generic] Extracting URL: https://rokfin.com/stream/12029/What-were-allowed-to-debate--Dr-Pierre-Kory
[generic] What-were-allowed-to-debate--Dr-Pierre-Kory: Requesting header
WARNING: [generic] Could not send HEAD request to https://rokfin.com/stream/12029/What-were-allowed-to-debate--Dr-Pierre-Kory: HTTP Error 404: Not Found
[generic] What-were-allowed-to-debate--Dr-Pierre-Kory: Downloading webpage
ERROR: [generic] Unable to download webpage: HTTP Error 404: Not Found (caused by <HTTPError 404: 'Not Found'>); please report this issue on  https://github.com/yt-dlp/yt-dlp . Make sure you are using the latest version; see  https://github.com/yt-dlp/yt-dlp  on how to update. Be sure to call yt-dlp with the --verbose flag and include its complete output. (caused by <HTTPError 404: 'Not Found'>); please report this issue on  https://github.com/yt-dlp/yt-dlp . Make sure you are using the latest version; see  https://github.com/yt-dlp/yt-dlp  on how to update. Be sure to call yt-dlp with the --verbose flag and include its complete output.
  File "/usr/lib64/python3.9/site-packages/yt_dlp/extractor/common.py", line 717, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/usr/lib64/python3.9/site-packages/yt_dlp/YoutubeDL.py", line 3456, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib64/python3.9/urllib/request.py", line 523, in open
    response = meth(req, response)
  File "/usr/lib64/python3.9/urllib/request.py", line 632, in http_response
    response = self.parent.error(
  File "/usr/lib64/python3.9/urllib/request.py", line 561, in error
    return self._call_chain(*args)
  File "/usr/lib64/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/usr/lib64/python3.9/urllib/request.py", line 641, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

@P-reducible
Copy link
Contributor Author

@pukkandan

  1. The reason searchIE is removed is that I started with the merged version and am putting the search back in. However, I am not finished with this yet and have not indicated that the code is ready for the next review. My intention is to keep searchIE in this PR, if you are still ok with that.
  2. Only videos and audios are being returned by channelIE and stackIE, by default.

@pukkandan
Copy link
Member

My intention is to keep searchIE in this PR, if you are still ok with that.

sure, either way is fine by me

@pukkandan
Copy link
Member

Only videos and audios are being returned by channelIE and stackIE, by default.

yt-dlp MUST NOT return non-video URLs in playlists! It doesn;t matter if you put it behind an extractor-arg

@P-reducible
Copy link
Contributor Author

P-reducible commented Mar 4, 2022

@pukkandan

yt-dlp MUST NOT return non-video URLs in playlists! It doesn;t matter if you put it behind an extractor-arg

  1. In our most recent exchange (which has since disappeared), you seemingly left the door open for an extractor-specific argument, so I went ahead with it.
  2. You can take this functionality out if you choose to. I would not put it in in the first place if you made yourself clear from the start.

@pukkandan
Copy link
Member

In our most recent exchange (which has since disappeared), you seemingly left the door open for an extractor-specific argument, so I went ahead with it.

I clearly said the same thing in our previous discussion (which has not "disappeared" btw). When you asked about adding an extractor arg, I asked for your reasoning on why yt-dlp "should return non-video URLs in playlists" and you failed to give any genuine reason beside a comment arguing the semantics that "it's not really a playlist" (which is relevant how?)

@P-reducible
Copy link
Contributor Author

@pukkandan

  1. Fair enough. Why didn't you say that right there and then? That would have saved my time and coding effort.
  2. I am glad the old discussion is still up, although I still can't see it. Regardless, I take the "disappeared" part back.

@pukkandan
Copy link
Member

Fair enough. Why didn't you say that right there and then? That would have saved my time and coding effort.

what!? I suggest you read the conversation again
image

@P-reducible
Copy link
Contributor Author

@pukkandan What I said makes perfect sense. I gave an explanation, and you didn't reply. Only in retrospect did I learn that my explanation was not "genuine".

@P-reducible
Copy link
Contributor Author

@pukkandan Are you planning a new release? If so, when should I give you the code in time for the release?

@pukkandan
Copy link
Member

There are a couple more issues (unrelated to this) that I want to fix before a release. Will make one as soon as I get the time to finish those up. It is unlikely for new code to make it in. Feel free to take your time

@P-reducible
Copy link
Contributor Author

@pukkandan If that's the case, you may want to merge this regression fix I just committed: 'premiumPlan' applies to posts only. For streams, the field is just called 'premium'. I also added multi_video=True to playlist_result in RokfinStackIE.

@iambumblehead
Copy link

thanks @P-reducible I look forward to this feature.

On my flakey internet connection, rokfin videos frequently buffer and break and are impractical to play through. It will be GREAT if yt-dlp will acquire the video for me so that it can be watched later without buffering and network issues.

yt_dlp/extractor/rokfin.py Outdated Show resolved Hide resolved
yt_dlp/extractor/rokfin.py Outdated Show resolved Hide resolved
yt_dlp/extractor/rokfin.py Outdated Show resolved Hide resolved
@pukkandan pukkandan merged commit 9461cb5 into yt-dlp:master Mar 8, 2022
@P-reducible
Copy link
Contributor Author

thanks ...

@iambumblehead, you are welcome. The speed lag affects me, as well, to some degree. Must be a Rokfin issue.

@pukkandan pukkandan mentioned this pull request Mar 9, 2022
9 tasks
@P-reducible
Copy link
Contributor Author

Site search and login moved to PR #2992.

@P-reducible P-reducible deleted the rokfin-extractor branch April 19, 2022 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge site-enhancement Feature request for some website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rokfin support
7 participants