[Rokfin] Add extractor #1534

P-reducible · 2021-11-03T15:46:39Z

Please follow the guide below

You will be asked some questions, please read them carefully and answer honestly
Put an x into all the boxes [ ] relevant to your pull request (like that [x])
Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

At least skimmed through contributing guidelines including yt-dlp coding conventions
Searched the bugtracker for similar pull requests
Checked the code with flake8

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check one of the following options:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Bug fix
Improvement
New extractor
New feature

This closes issue 1351.

yt_dlp/extractor/rokfin.py

pukkandan · 2021-11-08T23:23:15Z

let me know when you are done making changes

P-reducible · 2021-11-09T04:47:23Z

You can go ahead. I am done.

P-reducible · 2021-11-18T02:15:50Z

It's been 9 days, so I was wondering where the review process is. Any outstanding issues?

I would not be asking, but Rokfin is such an important site. It's a political website. Supporting it supports free speech. Rokfin
became popular because of the on-going censorship on YT. Examples of this censorship are countless. You can find one of them in rokfin.py. The second test case has a link to a show that's been wiped from YT. The author got his entire YT channel taken down with no clear explanation.

The other reason Rokfin support is so important is lack of the "Download" feature. The only way to download something from Rokfin is through things like yt-dlp.

I know that every extractor may be released as a plugin. However, there are two problems with this. First, not everyone has technical expertise to be able to use plugins. Second, even I (to be fully honest) didn't know about yt-dlp plugins until recently, and I've been using youtube-dl for years. So, can you really trust an average user to figure this out, since plugins are unsupported?

In summary, I would appreciate a substantive feedback regarding the Rokfin extractor. If everything is good, then perhaps it can be merged. Simultaneously, I will do what I can to speed the process up: let me know if there are urgent/important yt-dlp bugs
that must be fixed, and I will look into fixing them. (Yes, I am familiar with the Discord channel).

I hope this message won't be ignored. Certain videos on Rokfin are so important that they can (literally, not metaphorically) save lives. I can provide specifics privately, if you are interested.

Thank you.

P-reducible · 2021-12-03T05:00:49Z

I've been testing the Rokfin extractor with the --wait-for-video option, and, as I was waiting for a stream, a network glitch occurred. The 'release_timestamp' immediately became None, as did diff (line 1370 in YoutubeDL.py). So far, so good. But then line 1375 gave a warning that the video should have become available by now, which is false, and line 1376 gave

TypeError: '>' not supported between instances of 'int' and 'NoneType'

There are two ways out of this. The extractor could do _download_json(... fatal = True ...) instead of fatal = False, the way this is done right now. That would silence the warning and avoid the TypeError. Problem is: this won't work well if --wait-for-network gets implemented.

So, should your code handle both diff and 'live_status' being None?

pukkandan · 2021-12-03T12:59:32Z

It's a simple fix. I'll push later

diff --git a/yt_dlp/YoutubeDL.py b/yt_dlp/YoutubeDL.py
index 227098656..528d727ee 100644
--- a/yt_dlp/YoutubeDL.py
+++ b/yt_dlp/YoutubeDL.py
@@ -1375,7 +1375,7 @@ class YoutubeDL(object):
             self.report_warning('Release time of video is not known')
         elif (diff or 0) <= 0:
             self.report_warning('Video should already be available according to extracted info')
-        diff = min(max(diff, min_wait or 0), max_wait or float('inf'))
+        diff = min(max(diff or 0, min_wait or 0), max_wait or float('inf'))
         self.to_screen(f'[wait] Waiting for {format_dur(diff)} - Press Ctrl+C to try now')

         wait_till = time.time() + diff

P-reducible · 2021-12-03T22:47:22Z

Yes, but the lines

elif (diff or 0) <= 0:
    self.report_warning('Video should already be available according to extracted info')

will still give a false warning in the situation I described.

pukkandan · 2021-12-04T12:15:51Z

The warning is correct. If the extractor doesn't set was_upcoming, it means that the video should be already available according to extracted info

orbea · 2021-12-09T19:11:49Z

@P-reducible Can you please rebase and squash these commits so there aren't a lot of unrelated changes in the commit history?

P-reducible · 2021-12-09T19:58:16Z

@orbea Yeah, sure. Right now, I'm still testing, fixing bugs, and adding new things. Should be finished soon.

P-reducible · 2021-12-19T01:49:52Z

Suppose there is a pre-filmed non-live video. What's the difference between release_timestamp and timestamp in this case? Does release_timestamp show when the video was released by the author, possibly on other platforms? Does timestamp show when the video (quoting the description) "became available" on this particular platform?

Also, what should the timestamp be for live streams?

Is there a place where the difference is explained?

pukkandan · 2021-12-19T02:04:18Z

For some services, there is a clear separation between "uploaded" time and "released" time of the video. The two fields exists to account for that. If such a distincton doesnt make sense for a site, just use timestamp

Does release_timestamp show when the video was released by the author, possibly on other platforms?

No, it has nothing to do with whatever happens on other platforms

P-reducible · 2021-12-19T03:41:24Z

Ok, I got confused because common.py and README.md say different things about timestamp. It looks like timestamp would be better named upload_timestamp due to its link with upload_date.

But what about live streams? Live streams need release_timestamp. Otherwise --wait-for-video won't work (line 1378 in YoutubeDL.py). What should their timestamp be?

pukkandan · 2021-12-19T03:49:12Z

It looks like timestamp would be better named upload_timestamp due to its link with upload_date.

Originally there was only timestamp and upload_date. The release_ variants were added later. The field names cannot be changed for compatibility reasons

But what about live streams? Live streams need release_timestamp. Otherwise --wait-for-video won't work (line 1378 in YoutubeDL.py). What should their timestamp be?

For livestreams, I think it makes sense to have only release_timestamp. But if you want to copy the same to timestamp, that's fine as well

pukkandan · 2021-12-24T00:24:52Z

Is this done now? If so, rebase on master and ping me

P-reducible · 2021-12-24T03:36:25Z

@pukkandan As you requested ...

pukkandan

the same issues exist in the other classes too

yt_dlp/extractor/rokfin.py

pukkandan · 2021-12-24T04:01:01Z

yt_dlp/extractor/rokfin.py

+        downloaded_json = try_get(downloaded_json, lambda x: x.get)
+        created_by = try_get(downloaded_json, lambda x: x('createdBy')).get
+        upload_date_time = try_get(downloaded_json, lambda x: x('creationDateTime'))
+        channel_name = try_get(created_by, lambda x: x('name'))
+        content_subdict = try_get(content_subdict, lambda x: x.get)


remove all this. see below comment

yt_dlp/extractor/rokfin.py

pukkandan · 2021-12-24T04:07:14Z

yt_dlp/extractor/rokfin.py

+                is_unlisted=False)
+
+        content_subdict = try_get(downloaded_json, lambda x: x['content'])
+        video_formats_url = try_get(content_subdict, lambda x: url_or_none(x['contentUrl']))


Suggested change

video_formats_url = try_get(content_subdict, lambda x: url_or_none(x['contentUrl']))

video_formats_url = url_or_none(content.get('contentUrl'))

The point of try_get is to avoid .get chains. Not to unnecessarily over-complicate

pukkandan · 2021-12-24T04:12:12Z

yt_dlp/extractor/rokfin.py

+            'duration': try_get(content_subdict, lambda x: float_or_none(x('duration'))),
+            'thumbnail': try_get(content_subdict, lambda x: x('thumbnailUrl1')),
+            'description': try_get(content_subdict, lambda x: x('contentDescription')),
+            'like_count': try_get(downloaded_json, lambda x: x('likeCount')),
+            'dislike_count': try_get(downloaded_json, lambda x: x('dislikeCount')),
+            'comment_count': try_get(downloaded_json, lambda x: x('numComments')),
+            'availability': availability,
+            'creator': channel_name,
+            'channel_id': try_get(created_by, lambda x: x('id')),
+            'channel': channel_name,
+            'channel_url': try_get(created_by, lambda x: self._CHANNEL_BASE_URL + x('username')),
+            'timestamp': unified_timestamp(upload_date_time),
+            'tags': [str(tag) for tag in try_get(downloaded_json, lambda x: x('tags')) or []],


Suggested change

'duration': try_get(content_subdict, lambda x: float_or_none(x('duration'))),

'thumbnail': try_get(content_subdict, lambda x: x('thumbnailUrl1')),

'description': try_get(content_subdict, lambda x: x('contentDescription')),

'like_count': try_get(downloaded_json, lambda x: x('likeCount')),

'dislike_count': try_get(downloaded_json, lambda x: x('dislikeCount')),

'comment_count': try_get(downloaded_json, lambda x: x('numComments')),

'availability': availability,

'creator': channel_name,

'channel_id': try_get(created_by, lambda x: x('id')),

'channel': channel_name,

'channel_url': try_get(created_by, lambda x: self._CHANNEL_BASE_URL + x('username')),

'timestamp': unified_timestamp(upload_date_time),

'tags': [str(tag) for tag in try_get(downloaded_json, lambda x: x('tags')) or []],

'duration': content.get('duration'),

'thumbnail': url_or_none(content.get('thumbnailUrl1')),

'description': content.get('contentDescription'),

'like_count': downloaded_json.get('likeCount'),

'dislike_count': downloaded_json.get('dislikeCount'),

'comment_count': downloaded_json.get('numComments'),

'availability': availability,

'creator': channel_name,

'channel_id': try_get(downloaded_json, x['createdBy']['id']),

'channel': channel_name,

'channel_url': try_get(downloaded_json, self._CHANNEL_BASE_URL + x['createdBy']['username'])

'timestamp': unified_timestamp(upload_date_time),

'tags': [str(tag) for tag in downloaded_json.get('tags') or []],

Also, try_get(d, lambda x: x[a][b][c]) can be simplified to traverse_obj(d, (a, b, c) if you want

yt_dlp/extractor/rokfin.py

orbea · 2021-12-25T23:37:23Z

Is this supposed to work yet? I applied it to 2021.12.25 yt-dlp release and ran into this error.

$ yt-dlp --verbose 'https://rokfin.com/stream/12029/What-were-allowed-to-debate--Dr-Pierre-Kory'
[debug] Command-line config: ['--verbose', 'https://rokfin.com/stream/12029/What-were-allowed-to-debate--Dr-Pierre-Kory']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, err utf-8, pref UTF-8
[debug] yt-dlp version 2021.12.25 [87e049962]
[debug] Python version 3.9.0 (CPython 64bit) - Linux-5.10.45-x86_64-AMD_FX-tm-6350_Six-Core_Processor-with-glibc2.33
[debug] exe versions: ffmpeg git-2021-06-26-ad06929 (setts), ffprobe git-2021-06-26-ad06929, rtmpdump 2.4
[debug] Optional libraries: sqlite
[debug] Proxy map: {}
[debug] [generic] Extracting URL: https://rokfin.com/stream/12029/What-were-allowed-to-debate--Dr-Pierre-Kory
[generic] What-were-allowed-to-debate--Dr-Pierre-Kory: Requesting header
WARNING: [generic] Could not send HEAD request to https://rokfin.com/stream/12029/What-were-allowed-to-debate--Dr-Pierre-Kory: HTTP Error 404: Not Found
[generic] What-were-allowed-to-debate--Dr-Pierre-Kory: Downloading webpage
ERROR: [generic] Unable to download webpage: HTTP Error 404: Not Found (caused by <HTTPError 404: 'Not Found'>); please report this issue on  https://github.com/yt-dlp/yt-dlp . Make sure you are using the latest version; see  https://github.com/yt-dlp/yt-dlp  on how to update. Be sure to call yt-dlp with the --verbose flag and include its complete output. (caused by <HTTPError 404: 'Not Found'>); please report this issue on  https://github.com/yt-dlp/yt-dlp . Make sure you are using the latest version; see  https://github.com/yt-dlp/yt-dlp  on how to update. Be sure to call yt-dlp with the --verbose flag and include its complete output.
  File "/usr/lib64/python3.9/site-packages/yt_dlp/extractor/common.py", line 717, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/usr/lib64/python3.9/site-packages/yt_dlp/YoutubeDL.py", line 3456, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib64/python3.9/urllib/request.py", line 523, in open
    response = meth(req, response)
  File "/usr/lib64/python3.9/urllib/request.py", line 632, in http_response
    response = self.parent.error(
  File "/usr/lib64/python3.9/urllib/request.py", line 561, in error
    return self._call_chain(*args)
  File "/usr/lib64/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/usr/lib64/python3.9/urllib/request.py", line 641, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

P-reducible · 2022-03-04T13:48:35Z

@pukkandan

The reason searchIE is removed is that I started with the merged version and am putting the search back in. However, I am not finished with this yet and have not indicated that the code is ready for the next review. My intention is to keep searchIE in this PR, if you are still ok with that.
Only videos and audios are being returned by channelIE and stackIE, by default.

pukkandan · 2022-03-04T13:49:46Z

My intention is to keep searchIE in this PR, if you are still ok with that.

sure, either way is fine by me

pukkandan · 2022-03-04T13:50:07Z

Only videos and audios are being returned by channelIE and stackIE, by default.

yt-dlp MUST NOT return non-video URLs in playlists! It doesn;t matter if you put it behind an extractor-arg

P-reducible · 2022-03-04T19:22:37Z

@pukkandan

yt-dlp MUST NOT return non-video URLs in playlists! It doesn;t matter if you put it behind an extractor-arg

In our most recent exchange (which has since disappeared), you seemingly left the door open for an extractor-specific argument, so I went ahead with it.
You can take this functionality out if you choose to. I would not put it in in the first place if you made yourself clear from the start.

pukkandan · 2022-03-04T19:51:20Z

In our most recent exchange (which has since disappeared), you seemingly left the door open for an extractor-specific argument, so I went ahead with it.

I clearly said the same thing in our previous discussion (which has not "disappeared" btw). When you asked about adding an extractor arg, I asked for your reasoning on why yt-dlp "should return non-video URLs in playlists" and you failed to give any genuine reason beside a comment arguing the semantics that "it's not really a playlist" (which is relevant how?)

P-reducible · 2022-03-04T20:11:28Z

@pukkandan

Fair enough. Why didn't you say that right there and then? That would have saved my time and coding effort.
I am glad the old discussion is still up, although I still can't see it. Regardless, I take the "disappeared" part back.

pukkandan · 2022-03-04T20:14:53Z

Fair enough. Why didn't you say that right there and then? That would have saved my time and coding effort.

what!? I suggest you read the conversation again

P-reducible · 2022-03-04T20:37:08Z

@pukkandan What I said makes perfect sense. I gave an explanation, and you didn't reply. Only in retrospect did I learn that my explanation was not "genuine".

P-reducible · 2022-03-05T22:51:02Z

@pukkandan Are you planning a new release? If so, when should I give you the code in time for the release?

pukkandan · 2022-03-05T23:09:34Z

There are a couple more issues (unrelated to this) that I want to fix before a release. Will make one as soon as I get the time to finish those up. It is unlikely for new code to make it in. Feel free to take your time

into rokfin-extractor

P-reducible · 2022-03-05T23:56:59Z

@pukkandan If that's the case, you may want to merge this regression fix I just committed: 'premiumPlan' applies to posts only. For streams, the field is just called 'premium'. I also added multi_video=True to playlist_result in RokfinStackIE.

iambumblehead · 2022-03-07T06:12:44Z

thanks @P-reducible I look forward to this feature.

On my flakey internet connection, rokfin videos frequently buffer and break and are impractical to play through. It will be GREAT if yt-dlp will acquire the video for me so that it can be watched later without buffering and network issues.

yt_dlp/extractor/rokfin.py

P-reducible · 2022-03-08T20:18:55Z

thanks ...

@iambumblehead, you are welcome. The speed lag affects me, as well, to some degree. Must be a Rokfin issue.

P-reducible · 2022-03-09T13:00:15Z

Site search and login moved to PR #2992.

pukkandan reviewed Nov 3, 2021

View reviewed changes

yt_dlp/extractor/rokfin.py Outdated Show resolved Hide resolved

pukkandan added pending-fixes PR has had changes requested pending-review PR needs a review and removed pending-fixes PR has had changes requested labels Nov 3, 2021

P-reducible requested a review from pukkandan November 16, 2021 01:32

P-reducible mentioned this pull request Nov 18, 2021

Rokfin site support request ytdl-org/youtube-dl#29699

Closed

5 tasks

pukkandan added the site-request Request to support a new website label Nov 18, 2021

pukkandan force-pushed the master branch from 077d0ba to 717216b Compare November 29, 2021 19:33

P-reducible force-pushed the rokfin-extractor branch from 9b09659 to 3fa1757 Compare December 10, 2021 01:27

P-reducible force-pushed the rokfin-extractor branch from 05305e2 to f879e84 Compare December 24, 2021 03:30

pukkandan requested changes Dec 24, 2021

View reviewed changes

pukkandan reviewed Dec 24, 2021

View reviewed changes

yt_dlp/extractor/rokfin.py Outdated Show resolved Hide resolved

pukkandan requested changes Dec 24, 2021

View reviewed changes

pukkandan force-pushed the master branch from f0650dd to 87e0499 Compare December 25, 2021 00:33

pukkandan added the do-not-merge label Mar 4, 2022

P-reducible added 3 commits March 6, 2022 00:40

Regression fixed

09538ef

Merge branch 'yt-dlp:master' into rokfin-extractor

671ba61

Merge branch 'rokfin-extractor' of https://github.com/P-reducible/yt-dlp

c2211af

into rokfin-extractor

pukkandan requested changes Mar 8, 2022

View reviewed changes

yt_dlp/extractor/rokfin.py Outdated Show resolved Hide resolved

yt_dlp/extractor/rokfin.py Outdated Show resolved Hide resolved

yt_dlp/extractor/rokfin.py Outdated Show resolved Hide resolved

Rewrite!

8d800a3

pukkandan merged commit 9461cb5 into yt-dlp:master Mar 8, 2022

pukkandan mentioned this pull request Mar 9, 2022

[Rokfin] Search & login #2992

Merged

9 tasks

P-reducible deleted the rokfin-extractor branch April 19, 2022 03:28

	video_formats_url = try_get(content_subdict, lambda x: url_or_none(x['contentUrl']))
	video_formats_url = url_or_none(content.get('contentUrl'))

[Rokfin] Add extractor #1534

[Rokfin] Add extractor #1534

Conversation

P-reducible commented Nov 3, 2021

Please follow the guide below

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check one of the following options:

What is the purpose of your pull request?

pukkandan commented Nov 8, 2021

P-reducible commented Nov 9, 2021

P-reducible commented Nov 18, 2021 • edited

P-reducible commented Dec 3, 2021

pukkandan commented Dec 3, 2021

P-reducible commented Dec 3, 2021

pukkandan commented Dec 4, 2021

orbea commented Dec 9, 2021

P-reducible commented Dec 9, 2021

P-reducible commented Dec 19, 2021

pukkandan commented Dec 19, 2021 • edited

P-reducible commented Dec 19, 2021

pukkandan commented Dec 19, 2021

pukkandan commented Dec 24, 2021

P-reducible commented Dec 24, 2021

pukkandan left a comment

Choose a reason for hiding this comment

pukkandan Dec 24, 2021

Choose a reason for hiding this comment

pukkandan Dec 24, 2021

Choose a reason for hiding this comment

pukkandan Dec 24, 2021

Choose a reason for hiding this comment

pukkandan Dec 24, 2021

Choose a reason for hiding this comment

orbea commented Dec 25, 2021

P-reducible commented Mar 4, 2022

pukkandan commented Mar 4, 2022

pukkandan commented Mar 4, 2022

P-reducible commented Mar 4, 2022 • edited

pukkandan commented Mar 4, 2022

P-reducible commented Mar 4, 2022

pukkandan commented Mar 4, 2022

P-reducible commented Mar 4, 2022

P-reducible commented Mar 5, 2022

pukkandan commented Mar 5, 2022

P-reducible commented Mar 5, 2022

iambumblehead commented Mar 7, 2022

P-reducible commented Mar 8, 2022

P-reducible commented Mar 9, 2022

P-reducible commented Nov 18, 2021 •

edited

pukkandan commented Dec 19, 2021 •

edited

P-reducible commented Mar 4, 2022 •

edited