Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hketv] Add new extractor #18696
[hketv] Add new extractor #18696
Changes from 1 commit
ec2e471
7468be2
319b00d
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relax regexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the regex to
r'post_var\["file_id"\]\s*=\s*(.+?);'
for example.Hope that is relax enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaks on uninitialized
h
. Breaks on None width and height.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Indeed, some older videos were uploaded without width and height information, as in they are set to "0". Changed to the following:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urljoin
replaced with+
.Or did you mean the use of
self._downloader.urlopen()
? That was intentionally done so that the download link of the video could be downloaded without geo-restriction, i.e. without proxy, and without the PHPSESSID cookie. (The geo-restriction is done within their PHP script only.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, may I keep using self._downloader.urlopen() for the MPEG-4 video file? :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, what for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that no cookie is needed with the link given by
-g
. The link given byurljoin(_APPS_BASE_URL, fmt.get('file'))
looks something like this:https://apps.hkedcity.net/media/mediaplayer/filename.php?t=85ee74dae3477d95b05cc3091f26ff5a&m=mp4&p=fc61b1acf92b52238ad5c616409759360b4d914f183691d79c03e27a335a1c7e99a0f5d12b7183530daad9edc79fb17d97c3b6c0d3c445d68b96c92fa88b3a2f&file=551cc229795bb70003e9962bda09899612a7ae10b183be5e84e136419d91c684c63c863a8e3237a753ed31921500c2d28aa4db43a8684e37caac000571948110&e=.mp4
which requires the correct cookies to work. It gives a "HTTP/1.1 302 Found" redirecting to a final URL that looks like https://video1.hkedcity.net/streaming/b3848d0a3296b85d67d88b749642c29a/5c3738d1/channels/etv/201507/20150716151039_989507_720.mp4 which requires no cookie. This final URL is easier for me to copy-and-paste to other programs or even to stream to other devices.
That said, I have decided to take your advice and not "fully resolve" to the final URL because the intermediate URL, with the appropriate cookies saved with
--cookies
, remains valid for many hours, whereas the fully resolved URL would be "410 Gone" within one or two hours. Also, I realizedself._downloader.urlopen
was being run twice in thefor fmt in fmts:
loop, leading to quite a few seconds of delay. (The HKETV server seems rather slow in responding to this request.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str_or_none
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I copied that few lines verbatim from
_parse_jwplayer_data
in youtube_dl/extractor/common.py! Doesn't that meanstr_or_none
needs to be added there too?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise_geo_restricted
and specifycountries
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! After consulting other extractors, I am now using this:
with the following definitions near the top:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emotion
is not guaranteed to be a dict.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Line removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in
alone will work just fine without the extraneous
if emotion.get('result'):
check.