[rai] changed the extractor for the new site and dismisison of the old one #11790

timendum · 2017-01-20T16:43:51Z

Before submitting a pull request make sure you have:

At least skimmed through adding new extractor tutorial and youtube-dl coding conventions sections
Searched the bugtracker for similar pull requests

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Bug fix
Improvement
New extractor
New feature

Description of your pull request and other information

Rai changed its streaming website to raiplay.it and axed the old one, I heavy changed the extractor to reflect the new changes.

This is a follow up of #11134 , I've dropped the old one and fixed the conflicts.

All the tests pass in Italy, maybe some media can be geo-restricted.

timendum · 2017-01-27T10:59:54Z

@dstftw can you check?

Thansk

dstftw · 2017-01-28T12:48:55Z

Not working:
http://www.report.rai.it/dl/Report/puntata/ContentItem-0c7a664b-d0f4-4b2c-8835-3f82e46f433e.html
http://www.tg1.rai.it/dl/tg1/2010/edizioni/ContentSet-9b6e0cba-4bef-4aef-8cf0-9f7f665b7dfb-tg1.html?item=undefined
http://www.rainews.it/dl/rainews/live/ContentItem-3156f2f2-dc70-4953-8e2f-70d7489d4ce9.html

timendum · 2017-01-30T17:12:41Z

@dstftw
I've imported some code from the older extractor for better handling of old urls.

Can you check now?

Should I squash the commits?

Thanks.

dstftw · 2017-01-30T17:14:00Z

Should I squash the commits?

Yes.

timendum · 2017-01-31T12:01:46Z

@dstftw
Commits squashed.

rabblac · 2017-02-07T15:09:33Z

Still open? Is anything blocking this? @dstftw

dstftw · 2017-02-18T12:38:26Z

Check code with flake8.

timendum · 2017-02-20T15:02:48Z

Done and squashed.

There are still some lines longer then 100 chars, but they are test cases and regexps.

dstftw · 2017-03-04T16:37:47Z

youtube_dl/extractor/rai.py

+        formats = None
+        duration = None
+        if 'video' in media:
+            formats


What's this for?

dstftw · 2017-03-04T16:38:48Z

youtube_dl/extractor/rai.py

-        video_id = self._match_id(url)
+        formats = None
+        duration = None
+        if 'video' in media:


This is pointless. If no formats can be extracted extraction should stop immediately.

It's an initialization of a variable, I see no harm in it, but I'm open to suggestions.

It's a conditional expression that allows to skip formats extraction. As I've already said this is pointless since valid list of formats must be always present. None formats is not allowed.

dstftw · 2017-03-04T16:45:04Z

youtube_dl/extractor/rai.py

-            'formats': formats,
-            'subtitles': subtitles,
-        }
+        canonical_url = self._og_search_url(webpage)


What's the point of this? canonical_url is the same as url.

Not in the first test case and in every case where a query parameter is added to url.

Everything apart from query is the same and this unnecessary webpage downloading should be removed.

dstftw · 2017-03-11T12:40:22Z

youtube_dl/extractor/rai.py

-            'formats': formats,
-            'subtitles': subtitles,
-        }
+        canonical_url = self._og_search_url(webpage)


Everything apart from query is the same and this unnecessary webpage downloading should be removed.

dstftw · 2017-03-11T12:48:46Z

youtube_dl/extractor/rai.py

-        video_id = self._match_id(url)
+        formats = None
+        duration = None
+        if 'video' in media:


It's a conditional expression that allows to skip formats extraction. As I've already said this is pointless since valid list of formats must be always present. None formats is not allowed.

timendum · 2017-03-14T15:13:53Z

I've hopefully resolved all the issue:

The canonical url is now obtained removing the query with urlparse
An ExtractorError is thrown if no video is found

@zurfyx

* [cbsinteractive] fix extractor * [cbsinteractive] update test cases * [cbsinteractive] extract formats with `CBSIE` * [extractor/common] Fix rtmp and rtsp formats' URLs in _extract_wowza_formats * [vier] Extract more info Extract the `episode_number` and `upload_date`. Also extract the real `description`. * [vier] Relax regexes and extract more metadata (closes #12539) * [jsinterp] Add support for quoted names and indexers (closes #13123, closes #13130) * [ChangeLog] Actualize * release 2017.05.18 * [ChangeLog] Fix typo * [jsinterp] Fix typo and cleanup regexes (closes #13134) * [ChangeLog] Actualize * release 2017.05.18.1 * [mitele] Update app key regex * [hitbox] Add support for smashcast.tv (closes #13154) * [njpwworld] Fix extraction (closes #13162) * [toypics] Fix extraction * [toypics] Improve and modernize * [adobepass] Add support for Brighthouse MSO * [toggle] Relax _VALID_URL (closes #13172) * [youtube] Fix DASH manifest signature decryption (closes #8944) * [youtube] Modernize * [streamcz] Add support for subtitles * [downloader/external] Pass -loglevel to ffmpeg downloader (closes #13183) * Credit @zurfyx for atresplayer improvements (#12548) * Credit @mphe for streamango (#12643) * Credit @fredbourni for noovo (#12792) * [ChangeLog] Actualize * release 2017.05.23 * Credit @timendum for rai (#11790) and mediaset (#12964) * Credit @gritstub for vevo fix (#12879) * [cbsnews] fix extraction for 60 Minutes videos * [vimeo] Fix formats' sorting (closes #13189) * [postprocessor/ffmpeg] Fix metadata filename handling on Python 2 Fixes #13182 * [udemy] Fix extraction for outputs' format entries without URL (closes #13192) * [youku] Fix extraction (closes #13191) * [utils] Recognize more patterns in strip_jsonp() Used in Youku Show pages * [youku:show] Fix extraction * [tudou] Merge into youku extractor (fixes #12214) Also, there are no tudou playlists anymore. All playlist URLs points to youku playlists. * [bbc] Add support for authentication * Revert "[youtube] Don't use the DASH manifest from 'get_video_info' if 'use_cipher_signature' is True (#5118)" This reverts commit 87dc451. * [ChangeLog] Update after the fix for #11381 * [ChangeLog] Actualize * release 2017.05.26 * [cbsnews] Fix extraction (closes #13205) * [youku] Extract more metadata (closes #10433) * [adn] fix formats extraction * [utils] Drop an compatibility wrapper for Python < 2.6 addinfourl.getcode is added since Python 2.6a1. As youtube-dl now requires 2.6+, this is no longer necessary. See python/cpython@9b0d46d * [cbsinteractive] Relax _VALID_URL (closes #13213) * [beam:vod] Add extractor * [beam] Improve and add support for mixer.com (closes #13032) * [dvtv] Parse adaptive formats as well The old code hit an error when it attempted to parse the string "adaptive" for video height. Actually parsing the returned playlists is a good idea because it adds more output formats, including some audio-only-ones. * [dvtv] Improve and fix playlists support (closes #13063) * [medialaan] Fix videos with missing videoUrl A rough trick to get around the two different json styles medialaan seems to be using. Fix for these example videos: https://vtmkzoom.be/video?aid=45724 https://vtmkzoom.be/video?aid=45425 * [medialaan] PEP 8 (closes #12774) * [gaskrank] Fix extraction * [gaskrank] Improve (closes #12493) * [abcnews] Add support for embed URLs * [abcnews] Improve and remove duplicate test (closes #12851) * [xhamster] Extract categories (closes #11728) * [xhamster] Fix author and like/dislike count extraction * [xhamster] Simplify (closes #13216) * [youtube] Parse player_url if format URLs are encrypted or DASH MPDs are requested Fixes #13211 * [ChangeLog] Actualize * release 2017.05.29 * [README.md] Add an example for how to use .netrc on Windows That's a Python bug: http://bugs.python.org/issue28334 Most likely it will be fixed in Python 3.7: python/cpython#123 * [README.md] Mention http_dash_segments protocol * [packtpub] Fix authentication(closes #13240) * [drbonanza] Fix extraction (closes #13231) * [francetv] Relax _VALID_URL * [1tv] Lower preference for http formats (closes #13246) * [youtube] Improve chapters extraction (closes #13247) * [safari] Fix typo (closes #13252) * [YoutubeDL] Don't emit ANSI escape codes on Windows * [godtv] Remove extractor (closes #13175) * [pornhub:playlist] Fix extraction (closes #13281) * [pornhub:uservideos] Add missing raise * [bandcamp:weekly] Add extractor * [bandcamp:weekly] Improve and extract more metadata (closes #12758) * Credit @adamvoss for bandcamp:weekly (#12758) * Credit @mikf for beam:vod (#13032) * Credit @jktjkt for dvtv formats (#13063) * [ChangeLog] Actualize * release 2017.06.05 * [tvplayer] Fix extraction (closes #13291) * [rtlnl] Improve _VALID_URL (closes #13295) * [streamango] Make title optional * [streamango] Skip download for test (closes #13292) * [README.md] Clarify output template references (closes #13316) * [README.md] Improve man page formatting * [YoutubeDL] Sanitize more fields (#13313) * [liveleak] Ensure height is int (closes #13313) * [safari] Improve authentication detection (closes #13319) * [sohu] Fix numeric fields * [flickr] Ensure format id is string * [foxgay] Ensure height is int * [gfycat] Ensure filesize is int * [golem] Ensure format id is string * [jove] Ensure comment count is int * [sexu] Ensure height is int * [turbo] Ensure format id is string * [extractor/common] Return unicode string from _match_id * [extractor/generic] Ensure format id is unicode string * [msn] Fix formats extraction * [newgrounds] Improve formats and uploader extraction (closes #13346) * [newgrounds:playlist] Add extractor (closes #10611) * [utils] Improve unified_timestamp * [newgrounds] Extract more metadata (closes #13232) * [rutv] Add support for testplayer.vgtrk.com (closes #13347) * [xfileshare] Modernize and pass referrer * [xfileshare] Add support for rapidvideo (closes #13348) * [compat] Introduce compat_HTMLParseError * [utils] Handle HTMLParseError in extract_attributes (closes #13349) * [xfileshare] PEP 8 * [ChangeLog] Actualize * release 2017.06.12 * [compat] Add compat_HTMLParseError to __all__ * [corus] Add support for history.ca (closes #13359) * [corus] Add support for showcase.ca

dstftw added the pending-fixes label Jan 28, 2017

rabblac mentioned this pull request Feb 16, 2017

[RaiPlay] Add support for raiplay.it URLs #12123

Closed

8 tasks

dstftw requested changes Mar 4, 2017

View reviewed changes

dstftw requested changes Mar 11, 2017

View reviewed changes

[rai] changed the extractor for the new site

6240f5c

dstftw closed this in b8d8cce Apr 1, 2017

dstftw added a commit that referenced this pull request May 22, 2017

Credit @timendum for rai (#11790) and mediaset (#12964)

de53511

timendum deleted the rai branch December 19, 2017 14:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rai] changed the extractor for the new site and dismisison of the old one #11790

[rai] changed the extractor for the new site and dismisison of the old one #11790

timendum commented Jan 20, 2017

timendum commented Jan 27, 2017

dstftw commented Jan 28, 2017

timendum commented Jan 30, 2017

dstftw commented Jan 30, 2017

timendum commented Jan 31, 2017

rabblac commented Feb 7, 2017

dstftw commented Feb 18, 2017

timendum commented Feb 20, 2017

dstftw Mar 4, 2017

dstftw Mar 4, 2017

timendum Mar 6, 2017

dstftw Mar 11, 2017

dstftw Mar 4, 2017

timendum Mar 6, 2017

dstftw Mar 11, 2017

dstftw Mar 11, 2017

dstftw Mar 11, 2017

timendum commented Mar 14, 2017

[rai] changed the extractor for the new site and dismisison of the old one #11790

[rai] changed the extractor for the new site and dismisison of the old one #11790

Conversation

timendum commented Jan 20, 2017

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

What is the purpose of your pull request?

Description of your pull request and other information

timendum commented Jan 27, 2017

dstftw commented Jan 28, 2017

timendum commented Jan 30, 2017

dstftw commented Jan 30, 2017

timendum commented Jan 31, 2017

rabblac commented Feb 7, 2017

dstftw commented Feb 18, 2017

timendum commented Feb 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timendum commented Mar 14, 2017