Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Please add support for the family of Fairfax Newspaper websites. #4957
Comments
|
Just bumping this request. The generic extractor is extracting the wrong video. |
|
Brightcove-embeds. Currently working |
Fairfax owns a number of newspaper websites these are: www.brisbanetimes.com.au, www.watoday.com.au, www.theage.com.au, www.canberratimes.com.au and finally, www.smh.com.au. The generic extractor downloads the wrong videos from these sites.
All use similar but not identical html to present descriptions and titles. Videos are served in the same way on each of the sites. There's a "media.*" subdomain that hosts videos on each of the sites too. This uses slightly different html to show titles and descriptions from the main site.
Example URLs:
http://www.canberratimes.com.au/federal-politics/political-news/malcolm-turnbull-contradicts-tony-abbott-on-gillian-triggs-strategy-20150225-13o8lb.html
http://media.smh.com.au/video-news/video-national-news/julie-bishops-emoji-response-6295549.html
Also:
http://media.smh.com.au/featured/christopher-pynes-chum-bucket-6296463.html (note the one subfolder from the domain versus three)