Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making an IE for english.cntv.cn #879

Closed
yasoob opened this issue Jun 8, 2013 · 3 comments
Closed

Making an IE for english.cntv.cn #879

yasoob opened this issue Jun 8, 2013 · 3 comments

Comments

@yasoob
Copy link
Contributor

@yasoob yasoob commented Jun 8, 2013

Hi sorry for disturbing again. I am making an IE for english.cntv.cn and having a little problem again 😅 my test code is

import requests
import re
import sys
import json

def get_url(url):
    _VALID_URL = r'(?:http://)?(?:www\.)?english.cntv\.cn/program/([^/]+)/([^/]+)/([^/]+)\.shtml'
    mobj = re.match(_VALID_URL, url)
    if mobj is None:
        print u'Invalid URL: %s' % url
    print "Opening main page"
    html = requests.get(url)
    id = re.search(r'fo.addVariable\("videoCenterId","(.*)"\);fo.addVariable\("channelId",channelId_code\)',html.text)
    editor = (re.search(r'<b>Editor:</b>(.*)<b>Source:',html.text)).group(1)
    editor = (editor.strip('|')).strip()
    print "Opening Info_page"
    info = json.loads(requests.get('http://vdn.apps.cntv.cn/api/getHttpVideoInfo.do?pid='+ id.group(1)).text)
    title = info['title']
    video = info['video']
    chapters = video['chapters2'] if 'chapters2' in video else video['chapters']
    for x in chapters:
        urls = [x['url']]
    urls = [x['url'] for x in chapters]
    ext = "mp4"
    print {'url'     :  urls,
           'title'   :  title,
           'ext'     :  ext,
           'editor'  :  editor
    }

if __name__ == '__main__':
    url = sys.argv[-1]
    get_url(url)

Now the problem is that the url variable doesnot always contain a single url. It depends on the type of page which you open. For example http://english.cntv.cn/program/china24/20130607/106071.shtml gives us only one value in the url but http://english.cntv.cn/program/newshour/20120307/118190.shtml gives us 5 urls. What should we do here ? Any suggestions ?

@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented Jun 27, 2013

Try to pick the one with the best quality

@yasoob
Copy link
Contributor Author

@yasoob yasoob commented Jun 29, 2013

I think the videos are different parts. Just take a look at the urls:

http://v.cctv.com/flash/mp4video19/TMS/2012/03/07/dd4c11e583c34d5d89a2b1fde0c4614c_h264818000nero_aac32-1.mp4
http://v.cctv.com/flash/mp4video19/TMS/2012/03/07/dd4c11e583c34d5d89a2b1fde0c4614c_h264818000nero_aac32-2.mp4
http://v.cctv.com/flash/mp4video19/TMS/2012/03/07/dd4c11e583c34d5d89a2b1fde0c4614c_h264818000nero_aac32-3.mp4
http://v.cctv.com/flash/mp4video19/TMS/2012/03/07/dd4c11e583c34d5d89a2b1fde0c4614c_h264818000nero_aac32-4.mp4
http://v.cctv.com/flash/mp4video19/TMS/2012/03/07/dd4c11e583c34d5d89a2b1fde0c4614c_h264818000nero_aac32-5.mp4

Only the last number is incrementing....... What do you say........And even when i open all 5 urls they contain different videos.......

@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented Jun 29, 2013

Return 5 info_dicts, one for each url, it seems like the video is split in different parts. It may be better to return a playlist, I'm not sure.

@dstftw dstftw closed this in ce7ccb1 Jan 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.