Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Youtube-dl carries over 'Referer' header for subsequent requests #8778

Closed
kotofond opened this issue Mar 6, 2016 · 5 comments
Closed

Youtube-dl carries over 'Referer' header for subsequent requests #8778

kotofond opened this issue Mar 6, 2016 · 5 comments
Labels
bug

Comments

@kotofond
Copy link
Contributor

@kotofond kotofond commented Mar 6, 2016

I use youtube-dl in a python program which reuses YoutubeDL instance for downloading many videos. The issue I've encountered can be reproduced using the following code:

downloader = youtube_dl.YoutubeDL()
vimeo_info = downloader.extract_info('https://vimeo.com/149274392', download=False)
tudou_info = downloader.extract_info('http://www.tudou.com/programs/view/ayXy8TTcG0M/', download=False)

The output is as follows:

[vimeo] 149274392: Downloading webpage
[vimeo] 149274392: Extracting information
[vimeo] 149274392: Downloading webpage
[vimeo] 149274392: Downloading JSON metadata
[vimeo] 149274392: Downloading m3u8 information
[tudou] ayXy8TTcG0M: Downloading JSON metadata
[tudou] ayXy8TTcG0M: found 26 parts
[tudou] 400753010: Opening the info XML page

Traceback (most recent call last):
  File "C:\Python35\lib\site-packages\django\core\handlers\base.py", line 149, in get_response
    response = self.process_exception_by_middleware(e, request)
  File "C:\Python35\lib\site-packages\django\core\handlers\base.py", line 147, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "D:/Python projects/getvideo/getvideo\getvideo\views.py", line 175, in search
    tudou_info = downloader.extract_info('http://www.tudou.com/programs/view/ayXy8TTcG0M/', download=False)
  File "C:\Python35\lib\site-packages\youtube_dl\YoutubeDL.py", line 666, in extract_info
    ie_result = ie.extract(url)
  File "C:\Python35\lib\site-packages\youtube_dl\extractor\common.py", line 316, in extract
    return self._real_extract(url)
  File "C:\Python35\lib\site-packages\youtube_dl\extractor\tudou.py", line 85, in _real_extract
    ext = (final_url.split('?')[0]).split('.')[-1]
AttributeError: 'NoneType' object has no attribute 'split'

However the following code works OK:

downloader = youtube_dl.YoutubeDL()
tudou_info = downloader.extract_info('http://www.tudou.com/programs/view/ayXy8TTcG0M/', download=False)

With the help of tcpdump I've found out that when youtube-dl makes tudou request after vimeo it sends through Referer http header:

GET /f?id=400753010&hd3 HTTP/1.1
Host: v2.tudou.com
Accept-Language: en-us,en;q=0.5
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Connection: close
Accept-Encoding: gzip, deflate
Referer: https://player.vimeo.com/video/151382011?wmode=opaque
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/44.0 (Chrome)

which makes tudou website reply with <e errno='4' error='site is forbidden' tk='75028837309588150240200081' /><!--pageview_candidate--> hence crashing youtube-dl.

So I was wondering if it's possible to make youtube-dl start "from scratch" and nor carry over unnecessary headers when making new requests?

Thanks!

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Mar 6, 2016

It's a bug in VimeoIE. VimeoIE plays with the global variable std_headers, while it shouldn't. Another example is `SafariBaseIE``.

@yan12125 yan12125 added the bug label Mar 6, 2016
@yan12125 yan12125 closed this in 0f56a4b Mar 6, 2016
@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Mar 6, 2016

Thanks for the report. Consecutive requests should work again. By the way, you should use with so that YoutubeDL can do some cleanup works:

import youtube_dl

with youtube_dl.YoutubeDL() as downloader:
    vimeo_info = downloader.extract_info('https://vimeo.com/149274392', download=False)
    tudou_info = downloader.extract_info('http://www.tudou.com/programs/view/ayXy8TTcG0M/', download=False)
yan12125 added a commit that referenced this issue Mar 6, 2016
@kotofond
Copy link
Contributor Author

@kotofond kotofond commented Mar 6, 2016

Many thanks for the prompt fix!

@goonbag
Copy link

@goonbag goonbag commented Apr 12, 2016

Issue seems related so I will post this here rather than opening a new bug. Tudou module seems not to work in batch mode. I am using youtube-dl at command line with batch txt file input, all videos listed in txt file are tudou:

$ youtube-dl -a batch.txt
[tudou] B0zXrl9qXZY: Downloading JSON metadata
[download] Downloading playlist: June 4 - Voice of Korea - Part 2
[tudou] 114473647: Opening the info XML page
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/local/bin/youtube-dl/__main__.py", line 19, in <module>
  File "/usr/local/bin/youtube-dl/youtube_dl/__init__.py", line 419, in main
  File "/usr/local/bin/youtube-dl/youtube_dl/__init__.py", line 409, in _real_main
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 1725, in download
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 680, in extract_info
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 799, in process_ie_result
  File "/usr/local/bin/youtube-dl/youtube_dl/utils.py", line 1706, in getslice
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/tudou.py", line 110, in part_func
AttributeError: 'NoneType' object has no attribute 'split'

Please let me know if there is anything I can do to provide more info. Cheers

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Apr 12, 2016

Open a new issue and post the full verbose output of the following command:

youtube-dl -v -a batch.txt

And the content of batch.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.