Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'utf8' codec can't decode byte 0xb6 in position 147: invalid start byte #8070

Closed
matiasw opened this issue Dec 31, 2015 · 14 comments

Comments

@matiasw
Copy link

@matiasw matiasw commented Dec 31, 2015

Thanks for helping create youtube-dl.

There seems to be a problem with Unicode handling. With youtube-dl [any video], OR trying the latest git master HEAD (d5f6429) and ./setup.py build, I get:

Traceback (most recent call last):
File "/usr/local/bin/youtube-dl", line 5, in
from pkg_resources import load_entry_point
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 3138, in
@_call_aside
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 3124, in _call_aside
f(_args, *_kwargs)
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 3151, in _initialize_master_working_set
working_set = WorkingSet._build_master()
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 652, in _build_master
ws = cls()
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 645, in init
self.add_entry(entry)
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 701, in add_entry
for dist in find_distributions(entry, True):
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2139, in find_on_path
path_item, entry, metadata, precedence=DEVELOP_DIST
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2521, in from_location
py_version=py_version, platform=platform, **kw
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2835, in _reload_version
md_version = _version_from_file(self._get_metadata(self.PKG_INFO))
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2486, in _version_from_file
line = next(iter(version_lines), '')
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2654, in _get_metadata
for line in self.get_metadata_lines(name):
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2030, in get_metadata_lines
return yield_lines(self.get_metadata(name))
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2025, in get_metadata
metadata = f.read()
File "/usr/lib/python2.7/codecs.py", line 314, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb6 in position 147: invalid start byte

The line in question is:
from pkg_resources import load_entry_point

The installed youtube-dl version (/usr/local/bin/youtube-dl) is 2015.12.18.
Now, to my understanding, 0xb6 is, indeed, invalid as a utf8 start byte. Where this is coming from is beyond me at the moment. As the second byte, it is present in eg. ¶ (U+00B6, aka 0xc2 0xb6) and ö (U+00F6, aka 0xc3 0xb6).
So far, I've found this presentation: http://nedbatchelder.com/text/unipain.html
And this poor fellow, with the ~samish problem: http://www.gossamer-threads.com/lists/engine?do=post_view_flat;post=1070061;page=1;mh=-1;list=python;sb=post_latest_reply;so=ASC

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Dec 31, 2015

The installed youtube-dl version (/usr/local/bin/youtube-dl) is 2015.12.18.

FYI: The latest version of youtube-dl is 2015.12.29. You may want to check whether there are multiple versions on your device.

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Dec 31, 2015

Can you paste the output of the following command?

xxd /usr/lib/python2.7/dist-packages/youtube_dl-2015.12.29-py2.7.egg/EGG-INFO/PKG-INFO

The actual path may be different.

@matiasw
Copy link
Author

@matiasw matiasw commented Dec 31, 2015

Ok, there was indeed another version installed on my machine, 2015.11.27.1-1. I apt removed that, but, being unable to build the current version, was left to install v. 2015.11.27.1 from the repositories again. Here's the xxd output of /usr/lib/python2.7/dist-packages/youtube_dl-2015.11.27.1.egg-info/PKG-INFO:
00000000: 4d65 7461 6461 7461 2d56 6572 7369 6f6e Metadata-Version
00000010: 3a20 312e 310a 4e61 6d65 3a20 796f 7574 : 1.1.Name: yout
00000020: 7562 652d 646c 0a56 6572 7369 6f6e 3a20 ube-dl.Version:
00000030: 3230 3135 2e31 312e 3237 2e31 0a53 756d 2015.11.27.1.Sum
00000040: 6d61 7279 3a20 596f 7554 7562 6520 7669 mary: YouTube vi
00000050: 6465 6f20 646f 776e 6c6f 6164 6572 0a48 deo downloader.H
00000060: 6f6d 652d 7061 6765 3a20 6874 7470 733a ome-page: https:
00000070: 2f2f 6769 7468 7562 2e63 6f6d 2f72 6733 //github.com/rg3
00000080: 2f79 6f75 7475 6265 2d64 6c0a 4175 7468 /youtube-dl.Auth
00000090: 6f72 3a20 5068 696c 6970 7020 4861 6765 or: Philipp Hage
000000a0: 6d65 6973 7465 720a 4175 7468 6f72 2d65 meister.Author-e
000000b0: 6d61 696c 3a20 7068 6968 6167 4070 6869 mail: phihag@phi
000000c0: 6861 672e 6465 0a4c 6963 656e 7365 3a20 hag.de.License:
000000d0: 554e 4b4e 4f57 4e0a 4465 7363 7269 7074 UNKNOWN.Descript
000000e0: 696f 6e3a 2053 6d61 6c6c 2063 6f6d 6d61 ion: Small comma
000000f0: 6e64 2d6c 696e 6520 7072 6f67 7261 6d20 nd-line program
00000100: 746f 2064 6f77 6e6c 6f61 6420 7669 6465 to download vide
00000110: 6f73 2066 726f 6d20 596f 7554 7562 652e os from YouTube.
00000120: 636f 6d20 616e 6420 6f74 6865 7220 7669 com and other vi
00000130: 6465 6f20 7369 7465 732e 0a50 6c61 7466 deo sites..Platf
00000140: 6f72 6d3a 2055 4e4b 4e4f 574e 0a43 6c61 orm: UNKNOWN.Cla
00000150: 7373 6966 6965 723a 2054 6f70 6963 203a ssifier: Topic :
00000160: 3a20 4d75 6c74 696d 6564 6961 203a 3a20 : Multimedia ::
00000170: 5669 6465 6f0a 436c 6173 7369 6669 6572 Video.Classifier
00000180: 3a20 4465 7665 6c6f 706d 656e 7420 5374 : Development St
00000190: 6174 7573 203a 3a20 3520 2d20 5072 6f64 atus :: 5 - Prod
000001a0: 7563 7469 6f6e 2f53 7461 626c 650a 436c uction/Stable.Cl
000001b0: 6173 7369 6669 6572 3a20 456e 7669 726f assifier: Enviro
000001c0: 6e6d 656e 7420 3a3a 2043 6f6e 736f 6c65 nment :: Console
000001d0: 0a43 6c61 7373 6966 6965 723a 204c 6963 .Classifier: Lic
000001e0: 656e 7365 203a 3a20 5075 626c 6963 2044 ense :: Public D
000001f0: 6f6d 6169 6e0a 436c 6173 7369 6669 6572 omain.Classifier
00000200: 3a20 5072 6f67 7261 6d6d 696e 6720 4c61 : Programming La
00000210: 6e67 7561 6765 203a 3a20 5079 7468 6f6e nguage :: Python
00000220: 203a 3a20 322e 360a 436c 6173 7369 6669 :: 2.6.Classifi
00000230: 6572 3a20 5072 6f67 7261 6d6d 696e 6720 er: Programming
00000240: 4c61 6e67 7561 6765 203a 3a20 5079 7468 Language :: Pyth
00000250: 6f6e 203a 3a20 322e 370a 436c 6173 7369 on :: 2.7.Classi
00000260: 6669 6572 3a20 5072 6f67 7261 6d6d 696e fier: Programmin
00000270: 6720 4c61 6e67 7561 6765 203a 3a20 5079 g Language :: Py
00000280: 7468 6f6e 203a 3a20 330a 436c 6173 7369 thon :: 3.Classi
00000290: 6669 6572 3a20 5072 6f67 7261 6d6d 696e fier: Programmin
000002a0: 6720 4c61 6e67 7561 6765 203a 3a20 5079 g Language :: Py
000002b0: 7468 6f6e 203a 3a20 332e 320a 436c 6173 thon :: 3.2.Clas
000002c0: 7369 6669 6572 3a20 5072 6f67 7261 6d6d sifier: Programm
000002d0: 696e 6720 4c61 6e67 7561 6765 203a 3a20 ing Language ::
000002e0: 5079 7468 6f6e 203a 3a20 332e 330a 436c Python :: 3.3.Cl
000002f0: 6173 7369 6669 6572 3a20 5072 6f67 7261 assifier: Progra
00000300: 6d6d 696e 6720 4c61 6e67 7561 6765 203a mming Language :
00000310: 3a20 5079 7468 6f6e 203a 3a20 332e 340a : Python :: 3.4.

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Dec 31, 2015

What's the output if you remove all existing versions and build and run the latest version? And the content of /usr/lib/python2.7/dist-packages/youtube_dl-2015.12.29-py2.7.egg/EGG-INFO/PKG-INFO after you've built and installed the latest version.

@matiasw
Copy link
Author

@matiasw matiasw commented Jan 1, 2016

Sorry, but I am unable to build the latest version. I get the same error as above. Personally, I find this strange.

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Jan 1, 2016

Could you provide more details about "unable to build the latest version"? Including the commands you've tried to build and the error messages.

@matiasw
Copy link
Author

@matiasw matiasw commented Jan 1, 2016

./setup.py build
Traceback (most recent call last):
File "./setup.py", line 11, in
from setuptools import setup
File "/usr/lib/python2.7/dist-packages/setuptools/init.py", line 12, in
from setuptools.extension import Extension
File "/usr/lib/python2.7/dist-packages/setuptools/extension.py", line 8, in
from .dist import _get_unpatched
File "/usr/lib/python2.7/dist-packages/setuptools/dist.py", line 19, in
import pkg_resources
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 3138, in
@_call_aside
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 3124, in _call_aside
f(_args, *_kwargs)
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 3151, in _initialize_master_working_set
working_set = WorkingSet._build_master()
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 652, in _build_master
ws = cls()
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 645, in init
self.add_entry(entry)
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 701, in add_entry
for dist in find_distributions(entry, True):
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2139, in find_on_path
path_item, entry, metadata, precedence=DEVELOP_DIST
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2521, in from_location
py_version=py_version, platform=platform, **kw
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2835, in _reload_version
md_version = _version_from_file(self._get_metadata(self.PKG_INFO))
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2486, in _version_from_file
line = next(iter(version_lines), '')
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2654, in _get_metadata
for line in self.get_metadata_lines(name):
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2030, in get_metadata_lines
return yield_lines(self.get_metadata(name))
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2025, in get_metadata
metadata = f.read()
File "/usr/lib/python2.7/codecs.py", line 314, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb6 in position 147: invalid start byte

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Jan 1, 2016

In /usr/lib/python2.7/dist-packages/pkg_resources/__init__.py around line 2022:

    def get_metadata(self, name):
        if name=='PKG-INFO':
            with io.open(self.path, encoding='utf-8') as f:
                metadata = f.read()
            return metadata
        raise KeyError("No metadata except PKG-INFO is available")

Could you add a line print(self.path) before io.open(self.path, ...). Like this:

    def get_metadata(self, name):
        if name=='PKG-INFO':
            print(self.path)
            with io.open(self.path, encoding='utf-8') as f:
                metadata = f.read()
            return metadata
        raise KeyError("No metadata except PKG-INFO is available")

And run the build command again?
WARNING: Changing system files is risky. Please backup all files you're going to change.

@matiasw
Copy link
Author

@matiasw matiasw commented Jan 2, 2016

./setup.py build
Traceback (most recent call last):
File "./setup.py", line 11, in
from setuptools import setup
File "/usr/lib/python2.7/dist-packages/setuptools/init.py", line 12, in
from setuptools.extension import Extension
File "/usr/lib/python2.7/dist-packages/setuptools/extension.py", line 8, in
from .dist import _get_unpatched
File "/usr/lib/python2.7/dist-packages/setuptools/dist.py", line 19, in
import pkg_resources
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2004, in
class FileMetadata(EmptyProvider):
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2022, in FileMetadata
print(self.path)
NameError: name 'self' is not defined

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Jan 2, 2016

Could you paste contents around line 2000~2050 of the original /usr/lib/python2.7/dist-packages/pkg_resources/__init__.py?

@matiasw
Copy link
Author

@matiasw matiasw commented Jan 2, 2016

register_loader_type(zipimport.zipimporter, ZipProvider)

class FileMetadata(EmptyProvider):
"""Metadata handler for standalone PKG-INFO files

Usage::

    metadata = FileMetadata("/path/to/PKG-INFO")

This provider rejects all data and metadata requests except for PKG-INFO,
which is treated as existing, and will be the contents of the file at
the provided location.
"""

def __init__(self, path):
    self.path = path

def has_metadata(self, name):
    return name=='PKG-INFO' and os.path.isfile(self.path)

def get_metadata(self, name):
    if name=='PKG-INFO':
        with io.open(self.path, encoding='utf-8') as f:
            metadata = f.read()
        return metadata
    raise KeyError("No metadata except PKG-INFO is available")

def get_metadata_lines(self, name):
    return yield_lines(self.get_metadata(name))

class PathMetadata(DefaultProvider):
"""Metadata provider for egg directories

Usage::

    # Development eggs:

    egg_info = "/path/to/PackageName.egg-info"
    base_dir = os.path.dirname(egg_info)
    metadata = PathMetadata(base_dir, egg_info)
    dist_name = os.path.splitext(os.path.basename(egg_info))[0]
    dist = Distribution(basedir, project_name=dist_name, metadata=metadata)

    # Unpacked egg directories:

    egg_path = "/path/to/PackageName-ver-pyver-etc.egg"
    metadata = PathMetadata(egg_path, os.path.join(egg_path,'EGG-INFO'))
    dist = Distribution.from_filename(egg_path, metadata=metadata)
"""
@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Jan 2, 2016

Seems you've added it to the wrong line. Originally it's

def get_metadata(self, name):
    if name=='PKG-INFO':
        with io.open(self.path, encoding='utf-8') as f:
            metadata = f.read()
        return metadata
    raise KeyError("No metadata except PKG-INFO is available")

Change it to:

def get_metadata(self, name):
    if name=='PKG-INFO':
        print(self.path)
        with io.open(self.path, encoding='utf-8') as f:
            metadata = f.read()
        return metadata
    raise KeyError("No metadata except PKG-INFO is available")

Note that the print line is exactly before with io.open(...

@matiasw
Copy link
Author

@matiasw matiasw commented Jan 3, 2016

Oops, sorry.
Here's the printout with that line:
./setup.py build
/usr/lib/python2.7/argparse.egg-info
/usr/lib/python2.7/wsgiref.egg-info
/usr/lib/python2.7/lib-dynload/Python-2.7.egg-info
/usr/local/lib/python2.7/dist-packages/youtube_upload-0.8.0.egg-info
/usr/local/lib/python2.7/dist-packages/escpos-1.0.7.egg-info
/usr/local/lib/python2.7/dist-packages/calendar_indicator-0.3.1.egg-info
/usr/lib/python2.7/dist-packages/pyxdg-0.25.egg-info
/usr/lib/python2.7/dist-packages/yum_metadata_parser-1.1.4.egg-info
/usr/lib/python2.7/dist-packages/mercurial-3.5.2.egg-info
/usr/lib/python2.7/dist-packages/vboxapi-1.0.egg-info
/usr/lib/python2.7/dist-packages/wxPython_common-3.0.2.0.egg-info
/usr/lib/python2.7/dist-packages/bzr-2.7.0dev1.egg-info
/usr/lib/python2.7/dist-packages/pycurl-7.19.5.3.egg-info
/usr/lib/python2.7/dist-packages/lhafile-0.1.0fs4.egg-info
/usr/lib/python2.7/dist-packages/arandr-0.1.8.egg-info
/usr/lib/python2.7/dist-packages/rpm_python-4.12.0.1.egg-info
/usr/lib/python2.7/dist-packages/python_debianbts-2.6.0.egg-info
/usr/lib/python2.7/dist-packages/qbzr-0.23.1.egg-info
/usr/lib/python2.7/dist-packages/SecretStorage-2.1.2.egg-info
/usr/lib/python2.7/dist-packages/ecdsa-0.13.egg-info
/usr/lib/python2.7/dist-packages/docutils-0.12.egg-info
/usr/lib/python2.7/dist-packages/simplejson-3.7.3.egg-info
/usr/lib/python2.7/dist-packages/python_apt-1.1.0.b1.egg-info
/usr/lib/python2.7/dist-packages/scapy-2.2.0.egg-info
/usr/lib/python2.7/dist-packages/httplib2-0.9.1.egg-info
/usr/lib/python2.7/dist-packages/BzrTools-2.6.0.egg-info
/usr/lib/python2.7/dist-packages/gdata-2.0.18.egg-info
/usr/lib/python2.7/dist-packages/zenmap-7.00.egg-info
/usr/lib/python2.7/dist-packages/pycrypto-2.6.1.egg-info
/usr/lib/python2.7/dist-packages/pygobject-3.18.2.egg-info
/usr/lib/python2.7/dist-packages/pysqlite-1.0.1.egg-info
/usr/lib/python2.7/dist-packages/pyrit-0.4.0.egg-info
/usr/lib/python2.7/dist-packages/python_xlib-0.14.egg-info
/usr/lib/python2.7/dist-packages/numpy-1.9.2.egg-info
/usr/lib/python2.7/dist-packages/apt_xapian_index-0.47.egg-info
/usr/lib/python2.7/dist-packages/pygpgme-0.3.egg-info
/usr/lib/python2.7/dist-packages/pygame-1.9.1release.egg-info
/usr/lib/python2.7/dist-packages/bzr_builddeb-2.8.6.egg-info
/usr/lib/python2.7/dist-packages/roman-2.0.0.egg-info
/usr/lib/pymodules/python2.7/rpl-1.5.5.egg-info
Traceback (most recent call last):
File "./setup.py", line 11, in
from setuptools import setup
File "/usr/lib/python2.7/dist-packages/setuptools/init.py", line 12, in
from setuptools.extension import Extension
File "/usr/lib/python2.7/dist-packages/setuptools/extension.py", line 8, in
from .dist import _get_unpatched
File "/usr/lib/python2.7/dist-packages/setuptools/dist.py", line 19, in
import pkg_resources
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 3139, in
@_call_aside
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 3125, in _call_aside
f(_args, *_kwargs)
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 3152, in _initialize_master_working_set
working_set = WorkingSet._build_master()
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 652, in _build_master
ws = cls()
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 645, in init
self.add_entry(entry)
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 701, in add_entry
for dist in find_distributions(entry, True):
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2140, in find_on_path
path_item, entry, metadata, precedence=DEVELOP_DIST
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2522, in from_location
py_version=py_version, platform=platform, **kw
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2836, in _reload_version
md_version = _version_from_file(self._get_metadata(self.PKG_INFO))
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2487, in _version_from_file
line = next(iter(version_lines), '')
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2655, in _get_metadata
for line in self.get_metadata_lines(name):
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2031, in get_metadata_lines
return yield_lines(self.get_metadata(name))
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2026, in get_metadata
metadata = f.read()
File "/usr/lib/python2.7/codecs.py", line 314, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb6 in position 147: invalid start byte

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Jan 3, 2016

Thanks for helping debugging. The problem is /usr/lib/pymodules/python2.7/rpl-1.5.5.egg-info but not youtube-dl. As I can see from http://sourceforge.net/projects/rpl/files/rpl/rpl-1.5.5/, the author of rpl uses non-UTF8 characters in its setup.py. You may want to uninstall it and contact its author.
Also, don't forget to restore the original /usr/lib/python2.7/dist-packages/pkg_resources/__init__.py.

@yan12125 yan12125 closed this Jan 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.