-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected UnicodeDecodeError
while reading an OSX file comment.
#109
Comments
You can’t decode that as UTF-8 because it isn’t. That is a binary property list (bplist). You’ll need to use another library to decode it, such as https://docs.python.org/3/library/plistlib.html |
UPDATE: comment_str = comment.decode("unicode_escape") I think that this attempt should be done inside the |
@etrepum you should not close this issue since it can be solved directly inside this library. |
I think you might change your mind about that if you had more experience working with binary formats and text encodings. |
UPDATE: pip install bplist comment = xattr.getxattr(
filepath, "com.apple.metadata:kMDItemFinderComment"
)
comment_str = BPListReader(comment).parse() Anyway... this could be implemented inside this library to help many people. |
The problem here is that the data in these attributes could be encoded in any possible way, there's no universal decoding strategy that could be used. For reading Apple-specific metadata it would be a good idea to have a separate library that uses xattr to abstract this, xattr is a very low-level library that is only concerned with the direct reading and writing of these attributes. |
I understand, thank you for the good explanation! |
@fabiocaccamo as @etrepum said, this is a low level library. For working with Mac metadata, I recommend you look at osxmetadata (disclaimer: I'm the author) which provides direct access to all macOS metadata indexed by Spotlight as well as many other attributes. It does use xattr under the hood for some functions. For example, to read Finder comments: import osxmetadata
md = osxmetadata.OSXMetaData(filepath)
comment = md.findercomment
# also
comment = md.kMDItemFinderComment
# also, something you cannot do via setting the xattr:
md.findercomment = "My new comment" You can also directly access the extended attributes (but be aware that the extended attribute is not the source of truth for macOS Spotlight metadata and changing it won't necessarily update the Spotlight database) >>> from osxmetadata import *
>>> import plistlib
>>> from plistlib import FMT_BINARY
>>> from functools import partial
>>> md = OSXMetaData("test_file.txt")
>>> md.kMDItemWhereFroms = ["apple.com"]
>>> md.kMDItemWhereFroms
['apple.com']
>>> decode = partial(plistlib.loads, fmt=FMT_BINARY)
>>> encode = partial(plistlib.dumps, fmt=FMT_BINARY)
>>> md.get_xattr("com.apple.metadata:kMDItemWhereFroms")
b'bplist00\xa1\x01Yapple.com\x08\n\x00\x00\x00\x00\x00\x00\x01\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x14'
>>> md.get_xattr("com.apple.metadata:kMDItemWhereFroms", decode=decode)
['apple.com']
>>> md.set_xattr("com.apple.metadata:kMDItemWhereFroms", ["google.com"], encode=encode)
>>> md.get_xattr("com.apple.metadata:kMDItemWhereFroms", decode=decode)
['google.com']
>>> md.remove_xattr("com.apple.metadata:kMDItemWhereFroms")
>>> @etrepum Thank you for your work on xattr by the way! It has been extremely useful for me. |
@RhetTbull thank you very much for pointing me out to |
@RhetTbull if you'd like to plug osxmetadata in the xattr README.md I'd be happy to merge it! Making it easier to discover your library seems like a win for everyone. |
Hi,
I'm experiencing an unexpected
UnicodeDecodeError
while reading an OSX file comment:Output:
b'bplist00o\x10\x1f\x001\x000\x00-\x002\x00:\x00 \x00\xa9\x00 \x00d\x00e\x00l\x00p\x00i\x00e\x00r\x00o\x00o\x00/\x00D\x00e\x00p\x00o\x00s\x00i\x00t\x00p\x00h\x00o\x00t\x00o\x00s\x08\x00\x00\x00\x00\x00\x00\x01\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00I'
Then I try do decode the output:
But the following error is raised:
The text was updated successfully, but these errors were encountered: