New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryFSHandler.AddFile(textdata) fails silently if textdata contains unicode characters (python 2) #969

Closed
pieleric opened this Issue Aug 22, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@pieleric
Copy link
Contributor

pieleric commented Aug 22, 2018

Operating system: Linux (Ubuntu 16.04)
wxPython version & source: (pypi, self-built, etc.) 4.0.3 (from pypi)
Python version & source: (stock, anaconda, EDM, distro, self-built, etc.) Python 2.7.11 (from distro)

Description of the problem:
I've encountered an issue while trying to migrate code from wxPython 3 to wxPython 4 using "embedded" xrc files (ie, using pywxrc). My XRC file contains unicode characters. It used to work fine, but not anymore.

When calling wx.MemoryFSHandler.AddFile(string, textdata) with textdata contains non-latin1 characters (aka unicode characters), it pretends to go fine, but when attempting to read the content of the file, the file is empty. It used to work with wxPython 3. Note also that this is happening with Python2. (I couldn't try with Python3, but I don't expect the same error).

I'm not sure what the behaviour should be. With wxPython3, it used to basically serialize the string, and reading the "file" back would return an identical object, which seems to me a nice behaviour. At least, if it fails, it should raise an Exception, and not fail silently.

Note that a workaround is to explicitly encode the unicode string and convert it to a bytearray.

See the code below for an example of the behaviour with wxPython 4 (compared to wxPython 3).

# -*- coding: UTF-8 -*-
import wx

fs = wx.FileSystem()
wx.FileSystem.AddHandler(wx.MemoryFSHandler())

print(wx.__version__)

wx.MemoryFSHandler.AddFile("down", "test ↓")
f = fs.OpenFile('memory:down')
print(repr(f.Stream.readline()))
#''
#wx3: 'test \xe2\x86\x93'

wx.MemoryFSHandler.AddFile("d", u"test É")
f = fs.OpenFile('memory:d')
print(repr(f.Stream.readline()))
#'test \xc9'
#wx3: 't\x00\x00\x00e\x00\x00\x00s\x00\x00\x00t\x00\x00\x00 \x00\x00\x00\xc9\x00\x00\x00'

wx.MemoryFSHandler.AddFile("downu", u"test ↓")
f = fs.OpenFile('memory:downu')
print(repr(f.Stream.readline()))
#''
#wx3: 't\x00\x00\x00e\x00\x00\x00s\x00\x00\x00t\x00\x00\x00 \x00\x00\x00\x93!\x00\x00'

wx.MemoryFSHandler.AddFile("b", bytearray(u"test ↓".encode("utf-8")))
f = fs.OpenFile('memory:b')
print(repr(f.Stream.readline()))
#'test \xe2\x86\x93'
#wx3: 'test \xe2\x86\x93'

wx.MemoryFSHandler.AddFile("i", 'test \xe2')
f = fs.OpenFile('memory:i')
print(repr(f.Stream.readline()))
# UnicodeDecodeError: 'utf8' codec can't decode byte 0xe2 in position 5: unexpected end of data
#wx3: 'test \xe2'
@RobinD42

This comment has been minimized.

Copy link
Member

RobinD42 commented Oct 16, 2018

A few things have happened that contributed to some confusion here, such as string constants are now unicode in Python3 instead of 8-bit, wxPython4 is autoconverting all Py2 str/bytes parameters to wxString using only utf-8, and that many things in the wxPython4 wrappers (including AddFile) are now using an intermediate type wxPyBuffer for converting from Python objects compatible with the new buffer protocol.

Since typical file-like objects in Python3 should really be reading and writing bytes to be consistent with other file objects, I've made some changes in wx.MemoryFSHandler to work better with the automatic conversions and ensure that every text item that goes into a virtual file is actually bytes data (or str in Py2). Although it's still a little different from Classic wxPython I think that this change will make it more consistent between Py2 and Py3 and also will more closely align with how you read/write from real Python file objects. (Other than following wxPython's policy of always using utf-8 when converting from unicode, but I think that's an acceptable limitation. And there's a way around that, at least with Py3, by using a bytearray.)

With this change, I get the following from a slightly tweaked version of your sample code:

Python: 3.7.0, wxPython: 4.0.4a1
b'test no unicode'
b'test \xe2\x86\x93'
b'test \xc3\x89'
b'test \xe2\x86\x93'
b'test \xe2\x86\x93'
b'test \xc3\xa2'
Python: 2.7.15, wxPython: 4.0.4a1
'test no unicode'
'test \xe2\x86\x93'
'test \xc3\x89'
'test \xe2\x86\x93'
'test \xe2\x86\x93'
Traceback (most recent call last):
  File "tmp/test_memfile.py", line 38, in <module>
    wx.MemoryFSHandler.AddFile("i", 'test \xe2')
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe2 in position 5: unexpected end of data

The last error is due to trying to autoconvert from a Py2 str to a unicode wxString, and doesn't have anything to do with the wx.MemoryFSHandler code.

@RobinD42

This comment has been minimized.

Copy link
Member

RobinD42 commented Oct 24, 2018

Fixed in #1039

@RobinD42 RobinD42 closed this Oct 24, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment