-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full Unicode support on Windows #16
Comments
Please, could you elaborate more about the problem? I have no windows machine to test in and i have no clue about either what your expected behaviour and the actual behaviour are. Just to clarify, the policy here is:
|
Both the problem and the solution are described in detail in https://github.com/bastibe/PySoundFile/pull/129. If you provide bytes, everything is OK (It's the user's problem to get the encoding right). |
I take @batisbe's rant: Why Windows!! :-) There is a problem here, you are right. But the test code you point to misses to highlight it because it is just using So for further discussion it would be very helpful to have:
And remember:
That said, let's take a look to the actual problem: Why the test case you describe should work? If the locale codepage is compatible with the unicode chars you are using on the filename, When would we get the actual error? When we are trying to use a filename which is not representable with the locale codepage. Just because the system itself is able to do so but we are using a middle man, the codepage, that makes it fail. And yes in order to be able to do that we should use the wchar version. A test case would be a name taking chars from different codepages: cyrilic, korean... so for sure it will fail whichever your codepage. Let me write a test using just korean that should not fail in your computer and a test that should fail in every computer. If we got that, lets do the fix. |
I just commited two unit tests that should break in windows, i would like to have a red before proceeding with the fix, since i can't test it myself:
Could you run them in a Korean windows and paste the output so that we can confirm we have a failing case? Would be nice having it run in both Py2 and Py3. And give, please the information I asked above (windows version, locale codepage) |
I'm normally testing this stuff on virtualbox using an image from http://modern.ie. The Korean computer I was talking about was from a co-worker of mine, I currently don't have access to it. I tried the file names from your new tests, and they all work in PySoundFile (with the above-mentioned virtualbox image). The characters couldn't be displayed in the command prompt and in the IPython terminal either. But it worked when pasting the names into a But I'm quite confident that if you use my solution it will just work. I'm not sure if your tests are sensible, since you are using your library for both writing and reading. If there is an error in the name conversion, but the same error happens in both writing and reading, you'll not be able to detect it. This makes this stuff really unpleasant to test ... BTW, the file name in your tests ( |
It's important how you encode the file, but it is also important how do you declare the data types and how do you use encoding and decoding afterwards. Please could you run the tests, as is, in a Windows machine so that we could have a failing test? I am preparing the commit with your solution but i would like to have a failing test first. If symbols are Chinese then both cases should fail on a non Chinese windows computer. And then it is normal it failed in a Korean locale, because it was transcoded from unicode to unicode through the Korean encoding. For the tests i am relying on Python standard library Anyway, that's why i want a failing test, so that we can show up the failing case. Could you run it? I just updated the test to have one with just korean and the other with both Chinese and Russian. |
As I said, I don't have time for this, sorry. I missed the |
As i cannot test the code my self i publish the fix proposal in the unicode_for_windows branch. I will appreciate anyone interested that could help me testing the fix (and the tests making them fail without the fix). |
The new way libsndfile handles filenames in windows using utf8 should solve this and other issues. Or will pop others. Any way this concrete issue is no more relevant. File new ones with the new libsndfile behaviour if you hit them. https://github.com/libsndfile/libsndfile/blob/master/CHANGELOG.md#changed |
Thanks for the idea with
sys.getfilesystemencoding()
, I stole it from you in https://github.com/bastibe/PySoundFile/pull/119!However, this doesn't seem to be enough to support all Unicode filenames on Windows, there are Unicode code points that cannot be encoded using the
mbcs
encoding, see https://github.com/bastibe/PySoundFile/pull/129.I think this issue would be a good benchmark: libsndfile/libsndfile#74
The text was updated successfully, but these errors were encountered: