Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
not all String values round-trip through #540
I used persistent-sqlite with a table that contains a FilePath. After storing "test_öüä" in the database, I retrieved it back out, and got back "test_������".
This only occurred when I was not using a utf-8 capable locale (ie, LANG=C)
String can represent a filepath that may be encoded using any encoding, not just the current system encoding. This is handled by using utf surrogate characters. These surrogates are what don't round-trip through persistent.
I suspect that it may come down to PersistField being implemented in terms of Text. See the "Acceptable data" in Data.Text's haddock. Since Text cannot represent unicode surrogates, packing the String to Text loses them.
This is likely to mostly impact programs that store FilePaths in a database. And it's easy to miss that such a program has a bug, because it will mostly only happen when using a non-unicode locale, or perhaps when dealing with strange filenames that are not encoded with utf-8.
I don't know if this can be fixed as long as PersistField is using Text internally. If it were using only ByteString, it could probably be made to roundtrip all Strings through it. But, I have not checked what happens when PersistField operates on a PersistByteString.
I worked around this in git-annex with a newtype with its own PersistField implementation. The simplest approach is to show the String, which encodes the surrogate characters as \nnnn. But that is not backwards compatable with existing data in the database. So, I
I doubt there's anything reasonable we can do here. There's certainly a
On Mon, Feb 15, 2016, 12:01 AM Joey Hess email@example.com wrote: