Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Blacklist filename chars #2077
They are codepoints in UCS-2 that are reserved for making surrogate pairs to encode the astral planes in UTF-16. They should not appear in UTF-8, though some implementations allow them as a way of reversibly encoding broken UTF-16 (or any even-length bytestream at all).
So the check forbids malformed UTF-8 that might be interpreted as an alternative encoding of UTF-16.
referenced this pull request
Oct 16, 2017
If this is merged, #2070 will get a conflict to resolve.
Messages in this patch request are confusing. I think it is better when rules for allowed characters are short and concise, such when all allowed characters can be listed.
Maybe merge #2070 first? And wait until proper handling of entire Unicode range on all operating system is implemented, if there is any need to support not universally readable filenames at all.
So far, all uses of Unicode in file names were accidental, as far as I can see.
I can fix the conflicts, if necessary.
This PR does not do so.
This PR will use the shadowm's badlist implementation to show the list of files violating the rules. The only exception to this is invalid UTF-8, which would previously also fail, but with a less informative error message.
This PR simply extends the blacklist a bit. #2070 changes the entire system. I don't understand your rationale here.
No. There are various cases including Cyrillic and accented latin.