New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blacklist filename chars #2077

Merged
merged 6 commits into from Oct 18, 2017

Conversation

Projects
None yet
5 participants
@AI0867
Member

AI0867 commented Oct 6, 2017

This PR (alternative to #2070) extends and expands the blacklist with the parts we could mostly agree on.

@jyrkive

This comment has been minimized.

Show comment
Hide comment
@jyrkive

jyrkive Oct 6, 2017

Member

This pull request looks fine to me. (I'd also disallow astral Unicode planes, and AFAIK no one finds them important to keep.)

Member

jyrkive commented Oct 6, 2017

This pull request looks fine to me. (I'd also disallow astral Unicode planes, and AFAIK no one finds them important to keep.)

return true;
const ucs4::string name_ucs4 = unicode_cast<ucs4::string>(name);
const std::string name_utf8 = unicode_cast<utf8::string>(name_ucs4);
if(name != name_utf8){ // name is invalid UTF-8

This comment has been minimized.

@gfgtdf

gfgtdf Oct 6, 2017

Contributor

did you test that this check works? I'd think that unicode_cast simply throws an exception in case of invalid utf8

@gfgtdf

gfgtdf Oct 6, 2017

Contributor

did you test that this check works? I'd think that unicode_cast simply throws an exception in case of invalid utf8

This comment has been minimized.

@AI0867

AI0867 Oct 6, 2017

Member

See unicode_cast.hpp. The cast swallows the exception and returns the part of the string that it was able to convert.

@AI0867

AI0867 Oct 6, 2017

Member

See unicode_cast.hpp. The cast swallows the exception and returns the part of the string that it was able to convert.

This comment has been minimized.

@AI0867

AI0867 Oct 16, 2017

Member

I tested this yesterday. It works, but a different check failed in archive_addon. Specifically filesystem::looks_like_pbl failed because it uses utf8::lowercase. I've added a try-catch block for this.

@AI0867

AI0867 Oct 16, 2017

Member

I tested this yesterday. It works, but a different check failed in archive_addon. Specifically filesystem::looks_like_pbl failed because it uses utf8::lowercase. I've added a try-catch block for this.

@CelticMinstrel

This comment has been minimized.

Show comment
Hide comment
@CelticMinstrel

CelticMinstrel Oct 7, 2017

Member

This looks quite reasonable to me. Was space intentionally re-allowed? Not that I mind; I support including space in the list of allowed characters. Just wondering if it was intended.

Member

CelticMinstrel commented Oct 7, 2017

This looks quite reasonable to me. Was space intentionally re-allowed? Not that I mind; I support including space in the list of allowed characters. Just wondering if it was intended.

@AI0867

This comment has been minimized.

Show comment
Hide comment
@AI0867

AI0867 Oct 7, 2017

Member

Look at the first case in the ucs4 blacklist.

Member

AI0867 commented Oct 7, 2017

Look at the first case in the ucs4 blacklist.

@CelticMinstrel

This comment has been minimized.

Show comment
Hide comment
@CelticMinstrel

CelticMinstrel Oct 7, 2017

Member

Oh, you're right. I missed that and assumed that < 0x20 (rather than <= 0x20) meant it hadn't been accounted for.

Member

CelticMinstrel commented Oct 7, 2017

Oh, you're right. I missed that and assumed that < 0x20 (rather than <= 0x20) meant it hadn't been accounted for.

@CelticMinstrel

This comment has been minimized.

Show comment
Hide comment
@CelticMinstrel

CelticMinstrel Oct 7, 2017

Member

Oh right, what does the check for surrogates actually do? Does it forbid Unicode, malformed UTF-8, or some subset of Unicode?

Member

CelticMinstrel commented Oct 7, 2017

Oh right, what does the check for surrogates actually do? Does it forbid Unicode, malformed UTF-8, or some subset of Unicode?

@AI0867

This comment has been minimized.

Show comment
Hide comment
@AI0867

AI0867 Oct 7, 2017

Member

They are codepoints in UCS-2 that are reserved for making surrogate pairs to encode the astral planes in UTF-16. They should not appear in UTF-8, though some implementations allow them as a way of reversibly encoding broken UTF-16 (or any even-length bytestream at all).

So the check forbids malformed UTF-8 that might be interpreted as an alternative encoding of UTF-16.

Member

AI0867 commented Oct 7, 2017

They are codepoints in UCS-2 that are reserved for making surrogate pairs to encode the astral planes in UTF-16. They should not appear in UTF-8, though some implementations allow them as a way of reversibly encoding broken UTF-16 (or any even-length bytestream at all).

So the check forbids malformed UTF-8 that might be interpreted as an alternative encoding of UTF-16.

@AI0867

This comment has been minimized.

Show comment
Hide comment
@AI0867

AI0867 Oct 17, 2017

Member

While there is a lot of discussion going on at #2070, I don't think this PR is particularly controversial. Is there anyone opposed to merging it?

Member

AI0867 commented Oct 17, 2017

While there is a lot of discussion going on at #2070, I don't think this PR is particularly controversial. Is there anyone opposed to merging it?

@Arcanister

This comment has been minimized.

Show comment
Hide comment
@Arcanister

Arcanister Oct 18, 2017

Contributor

If this is merged, #2070 will get a conflict to resolve.
I do not like the idea of limiting Unicode to BMP if Unicode is allowed at all.

Messages in this patch request are confusing. I think it is better when rules for allowed characters are short and concise, such when all allowed characters can be listed.

Maybe merge #2070 first? And wait until proper handling of entire Unicode range on all operating system is implemented, if there is any need to support not universally readable filenames at all.

So far, all uses of Unicode in file names were accidental, as far as I can see.

Contributor

Arcanister commented Oct 18, 2017

If this is merged, #2070 will get a conflict to resolve.
I do not like the idea of limiting Unicode to BMP if Unicode is allowed at all.

Messages in this patch request are confusing. I think it is better when rules for allowed characters are short and concise, such when all allowed characters can be listed.

Maybe merge #2070 first? And wait until proper handling of entire Unicode range on all operating system is implemented, if there is any need to support not universally readable filenames at all.

So far, all uses of Unicode in file names were accidental, as far as I can see.

@AI0867

This comment has been minimized.

Show comment
Hide comment
@AI0867

AI0867 Oct 18, 2017

Member

If this is merged, #2070 will get a conflict to resolve.

I can fix the conflicts, if necessary.

I do not like the idea of limiting Unicode to BMP if Unicode is allowed at all.

This PR does not do so.

Messages in this patch request are confusing. I think it is better when rules for allowed characters are short and concise, such when all allowed characters can be listed.

This PR will use the shadowm's badlist implementation to show the list of files violating the rules. The only exception to this is invalid UTF-8, which would previously also fail, but with a less informative error message.

Maybe merge #2070 first? And wait until proper handling of entire Unicode range on all operating system is implemented, if there is any need to support not universally readable filenames at all.

This PR simply extends the blacklist a bit. #2070 changes the entire system. I don't understand your rationale here.

So far, all uses of Unicode in file names were accidental, as far as I can see.

No. There are various cases including Cyrillic and accented latin.

Member

AI0867 commented Oct 18, 2017

If this is merged, #2070 will get a conflict to resolve.

I can fix the conflicts, if necessary.

I do not like the idea of limiting Unicode to BMP if Unicode is allowed at all.

This PR does not do so.

Messages in this patch request are confusing. I think it is better when rules for allowed characters are short and concise, such when all allowed characters can be listed.

This PR will use the shadowm's badlist implementation to show the list of files violating the rules. The only exception to this is invalid UTF-8, which would previously also fail, but with a less informative error message.

Maybe merge #2070 first? And wait until proper handling of entire Unicode range on all operating system is implemented, if there is any need to support not universally readable filenames at all.

This PR simply extends the blacklist a bit. #2070 changes the entire system. I don't understand your rationale here.

So far, all uses of Unicode in file names were accidental, as far as I can see.

No. There are various cases including Cyrillic and accented latin.

@AI0867 AI0867 merged commit ff3855a into wesnoth:master Oct 18, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

AI0867 added a commit that referenced this pull request Oct 18, 2017

@AI0867 AI0867 deleted the AI0867:blacklist-filename-chars branch Oct 18, 2017

fendrin pushed a commit to fendrin/wesnoth-src that referenced this pull request Nov 2, 2017

fendrin pushed a commit to fendrin/wesnoth-src that referenced this pull request Nov 2, 2017

fendrin pushed a commit to fendrin/wesnoth-src that referenced this pull request Nov 2, 2017

Merge branch 'blacklist-filename-chars' (PR #2077)
Former-commit-id: 1a7d27a7d2a59d0ce9a2a08e2db51e3b3a5e871c

fendrin pushed a commit to fendrin/wesnoth-src that referenced this pull request Nov 2, 2017

Merge branch 'blacklist-filename-chars' (PR #2077)
Former-commit-id: 1a7d27a7d2a59d0ce9a2a08e2db51e3b3a5e871c
Former-commit-id: fec4c6fda8f3f94149a48064d2723e42250108f8

fendrin pushed a commit to fendrin/wesnoth-src that referenced this pull request Nov 2, 2017

Merge branch 'blacklist-filename-chars' (PR #2077)
Former-commit-id: 1a7d27a7d2a59d0ce9a2a08e2db51e3b3a5e871c
Former-commit-id: fec4c6fda8f3f94149a48064d2723e42250108f8
Former-commit-id: b97ceb9c29698cd698cd4cd5005a0088a796f13d

fendrin pushed a commit to fendrin/wesnoth-src that referenced this pull request Nov 2, 2017

Merge branch 'blacklist-filename-chars' (PR #2077)
Former-commit-id: 1a7d27a7d2a59d0ce9a2a08e2db51e3b3a5e871c
Former-commit-id: fec4c6fda8f3f94149a48064d2723e42250108f8
Former-commit-id: b97ceb9c29698cd698cd4cd5005a0088a796f13d

fendrin pushed a commit to fendrin/wesnoth-src that referenced this pull request Nov 2, 2017

Merge branch 'blacklist-filename-chars' (PR #2077)
Former-commit-id: 1a7d27a7d2a59d0ce9a2a08e2db51e3b3a5e871c
Former-commit-id: fec4c6fda8f3f94149a48064d2723e42250108f8
Former-commit-id: b97ceb9c29698cd698cd4cd5005a0088a796f13d

fendrin pushed a commit to fendrin/wesnoth-src that referenced this pull request Nov 2, 2017

Merge branch 'blacklist-filename-chars' (PR #2077)
Former-commit-id: 1a7d27a7d2a59d0ce9a2a08e2db51e3b3a5e871c
Former-commit-id: fec4c6fda8f3f94149a48064d2723e42250108f8
Former-commit-id: b97ceb9c29698cd698cd4cd5005a0088a796f13d

fendrin pushed a commit to fendrin/wesnoth-src that referenced this pull request Nov 2, 2017

Merge branch 'blacklist-filename-chars' (PR #2077)
Former-commit-id: 1a7d27a7d2a59d0ce9a2a08e2db51e3b3a5e871c
Former-commit-id: fec4c6fda8f3f94149a48064d2723e42250108f8
Former-commit-id: b97ceb9c29698cd698cd4cd5005a0088a796f13d

fendrin pushed a commit to fendrin/wesnoth-src that referenced this pull request Nov 6, 2017

Merge branch 'blacklist-filename-chars' (PR #2077)
Former-commit-id: 1a7d27a7d2a59d0ce9a2a08e2db51e3b3a5e871c
Former-commit-id: fec4c6fda8f3f94149a48064d2723e42250108f8
Former-commit-id: b97ceb9c29698cd698cd4cd5005a0088a796f13d

fendrin pushed a commit to fendrin/wesnoth-src that referenced this pull request Nov 6, 2017

Merge branch 'blacklist-filename-chars' (PR #2077)
Former-commit-id: 1a7d27a7d2a59d0ce9a2a08e2db51e3b3a5e871c
Former-commit-id: fec4c6fda8f3f94149a48064d2723e42250108f8
Former-commit-id: b97ceb9c29698cd698cd4cd5005a0088a796f13d
Former-commit-id: fcb25e02c59e94bc5eb4b8aab867bf20aea4caeb [formerly 591aba69bee625ea915ec730e60cb6bfc467007e]
Former-commit-id: 3fb003ba0f0b7cac03607cd18c2fafc88ed0b54c

fendrin pushed a commit to fendrin/wesnoth-src that referenced this pull request Nov 6, 2017

Merge branch 'blacklist-filename-chars' (PR #2077)
Former-commit-id: 1a7d27a7d2a59d0ce9a2a08e2db51e3b3a5e871c
Former-commit-id: fec4c6fda8f3f94149a48064d2723e42250108f8
Former-commit-id: b97ceb9c29698cd698cd4cd5005a0088a796f13d
Former-commit-id: fcb25e02c59e94bc5eb4b8aab867bf20aea4caeb [formerly 591aba69bee625ea915ec730e60cb6bfc467007e]
Former-commit-id: 3fb003ba0f0b7cac03607cd18c2fafc88ed0b54c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment