canonical caseless match? #30

rowlesmr · 2023-07-14T09:52:43Z

rowlesmr
Jul 14, 2023

In your library, you have una::caseless::compare_utf8, which is described as "case insensitive".

Is this the same as a canonical caseless match?

I'm writing a program, and I need to handle Unicode, which I currently do transparently as a sequence of bytes in a std::string, but now I actually need to do some comparisons, and your library stands out to me as a lightweight (header-only!) solution.

Answered by mg152

Jul 14, 2023

No, it's not the same. The Unicode Standard describes canonical caseless match like this:

A string X is a canonical caseless match for a string Y if and only if:
NFD(toCasefold(NFD(X))) = NFD(toCasefold(NFD(Y)))

So it will look like this with the library:

bool canonical_caseless_match(std::string_view str1, std::string_view str2)
{
    return una::norm::to_nfd_utf8(una::cases::to_casefold_utf8(una::norm::to_nfd_utf8(str1))) ==
           una::norm::to_nfd_utf8(una::cases::to_casefold_utf8(una::norm::to_nfd_utf8(str2)));
}

Of course the performance of this won't be that great. Technically it's possible to implement it better with views but the problem right now the library doesn't have c…

View full answer

mg152 · 2023-07-14T19:16:29Z

mg152
Jul 14, 2023
Maintainer

No, it's not the same. The Unicode Standard describes canonical caseless match like this:

A string X is a canonical caseless match for a string Y if and only if:
NFD(toCasefold(NFD(X))) = NFD(toCasefold(NFD(Y)))

So it will look like this with the library:

bool canonical_caseless_match(std::string_view str1, std::string_view str2)
{
    return una::norm::to_nfd_utf8(una::cases::to_casefold_utf8(una::norm::to_nfd_utf8(str1))) ==
           una::norm::to_nfd_utf8(una::cases::to_casefold_utf8(una::norm::to_nfd_utf8(str2)));
}

Of course the performance of this won't be that great. Technically it's possible to implement it better with views but the problem right now the library doesn't have case folding view (it's planned). So you need to implement it manually with una::codepoint::to_casefold_u32 function from uni_algo/prop.h.
If you want to try it then as a starting point you can use canonical_equivalence function in example/cpp_ranges.h. But it won't be easy for sure, you probably need to implement you own case folding view first because the algorithm requires case folding to be between 2 NFDs.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

canonical caseless match? #30

{{title}}

Replies: 1 comment

{{title}}

Select a reply

canonical caseless match? #30

rowlesmr Jul 14, 2023

Replies: 1 comment

mg152 Jul 14, 2023 Maintainer

rowlesmr
Jul 14, 2023

mg152
Jul 14, 2023
Maintainer