Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove global locale #3611

Open
idotobi opened this issue Nov 23, 2021 · 12 comments
Open

Remove global locale #3611

idotobi opened this issue Nov 23, 2021 · 12 comments
Assignees

Comments

@idotobi
Copy link
Contributor

idotobi commented Nov 23, 2021

Is your feature request related to a problem? Please describe.

Setting a global cpp locale caused some problems in the past (see #3606) and required some fixes (see #3610). However, these rather cured the symptom than the cause.

Describe the solution you'd like

A cleaner way would be to remove the global locale, by adapting

void initLocalisation() {
#ifdef ENABLE_NLS
fs::path localeDir = Util::getGettextFilepath(Util::getLocalePath().u8string().c_str());
bindtextdomain(GETTEXT_PACKAGE, localeDir.u8string().c_str());
textdomain(GETTEXT_PACKAGE);
#ifdef _WIN32
bind_textdomain_codeset(GETTEXT_PACKAGE, "UTF-8");
#endif
#endif // ENABLE_NLS
// Not working on GNU g++(mingww) forWindows! Only working on Linux/macOS and with msvc
try {
std::locale::global(std::locale("")); // "" - system default locale
} catch (std::runtime_error& e) {
g_warning("XournalMain: System default locale could not be set.\n - Caused by: %s\n - Note that it is not "
"supported to set the locale using mingw-w64 on windows.\n - This could be solved by compiling "
"xournalpp with msvc",
e.what());
}
/**
* Force numbers to be printed out and parsed by C libraries (cairo) in the "classic" locale.
* This avoids issue with tags when exporting to PDF, see #3551
*/
setlocale(LC_NUMERIC, "C");
std::cout.imbue(std::locale());
}

Describe alternatives you've considered

  • adding imbue to a lot of places in the codebase and testing several locales whenever input/output is implemented

Note
This change should be followed by extensive testing, as it might have far reaching unintended side effects.

@Febbe
Copy link
Collaborator

Febbe commented Nov 24, 2021

The thing is, that we always want to have the C locale for serdes and always want to have the default locale for display.
When we touch this, we should create 2 template functions:

template<class Stream, class ...Args>
auto xoj::init_serdes_stream(Args... args) -> Stream{
    Stream stream{std::forward<Args>(args)...};
    stream.imbue(std::locale::classic());
    return stream;
}

template<class Stream, class ...Args>
auto xoj::init_display_stream(Args... args) -> Stream;

Regarding the global locale: I don't know if we want to set it to the C locale, since it could affect the locale of gtk.

@Febbe
Copy link
Collaborator

Febbe commented Nov 24, 2021

Regarding the global locale: I don't know if we want to set it to the C locale, since it could affect the locale of gtk.

Gtk itself sets the c-locale to the system's locale, so we should not override that after e.g. gtk_init. But setting the c++ locale to C/classic is ok. But this should still happen before the init of gtk, since c++ also sets the c locale.

@bhennion
Copy link
Contributor

The thing is, that we always want to have the C locale for serdes and always want to have the default locale for display. When we touch this, we should create 2 template functions:

template<class Stream, class ...Args>
auto xoj::init_serdes_stream(Args... args) -> Stream{
    Stream stream{std::forward<Args>(args)...};
    stream.imbue(std::locale::classic());
    return stream;
}

template<class Stream, class ...Args>
auto xoj::init_display_stream(Args... args) -> Stream;

Regarding the global locale: I don't know if we want to set it to the C locale, since it could affect the locale of gtk.

I'd do it the other way around: leave the default C++ locale to "C", and imbue the display streams with the system locale.
My reasonning is the following: if a SerDes stream is mistakenly not in the "C" locale, this leads to bugs that are quite time consuming to solve. On the other hand, if a display stream is not in the right locale, it create a very visible bug, that is easy to pinpoint and easy to fix.

As for GTK and the C locale, well since #3551, the C locale is forcefully reset to "C" (because cairo parses stuff with sscanf and this created a bug). Still, the app is in the right language (to be confirmed on other OS's). It seems to only affects the g_message's. The best way to deal with the localization of those messages would be to replace g_message with a cout-based function, and imbue cout with the right locale upon startup.

@idotobi
Copy link
Contributor Author

idotobi commented Nov 24, 2021

@bhennion : Sounds reasonable.

The only thing I'm wondering is if we really need g_message to be replaced by our own function (cout + imbue). What's the negative effect of using g_message regardless?

@Febbe
Copy link
Collaborator

Febbe commented Nov 24, 2021

The locale of debug output is not relevant. Mostly it is not even shown to the user...

But in my opinion, wrong coding should not be hidden. It should crash the app. So crashing is better than a locale shown wrongly (nearly no one complains about wrong locales).

@bhennion
Copy link
Contributor

But in my opinion, wrong coding should not be hidden. It should crash the app. So crashing is better than a locale shown wrongly (nearly no one complains about wrong locales).

I could agree on principle, but in practice this leads to us (and others) wasting a lot of time. The case of #3388 is a very good example. IMO, the amount of people who got involved and the time they spent is way to much a cost for sticking to the principle

@LittleHuba
Copy link
Member

I would argue that all debug output should be in C locale. They are meant for us devs and not for some user with a specific locale that prefers whitespace over comma or whatever.

@idotobi
Copy link
Contributor Author

idotobi commented Nov 27, 2021

Let me try to summarize the above points, to make sure I understand it right.

We have the following two options:

(A) Use "C" locale as C++ default locale and use a custom stream (with imbue) for display relevant stuff (including CLI messages)
(B) Use system locale as C++ default locale and use a custom stream (with imbue) for SerDes (serialize deserialize) stuff

In both cases it's pretty likely that our future selves or contributors will forget that there exists a custom stream that they should use for the respective use case.

Hence we have the following Project Risks:

(A) "C" locale as C++ default:

  • wrong display of locales
    • probability: high
    • severity: low (users complain or they don't even notice)
      • detectability: high (it's user facing)
      • revoverability: easy (NO persistent state changed)
    • => Risk: middle

(B) system locale as C++ default:

  • wrong SerDes behaviour
    • probability: high
    • severity: high (data corruption, crashes)
      • detectability: middle (in the "happy" case it leads to a crash, in the unhappy case to just strange behaviour that's hard to trace back and might corrupt data on the way)
      • recoverability: difficult (serialized data might be persistent)
    • => Risk: high

@idotobi
Copy link
Contributor Author

idotobi commented Nov 27, 2021

If the above is correct I would prefer (A).

@LittleHuba
Copy link
Member

LittleHuba commented Nov 27, 2021

That sums it up quite nicely. Definitely A for me.

I would also restrict translations to the UI and the console interface. Any kind of debug or error message to the console using g_messages should stay in English. Most users can't really decipher what they mean anyway and we can't translate them from an unknown language to English when they open a ticket.

That should cut down on the changes significantly as most g_messages would stay the same.

@bhennion
Copy link
Contributor

I'm for (A), and I agree the g_message outputs should stay in english. All in all, it just means that the global locales (both C and C++) would be set to classic "C" and we'd make a custom stream for GUI output purposes.

@Febbe
Copy link
Collaborator

Febbe commented Nov 27, 2021

Let me try to summarize the above points, to make sure I understand it right.

I have some objections; @bhennion summarized it very well, "leads to us (and others) wasting a lot of time", which basically means, the problem is laziness here.

We have the following two options:

(A) Use "C" locale as C++ default locale and use a custom stream (with imbue) for display relevant stuff (including CLI messages)
(B) Use system locale as C++ default locale and use a custom stream (with imbue) for SerDes (serialize deserialize) stuff
(C) Don't rely on global locales; Imbue every stream; fallback to (A), for forgotten streams.

In both cases it's pretty likely that our future selves or contributors will forget that there exists a custom stream that they should use for the respective use case.

Hence we have the following Project Risks:

(A) "C" locale as C++ default:

wrong display of locales
    probability: high
    severity: low (users complain or they don't even notice)
        detectability: **low** ("users don't even notice")
        recoverability: **hard** (NO persistent state changed, **but fixing my not be possible anymore**)
    => Risk: middle **(no serious damage, but also not fixable without increasing the risk for one)**

(B) system locale as C++ default:

wrong SerDes behavior
    probability: high
    severity: high (data corruption, crashes)
        detectability: **high** (in the "happy" case it leads to a crash, in the unhappy case to just strange behaviour that's hard to trace back and might corrupt data on the way;**but users will complain**)
        recoverability: difficult (serialized data might be persistent;**easy to fix itself, but may have big negative impact**)
    => Risk: high

(C) Don't rely on global locales, imbue every stream (maybe fallback to (A))

    wrong display of locales
    probability: low
    severity: low (users complain or they don't even notice)
        detectability: low ("users don't even notice")
        recoverability: easy (fallback behavior can be changed if required)
    => Risk: low


wrong SerDes behavior
    probability: low
    severity: high (data corruption, crashes)
        detectability: high
        recoverability: easy
    => Risk: low

=> High effort

Therefore, I propose to use (C) with the fallback behavior (A). The latter can be changed at least for the c++ locale but should not be relevant there.

The problem I see with A and B is, that we strongly rely on global static (thread unsafe) state, which can be changed in every linked library. This is extremely dangerous, nearly impossible to find, and also hard to be fixed. For C locales, this is hard to solve, since the locale apis differ between the three great OSes. But for C++ locales, we can easily use imbue for every stream.

For C locales, we should use a guard, setting the local thread locale (requires writing OS specific code), and resetting it to the previous after leaving the scope.
For C++: just use the std::locale class and imbue.

Some References:
https://docs.microsoft.com/en-us/cpp/parallel/multithreading-and-locales?view=msvc-170
https://stackoverflow.com/questions/4057319/is-setlocale-thread-safe-function

=> If implemented, we can easily switch to (C(B)) again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants