Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script support for languages #65

Open
juhovh opened this issue Jul 22, 2015 · 4 comments
Open

Script support for languages #65

juhovh opened this issue Jul 22, 2015 · 4 comments

Comments

@juhovh
Copy link

juhovh commented Jul 22, 2015

I'm writing this on my mobile phone so sorry if I'm describing this too shortly.

On JVM languages the language tags are considered to be in format language-region-variant. This is valid, but makes it impossible to select translation based on script, which is kind of a deal breaker for me.

To give an example, we could use Serbian language. There are versions sr-Latn-RS (Serbian with latin script in Serbia), sr-Cyrl-RS (Serbian with cyrillic script in Serbia), sr-Latn-MN (Serbian with latin script in Montenegro), sr-Cyrl-MN (Serbian with cyrillic script in Montenegro). There is currently no way of providing script information to the library.

Another example is Chinese, where zh-Hans (simplified Chinese) is used in mainland and Singapore, whereas zh-Hant is used in Taiwan and Hong Kong. It would be really useful to have generic localizations for simplified and traditional Chinese and some country specific localizations for example for Taiwan and Singapore.

Now there are simple fixes and more complicated fixes, the most simple would be to require the language tags to be in form language-script-region-variant, but that would break backwards compatibility. The more complicated version would require regexps and/or parsing etc. Which version you would like to have a pull request of?

@juhovh
Copy link
Author

juhovh commented Jul 22, 2015

After thinking this through I think going with a proper parsing of BCP 47 tags (probably still supporting underscore for backwards compatibility?) is the only reasonable option. So I'll start to prepare a patch for that.

@ptaoussanis
Copy link
Member

Hi there,

I'm afraid I'm not really following. To clarify: Tower has two notions of "locale"-

  1. A JVM locale used for the JVM-dependent localization features
  2. A keyword locale used for translations

The second is a strict superset of the first. No JVM validation or structural requirements are imposed on the second type.

So you can use any kind of locale structure you like for translations: :sr-Latn-RS and :zh-Hans should work fine.

You can't use these for anything that requires a JVM locale (since the JVM wouldn't be able to recognize these); but you're free to use them for the translations API (which is pure Clojure/Script).

The only semantic requirement is that an :x-y-z locale can sensibly fallback to :x-y and then :x.

Does that make sense?

@juhovh
Copy link
Author

juhovh commented Jul 24, 2015

Hi,

Sorry for not being clear enough in my description, I'll try to be short and describe my use case better.

I have a system where I have a valid IETF BCP 47 locale in my database for each user. I want to use this locale for formatting both translations and JVM-dependent localization features. If we take zh-Hans-CN as an example, I can use it just fine for translations, but if I try to use it for localization features the following happens:

=> (tower/fmt-str :zh-Hans-CN "%f" 5.5)
ExceptionInfo Invalid locale: :zh-Hans-CN  clojure.core/ex-info (core.clj:4403)

This is not nice, because zh-Hans-CN is a perfectly valid locale according to best current practices. (hence BCP)

What I would really like to have as a first step would be something like the following:

diff --git a/src/taoensso/tower.cljx b/src/taoensso/tower.cljx
index 2abb09d..933decd 100644
--- a/src/taoensso/tower.cljx
+++ b/src/taoensso/tower.cljx
@@ -7,7 +7,7 @@
                   [taoensso.encore :as encore]
                   [taoensso.timbre :as timbre]
                   [taoensso.tower.utils :as utils :refer (defmem- defmem-*)])
-  #+clj (:import  [java.util Date Locale TimeZone Formatter]
+  #+clj (:import  [java.util Date Locale Locale$Builder TimeZone Formatter IllformedLocaleException]
                   [java.text Collator NumberFormat DateFormat])
   #+cljs (:require-macros [taoensso.encore :as encore]
                           [taoensso.tower  :as tower-macros])
@@ -67,11 +67,16 @@
         (make-Locale (.getLanguage ^Locale loc)))

       :else
-      (let [loc-parts (str/split (name loc) #"[-_]")]
-        (all-Locales
-          (if-not lang-only?
-            (apply make-Locale loc-parts)
-            (make-Locale (first loc-parts))))))))
+      (try
+        (let [loc-obj (.build (.setLanguageTag (Locale$Builder.) (name loc)))]
+          (if-not lang-only? loc-obj
+            (make-Locale (.getLanguage ^Locale loc-obj))))
+        (catch IllformedLocaleException e
+          (let [loc-parts (str/split (name loc) #"[_]")]
+            (all-Locales
+              (if-not lang-only?
+                (apply make-Locale loc-parts)
+                (make-Locale (first loc-parts))))))))))

 #+clj
 (def jvm-locale

This would support all BCP 47 locales and try to fall back to the old legacy way of handling locales. Notice that I removed - from the regex because I think it would be much nicer to have all locale names separated by dash to be well formed BCP 47 names.

What would be nice in the long run would be something like what .NET does as explained in https://msdn.microsoft.com/en-us/library/vstudio/dd997383(v=vs.100).aspx

The parent chain of the Chinese cultures now includes the root Chinese culture. The following examples show the complete parent chain for two of the Chinese specific cultures:

zh-CN → zh-CHS → zh-Hans → zh → Invariant
zh-TW → zh-CHT → zh-Hant → zh → Invariant

So the system should know that simplified Chinese is used in mainland China and fall back to zh-Hans if zh-CN is used. But it seems that the JVM doesn't support this either and leaves it to the implementation to handle correct mappings, so I think it's out of scope of tower.

Did this clarify my point of script support any better?

@ptaoussanis
Copy link
Member

Hi,

Did this clarify my point of script support any better?

It did, thank you for the detailed info! Not ready to reply just yet, need some time to go over this all.
Just leaving a reference here in the meantime: http://openjdk.java.net/jeps/128

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants