Programming languages have no or basic support of Unicode. Libraries are required to get a full support of Unicode on all platforms.
Qt is a big :ref:`C++ <cpp>` library covering different topics, but it is typically used to create graphical interfaces. It is distributed under the GNU LGPL license (version 2.1), and is also available under a commercial license.
Character and string classes
QChar is a Unicode character, only able to store :ref:`BMP characters
<bmp>`. It is implemented using a 16 bits unsigned number. Interesting
isSpace(): True if the :ref:`character category <unicode categories>` is separator (Zl, Zp or Zs)
toUpper(): convert to upper case
QString is a :ref:`character string <str>` implemented as an array of
QChar using :ref:`UTF-16 <utf16>`. A :ref:`Non-BMP character <bmp>` is
stored as two
QChar (a :ref:`surrogate pair <surrogates>`). Interesting
Qt :ref:`decodes <decode>` literal byte strings from :ref:`ISO-8859-1` using the
QLatin1String class, a thin wrapper to :c:type:`char*`.
is a character string storing each character as a single byte. It is possible
because it only supports characters in U+0000—U+00FF range.
cannot be used to manipulate text, it has a smaller API than
example, it is not possible to concatenate two
QTextCodec.codecForLocale() gets the locale encoding codec:
QFile.decodeName() is the reverse operation.
.. todo:: what about undecodable filenames?
Qt has two implementations of its
- Windows: use Windows native API
- UNIX: use POSIX API. Examples:
The glib library
The :c:type:`gunichar` type is a character. It is able to store any Unicode 6.0 character (U+0000—U+10FFFF).
- :c:func:`g_convert`: :ref:`decode <decode>` from an encoding and :ref:`encode <encode>` to another encoding with the :ref:`iconv library <iconv>`. Use :c:func:`g_convert_with_fallback` to choose :ref:`how to handle <errors>` :ref:`undecodable bytes <undecodable>` and :ref:`unencodable characters <unencodable>`.
- :c:func:`g_locale_from_utf8` / :c:func:`g_locale_to_utf8`: encode to/decode from the current :ref:`locale encoding <locale encoding>`.
- :c:func:`g_get_charset`: get the locale encoding
- :c:func:`g_utf8_get_char`: get the first character of an UTF-8 string as :c:type:`gunichar`
- :c:func:`g_filename_from_utf8` / :c:func:`g_filename_to_utf8`: :ref:`encode <encode>`/:ref:`decode <decode>` a filename to/from UTF-8
- :c:func:`g_filename_display_name`: human readable version of a filename. Try to decode the filename from each encoding of :c:func:`g_get_filename_charsets` encoding list. If all decoding failed, decode the filename from :ref:`UTF-8` and :ref:`replace <replace>` :ref:`undecodable bytes <undecodable>` by � (U+FFFD).
- :c:func:`g_get_filename_charsets`: get the list of charsets used to decode and encode filenames. :c:func:`g_filename_display_name` tries each encoding of this list, other functions just use the first encoding. Use :ref:`UTF-8` on :ref:`Windows`. On other operating systems, use:
G_FILENAME_ENCODINGenvironment variable (if set): comma-separated list of character set names, the special token
"@locale"is taken to mean the :ref:`locale encoding <locale encoding>`
- or UTF-8 if
G_BROKEN_FILENAMESenvironment variable is set
- or call :c:func:`g_get_charset` (the :ref:`locale encoding <locale encoding>`)
By default, libiconv is :ref:`strict <strict>`: an :ref:`unencodable character
<unencodable>` raise an error. You can :ref:`ignore <ignore>` these characters
by adding the
//IGNORE suffix to the encoding name. There is also the
suffix to :ref:`replace unencodable characters <translit>` by similarly looking
:ref:`PHP <php>` has a builtin binding of iconv.
International Components for Unicode (ICU) is a mature, widely used set of :ref:`C <c>`, :ref:`C++ <cpp>` and :ref:`Java <java>` libraries providing Unicode and Globalization support for software applications. ICU is an open source project distributed under the MIT license.