Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
187 lines (132 sloc) 6.74 KB

Libraries

Programming languages have no or basic support of Unicode. Libraries are required to get a full support of Unicode on all platforms.

Qt library

Qt is a big :ref:`C++ <cpp>` library covering different topics, but it is typically used to create graphical interfaces. It is distributed under the GNU LGPL license (version 2.1), and is also available under a commercial license.

Character and string classes

QChar is a Unicode character, only able to store :ref:`BMP characters <bmp>`. It is implemented using a 16 bits unsigned number. Interesting QChar methods:

QString is a :ref:`character string <str>` implemented as an array of QChar using :ref:`UTF-16 <utf16>`. A :ref:`Non-BMP character <bmp>` is stored as two QChar (a :ref:`surrogate pair <surrogates>`). Interesting QString methods:

Qt :ref:`decodes <decode>` literal byte strings from :ref:`ISO-8859-1` using the QLatin1String class, a thin wrapper to :c:type:`char*`. QLatin1String is a character string storing each character as a single byte. It is possible because it only supports characters in U+0000—U+00FF range. QLatin1String cannot be used to manipulate text, it has a smaller API than QString. For example, it is not possible to concatenate two QLatin1String strings.

Codec

QTextCodec.codecForLocale() gets the locale encoding codec:

Filesystem

QFile.encodeName():

QFile.decodeName() is the reverse operation.

.. todo:: what about undecodable filenames?

Qt has two implementations of its QFSFileEngine:

  • Windows: use Windows native API
  • UNIX: use POSIX API. Examples: fopen(), getcwd() or get_current_dir_name(), mkdir(), etc.

Related classes: QFile, QFileInfo, QAbstractFileEngineHandler, QFSFileEngine.

The glib library

The glib library is a great :ref:`C <c>` library distributed under the GNU LGPL license (version 2.1).

Character strings

The :c:type:`gunichar` type is a character. It is able to store any Unicode 6.0 character (U+0000—U+10FFFF).

The glib library has no :ref:`character string <str>` type. It uses :ref:`byte strings <bytes>` using the :c:type:`gchar*` type, but most functions use :ref:`UTF-8` encoded strings.

Codec functions

Filename functions

iconv library

libiconv is a library to encode and decode text in different encodings. It is distributed under the GNU LGPL license. It supports a lot of encodings including rare and old encodings.

By default, libiconv is :ref:`strict <strict>`: an :ref:`unencodable character <unencodable>` raise an error. You can :ref:`ignore <ignore>` these characters by adding the //IGNORE suffix to the encoding name. There is also the //TRANSLIT suffix to :ref:`replace unencodable characters <translit>` by similarly looking characters.

:ref:`PHP <php>` has a builtin binding of iconv.

ICU libraries

International Components for Unicode (ICU) is a mature, widely used set of :ref:`C <c>`, :ref:`C++ <cpp>` and :ref:`Java <java>` libraries providing Unicode and Globalization support for software applications. ICU is an open source project distributed under the MIT license.

libunistring

libunistring provides functions for manipulating Unicode strings and for manipulating C strings according to the Unicode standard. It is distributed under the GNU LGPL license version 3.