Programming languages have no or basic support of Unicode. Libraries are required to get a full support of Unicode on all platforms.

Qt library

Qt is a big :ref:`C++ <cpp>` library covering different topics, but it is typically used to create graphical interfaces. It is distributed under the GNU LGPL license (version 2.1), and is also available under a commercial license.

Character and string classes

QChar is a Unicode character, only able to store :ref:`BMP characters <bmp>`. It is implemented using a 16 bits unsigned number. Interesting QChar methods:

QString is a :ref:`character string <str>` implemented as an array of QChar using :ref:`UTF-16 <utf16>`. A :ref:`Non-BMP character <bmp>` is stored as two QChar (a :ref:`surrogate pair <surrogates>`). Interesting QString methods:

Qt :ref:`decodes <decode>` literal byte strings from :ref:`ISO-8859-1` using the QLatin1String class, a thin wrapper to :c:type:`char*`. QLatin1String is a character string storing each character as a single byte. It is possible because it only supports characters in U+0000—U+00FF range. QLatin1String cannot be used to manipulate text, it has a smaller API than QString. For example, it is not possible to concatenate two QLatin1String strings.


QTextCodec.codecForLocale() gets the locale encoding codec:



QFile.decodeName() is the reverse operation.

Qt has two implementations of its QFSFileEngine:

  • Windows: use Windows native API
  • UNIX: use POSIX API. Examples: fopen(), getcwd() or get_current_dir_name(), mkdir(), etc.

Related classes: QFile, QFileInfo, QAbstractFileEngineHandler, QFSFileEngine.

The glib library

The glib library is a great :ref:`C <c>` library distributed under the GNU LGPL license (version 2.1).

Character strings

The :c:type:`gunichar` type is a character. It is able to store any Unicode 6.0 character (U+0000—U+10FFFF).

The glib library has no :ref:`character string <str>` type. It uses :ref:`byte strings <bytes>` using the :c:type:`gchar*` type, but most functions use :ref:`UTF-8` encoded strings.

Codec functions

Filename functions

iconv library

libiconv is a library to encode and decode text in different encodings. It is distributed under the GNU LGPL license. It supports a lot of encodings including rare and old encodings.

By default, libiconv is :ref:`strict <strict>`: an :ref:`unencodable character <unencodable>` raise an error. You can :ref:`ignore <ignore>` these characters by adding the //IGNORE suffix to the encoding name. There is also the //TRANSLIT suffix to :ref:`replace unencodable characters <translit>` by similarly looking characters.

:ref:`PHP <php>` has a builtin binding of iconv.

ICU libraries

International Components for Unicode (ICU) is a mature, widely used set of :ref:`C <c>`, :ref:`C++ <cpp>` and :ref:`Java <java>` libraries providing Unicode and Globalization support for software applications. ICU is an open source project distributed under the MIT license.


libunistring provides functions for manipulating Unicode strings and for manipulating C strings according to the Unicode standard. It is distributed under the GNU LGPL license version 3.