Skip to content

Text Rendering

Tristan Gerritsen edited this page Aug 16, 2023 · 1 revision

Text Rendering

Text rendering is a complex topic, since it contains many steps (...). Retina has the following workflow:

Character Set

At the lowest level, text is made up of characters, or in Unicode terms: code points.

Basic ASCII would contain only the basic Latin alphabet used by English, plus some digits, symbols and control characters. Since English isn't the only language, and e.g. emojis are also nice to have, we need some standardized way of relating certain numbers to code points.

This is where Unicode comes in. The Unicode specification declares hundreds of thousands of code points with all different meanings. By these mechanisms, complex non-Latin scripts can be encoded, math and other science symbols, emojis, and more.

Unicode also specifies the combination of certain code points and the meaning thereof. For example, U+200D ZERO WIDTH JOINER (ZWJ) provides a way of combining multiple code points into one complex character. These are not only used for complex scripts, but also for compositing emoji with different skin tones, family combination, profession by gender and skin tone, etc.

Character Encoding

Whilst the Unicode specifications provides a list of code points, there is still a way needed of representing them as bytes. Since Unicode code points range from 0x0 to 0x10FFFF (or U+10FFFF), some code points might be represented as 0x00000030 (the code point for 0), whilst others take up more decimal positions.

The naive way of encoding these code points as bytes would be to encode them as 32-bit integers, since that is what the largest code point can be. And there is an official encoding for that: UTF-32. However, since webpages largely consist of HTML, CSS and JavaScript, the most common code points are those of the English alphabet.

Encoding these documents as UTF-32 would waste a lot of zero bytes, so most webpages use UTF-8. This encoding starts at one byte per code point, and if more bytes are needed to represent the given code point, the upper bits of the bytes can signify the amount of bytes used for that code point. This is called variable encoding, and is much more compact.

Graphemes

Font Faces

Text Shaping

Kerning

Ligatures and other Font Features

Text Layout

Glyph Rasterization

Hinting

Subpixel/ClearType

Glyph Rendering

Gamma Correction

Alpha Blending

Masking