# Decomposition and Composition

So it is in these cases that we use unicode normalization to *normalize* our characters into matching pairs. As there are different forms of equivalence, there are also different forms of normalization. These are all called **N**ormal **F**orm, and there are four different methods:

| Name | Abbreviation | Description | Example |
| --- | --- | --- | --- |
| Form D | NFD | *Canonical* decomposition | `Ç` → `C ̧` |
| Form C | NFC | *Canoncial* decomposition followed by *canonical* composition | `Ç` → `C ̧` → `Ç` |
| Form KD | NFKD | *Compatibility* decomposition | `ℌ ̧` → `H ̧` |
| Form KC | NFKC | *Compatibility* decomposition followed by *canonical* composition | `ℌ ̧` → `H ̧` → `Ḩ` |

Let's take a look at each of these forms in action. Our C with cedilla character Ç can be represented in two ways, as a single character called *Latin capital C with cedilla* (*\u00C7*), or as two characters called *Latin capital C* (*\u0043*) and *combining cedilla* (*\u0327*):

In [6]:
import unicodedata

In [2]:
c_with_cedilla = "\u00C7"
c_with_cedilla

'Ç'

In [3]:
c_plus_cedilla = "\u0043\u0327"
c_plus_cedilla

'Ç'

In [4]:
c_with_cedilla == c_plus_cedilla

False

In [7]:
unicodedata.normalize('NFD', c_with_cedilla) == c_plus_cedilla

True

In [8]:
unicodedata.normalize('NFC', c_plus_cedilla) == c_with_cedilla

True

In [9]:
unicodedata.normalize('NFC', c_plus_cedilla) == unicodedata.normalize('NFC', c_with_cedilla)

True