Skip to content

Commit

Permalink
Provided detailed explanation of v4 slug regex
Browse files Browse the repository at this point in the history
  • Loading branch information
petemoore committed Feb 12, 2016
1 parent 5fc9b5e commit a2b7379
Showing 1 changed file with 146 additions and 1 deletion.
147 changes: 146 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,155 @@ Generated slugs take the form ``[A-Za-z0-9_-]{22}``, or more precisely:
- ``slugid.nice()`` slugs conform to
``[A-Za-f][A-Za-z0-9_-]{7}[Q-T][A-Za-z0-9_-][CGKOSWaeimquy26-][A-Za-z0-9_-]{10}[AQgw]``

RFC 4122 defines the setting of 6 bits of the v4 UUID which implies v4 slugs
RFC 4122 defines the setting of six bits of the v4 UUID which implies v4 slugs
provide 128 - 6 = 122 bits entropy. Due to the (un)setting of the first bit
of "nice" slugs, nice slugs provide therefore 121 bits entropy.

These are the six fixed bits:

- bit 48: ``0``
- bit 49: ``1``
- bit 50: ``0``
- bit 51: ``0``
- bit 64: ``1``
- bit 65: ``0``

Splitting the 128 bits into groups of six to see the base64 character boundaries, we
get:

::

position: 11 111111 111111 111111 111111 11
11 111111 112222 222222 333333 333344 444444 445555 555555 666666 666677 777777 778888 888888 999999 999900 000000 001111 111111 222222 22
012345 678901 234567 890123 456789 012345 678901 234567 890123 456789 012345 678901 234567 890123 456789 012345 678901 234567 890123 456789 012345 67
bin: |......|......|......|......|......|......|......|......|0100..|......|....10|......|......|......|......|......|......|......|......|......|......|..0000|
b64: | α | α | α | α | α | α | α | α | β | α | γ | α | α | α | α | α | α | α | α | α | α | δ |

Using the base64url encoding scheme, we can see which characters are
allowed at each of the 22 positions.


- α = ``0b......`` ∈ {

::

000000 A
000001 B
000010 C
000011 D
000100 E
000101 F
000110 G
000111 H
001000 I
001001 J
001010 K
001011 L
001100 M
001101 N
001110 O
001111 P
010000 Q
010001 R
010010 S
010011 T
010100 U
010101 V
010110 W
010111 X
011000 Y
011001 Z
011010 a
011011 b
011100 c
011101 d
011110 e
011111 f
100000 g
100001 h
100010 i
100011 j
100100 k
100101 l
100110 m
100111 n
101000 o
101001 p
101010 q
101011 r
101100 s
101101 t
101110 u
101111 v
110000 w
110001 x
110010 y
110011 z
110100 0
110101 1
110110 2
110111 3
111000 4
111001 5
111010 6
111011 7
111100 8
111101 9
111110 -
111111 _

}
- β = ``0b0100..`` ∈ {

::

010000 Q
010001 R
010010 S
010011 T

}
- γ = ``0b....10`` ∈ {

::

000010 C
000110 G
001010 K
001110 O
010010 S
010110 W
011010 a
011110 e
100010 i
100110 m
101010 q
101110 u
110010 y
110110 2
111010 6
111110 -

}
- δ = ``0b..0000`` ∈ {

::

000000 A
010000 Q
100000 g
110000 w

}

Thus we reach a 22 character encoding of:

- α{8}βαγα{10}δ

which denormalised becomes:

- ``^[A-Za-z0-9_-]{8}[Q-T][A-Za-z0-9_-][CGKOSWaeimquy26-][A-Za-z0-9_-]{10}[AQgw]$``

Usage
-----
Expand Down

0 comments on commit a2b7379

Please sign in to comment.