## BIP 39 + BIP 32 Python Implementation

**BIP 39** describes the generation of mnemonic sentences (AKA seed/recovery phrases) ranging from 12-24 words. From those words, a 512-bit seed can be derived.

More comprehensive step explanations can be found in the accompanying blog post: [Ethereum 201: Mnemonics](TODO).

_Disclaimer: this code is for educational purposes and not optimized for a production environment._

### Generate entropy

A random number between 128 and 256 bits. Must be a multiple of 32.

In [11]:
import os

# valid_entropy_bit_sizes = [128, 160, 192, 224, 256]
entropy_bit_size = 128
entropy_bytes = os.urandom(entropy_bit_size // 8)

# view bytes:
print(entropy_bytes)
# e.g. b'\x02@\x1f\x17\xd8GF\xd2\x8b\x18\xb5\xef\xbd\xd8\x1c\x96'

# view bits:
from bitarray import bitarray
entropy_bits = bitarray()
entropy_bits.frombytes(entropy)
print(entropy_bits)
# e.g. bitarray('0000001001000000000111...0110000001110010010110')

b'\x1em\x8a\x9f\x95]\xcc\xe9\x1a\x90\x0eTrJF\xfe'
bitarray('10101100101010110101010100111110111110111110010010001001110100000000100111011111001011011001010001100111011100110111100101010001')


### Make the entropy evenly disible by 11 bits

A checksum needs to be added to make the entropy evenly divisible by 11. (You'll see why later.) Divide the entropy size by 32 to get the length of checksum required:

In [12]:
checksum_length = entropy_bit_size // 32

print(checksum_length)
# 4

4


Which four bits? The first four bits of the hashed entropy:

In [13]:
from hashlib import sha256
hash = sha256(entropy).digest()

# print(hash)
# e.g. b'\x1ay\xc9&[\x8a\xe03Z\x8f\xa4...'

hash_bits = bitarray()
hash_bits.frombytes(hash)

# print(hash_bits)
# e.g. bitarray('0001101001111...')

checksum = hash_bits[:checksum_length]
print(checksum)
# e.g. bitarray('0001')

bitarray('0011')


Add those first four bits to the end of the unhashed entropy:

In [14]:
print(len(entropy_bits))
# e.g. 128

entropy_bits.extend(checksum)

print(len(entropy_bits))
# e.g. 132

128
132


### Split the entropy into groups of 11 bits

The number of groups is the number of mnemonic words that will be produced.

In [16]:
grouped_bitarrays = tuple(entropy_bits[i * 11: (i + 1) * 11] for i in range(len(entropy_bits) // 11))

print(grouped_bitarrays)
# e.g. (bitarray('00000010010'), bitarray('00000000111'), ...)

print(len(grouped_bitarrays))
# e.g. 12

(bitarray('10101100101'), bitarray('01011010101'), bitarray('01001111101'), bitarray('11110111110'), bitarray('01001000100'), bitarray('11101000000'), bitarray('00100111011'), bitarray('11100101101'), bitarray('10010100011'), bitarray('00111011100'), bitarray('11011110010'), bitarray('10100010011'))
12


Convert the bitarrays to integers. Each 11-bit number is between 0-2047.

In [17]:
from bitarray.util import ba2int
indices = tuple(ba2int(ba) for ba in grouped_bitarrays)

print(indices)
# e.g. (18, 7, 1583, 1412, 931, 842, 355, 181, 1917, 1910, 57, 353)

(1381, 725, 637, 1982, 580, 1856, 315, 1837, 1187, 476, 1778, 1299)


### Convert each index into an English word

The BIP 39 spec links to official word lists for several languages. There are 2048 words in each list - one for each possible 11-bit number. Load the words into memory and swap out each index for its corresponding English word:

In [18]:
with open('english.txt', 'r') as file:
    english_word_list = file.read().strip().split()

print(len(english_word_list))
print(english_word_list[:5])
# ['abandon', 'ability', 'able', 'about', 'above']

2048
['abandon', 'ability', 'able', 'about', 'above']


In [22]:
words = tuple(english_word_list[i] for i in indices)

# print(words)
# e.g. ('across', 'abstract', 'shine', 'rack', 'inner', 'harsh', 
#  'cluster', 'birth', 'use', 'uphold', 'already', 'club')

mnemonic_string = ' '.join(words)
print(mnemonic_string)
# 'across abstract shine ... uphold already club'

protect follow exile water embrace trend cherry toss nerve describe tattoo pen


### Generate the seed

Use a password-based key derivation function (PBKDF2) to create the seed.

Bonus security: you can set an optional passphrase to be included in the salt. (Defaults to empty string.)

In [24]:
salt = "mnemonic" # + passphrase (optional)

import hashlib
seed = hashlib.pbkdf2_hmac(
   "sha512", 
   mnemonic_string.encode("utf-8"), 
   salt.encode("utf-8"), 
   2048
)

print(seed)
# b"\xf8\xb7W}\xba\x02Wx\xb9\xbf$\xf8..."

print(len(seed))
# 64 (bytes, i.e. 512 bits)

print(seed.hex())
# Behold: your seed!

b'a\xcby\xd6Q\x16\xa7\xca\x0c\xa2ka\x8f\xceB2s$@\n.k\xb6<\x88\xc6* \xfe\x95\xaf\xb5\xad_\x93\x99\xa0\x97\x8a\xb2P\xcd\x89$\x97C$L!mC`\x9e\xc0t\xd5Bh\x00R\x81\x18\xb6|'
64
61cb79d65116a7ca0ca26b618fce42327324400a2e6bb63c88c62a20fe95afb5ad5f9399a0978ab250cd89249743244c216d43609ec074d5426800528118b67c


Seed generated!

Next up: **BIP 32** (coming soon)