## BIP 39 + BIP 32 Python Implementation

**BIP 39** describes the generation of mnemonic sentences (AKA seed/recovery phrases) ranging from 12-24 words. From those words, a 512-bit seed can be derived.

More comprehensive step explanations can be found in the accompanying blog post: [Ethereum 201: Mnemonics](TODO).

_Disclaimer: this code is for educational purposes and not optimized for a production environment._

### Generate entropy

A random number between 128 and 256 bits. Must be a multiple of 32.

In [1]:
import os

# valid_entropy_bit_sizes = [128, 160, 192, 224, 256]
entropy_bit_size = 128
entropy_bytes = os.urandom(entropy_bit_size // 8)

print(entropy_bytes)
# e.g. b'\x02@\x1f\x17\xd8GF\xd2\x8b\x18\xb5\xef\xbd\xd8\x1c\x96'

from bitarray import bitarray
entropy_bits = bitarray()
entropy_bits.frombytes(entropy_bytes)
print(entropy_bits)
# e.g. bitarray('0000001001000000000111...0110000001110010010110')

b',\x04q\x0f\x10\xff7\x0eH\xcd><2\x06~\xa2'
bitarray('00101100000001000111000100001111000100001111111100110111000011100100100011001101001111100011110000110010000001100111111010100010')


### Make the entropy evenly disible by 11 bits

A checksum needs to be added to make the entropy evenly divisible by 11. (You'll see why later.) Divide the entropy size by 32 to get the length of checksum required:

In [2]:
checksum_length = entropy_bit_size // 32

print(checksum_length)
# e.g. 4

4


Which four bits? The first four bits of the hashed entropy:

In [3]:
from hashlib import sha256
hash_bytes = sha256(entropy_bytes).digest()

# print(hash_bytes)
# e.g. b'\x1ay\xc9&[\x8a\xe03Z\x8f\xa4...'

hash_bits = bitarray()
hash_bits.frombytes(hash_bytes)

# print(hash_bits)
# e.g. bitarray('0001101001111...')

checksum = hash_bits[:checksum_length]
print(checksum)
# e.g. bitarray('0001')

bitarray('1010')


Add those first four bits to the end of the unhashed entropy:

In [4]:
print(len(entropy_bits))
# e.g. 128

entropy_bits.extend(checksum)

print(len(entropy_bits))
# e.g. 132

128
132


### Split the entropy into groups of 11 bits

The number of groups is the number of mnemonic words that will be produced.

In [5]:
grouped_bitarrays = tuple(entropy_bits[i * 11: (i + 1) * 11] for i in range(len(entropy_bits) // 11))

print(grouped_bitarrays)
# e.g. (bitarray('00000010010'), bitarray('00000000111'), ...)

print(len(grouped_bitarrays))
# e.g. 12

(bitarray('00101100000'), bitarray('00100011100'), bitarray('01000011110'), bitarray('00100001111'), bitarray('11110011011'), bitarray('10000111001'), bitarray('00100011001'), bitarray('10100111110'), bitarray('00111100001'), bitarray('10010000001'), bitarray('10011111101'), bitarray('01000101010'))
12


Convert the bitarrays to integers. Each 11-bit number is between 0-2047.

In [6]:
from bitarray.util import ba2int
indices = tuple(ba2int(ba) for ba in grouped_bitarrays)

print(indices)
# e.g. (18, 7, 1583, 1412, 931, 842, 355, 181, 1917, 1910, 57, 353)

(352, 284, 542, 271, 1947, 1081, 281, 1342, 481, 1153, 1277, 554)


### Convert each index into an English word

The BIP 39 spec links to official word lists for several languages. There are 2048 words in each list - one for each possible 11-bit number. Load the words into memory and swap out each index for its corresponding English word:

In [7]:
with open('english.txt', 'r') as file:
    english_word_list = file.read().strip().split()

print(len(english_word_list))
print(english_word_list[:5])
# ['abandon', 'ability', 'able', 'about', 'above']

2048
['abandon', 'ability', 'able', 'about', 'above']


In [8]:
words = tuple(english_word_list[i] for i in indices)

# print(words)
# e.g. ('across', 'abstract', 'shine', 'rack', 'inner', 'harsh', 
#  'cluster', 'birth', 'use', 'uphold', 'already', 'club')

mnemonic_string = ' '.join(words)
print(mnemonic_string)
# e.g. 'across abstract shine ... uphold already club'

clown castle duck capable vibrant mango case pond destroy mother panic earn


### Generate the seed

Use a password-based key derivation function (PBKDF2) to create the seed.

Bonus security: you can set an optional passphrase to be included in the salt. (Defaults to empty string.)

In [9]:
salt = "mnemonic" # + passphrase (optional)

from hashlib import pbkdf2_hmac
seed = pbkdf2_hmac(
   "sha512", 
   mnemonic_string.encode("utf-8"), 
   salt.encode("utf-8"), 
   2048
)

print(seed)
# b"\xf8\xb7W}\xba\x02Wx\xb9\xbf$\xf8..."

print(len(seed))
# 64 (bytes, i.e. 512 bits)

print(seed.hex())
# Behold: your seed!

b'\x1a\x12\xa8\xcb\xe0(\x8eu\xbb\\\x8b\xda\xe0m\x8cW\x0eK\x8b\xa4\xef\xf4\x17G\x19<uq\x88\xf5!0\xfb\x10\xd6\x94\xa6\x96 \xfe\x1f\x0e\x93\x16\xabu\xc3\xdaj>Q\xfe\xa8\x88\xb3+\x16\x8e\x01B4\xe0\xf2\x1d'
64
1a12a8cbe0288e75bb5c8bdae06d8c570e4b8ba4eff41747193c757188f52130fb10d694a69620fe1f0e9316ab75c3da6a3e51fea888b32b168e014234e0f21d


Seed generated!

Next up: **BIP 32** (coming soon)