# Generating a list of words that represents an hexadecimal string

## Random hexadecimal string
First we create a random hexadecimal string, and store it in "private_key" variable

In [1]:
# Generate random hexadecimal string
import random
total_bits=256
bit_per_char=16
private_key = "".join(random.choice("0123456789abcdef") for _ in range(4*256//16))
print(private_key)

09e8bb3529506dccc8b0a70cbac76b78c37792de9e1a948c7da0290632b62ced


## Mnemonic representation function
We create a function that takes a list of words and represents the hexadecimal number using the list.
The list must have a length that is a power of 2: 2, 4, 8, 16, 32, 65.., 

In [2]:
# Import binascii for hex to bytes conversion
import binascii
import math

# Define a function that takes a hexadecimal string and a wordlist as input and returns a mnemonic sentence without checksum as output
def hex_to_mnemonic_no_checksum(hex_string, wordlist, separator=" "):
    # Convert the hex string to bytes
    data = binascii.unhexlify(hex_string)
    # Convert the data to binary string
    # We use zfill so each byte is filled with zeroes if it's less than 8 bits
    data_bits = bin(int.from_bytes(data, "big"))[2:].zfill(len(data) * 8)
    # Split the bits into groups of log2(wordlist_size) and convert each group to an index in the wordlist
    words = []
    group_size = int(math.log2(len(wordlist)))
    for i in range(0, len(data_bits), group_size):
        index = int(data_bits[i:i+group_size], 2)
        word = wordlist[index]
        words.append(word)
    # Join the words with spaces and return the mnemonic sentence
    mnemonic = separator.join(words)
    return mnemonic

# Define a wordlist of any power of 2 size
wordlist = ["apple", "banana", "carrot", "date", "elderberry", "fig", "cat", "dog"]
# Test with an example hex string of 128 bits (16 bytes) and 
mnemonic = hex_to_mnemonic_no_checksum(private_key, wordlist)
print(mnemonic)
# Show length of source word list, and the number of words that must be used in the mnemonic
print("Source word list length: " + str(len(wordlist)))
print("Number of words in the mnemonic: "+ str(len(mnemonic)))

apple carrot date cat elderberry carrot dog date banana fig carrot carrot elderberry fig carrot apple date date date elderberry cat date banana apple fig elderberry banana carrot date elderberry banana elderberry fig cat fig elderberry date fig fig date date cat banana elderberry banana fig cat dog elderberry elderberry fig fig dog carrot date cat apple cat fig banana carrot carrot banana elderberry date dog date carrot apple apple fig banana apple banana elderberry date banana carrot cat cat banana date banana cat cat banana
Source word list length: 8
Number of words in the mnemonic: 531


## Using 0-9, a-f
When we use characters 0-9 and a-f as the source list, we should get exactly the same original hexadecimal representation

In [3]:
wordlist = ["0","1", "2","3","4","5","6","7","8","9","a","b","c","d","e","f"]
mnemonic = hex_to_mnemonic_no_checksum(private_key, wordlist, "")
print(mnemonic)
print("Source word list length: " + str(len(wordlist)))
print("Number of words in the mnemonic: "+ str(len(mnemonic)))

09e8bb3529506dccc8b0a70cbac76b78c37792de9e1a948c7da0290632b62ced
Source word list length: 16
Number of words in the mnemonic: 64


## Using 0-1
We repeat the same just with "0" and "1", and we would get the binary representation of the number

In [4]:
wordlist = ["0","1"]
mnemonic = hex_to_mnemonic_no_checksum(private_key, wordlist, "")
print(mnemonic)
print("Source word list length: " + str(len(wordlist)))
print("Number of words in the mnemonic: "+ str(len(mnemonic)))

0000100111101000101110110011010100101001010100000110110111001100110010001011000010100111000011001011101011000111011010110111100011000011011101111001001011011110100111100001101010010100100011000111110110100000001010010000011000110010101101100010110011101101
Source word list length: 2
Number of words in the mnemonic: 256


## Using Bitcoin BIP39 list of 2048 source mnemonic words
We can use the BIP39 list of 2048 words used for mnemonics of Bitcoin wallets

In [5]:
import requests
wordlist_url = "https://raw.githubusercontent.com/bitcoin/bips/master/bip-0039/english.txt"
wordlist = requests.get(wordlist_url).text.split()
mnemonic = hex_to_mnemonic_no_checksum(private_key, wordlist, " ")
print(mnemonic)
print("Source word list length: " + str(len(wordlist)))
print("Number of words in the mnemonic: "+ str(mnemonic.count(" ")+1))

antique easy snap famous almost toy carpet belt arrow stomach suspect various danger siren kidney select nest glue gym faith glimpse force recipe absent
Source word list length: 2048
Number of words in the mnemonic: 24


## Final example including a checksum
On the real BIP39 representation, each

In [7]:
# Import libraries
import secrets
import hashlib

# Define constants
WORDLIST = "https://raw.githubusercontent.com/bitcoin/bips/master/bip-0039/english.txt" # URL of BIP-39 wordlist
ENTROPY_BITS = 256 # Number of bits of entropy
CHECKSUM_BITS = ENTROPY_BITS // 32 # Number of bits of checksum

# Download wordlist
import requests
response = requests.get(WORDLIST)
words = response.text.split()

# Generate random entropy
entropy = secrets.token_bytes(ENTROPY_BITS // 8)

# Compute checksum
checksum = hashlib.sha256(entropy).digest()[0]

# Convert entropy and checksum to binary string
binary = bin(int.from_bytes(entropy, "big"))[2:].zfill(ENTROPY_BITS) + bin(checksum)[2:].zfill(CHECKSUM_BITS)

# Split binary string into groups of 11 bits
groups = [binary[i:i+11] for i in range(0, len(binary), 11)]

# Convert each group to an index and look up the corresponding word
mnemonic = [words[int(g, 2)] for g in groups]

# Join the words with spaces
mnemonic = " ".join(mnemonic)

# Print the mnemonic phrase
print(mnemonic)

panel cause ribbon silent tennis can giant cable faith zero end exercise twelve hockey electric spoon chef dilemma table rent bunker similar service sight
