# HD Wallet (Hierarchical Deterministic Wallet) Demonstration

## Introduction to HD Wallet
### What is an HD Wallet?
An HD Wallet (Hierarchical Deterministic Wallet) is a type of cryptocurrency wallet that uses a single seed phrase (mnemonic) to generate an entire hierarchy of private and public keys. This structure allows:
- Easy backup and recovery through a single mnemonic phrase.
- Deterministic generation of keys, ensuring consistent results across different software.
- Enhanced security and usability through hierarchical key derivation.

In this notebook, we will:
1. Convert a mnemonic into a seed.
2. Derive the master key and chain code.
3. Use the master key to derive child keys.
4. Understand and implement derivation paths.

### Structure of an HD Wallet
- **Mnemonic**: A human-readable set of words that encodes the entropy.
- **Seed**: A 512-bit value derived from the mnemonic and an optional passphrase.
- **Master Key & Chain Code**: The starting point for generating child keys in the hierarchy.
- **Derivation Path**: A structured path that determines how keys are derived at each level of the hierarchy.

---

## Step 1: Convert Mnemonic to Seed
The mnemonic phrase, along with an optional passphrase, is transformed into a 64-byte seed using the PBKDF2-HMAC-SHA512 algorithm. This seed is used as the basis for generating cryptographic keys in hierarchical deterministic (HD) wallets. Below is a detailed breakdown of the process and rationale:

### 1. Input Components
- **Mnemonic**: A sequence of 12, 15, 18, 21, or 24 words derived from BIP-39 word lists. This serves as the primary input.
- **Passphrase** (Optional): A user-defined string that enhances security by introducing additional entropy. If not provided, an empty string is used.
- **Salt**: The passphrase is combined with the fixed prefix string "mnemonic" to form the salt.

### 2. PBKDF2-HMAC-SHA512 Process

PBKDF2 (Password-Based Key Derivation Function 2) is a standard algorithm used to securely derive a cryptographic key from a password or mnemonic. It works as follows:
1. The algorithm uses HMAC-SHA512 as its hash function.
2. The mnemonic, encoded as UTF-8, is used as the primary input.
3. The salt, which is "mnemonic" concatenated with the optional passphrase, is also encoded as UTF-8.
4. The algorithm performs 2048 iterations of HMAC-SHA512, repeatedly hashing the input to strengthen the derived key.
5. The output is a fixed 64-byte (512-bit) seed.

### 3. Why PBKDF2-HMAC-SHA512?

The choice of PBKDF2-HMAC-SHA512 is driven by its ability to meet the following security and compatibility requirements:

**(a) Resistance to Brute-Force Attacks**
- PBKDF2 employs a computational delay by performing thousands of iterations, making brute-force attacks computationally expensive.
- In BIP-39, the number of iterations is fixed at 2048, balancing security and performance.

**(b) Key Stretching**
- Even if the mnemonic or passphrase is weak, the algorithm “stretches” its entropy, producing a stronger output.

**(c) Standardized and Reliable**
- PBKDF2 is widely adopted in cryptographic systems and follows the RFC 2898 standard, ensuring interoperability and reliability.

**(d) HMAC-SHA512 Features**
- HMAC (Hash-based Message Authentication Code) provides additional security by preventing hash collisions.
- SHA-512 offers high resistance against cryptographic attacks, such as collision or pre-image attacks.

**(e) Deterministic Output**
- For the same mnemonic and passphrase, the derived seed will always be identical, ensuring deterministic wallet recovery.

### 4. Security Implications
- The optional passphrase acts as a “second factor,” making it significantly harder for an attacker to derive the seed from the mnemonic alone.
- The derived 64-byte seed is then used in hierarchical deterministic wallets to generate the master key and master chain code, from which all child keys are derived.

In [1]:
import hashlib

PBKDF2_ROUNDS = 2048

def mnemonic_to_seed(mnemonic_words: list, passphrase: str = "") -> bytes:
    """
    Convert a mnemonic phrase into a seed.

    :param mnemonic_words: List of mnemonic words.
    :param passphrase: An optional passphrase to enhance security.
    :return: A 64-byte seed.
    """
    mnemonic_str = " ".join(mnemonic_words).strip()
    print(f"mnemonic_str: {mnemonic_str}")
    salt = "mnemonic" + passphrase.strip()
    print(f"salt: {salt}")

    # Generate the seed using PBKDF2-HMAC-SHA512 (password-based key derivation function 2)
    seed = hashlib.pbkdf2_hmac(
        "sha512", mnemonic_str.encode("utf-8"), salt.encode("utf-8"), PBKDF2_ROUNDS
    )
    return seed

# Example usage
mnemonic_list = ['bring', 'boil', 'cattle', 'dawn', 'off', 'buyer', 'weird', 'plug', 'summer', 'federal', 'misery', 'ship']
passphrase = ""
seed = mnemonic_to_seed(mnemonic_list, passphrase)
print(f"Generated seed: {seed.hex()} ({len(seed)} bytes)")

mnemonic_str: bring boil cattle dawn off buyer weird plug summer federal misery ship
salt: mnemonic
Generated seed: 626a89a999816e971d054d3a2a2dc33c13dace2ce95f6ea36b4e4245f9541368128310d3d9f4b281c0839c22828628fa8648725916879364aec673981c8f98c2 (64 bytes)


## Step 2: Derive Master Key and Chain Code

### What are Master Key and Chain Code?
- **Master Key**: This is a 256-bit private key derived from the seed. It is the root key of the HD Wallet hierarchy and is used to derive all subsequent keys.
- **Chain Code**: The chain code is another 256-bit value derived alongside the master key. It is used as part of the key derivation process to ensure deterministic but secure generation of child keys.

The chain code is created by splitting the 512-bit HMAC-SHA512 hash into two halves:
- The **first 32 bytes** form the master private key.
- The **last 32 bytes** form the chain code.

The term "chain code" is used because it acts as a cryptographic anchor, linking the parent key to its derived child keys. Without the chain code, it would be impossible to deterministically derive child keys in a secure manner.

In [2]:
import hmac

def derive_master_key(seed: bytes) -> tuple[bytes, bytes]:
    """
    Derive the master key and chain code from the seed.

    :param seed: A 64-byte seed.
    :return: A tuple containing the master private key and chain code.
    """
    key = b"Bitcoin seed"
    h = hmac.new(key, seed, hashlib.sha512).digest()
    master_key = h[:32]
    chain_code = h[32:]
    return master_key, chain_code

# Example usage
print(f"Generate master key using seed {seed.hex()} ({len(seed)} bytes)")
master_key, chain_code = derive_master_key(seed)
print(f"Master Key: {master_key.hex()} ({len(master_key)} bytes)")
print(f"Chain Code: {chain_code.hex()} ({len(chain_code)} bytes)")

Generate master key using seed 626a89a999816e971d054d3a2a2dc33c13dace2ce95f6ea36b4e4245f9541368128310d3d9f4b281c0839c22828628fa8648725916879364aec673981c8f98c2 (64 bytes)
Master Key: c4e4698f728127e084769296bd9736fe03f164fe14035dfa6c3359aa738eda5b (32 bytes)
Chain Code: 6644d0d1fa201149d49aae993511214646afde1de3526be2792fb5d4acc3cb23 (32 bytes)


## Step 3: Derive Child Keys

### What is Hardened Index?
When deriving child keys, there are two types of derivation:
- **Normal Derivation**: Allows the derived public key to be used to generate further child keys without revealing the private key.
- **Hardened Derivation**: Adds an additional layer of security by preventing the derived public key from being used to deduce the parent private key. 

### How Does Normal Derivation Work?
In Normal Derivation, the child key is derived using the parent public key and chain code, without requiring the parent private key. This means:
- Anyone with access to the parent public key and chain code can generate child public keys.
- If a child private key is leaked and the chain code is also compromised, the parent private key can be calculated, posing a significant security risk.

The formula for Normal Derivation:
```
Child Private Key = Parent Private Key + Derived Value (mod n)
Child Public Key  = Parent Public Key + Derived Value * G
```
where Derived Value is computed using the chain code and index:
```
Derived Value = HMAC-SHA512(Chain Code, Index)
```
Since Normal Derivation relies only on the parent public key and chain code, an exposed chain code can lead to security vulnerabilities if a child private key is leaked.

### How Does Hardened Derivation Work?
To enhance security, Hardened Derivation modifies the derivation process by requiring the parent private key to generate child keys. This prevents the derivation of further child keys from just the public key and chain code. The key distinction is:
- In Hardened Derivation, the parent private key is included in the HMAC-SHA512 function, making it impossible to deduce the parent private key from any derived public key.
- Even if a child private key is leaked, it does not compromise the parent key, ensuring better security.

To indicate hardened derivation, the index is "hardened" by setting the most significant bit (MSB) to `1`. This is achieved by applying a bitwise OR operation with `0x80000000`:
```python
hardened_index = 0x80000000 | index
```
The value `0x80000000` corresponds to the 31st bit in the index, effectively ensuring that the index is within the hardened key space.

### Security comparison: Normal vs. Hardened Derivation
| Feature | Normal Derivation | Hardened Derivation |
|---|---|---|
| Requires Parent Private Key? | ❌ N | ✅ Y |
| Can Derived Child Public Keys from Parent Public Key? | ✅ Y | ❌ N |
| Can Parent Private Key be Inferred from a Leaked Child Private Key? | ✅ Possible (if chain code is known) | ❌ N |
| Chain Code Exposure Risk | ⚠️ High | ✅ Safe |
| Recommended For | Non-sensitive addresses | High-security account separation |

### Why Use Hardened Derivation?
Hardened keys ensure:
1. **No backward traceability**: A compromised child key does not expose the parent private key.
2. **Secure account separation**: Each hardened path (e.g., `account'`) is securely isolated from others.
3. **Protection against chain code exposure**: Since the parent private key is required for hardened derivation, an exposed chain code alone is not enough to compromise security.

### Best Practices for HD Wallet Security
- **Use Hardened Derivation for critical keys**, such as account-level keys (m/44'/60'/0' in BIP-44).
- **Never expose xpub (Extended Public Key) with chain code** to third parties unless necessary.
- **Always securely store the seed**, as it derives both private keys and chain codes.
- **Use a hardware wallet** to keep private keys isolated from networked environments.

By following these principles, HD wallets can ensure maximum security while maintaining flexibility in key generation.

In [3]:
from ecdsa.util import string_to_number
from ecdsa import SigningKey, VerifyingKey, SECP256k1
import hashlib
import hmac

def derive_child_key(
    parent_key: bytes,
    parent_chain_code: bytes,
    index: int
) -> tuple[bytes, bytes, bytes]:
    """
    Derive a child key from a parent key and chain code.

    :param parent_key: The parent private key (32 bytes).
    :param parent_chain_code: The parent chain code.
    :param index: The child index (use hardened index for added security).
    :return: A tuple containing the child private key, chain code, and public key.
    """
    # Convert index to 4-byte big-endian
    hdi = index.to_bytes(4, "big")
    print(f"Index (4 bytes): {hdi.hex()}")

    if index & 0x80000000:  # Hardened derivation
        # Hardened: Use private key
        data = b"\x00" + parent_key + hdi
    else:  # Normal derivation
        parent_pub_key = derive_public_key_from_private_key(parent_key)
        data = parent_pub_key + hdi
    
    print(f"HMAC input data: {data.hex()}")

    # Perform HMAC-SHA512
    h = hmac.new(parent_chain_code, data, hashlib.sha512).digest()
    child_tweak = string_to_number(h[:32])  # First 32 bytes
    child_chain_code = h[32:]  # Last 32 bytes

    print(f"Child tweak: {h[:32].hex()}, Child chain code: {child_chain_code.hex()}")

    # SECP256k1's curve order
    n = 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141

    # Compute child private key
    parent_int = string_to_number(parent_key)
    child_private_key = (parent_int + child_tweak) % n
    if child_private_key == 0:
        raise ValueError("Derived child private key is invalid (zero).")

    child_private_key_bytes = child_private_key.to_bytes(32, "big")
    print(f"Derived child private key: {child_private_key_bytes.hex()}")

    # Compute child public key from private key
    child_pub_key = derive_public_key_from_private_key(child_private_key_bytes)

    return child_private_key_bytes, child_chain_code, child_pub_key

def derive_public_key_from_private_key(private_key: bytes) -> bytes:
    """
    Derive the public key from the private key.

    :param private_key: The private key (32 bytes).
    :return: The compressed public key (33 bytes).
    """
    sk = SigningKey.from_string(private_key, curve=SECP256k1)
    vk = sk.verifying_key
    x = vk.pubkey.point.x()
    prefix = b"\x02" if vk.pubkey.point.y() % 2 == 0 else b"\x03"
    return prefix + x.to_bytes(32, "big")

## Step 4: Derivation Paths
### What is a Derivation Path?
A derivation path specifies how keys are derived from the master key in an HD wallet. It provides a structured way to navigate the hierarchy of keys. The path typically looks like this:
```
m / purpose' / coin_type' / account' / change / address_index
```
Each segment represents a level in the hierarchy:
- `m`: The master key (root).
- `purpose'`: A hardened value indicating the protocol purpose (e.g., `44'` for BIP-44).
- `coin_type'`: A hardened value specifying the cryptocurrency (e.g., `60'` for Ethereum).
- `account'`: A hardened value representing the account index (e.g., `0'` for the first account).
- `change`: Indicates external (`0`) or internal (`1`) addresses.
- `address_index`: The specific address index.

### Example: Ethereum Default Path
The default derivation path for Ethereum is:
```
m / 44' / 60' / 0' / 0 / 0
```
This path derives the first external address for the first account.

### How to Apply Derivation Paths in Code
You can derive child keys for each segment of the path by iterating through the levels:

In [4]:
def derive_from_path(
    master_key: bytes,
    chain_code: bytes,
    path: str
) -> tuple[bytes, bytes, bytes]:
    """
    Derive a key from a given derivation path.

    :param master_key: The master private key.
    :param chain_code: The master chain code.
    :param path: The derivation path (e.g., "m/44'/60'/0'/0/0").
    :return: The derived private key, chain code, and public key.
    """
    levels = path.split("/")[1:]  # Skip "m"
    key, code = master_key, chain_code

    for level in levels:
        if "'" in level:  # Hardened key
            index = 0x80000000 | int(level[:-1])
        else:  # Normal key
            index = int(level)

        print(f"Processing index: {index} (Hardened: {'Yes' if index & 0x80000000 else 'No'})")
        key, code, pub_key = derive_child_key(key, code, index)

        # Log the current state
        print(f"Derived private key: {key.hex()}")
        print(f"Derived chain code: {code.hex()}")
        print(f"Derived public key: {pub_key.hex()}")

    return key, code, pub_key
    
# Example usage
path = "m/44'/60'/0'/0/0" # Compare with Metamask
derived_key, derived_chain_code, derived_pub_key = derive_from_path(master_key, chain_code, path)
print(f"Derived Private Key: {derived_key.hex() if derived_key else None}")
print(f"Derived Chain Code: {derived_chain_code.hex()}")
print(f"Derived Public Key: {derived_pub_key.hex()}")

Processing index: 2147483692 (Hardened: Yes)
Index (4 bytes): 8000002c
HMAC input data: 00c4e4698f728127e084769296bd9736fe03f164fe14035dfa6c3359aa738eda5b8000002c
Child tweak: 6c2d225f8851a571c5f837422bce174f703cfd1f1d0c475fb49f94dc4cecace1, Child chain code: 4542e1ab8ac945b08b0a70ad6dc092d9c8bd06b992517ad2df5e11e009d2551d
Derived child private key: 31118beefad2cd524a6ec9d8e9654e4eb97f853681c7051e61008ff9f04545fb
Derived private key: 31118beefad2cd524a6ec9d8e9654e4eb97f853681c7051e61008ff9f04545fb
Derived chain code: 4542e1ab8ac945b08b0a70ad6dc092d9c8bd06b992517ad2df5e11e009d2551d
Derived public key: 0243eccffb1abb75a8f30d987a8c52148dc05035756d7cee25f92407d54fcb94e6
Processing index: 2147483708 (Hardened: Yes)
Index (4 bytes): 8000003c
HMAC input data: 0031118beefad2cd524a6ec9d8e9654e4eb97f853681c7051e61008ff9f04545fb8000003c
Child tweak: a961b4ab31725a74ac127e2582582224cc8f4e73d074a7881679ba7eb894cc8f, Child chain code: 9c228cc7708f39d381f5295fb6bb6f1adc19942b91a347817ee08fe6d95e6874


---

## Summary
In this notebook, we demonstrated the key steps to:
1. Convert a mnemonic to a seed.
2. Derive the master key and chain code from the seed.
3. Generate child keys using hierarchical deterministic derivation.
4. Understand and implement derivation paths.

Key takeaways:
- The chain code is critical for securely linking parent and child keys.
- Hardened derivation enhances security by preventing the exposure of parent private keys.
- Derivation paths provide a structured approach to navigate the hierarchy of HD Wallet keys.
