## Problem: Write a Byte Pain Encoder in Python

### Problem Statement
Implement a **Transformer model** in PyTorch by completing the required sections. The model should consist of an embedding layer, a Transformer encoder, and an output layer for sequence processing and prediction.

### Requirements
1. **Define the Transformer Model Architecture**:
   - **Embedding Layer**:
     - Implement a layer to transform input data into a higher-dimensional space.
     - Use a `torch.nn.Linear` or `torch.nn.Embedding` layer to create embeddings from the input.
   - **Transformer Encoder**:
     - Use `torch.nn.TransformerEncoder` or `torch.nn.Transformer` to process sequences with attention.
     - Configure parameters such as the number of attention heads and encoder layers.
   - **Output Layer**:
     - Add a fully connected (linear) layer to reduce the transformer's sequence output into the desired output dimension.

2. **Implement the Forward Method**:
   - Map the input to the higher-dimensional space using the embedding layer.
   - Pass the transformed input through the Transformer encoder.
   - Use the output layer to convert the encoded sequence into predictions.

### Constraints
- Handle input padding correctly for variable-length sequences.
- Ensure compatibility with batch processing by correctly shaping input and output tensors.


In [None]:
from collections import defaultdict, Counter

def get_vocab(corpus):
    """Creates a vocabulary with words split into characters and a special end-of-word token."""
    
    return vocab

def get_stats(vocab):
    """Counts frequency of adjacent symbol pairs."""
    
    return pairs

def merge_vocab(pair, vocab):
    """Merges the most frequent pair into a single symbol."""
    
    return new_vocab

def byte_pair_encoding(corpus, num_merges=10):
    """Performs BPE on a corpus."""
    
    return vocab, merges

# Example usage
corpus = ["low", "lowest", "newer", "wider"]
final_vocab, merge_operations = byte_pair_encoding(corpus, num_merges=10)

print("\nFinal Vocabulary:")
for word in final_vocab:
    print(' '.join(word), ":", final_vocab[word])


In [None]:
def test_get_vocab():
    corpus = ["test"]
    vocab = get_vocab(corpus)
    assert vocab == {('t', 'e', 's', 't', '</w>'): 1}
    print("✓ test_get_vocab passed")

def test_get_stats():
    vocab = {('t', 'e', 's', 't', '</w>'): 1}
    stats = get_stats(vocab)
    expected = {
        ('t', 'e'): 1,
        ('e', 's'): 1,
        ('s', 't'): 1,
        ('t', '</w>'): 1
    }
    assert stats == expected
    print("✓ test_get_stats passed")

def test_merge_vocab():
    vocab = {('t', 'e', 's', 't', '</w>'): 1}
    merged = merge_vocab(('e', 's'), vocab)
    expected = {('t', 'es', 't', '</w>'): 1}
    assert merged == expected
    print("✓ test_merge_vocab passed")

def test_bpe_sequence():
    corpus = ["low", "lower", "newest", "widest"]
    final_vocab, merges = byte_pair_encoding(corpus, num_merges=5)
    assert isinstance(final_vocab, dict)
    assert all(isinstance(pair, tuple) for pair in merges)
    assert len(merges) == 5
    print("✓ test_bpe_sequence passed")

# Run all tests
test_get_vocab()
test_get_
