# uubed Basic Usage

This notebook demonstrates the basic usage of uubed for encoding and decoding embedding vectors.

In [None]:
import numpy as np
from uubed import encode, decode

# Display version
import uubed
print(f"uubed version: {uubed.__version__}")

## 1. Basic Encoding and Decoding

Let's start with a simple example of encoding and decoding a small embedding vector.

In [None]:
# Create a random embedding vector
embedding = np.random.randint(0, 256, 32, dtype=np.uint8)
print(f"Original embedding shape: {embedding.shape}")
print(f"First 10 values: {embedding[:10]}")

In [None]:
# Encode using the default method (auto-selects based on size)
encoded = encode(embedding)
print(f"Encoded string: {encoded}")
print(f"Encoded length: {len(encoded)} characters")

In [None]:
# Decode back to bytes (only works with eq64 method)
encoded_eq64 = encode(embedding, method="eq64")
decoded = decode(encoded_eq64, method="eq64")
decoded_array = np.frombuffer(decoded, dtype=np.uint8)

print(f"Decoded shape: {decoded_array.shape}")
print(f"Decoded matches original: {np.array_equal(embedding, decoded_array)}")

## 2. Different Encoding Methods

uubed provides several encoding methods optimized for different use cases:

In [None]:
# Create a larger embedding for demonstration
large_embedding = np.random.randint(0, 256, 768, dtype=np.uint8)

# Compare different encoding methods
methods = {
    "eq64": "Full precision (lossless)",
    "shq64": "SimHash (compact, similarity-preserving)",
    "t8q64": "Top-K indices (sparse representation)",
    "zoq64": "Z-order (spatial locality)"
}

for method, description in methods.items():
    if method == "t8q64":
        encoded = encode(large_embedding, method=method, k=8)
    elif method == "shq64":
        encoded = encode(large_embedding, method=method, planes=64)
    else:
        encoded = encode(large_embedding, method=method)
    
    print(f"\n{method}: {description}")
    print(f"  Encoded: {encoded[:32]}...")
    print(f"  Length: {len(encoded)} characters")

## 3. Working with Different Input Types

uubed can handle various input formats:

In [None]:
# From bytes
byte_data = b"Hello, uubed! This is a test."
encoded_bytes = encode(byte_data, method="eq64")
print(f"From bytes: {encoded_bytes[:40]}...")

# From list of integers
int_list = [72, 101, 108, 108, 111]  # "Hello" in ASCII
encoded_list = encode(int_list, method="eq64")
print(f"From list: {encoded_list}")

# From numpy array
np_array = np.array([72, 101, 108, 108, 111], dtype=np.uint8)
encoded_array = encode(np_array, method="eq64")
print(f"From numpy: {encoded_array}")

## 4. Performance Comparison

Let's compare the performance of different encoding methods:

In [None]:
import time

# Generate test data
test_embeddings = [np.random.randint(0, 256, 768, dtype=np.uint8) for _ in range(100)]

# Benchmark each method
results = {}
for method in ["eq64", "shq64", "t8q64", "zoq64"]:
    start_time = time.time()
    
    for emb in test_embeddings:
        if method == "t8q64":
            encode(emb, method=method, k=8)
        else:
            encode(emb, method=method)
    
    elapsed = time.time() - start_time
    results[method] = elapsed

# Display results
print("Performance Results (100 embeddings, 768 dimensions):")
print("-" * 50)
for method, elapsed in sorted(results.items(), key=lambda x: x[1]):
    print(f"{method:8s}: {elapsed:.4f}s ({100/elapsed:.1f} embeddings/sec)")

## 5. Error Handling

uubed provides clear error messages for common issues:

In [None]:
# Invalid input values
try:
    invalid_embedding = [0, 100, 300, 50]  # 300 is out of range
    encode(invalid_embedding)
except ValueError as e:
    print(f"ValueError: {e}")

# Invalid decoding
try:
    compressed = encode(embedding, method="shq64")
    decode(compressed, method="shq64")  # shq64 is lossy, can't decode
except NotImplementedError as e:
    print(f"\nNotImplementedError: {e}")

## Summary

In this notebook, we covered:
1. Basic encoding and decoding with uubed
2. Different encoding methods and their use cases
3. Working with various input types
4. Performance comparison
5. Error handling

Key takeaways:
- **eq64** provides full precision and is the only reversible encoding
- **shq64** is best for compact similarity-preserving codes
- **t8q64** works well for sparse representations
- **zoq64** maintains spatial locality for range queries

Next, check out the streaming API notebook for handling large datasets!