# Trading Time for Space: Adding TTL Caching to Your Python Functions

When optimizing algorithms, we often face the classic time vs. space complexity trade-off. Sometimes, it makes sense to use more memory to achieve faster execution times, especially when dealing with repeated operations. In this notebook, we'll explore how to add Time-To-Live (TTL) caching to a Python function that trades time for space complexity.

## The Problem

Let's start with a simple hash search function that converts an array into a dictionary for O(1) lookups:

In [None]:
def hash_search(array, item):
    if array is None or len(array) < 1: 
        return None
    m = {val : n for n, val in enumerate(array)}
    return m[item] if item in m else None

# Test the basic function
test_array = [1, 2, 3, 4, 5]
result = hash_search(test_array, 3)
print(f"Index of 3 in {test_array}: {result}")

This function trades time for space by creating a hash map (`m`) from the input array. While this gives us fast lookups, we rebuild the hash map on every function call, even for the same array. If we're searching the same arrays repeatedly, we're doing unnecessary work.

## The Solution: TTL Caching

We can cache the hash map with a Time-To-Live (TTL) mechanism, so subsequent calls with the same array reuse the cached dictionary until it expires.

### Functional Approach with Global Cache

In [None]:
# Install required package if not already installed
import subprocess
import sys

def install_package(package):
    try:
        __import__(package)
    except ImportError:
        print(f"Installing {package}...")
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])

install_package("cachetools")

In [None]:
from cachetools import TTLCache
import hashlib
import pickle

# Global cache with TTL (100 entries max, 5-minute expiration)
_hash_cache = TTLCache(maxsize=100, ttl=300)

def hash_search_cached(array, item):
    if array is None or len(array) < 1: 
        return None
    
    # Create a robust cache key
    try:
        # Try built-in hash first (fastest)
        array_key = hash(tuple(array))
    except TypeError:
        # Fallback for unhashable elements (lists, dicts, etc.)
        array_bytes = pickle.dumps(array)
        array_key = hashlib.md5(array_bytes).hexdigest()
    
    # Check cache first
    if array_key not in _hash_cache:
        print(f"Cache MISS - creating hash map for array of length {len(array)}")
        # Cache miss: create and store the hash map
        _hash_cache[array_key] = {val: n for n, val in enumerate(array)}
    else:
        print(f"Cache HIT - reusing hash map for array of length {len(array)}")
    
    # Use cached hash map
    m = _hash_cache[array_key]
    return m[item] if item in m else None

In [None]:
# Test the cached version
test_array = [1, 2, 3, 4, 5] * 1000  # Large array

print("First call (should be cache miss):")
result1 = hash_search_cached(test_array, 3)
print(f"Result: {result1}")

print("\nSecond call (should be cache hit):")
result2 = hash_search_cached(test_array, 4)
print(f"Result: {result2}")

print(f"\nCache size: {len(_hash_cache)}")

## Class-Based Approach: Better for Apps and Libraries

For building applications and libraries, a class-based approach provides better encapsulation, flexibility, and API design:

In [None]:
class HashSearcher:
    def __init__(self, ttl_seconds=300, maxsize=100):
        self._cache = TTLCache(maxsize=maxsize, ttl=ttl_seconds)
    
    def search(self, array, item):
        if array is None or len(array) < 1:
            return None
        
        # Create a robust cache key
        try:
            array_key = hash(tuple(array))
        except TypeError:
            array_bytes = pickle.dumps(array)
            array_key = hashlib.md5(array_bytes).hexdigest()
        
        # Check cache first
        if array_key not in self._cache:
            print(f"Cache MISS - creating hash map")
            self._cache[array_key] = {val: n for n, val in enumerate(array)}
        else:
            print(f"Cache HIT - reusing hash map")
        
        m = self._cache[array_key]
        return m[item] if item in m else None
    
    def clear_cache(self):
        """Manually clear the cache if needed"""
        self._cache.clear()
    
    def cache_info(self):
        """Get cache statistics"""
        return {
            'size': len(self._cache),
            'maxsize': self._cache.maxsize,
            'ttl': self._cache.ttl
        }
    
    def contains_array(self, array):
        """Check if array is already cached"""
        try:
            array_key = hash(tuple(array))
        except TypeError:
            array_bytes = pickle.dumps(array)
            array_key = hashlib.md5(array_bytes).hexdigest()
        return array_key in self._cache

In [None]:
# Test the class-based approach
searcher = HashSearcher(ttl_seconds=60, maxsize=50)

test_array = [10, 20, 30, 40, 50]
print("Testing HashSearcher:")
print(f"Search for 30: {searcher.search(test_array, 30)}")
print(f"Search for 40: {searcher.search(test_array, 40)}")
print(f"Cache info: {searcher.cache_info()}")
print(f"Array cached: {searcher.contains_array(test_array)}")

### Why Class-Based Is Better for Production

#### 1. Multiple Cache Configurations

In [None]:
# Different strategies for different use cases
fast_searcher = HashSearcher(ttl_seconds=60, maxsize=50)    # Quick operations
persistent_searcher = HashSearcher(ttl_seconds=3600, maxsize=200)  # Long-lived data
user_searcher = HashSearcher(ttl_seconds=1800, maxsize=10)  # Per-user cache

print("Multiple searchers created with different configurations:")
print(f"Fast: {fast_searcher.cache_info()}")
print(f"Persistent: {persistent_searcher.cache_info()}")
print(f"User: {user_searcher.cache_info()}")

#### 2. Enhanced API with Statistics

In [None]:
class AdvancedHashSearcher(HashSearcher):
    def __init__(self, ttl_seconds=300, maxsize=100, enable_stats=True):
        super().__init__(ttl_seconds, maxsize)
        self.enable_stats = enable_stats
        self.hit_count = 0
        self.miss_count = 0
    
    def search(self, array, item):
        if array is None or len(array) < 1:
            return None
        
        array_key = self._create_key(array)
        
        if array_key in self._cache:
            if self.enable_stats:
                self.hit_count += 1
                print(f"Cache HIT (total hits: {self.hit_count})")
        else:
            if self.enable_stats:
                self.miss_count += 1
                print(f"Cache MISS (total misses: {self.miss_count})")
        
        return super().search(array, item)
    
    def _create_key(self, array):
        try:
            return hash(tuple(array))
        except TypeError:
            array_bytes = pickle.dumps(array)
            return hashlib.md5(array_bytes).hexdigest()
    
    def get_hit_ratio(self):
        total = self.hit_count + self.miss_count
        return self.hit_count / total if total > 0 else 0

In [None]:
# Test the advanced searcher
advanced_searcher = AdvancedHashSearcher(ttl_seconds=300)
test_array = [1, 2, 3, 4, 5]

print("Testing AdvancedHashSearcher:")
for i in range(5):
    result = advanced_searcher.search(test_array, 3)
    
print(f"\nFinal hit ratio: {advanced_searcher.get_hit_ratio():.2%}")

## Performance Benefits

Let's see the improvement with a performance test:

In [None]:
import time

# Large array for testing
large_array = list(range(10000))

# Without caching - rebuilds hash map every time
print("Testing without caching...")
start = time.time()
for _ in range(100):  # Reduced iterations for notebook
    hash_search(large_array, 5000)
no_cache_time = time.time() - start
print(f"Without caching: {no_cache_time:.3f}s")

# With caching - builds hash map once, reuses 99 times
print("\nTesting with caching...")
searcher = HashSearcher()
start = time.time()
for _ in range(100):
    searcher.search(large_array, 5000)
cache_time = time.time() - start
print(f"With caching: {cache_time:.3f}s")

print(f"\nSpeedup: {no_cache_time/cache_time:.1f}x faster with caching")

## Library-Friendly Design

The class-based approach is perfect for building reusable libraries:

In [None]:
class DataSearchLibrary:
    def __init__(self):
        self.hash_searcher = HashSearcher(ttl_seconds=600)
        # Could add other search strategies here
    
    def find_in_array(self, array, item, method='auto'):
        """
        Find item in array using the best search method.
        
        Args:
            array: List to search in
            item: Item to find
            method: 'hash', 'linear', or 'auto'
        
        Returns:
            Index of item or None if not found
        """
        if method == 'hash' or (method == 'auto' and len(array) > 100):
            return self.hash_searcher.search(array, item)
        else:
            # Fallback to linear search for small arrays
            try:
                return array.index(item)
            except ValueError:
                return None
    
    def get_cache_stats(self):
        """Get performance statistics"""
        return self.hash_searcher.cache_info()

In [None]:
# Test the library
search_lib = DataSearchLibrary()

# Test with small array (should use linear search)
small_array = [1, 2, 3, 4, 5]
print(f"Small array search (auto): {search_lib.find_in_array(small_array, 3)}")

# Test with large array (should use hash search)
large_array = list(range(1000))
print(f"Large array search (auto): {search_lib.find_in_array(large_array, 500)}")
print(f"Cache stats: {search_lib.get_cache_stats()}")

# Force hash method on small array
print(f"Small array with hash method: {search_lib.find_in_array(small_array, 3, 'hash')}")

## Learn More About Cachetools

The `cachetools` library provides several powerful caching strategies beyond the TTL approach we used:

In [None]:
from cachetools import (
    LRUCache,    # Least Recently Used
    TTLCache,    # Time To Live  
    TLRUCache,   # Time-aware LRU
    LFUCache,    # Least Frequently Used
    RRCache,     # Random Replacement
    Cache        # Basic FIFO cache
)

# Demonstrate LRU vs TTL
print("=== LRU Cache Demo ===")
lru_cache = LRUCache(maxsize=3)

lru_cache['A'] = 1  
lru_cache['B'] = 2  
lru_cache['C'] = 3  
print(f"After adding A, B, C: {dict(lru_cache)}")

# Access A (moves it to "most recent")
value = lru_cache['A']  
print(f"After accessing A: {dict(lru_cache)}")

lru_cache['D'] = 4  # Evicts B (least recently used)
print(f"After adding D: {dict(lru_cache)}")

In [None]:
print("=== TTL Cache Demo ===")
ttl_cache = TTLCache(maxsize=100, ttl=2)  # 2-second expiration

ttl_cache['data'] = 'fresh_data'
print(f"Immediately after adding: {'data' in ttl_cache}")

time.sleep(1)
print(f"After 1 second: {'data' in ttl_cache}")

time.sleep(2)
print(f"After 3 seconds total: {'data' in ttl_cache}")

## Key Takeaways

1. **Class-based design** provides better encapsulation and flexibility for apps and libraries
2. **TTL caching** dramatically improves performance for repeated operations on the same data
3. **Robust key generation** prevents cache collisions and handles complex data types
4. **Configurable cache parameters** allow optimization for different use cases
5. **Additional methods** like `cache_info()` and `clear_cache()` provide operational control
6. **Thread safety** can be added when needed without changing the core API

By transforming our simple hash search function into a robust, cacheable class, we've created a reusable component that's perfect for applications and libraries. The class-based approach provides all the performance benefits of caching while maintaining clean, professional APIs that are easy to test, extend, and integrate into larger systems.