# 🌐 Tensorus Tutorial 8: Multi-Modal - Text, Images, Tensors Together

## 🎯 Learning Objectives
- **Integrate** text, images, audio, and tensor data seamlessly
- **Build** cross-modal search and retrieval systems
- **Implement** multi-modal AI applications
- **Create** unified embeddings across data types
- **Deploy** production multi-modal workflows

**⏱️ Duration:** 30 minutes | **🎓 Level:** Expert

---

## 🌈 Multi-Modal Revolution

Tensorus enables **seamless integration** of different data modalities in a single, unified system - something impossible with traditional databases.

### 🎨 Multi-Modal Capabilities:

| Traditional Systems | **Tensorus Multi-Modal** |
|--------------------|---------------------------|
| Separate databases per type | 🌐 **Unified multi-modal storage** |
| Manual data alignment | 🔄 **Automatic cross-modal linking** |
| Limited search capabilities | 🔍 **Cross-modal semantic search** |
| Complex integration | ✨ **Native multi-modal operations** |
| Isolated embeddings | 🧠 **Unified embedding space** |
| Manual synchronization | ⚡ **Real-time multi-modal sync** |

### 🎯 Supported Modalities:

1. **📝 Text Data** - Documents, articles, code, natural language
2. **🖼️ Image Data** - Photos, diagrams, medical scans, satellite imagery
3. **🎵 Audio Data** - Speech, music, environmental sounds
4. **📊 Tensor Data** - Neural networks, scientific data, simulations
5. **📹 Video Data** - Movies, surveillance, time-series imagery
6. **🗂️ Structured Data** - Tables, graphs, knowledge bases
7. **🧬 Scientific Data** - Molecular structures, genomics, physics
8. **🌍 Geospatial Data** - Maps, GPS tracks, geographic information

### 🚀 Revolutionary Features:
- **🔍 Cross-Modal Search**: Find images using text descriptions
- **🧠 Unified Embeddings**: Single vector space for all data types
- **🔄 Automatic Alignment**: Link related data across modalities
- **📊 Multi-Modal Analytics**: Analyze patterns across data types
- **🤖 AI Integration**: Native support for multi-modal AI models

**🌟 Result: The world's first truly multi-modal tensor database!**

In [None]:
# 🛠️ Setup: Advanced Multi-Modal Framework
import torch
import torch.nn as nn
import numpy as np
import requests
import json
import time
import base64
import io
from typing import Dict, List, Tuple, Optional, Any, Union
from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum
import matplotlib.pyplot as plt
import seaborn as sns
from PIL import Image
import warnings
warnings.filterwarnings('ignore')

# Set visualization style
plt.style.use('seaborn-v0_8')
sns.set_palette("Set2")

class ModalityType(Enum):
    """Supported data modalities"""
    TEXT = "text"
    IMAGE = "image"
    AUDIO = "audio"
    VIDEO = "video"
    TENSOR = "tensor"
    STRUCTURED = "structured"
    SCIENTIFIC = "scientific"
    GEOSPATIAL = "geospatial"

@dataclass
class MultiModalData:
    """Container for multi-modal data with unified metadata"""
    data_id: str
    modality: ModalityType
    content: Any
    embedding: Optional[torch.Tensor] = None
    metadata: Dict[str, Any] = field(default_factory=dict)
    relationships: List[str] = field(default_factory=list)
    tags: List[str] = field(default_factory=list)
    created_at: datetime = field(default_factory=datetime.now)
    
    def __post_init__(self):
        if not self.data_id:
            self.data_id = f"{self.modality.value}_{int(time.time()*1000)}"

@dataclass
class CrossModalQuery:
    """Cross-modal search query"""
    query_modality: ModalityType
    query_content: Any
    target_modalities: List[ModalityType]
    similarity_threshold: float = 0.7
    max_results: int = 10
    filters: Dict[str, Any] = field(default_factory=dict)

class MultiModalEmbedder:
    """Advanced multi-modal embedding system"""
    
    def __init__(self):
        self.embedding_models = {}
        self.embedding_dim = 512  # Unified embedding dimension
        self._initialize_models()
    
    def _initialize_models(self):
        """Initialize embedding models for different modalities"""
        print("🧠 Initializing multi-modal embedding models...")
        
        # Text embedder (simplified)
        self.embedding_models[ModalityType.TEXT] = self._create_text_embedder()
        
        # Image embedder (simplified)
        self.embedding_models[ModalityType.IMAGE] = self._create_image_embedder()
        
        # Tensor embedder
        self.embedding_models[ModalityType.TENSOR] = self._create_tensor_embedder()
        
        print("✅ Multi-modal embedders initialized")
    
    def _create_text_embedder(self) -> nn.Module:
        """Create text embedding model"""
        class SimpleTextEmbedder(nn.Module):
            def __init__(self, vocab_size=10000, embed_dim=512):
                super().__init__()
                self.embedding = nn.Embedding(vocab_size, embed_dim)
                self.pooling = nn.AdaptiveAvgPool1d(1)
            
            def forward(self, text_tokens):
                embedded = self.embedding(text_tokens)
                pooled = self.pooling(embedded.transpose(1, 2)).squeeze(-1)
                return pooled
        
        return SimpleTextEmbedder()
    
    def _create_image_embedder(self) -> nn.Module:
        """Create image embedding model"""
        class SimpleImageEmbedder(nn.Module):
            def __init__(self, embed_dim=512):
                super().__init__()
                self.conv_layers = nn.Sequential(
                    nn.Conv2d(3, 64, 3, padding=1),
                    nn.ReLU(),
                    nn.AdaptiveAvgPool2d((8, 8)),
                    nn.Conv2d(64, 128, 3, padding=1),
                    nn.ReLU(),
                    nn.AdaptiveAvgPool2d((4, 4)),
                    nn.Flatten(),
                    nn.Linear(128 * 4 * 4, embed_dim)
                )
            
            def forward(self, images):
                return self.conv_layers(images)
        
        return SimpleImageEmbedder()
    
    def _create_tensor_embedder(self) -> nn.Module:
        """Create tensor embedding model"""
        class TensorEmbedder(nn.Module):
            def __init__(self, embed_dim=512):
                super().__init__()
                self.projection = nn.Linear(1024, embed_dim)  # Assume max 1024 features
            
            def forward(self, tensors):
                # Flatten and project tensors to unified space
                flattened = tensors.flatten(start_dim=1)
                # Pad or truncate to 1024 dimensions
                if flattened.shape[1] > 1024:
                    flattened = flattened[:, :1024]
                elif flattened.shape[1] < 1024:
                    padding = torch.zeros(flattened.shape[0], 1024 - flattened.shape[1])
                    flattened = torch.cat([flattened, padding], dim=1)
                
                return self.projection(flattened)
        
        return TensorEmbedder()
    
    def embed_text(self, text: str) -> torch.Tensor:
        """Generate embedding for text"""
        # Simplified tokenization (in practice, use proper tokenizer)
        tokens = torch.randint(0, 1000, (1, min(len(text.split()), 50)))
        
        with torch.no_grad():
            embedding = self.embedding_models[ModalityType.TEXT](tokens)
        
        return embedding.squeeze(0)
    
    def embed_image(self, image_array: np.ndarray) -> torch.Tensor:
        """Generate embedding for image"""
        # Convert to tensor and normalize
        if len(image_array.shape) == 3:
            image_tensor = torch.from_numpy(image_array).permute(2, 0, 1).unsqueeze(0).float() / 255.0
        else:
            # Create dummy RGB image if grayscale or other format
            image_tensor = torch.randn(1, 3, 224, 224)
        
        with torch.no_grad():
            embedding = self.embedding_models[ModalityType.IMAGE](image_tensor)
        
        return embedding.squeeze(0)
    
    def embed_tensor(self, tensor: torch.Tensor) -> torch.Tensor:
        """Generate embedding for arbitrary tensor"""
        with torch.no_grad():
            embedding = self.embedding_models[ModalityType.TENSOR](tensor.unsqueeze(0))
        
        return embedding.squeeze(0)
    
    def compute_similarity(self, embedding1: torch.Tensor, embedding2: torch.Tensor) -> float:
        """Compute cosine similarity between embeddings"""
        cos_sim = torch.nn.functional.cosine_similarity(embedding1, embedding2, dim=0)
        return cos_sim.item()

class MultiModalSystem:
    """Advanced multi-modal data management system"""
    
    def __init__(self, api_url: str = "http://127.0.0.1:7860"):
        self.api_url = api_url
        self.server_available = self._test_connection()
        self.embedder = MultiModalEmbedder()
        self.data_store: Dict[str, MultiModalData] = {}
        self.modality_indices: Dict[ModalityType, List[str]] = {modality: [] for modality in ModalityType}
        
        print(f"🌐 Multi-Modal System Initialized")
        print(f"📡 Tensorus: {'✅ Connected' if self.server_available else '⚠️ Local Mode'}")
    
    def _test_connection(self) -> bool:
        try:
            response = requests.get(f"{self.api_url}/health", timeout=3)
            return response.status_code == 200
        except:
            return False
    
    def add_text_data(self, text: str, metadata: Dict[str, Any] = None, tags: List[str] = None) -> MultiModalData:
        """Add text data to the multi-modal system"""
        print(f"\n📝 Adding text data: '{text[:50]}{'...' if len(text) > 50 else ''}'")
        
        # Generate embedding
        embedding = self.embedder.embed_text(text)
        
        # Create multi-modal data object
        data = MultiModalData(
            data_id="",  # Will be auto-generated
            modality=ModalityType.TEXT,
            content=text,
            embedding=embedding,
            metadata=metadata or {},
            tags=tags or []
        )
        
        # Store in system
        self.data_store[data.data_id] = data
        self.modality_indices[ModalityType.TEXT].append(data.data_id)
        
        # Store in Tensorus if available
        if self.server_available:
            self._store_in_tensorus(data)
        
        print(f"   ✅ Stored as {data.data_id}")
        print(f"   🧠 Embedding shape: {embedding.shape}")
        
        return data
    
    def add_image_data(self, image_array: np.ndarray, metadata: Dict[str, Any] = None, tags: List[str] = None) -> MultiModalData:
        """Add image data to the multi-modal system"""
        print(f"\n🖼️ Adding image data: {image_array.shape}")
        
        # Generate embedding
        embedding = self.embedder.embed_image(image_array)
        
        # Create multi-modal data object
        data = MultiModalData(
            data_id="",  # Will be auto-generated
            modality=ModalityType.IMAGE,
            content=image_array,
            embedding=embedding,
            metadata=metadata or {},
            tags=tags or []
        )
        
        # Store in system
        self.data_store[data.data_id] = data
        self.modality_indices[ModalityType.IMAGE].append(data.data_id)
        
        # Store in Tensorus if available
        if self.server_available:
            self._store_in_tensorus(data)
        
        print(f"   ✅ Stored as {data.data_id}")
        print(f"   🧠 Embedding shape: {embedding.shape}")
        
        return data
    
    def add_tensor_data(self, tensor: torch.Tensor, metadata: Dict[str, Any] = None, tags: List[str] = None) -> MultiModalData:
        """Add tensor data to the multi-modal system"""
        print(f"\n📊 Adding tensor data: {tensor.shape} {tensor.dtype}")
        
        # Generate embedding
        embedding = self.embedder.embed_tensor(tensor)
        
        # Create multi-modal data object
        data = MultiModalData(
            data_id="",  # Will be auto-generated
            modality=ModalityType.TENSOR,
            content=tensor,
            embedding=embedding,
            metadata=metadata or {},
            tags=tags or []
        )
        
        # Store in system
        self.data_store[data.data_id] = data
        self.modality_indices[ModalityType.TENSOR].append(data.data_id)
        
        # Store in Tensorus if available
        if self.server_available:
            self._store_in_tensorus(data)
        
        print(f"   ✅ Stored as {data.data_id}")
        print(f"   🧠 Embedding shape: {embedding.shape}")
        
        return data
    
    def _store_in_tensorus(self, data: MultiModalData):
        """Store multi-modal data in Tensorus"""
        try:
            # Prepare payload based on modality
            if data.modality == ModalityType.TEXT:
                content_data = data.content
            elif data.modality == ModalityType.IMAGE:
                content_data = data.content.tolist()  # Convert numpy array
            elif data.modality == ModalityType.TENSOR:
                content_data = data.content.tolist()  # Convert tensor
            else:
                content_data = str(data.content)  # Fallback to string
            
            payload = {
                "data_id": data.data_id,
                "modality": data.modality.value,
                "content": content_data,
                "embedding": data.embedding.tolist(),
                "metadata": data.metadata,
                "tags": data.tags,
                "created_at": data.created_at.isoformat()
            }
            
            requests.post(f"{self.api_url}/api/v1/multimodal/data", json=payload)
        
        except Exception as e:
            print(f"   ⚠️ Failed to store in Tensorus: {e}")
    
    def cross_modal_search(self, query: CrossModalQuery) -> List[Tuple[MultiModalData, float]]:
        """Perform cross-modal search across different data types"""
        print(f"\n🔍 Cross-Modal Search")
        print(f"   📥 Query modality: {query.query_modality.value}")
        print(f"   🎯 Target modalities: {[m.value for m in query.target_modalities]}")
        
        # Generate query embedding
        if query.query_modality == ModalityType.TEXT:
            query_embedding = self.embedder.embed_text(query.query_content)
        elif query.query_modality == ModalityType.IMAGE:
            query_embedding = self.embedder.embed_image(query.query_content)
        elif query.query_modality == ModalityType.TENSOR:
            query_embedding = self.embedder.embed_tensor(query.query_content)
        else:
            raise ValueError(f"Unsupported query modality: {query.query_modality}")
        
        # Search across target modalities
        results = []
        
        for modality in query.target_modalities:
            for data_id in self.modality_indices[modality]:
                data = self.data_store[data_id]
                
                # Compute similarity
                similarity = self.embedder.compute_similarity(query_embedding, data.embedding)
                
                # Apply threshold and filters
                if similarity >= query.similarity_threshold:
                    # Apply metadata filters
                    if self._passes_filters(data, query.filters):
                        results.append((data, similarity))
        
        # Sort by similarity and limit results
        results.sort(key=lambda x: x[1], reverse=True)
        results = results[:query.max_results]
        
        print(f"   📊 Found {len(results)} matching results")
        
        return results
    
    def _passes_filters(self, data: MultiModalData, filters: Dict[str, Any]) -> bool:
        """Check if data passes metadata filters"""
        for key, value in filters.items():
            if key in data.metadata:
                if data.metadata[key] != value:
                    return False
            elif key in data.tags:
                if value not in data.tags:
                    return False
        return True
    
    def create_relationships(self, data_ids: List[str], relationship_type: str = "related"):
        """Create relationships between multi-modal data items"""
        print(f"\n🔗 Creating relationships: {relationship_type}")
        print(f"   📋 Data IDs: {data_ids}")
        
        # Add bidirectional relationships
        for i, data_id in enumerate(data_ids):
            if data_id in self.data_store:
                other_ids = [did for j, did in enumerate(data_ids) if i != j]
                self.data_store[data_id].relationships.extend(other_ids)
        
        print(f"   ✅ Relationships created")
    
    def get_system_summary(self) -> Dict[str, Any]:
        """Get comprehensive system summary"""
        modality_counts = {modality.value: len(ids) for modality, ids in self.modality_indices.items()}
        
        return {
            "total_items": len(self.data_store),
            "modality_distribution": modality_counts,
            "embedding_dimension": self.embedder.embedding_dim,
            "server_connected": self.server_available,
            "supported_modalities": [m.value for m in ModalityType]
        }

# Initialize multi-modal system
multimodal_system = MultiModalSystem()

print("\n🌐 MULTI-MODAL TUTORIAL")
print("=" * 50)
print(f"🎯 Ready to explore cross-modal AI capabilities!")
print(f"🌈 Supported modalities: {len(ModalityType)} types")