## Project Overview
Real-time 3D pose estimation and semantic analysis system for human movement analysis using multiple synchronized cameras. This project evolved from foundational planning (Meeting 1: Trevor & Jason, ~Sept 5) to implementation refinement (Meeting 2: Trevor, Jason & Henry, ~Sept 6-7, 2025).

### Core Applications:
- **Combat Sports**: Kickboxing analysis (jabs, crosses, hooks, kicks) with multi-person scenarios
- **Dance Performance**: Pose-to-sound mapping for immersive art installations (Wei Dong collaboration)
- **Medical/Recovery**: Gait analysis for concussion recovery tracking
- **Fitness**: Kettlebell exercise form analysis and technique correction
- **General Motion Capture**: Extensible framework for various movement analysis scenarios

**Demo Target**: September 24, 2025 in San Francisco with Wei Dong (Kineve/Kinetech Arts)

## Project Evolution & Meeting Insights

### Meeting 1 (Foundational Architecture, ~2 hours)
**Participants**: Trevor Conger (intern/student) & Jason (mentor/lead with IoT/sensors background)
**Focus**: Deep-dive architectural planning with educational emphasis
- Established pipeline blueprint: Capture → Normalize → Fuse → Model → Semantics
- Defined real-time as SLA (Service Level Agreement) with emergency response analogies
- Explored graph-based approach with explicit edges (wrist-elbow connections)
- Emphasized mentorship and foundational understanding

### Meeting 2 (Implementation Pivot, ~30-60 mins with tech issues)
**Participants**: Trevor, Jason & Henry (Trevor's classmate, 3D modeling expertise)
**Focus**: Tactical implementation with collaborative problem-solving
- **Key Shift**: From graph-based to point cloud approach (no explicit edges)
- Introduced OFF format and ModelNet integration strategy
- Added Henry's 3D mesh experience (Minecraft replicas, mesh simplification)
- Refined tools and setup procedures

## Technical Architecture

### 1. Input Capture & Synchronization
- **Hardware**: 3 synchronized cameras (Raspberry Pi cameras: 640x480, 240 FPS)
- **Synchronization Method**: Hollywood-style light-flash technique (hacky but effective)
- **Processing**: MediaPipe for pose estimation
  - Outputs 33 keypoints per frame per camera
  - Each keypoint has XYZ coordinates (99 columns total: 33 × 3)
  - Confidence scores available but ignored initially
- **Data Source**: KB_snatch_Sept5_prim folder (3 Parquet/CSV files from kettlebell footage)

### 2. Data Pipeline & Storage
- **Input Format**: CSV/Parquet files (3 files per session, one per camera)
- **Data Structure**: High frame rate (240 FPS) provides density; can downsample to 30 FPS
- **Database**: KDB (column-based time series with in-memory capabilities)
- **File Management**: GitHub for code, DVC for data versioning, UV for package management

### 3. Normalization & Fusion Strategy
- **Reference Frame**: MediaPipe provides world coordinates (standard reference frame)
- **Fusion Method**: For each frame and keypoint, combine XYZ from all 3 cameras
  - Creates triangular mesh or point set per keypoint
  - Results in sparse point cloud per frame (99 points total)
- **Alternative Approach**: Point cloud concatenation without explicit edge collapse

### 4. Data Transformation for ML
- **Target Format**: OFF (Object File Format) for compatibility with ModelNet
- **OFF Structure**:
  - Header: "OFF"
  - Counts: vertices (99), faces, edges (0)
  - Vertex list: XYZ coordinates per line
  - Face list: Triangles defined by 3-camera views per keypoint
- **Implementation**: Hack ModelNet class to use local OFF files instead of downloads

### 5. Machine Learning Framework
- **Primary Framework**: PyTorch Geometric
- **Model Architecture**: PointNet/PointNet++ for point cloud processing
  - **Evolution**: Shifted from graph-based (Meeting 1) to point cloud focus (Meeting 2)
  - Handles sparse point clouds without explicit edges
  - Uses distance functions for associations instead of predefined connections
- **Reference Implementation**: Weights & Biases PointNet example for architecture guidance
- **Data Classes**: Data/TemporalData for graph representation
- **Logging**: MLflow (preferred over Weights & Biases for cost/market relevance)
- **Scaling**: PyTorch Geometric auto-scales to multi-GPU (PaperSpace for cloud development)

### 6. Real-Time Processing Definition
- **Real-Time = SLA**: Service Level Agreement guaranteeing response within time window
- **Not**: Ultra-low latency, but predictable throughput
- **Target**: Based on input volume and processing needs
- **Hardware Scaling**: PyTorch Geometric auto-scales to multi-GPU

## Current Implementation Status

### Completed/Defined:
- ✅ Overall architecture and data flow
- ✅ Hardware selection and synchronization method  
- ✅ Data format specifications (CSV/Parquet → OFF)
- ✅ ML framework selection (PyTorch Geometric + PointNet)
- ✅ Database choice (KDB) and tooling (UV, GitHub, DVC)

### In Progress:
- 🔄 Data transformation pipeline (CSV/Parquet → OFF format)
- 🔄 ModelNet class modification for local file processing
- 🔄 Environment setup and repository structure

### Next Steps:
1. **Trevor/Henry Tasks**:
   - Process 3 Parquet/CSV files from KB_snatch_Sept5_prim folder
   - Implement CSV → OFF transformation
   - Research CLI tools (MeshLab) for format conversion
   - Setup development environment (UV, GitHub repo)
   - Study PointNet papers and PyTorch Geometric documentation

2. **Technical Validation**:
   - Test sparse mesh → point cloud conversion
   - Validate temporal encoding (single vs. multi-frame OFF)
   - Address sparsity challenges (99 points vs. typical 10k+ point clouds)

## Key Technical Challenges & Solutions

### Critical Issues Identified:
- **Sparsity Challenge**: Only 99 points/frame vs. typical dense point clouds (10k+)
  - *Risk*: PointNet may struggle with sparse data
  - *Solution*: Group frames for density (downsample 240→30 FPS for denser clouds)
  
- **Temporal Encoding**: Multi-frame sequences and TemporalData integration
  - *Question*: Single OFF file per frame vs. multi-frame encoding
  - *Approach*: Test TemporalData class for sequence handling
  
- **Fusion Accuracy**: Camera displacement creates "weak" triangular meshes
  - *Henry's Proposal*: Average XYZ coordinates across cameras per keypoint
  - *Status*: Deferred until baseline implementation works
  
- **Data Volume**: High frame rate (240 FPS) processing pressure
  - *Solutions*: Efficient batching, data loaders, modern hardware scaling

### Implementation Strategies:
- **"Straight Path First"**: Get basic pipeline working before optimization
- **Prototype-Driven**: End-to-end functionality over perfection
- **Collaborative Debugging**: Leverage team's diverse expertise
- **Hardware Scaling**: Local development → GPU cloud scaling (PaperSpace)

## Team Structure & Collaboration Dynamics

### Core Team:
- **Trevor Conger**: Data transformation implementation (intern/student, passionate collaborator)
- **Jason (Speaker 1)**: Hardware/IoT integration, project mentorship (background: healthcare simulations, Spark/MLpack)
- **Henry**: 3D modeling and ML aspects (Trevor's classmate, mesh expertise from Minecraft replicas)
- **Wei Dong**: External collaboration partner (Kineve/Kinetech Arts, dance/performance focus)

### Collaboration Evolution:
- **Meeting 1**: Mentor-student dynamic (Jason's educational approach, 2-hour deep dive)
- **Meeting 2**: Collaborative problem-solving (Henry's fresh perspective, practical questions)
- **Key Insight**: "Collaboration as energy source" - Wei's involvement sparked project restart

## Project Philosophy & Context

### Core Principles:
- **Prototype-First**: End-to-end functionality over perfection ("straight path first")
- **Collaborative Energy**: Team motivation essential (solo work draining, all-nighters unsustainable)
- **Accessible Technology**: Low-cost hardware (Raspberry Pi) for broader adoption
- **Modular Architecture**: Ingest → Fuse → Model → Semantics pipeline
- **Educational Value**: Learning through implementation (Trevor's PointNet paper reading assignments)

### Ethical Considerations:
- **Contrast with AI-Generated Work**: Trevor's discomfort with fully AI-generated projects (Gerard's ethics tool)
- **Hands-On Approach**: Emphasis on understanding and building rather than automated generation
- **Real-World Applications**: Focus on meaningful use cases (medical recovery, performance art)

### Broader Context:
- **Historical Connections**: Jason's background with Spark early days, MLpack collaborations
- **Industry Relevance**: Dance/performance art integration, medical applications, sports analysis
- **Timeline Pressure**: Demo deadline creates focused urgency while maintaining learning goals


In [1]:
from IPython.display import Video

Video("example.mp4")