From 47d405e066e02577c35bcdf6f3e17f36253f907d Mon Sep 17 00:00:00 2001
From: Feng Zhang <f.feng.zhang@gmail.com>
Date: Thu, 24 Jul 2025 09:59:10 -0700
Subject: [PATCH] Clean up documents and add storage guides

---
 README.md                                     |  16 +-
 ...versioned-memory.md => ai-agent-memory.md} |   0
 ...xternal-storage.md => external-storage.md} |   0
 doc/guide/git-workflows.md                    | 776 ------------------
 doc/guide/{git-prolly.md => git.md}           |   0
 doc/guide/{git-prolly-sql.md => sql.md}       |   0
 doc/guide/storage.md                          | 399 +++++++++
 7 files changed, 405 insertions(+), 786 deletions(-)
 rename doc/design/{ai-agent-versioned-memory.md => ai-agent-memory.md} (100%)
 rename doc/design/{git-external-storage.md => external-storage.md} (100%)
 delete mode 100644 doc/guide/git-workflows.md
 rename doc/guide/{git-prolly.md => git.md} (100%)
 rename doc/guide/{git-prolly-sql.md => sql.md} (100%)
 create mode 100644 doc/guide/storage.md

diff --git a/README.md b/README.md
index 8ce0f6c..c7d06b6 100644
--- a/README.md
+++ b/README.md
@@ -221,16 +221,12 @@ The following features are for Prolly tree library for Version 0.2.0:
 The following features are for Prolly tree library for Version 0.2.1:
 - [X] tree diffing and merging examples
 - [X] show history of changes of the Prolly tree (git logs style) using `gitoxide` crate
-- [ ] support gluesql as a kv store
-
-The following features are for Prolly tree library for Version 0.2.2:
-- [ ] version-controlled databases
-  - [ ] support for IPFS (InterPlanetary File System) for distributed storage
-  - [ ] advanced probabilistic splitting algorithms
-- [ ] decentralized databases
-  - [ ] add support for cryptographic hash functions like SHA-256, BLAKE3, etc.
-  - [ ] support ZK-friendly hashes such as Poseidon, MiMC, etc.
-  - [ ] supporting a rollup-style L2 for a decentralized database
+- [X] support python bindings for Prolly Tree
+- [X] support sql query based on gluesql as a query engine
+- [X] add usage examples for git-prolly use cases
+- [X] add usage examples for ai agent memory use cases
+- [X] support rocksdb as storage backend
+- [ ] support ipdl as storage backend
 
 ## Contributing
 
diff --git a/doc/design/ai-agent-versioned-memory.md b/doc/design/ai-agent-memory.md
similarity index 100%
rename from doc/design/ai-agent-versioned-memory.md
rename to doc/design/ai-agent-memory.md
diff --git a/doc/design/git-external-storage.md b/doc/design/external-storage.md
similarity index 100%
rename from doc/design/git-external-storage.md
rename to doc/design/external-storage.md
diff --git a/doc/guide/git-workflows.md b/doc/guide/git-workflows.md
deleted file mode 100644
index 94ed26c..0000000
--- a/doc/guide/git-workflows.md
+++ /dev/null
@@ -1,776 +0,0 @@
-# Design of Development Workflows using Prollytree and Git
-
-A comprehensive guide to integrating git-prolly into your development workflows, covering both separate repository and monorepo approaches.
-
-## Table of Contents
-
-1. [Overview](#overview)
-2. [Repository Architecture Patterns](#repository-architecture-patterns)
-3. [Separate Repository Workflow](#separate-repository-workflow)
-4. [Monorepo Workflow](#monorepo-workflow)
-5. [Cross-Branch Data Testing](#cross-branch-data-testing)
-6. [Advanced Debugging Techniques](#advanced-debugging-techniques)
-7. [Best Practices](#best-practices)
-8. [Common Scenarios](#common-scenarios)
-
-## Overview
-
-git-prolly enables versioned key-value storage with full Git integration, allowing you to version your data alongside your code. This manual covers recommended workflows for different development scenarios.
-
-### Key Benefits
-- **Version Control**: Full history tracking for both code and data
-- **Branching**: Separate data states for different features/environments
-- **Collaboration**: Standard Git workflows for team development
-- **Debugging**: Test code against different data states
-- **Deployment**: Coordinate code and data deployments
-
-## Repository Architecture Patterns
-
-### Pattern 1: Separate Repositories
-```
-myapp/          (Main application repository)
-├── .git/
-├── src/
-├── Cargo.toml
-└── data/       (Git submodule → kv-data repo)
-
-kv-data/        (Separate KV data repository)
-├── .git/
-├── prolly_tree_root
-└── README.md
-```
-
-### Pattern 2: Monorepo (Single Repository)
-```
-myapp/          (Single repository)
-├── .git/
-├── src/        (Application code)
-├── config/     (KV data store)
-│   └── prolly_tree_root
-├── user-data/  (Another KV store)
-│   └── prolly_tree_root
-└── Cargo.toml
-```
-
-## Separate Repository Workflow
-
-### Setup
-
-#### 1. Create KV Data Repository
-```bash
-# Create and initialize KV data repository
-git clone --bare https://github.com/myteam/kv-data.git
-cd kv-data
-git-prolly init
-git-prolly set config:app:name "MyApp"
-git-prolly set config:app:version "1.0.0"
-git-prolly commit -m "Initial configuration"
-git push origin main
-```
-
-#### 2. Add KV Data as Submodule
-```bash
-# In your main application repository
-git submodule add https://github.com/myteam/kv-data.git data
-git commit -m "Add KV data submodule"
-```
-
-### Development Workflow
-
-#### Feature Development
-```bash
-# Start new feature
-git checkout -b feature/new-ui
-
-# Update KV data for this feature
-cd data
-git checkout -b feature/new-ui-config
-git-prolly set ui:theme "material"
-git-prolly set ui:layout "grid"
-git-prolly commit -m "Add new UI configuration"
-git push origin feature/new-ui-config
-cd ..
-
-# Update submodule reference
-git add data
-git commit -m "Update KV data for new UI feature"
-```
-
-#### Environment-Specific Branches
-```bash
-# Production KV data
-cd data
-git checkout production
-git-prolly set db:host "prod-db.example.com"
-git-prolly set cache:ttl "3600"
-git-prolly commit -m "Production configuration"
-cd ..
-
-# Staging KV data
-cd data
-git checkout staging
-git-prolly set db:host "staging-db.example.com"
-git-prolly set cache:ttl "300"
-git-prolly commit -m "Staging configuration"
-cd ..
-```
-
-### Using KV Data in Code
-```rust
-// src/main.rs
-use prollytree::git::VersionedKvStore;
-
-fn main() -> Result<(), Box<dyn std::error::Error>> {
-    // Open KV store from submodule
-    let store = VersionedKvStore::open("./data")?;
-    
-    let app_name = store.get(b"config:app:name")?;
-    let db_host = store.get(b"db:host")?;
-    
-    println!("Starting {} with database at {}", 
-        String::from_utf8_lossy(&app_name.unwrap_or_default()),
-        String::from_utf8_lossy(&db_host.unwrap_or_default())
-    );
-    
-    Ok(())
-}
-```
-
-### Deployment
-```bash
-# Deploy to production
-git checkout main
-cd data
-git checkout production  # Use production KV data
-cd ..
-git add data
-git commit -m "Deploy with production configuration"
-git push origin main
-
-# Deploy to staging
-git checkout staging
-cd data
-git checkout staging     # Use staging KV data
-cd ..
-git add data
-git commit -m "Deploy with staging configuration"
-git push origin staging
-```
-
-## Monorepo Workflow
-
-### Setup
-
-#### 1. Initialize Monorepo
-```bash
-# Create project structure
-mkdir myapp && cd myapp
-git init
-
-# Initialize KV stores
-mkdir config && cd config
-git-prolly init
-git-prolly set app:name "MyApp"
-git-prolly set app:version "1.0.0"
-git-prolly commit -m "Initial app configuration"
-cd ..
-
-mkdir user-data && cd user-data
-git-prolly init
-git-prolly set schema:version "1"
-git-prolly commit -m "Initial user data schema"
-cd ..
-
-# Add application code
-mkdir src
-echo 'fn main() { println!("Hello World"); }' > src/main.rs
-
-# Commit everything
-git add .
-git commit -m "Initial project setup"
-```
-
-### Development Workflow
-
-#### Feature Development
-```bash
-# Create feature branch
-git checkout -b feature/user-profiles
-
-# Update both code and KV data
-echo 'fn create_user_profile() {}' >> src/lib.rs
-
-cd config
-git-prolly set features:user_profiles "true"
-git-prolly set ui:profile_page "enabled"
-git-prolly commit -m "Enable user profiles feature"
-cd ..
-
-cd user-data
-git-prolly set schema:user_profile "name,email,created_at"
-git-prolly commit -m "Add user profile schema"
-cd ..
-
-# Commit all changes together
-git add .
-git commit -m "Implement user profiles feature"
-```
-
-#### Environment-Specific Configurations
-```bash
-# Production configuration
-git checkout main
-cd config
-git-prolly set db:host "prod-db.example.com"
-git-prolly set features:beta_features "false"
-git-prolly commit -m "Production settings"
-cd ..
-git add config/
-git commit -m "Update production configuration"
-
-# Staging configuration
-git checkout -b staging
-cd config
-git-prolly set db:host "staging-db.example.com"
-git-prolly set features:beta_features "true"
-git-prolly commit -m "Staging settings"
-cd ..
-git add config/
-git commit -m "Update staging configuration"
-```
-
-### Using Multiple KV Stores
-```rust
-// src/main.rs
-use prollytree::git::VersionedKvStore;
-
-fn main() -> Result<(), Box<dyn std::error::Error>> {
-    // Open multiple KV stores
-    let config_store = VersionedKvStore::open("./config")?;
-    let user_store = VersionedKvStore::open("./user-data")?;
-    
-    // Use configuration
-    let app_name = config_store.get(b"app:name")?;
-    let db_host = config_store.get(b"db:host")?;
-    
-    // Use user data schema
-    let schema = user_store.get(b"schema:user_profile")?;
-    
-    println!("App: {} | DB: {} | Schema: {}", 
-        String::from_utf8_lossy(&app_name.unwrap_or_default()),
-        String::from_utf8_lossy(&db_host.unwrap_or_default()),
-        String::from_utf8_lossy(&schema.unwrap_or_default())
-    );
-    
-    Ok(())
-}
-```
-
-## Cross-Branch Data Testing
-
-### The Problem
-You're working on a hotfix and need to test it against data from different branches/environments:
-- Production data (stable)
-- Staging data (recent changes)
-- Production-sample data (subset for testing)
-
-### Solution 1: Git Worktrees (Recommended)
-
-```bash
-# Create separate worktrees for different environments
-git worktree add ../myapp-staging staging
-git worktree add ../myapp-production production
-git worktree add ../myapp-sample production-sample
-
-# Test your hotfix against each environment
-cd ../myapp-staging
-cargo test -- --test-threads=1
-
-cd ../myapp-production
-cargo test -- --test-threads=1
-
-cd ../myapp-sample
-cargo test -- --test-threads=1
-
-# Clean up when done
-cd ../myapp
-git worktree remove ../myapp-staging
-git worktree remove ../myapp-production
-git worktree remove ../myapp-sample
-```
-
-### Solution 2: KV Data Branch Switching
-
-```bash
-#!/bin/bash
-# test_cross_branch.sh
-
-BRANCHES=("staging" "production" "production-sample")
-ORIGINAL_BRANCH=$(cd config && git-prolly current-branch)
-
-echo "Testing hotfix against multiple data branches..."
-
-for branch in "${BRANCHES[@]}"; do
-    echo "========================================="
-    echo "Testing against $branch data..."
-    
-    # Switch KV data to this branch
-    cd config
-    git-prolly checkout $branch
-    cd ..
-    
-    # Run tests
-    echo "Running tests with $branch data:"
-    cargo test --test integration_tests
-    
-    if [ $? -eq 0 ]; then
-        echo "✅ Tests PASSED with $branch data"
-    else
-        echo "❌ Tests FAILED with $branch data"
-    fi
-    
-    echo ""
-done
-
-# Restore original branch
-cd config
-git-prolly checkout $ORIGINAL_BRANCH
-cd ..
-
-echo "Cross-branch testing complete!"
-```
-
-### Solution 3: Programmatic Testing
-
-```rust
-// tests/cross_branch_test.rs
-use prollytree::git::VersionedKvStore;
-use std::process::Command;
-
-#[derive(Debug)]
-struct TestResult {
-    branch: String,
-    passed: bool,
-    details: String,
-}
-
-struct CrossBranchTester {
-    config_path: String,
-}
-
-impl CrossBranchTester {
-    fn new(config_path: &str) -> Self {
-        Self {
-            config_path: config_path.to_string(),
-        }
-    }
-    
-    fn test_against_branch(&self, branch: &str) -> Result<TestResult, Box<dyn std::error::Error>> {
-        // Switch to the test branch
-        let mut store = VersionedKvStore::open(&self.config_path)?;
-        let current_branch = store.current_branch().to_string();
-        
-        store.checkout(branch)?;
-        
-        // Run your hotfix logic
-        let result = self.run_hotfix_tests(&store);
-        
-        // Restore original branch
-        store.checkout(&current_branch)?;
-        
-        Ok(TestResult {
-            branch: branch.to_string(),
-            passed: result.is_ok(),
-            details: match result {
-                Ok(msg) => msg,
-                Err(e) => format!("Error: {}", e),
-            },
-        })
-    }
-    
-    fn run_hotfix_tests(&self, store: &VersionedKvStore<32>) -> Result<String, Box<dyn std::error::Error>> {
-        // Your actual hotfix testing logic
-        let db_host = store.get(b"db:host")?;
-        let timeout = store.get(b"db:timeout")?;
-        
-        // Simulate hotfix logic
-        match (db_host, timeout) {
-            (Some(host), Some(timeout_val)) => {
-                let host_str = String::from_utf8_lossy(&host);
-                let timeout_str = String::from_utf8_lossy(&timeout_val);
-                
-                // Your hotfix validation logic here
-                if host_str.contains("prod") && timeout_str.parse::<u32>().unwrap_or(0) > 1000 {
-                    Ok("Hotfix works correctly".to_string())
-                } else {
-                    Err("Hotfix validation failed".into())
-                }
-            }
-            _ => Err("Required configuration missing".into()),
-        }
-    }
-    
-    fn test_all_branches(&self) -> Result<Vec<TestResult>, Box<dyn std::error::Error>> {
-        let branches = vec!["staging", "production", "production-sample"];
-        let mut results = Vec::new();
-        
-        for branch in branches {
-            match self.test_against_branch(branch) {
-                Ok(result) => results.push(result),
-                Err(e) => {
-                    results.push(TestResult {
-                        branch: branch.to_string(),
-                        passed: false,
-                        details: format!("Error: {}", e),
-                    });
-                }
-            }
-        }
-        
-        Ok(results)
-    }
-}
-
-#[test]
-fn test_hotfix_cross_branch() {
-    let tester = CrossBranchTester::new("./config");
-    let results = tester.test_all_branches().unwrap();
-    
-    for result in results {
-        println!("Branch: {} - Passed: {} - Details: {}", 
-            result.branch, result.passed, result.details);
-        
-        // You can assert specific conditions here
-        // assert!(result.passed, "Hotfix failed for branch: {}", result.branch);
-    }
-}
-```
-
-## Advanced Debugging Techniques
-
-### 1. Historical Data Testing
-
-```bash
-# Test against specific historical commits
-cd config
-git-prolly checkout abc123def  # Specific commit
-cd ..
-cargo test
-
-# Test against tagged versions
-cd config
-git-prolly checkout v1.2.3
-cd ..
-cargo test
-```
-
-### 2. Data Diff Analysis
-
-```bash
-# Compare data between branches
-git-prolly diff production staging
-
-# Compare specific commits
-git-prolly diff abc123def def456abc
-
-# JSON output for automation
-git-prolly diff production staging --format=json > data_diff.json
-```
-
-### 3. Debugging with Multiple Datasets
-
-```rust
-// src/debug_tools.rs
-use prollytree::git::VersionedKvStore;
-
-pub fn debug_with_multiple_datasets() -> Result<(), Box<dyn std::error::Error>> {
-    let datasets = vec![
-        ("staging", "./config"),
-        ("production", "./config"),
-        ("production-sample", "./config"),
-    ];
-    
-    for (name, path) in datasets {
-        println!("=== Debugging with {} dataset ===", name);
-        
-        let mut store = VersionedKvStore::open(path)?;
-        store.checkout(name)?;
-        
-        // Your debugging logic here
-        debug_specific_issue(&store, name)?;
-    }
-    
-    Ok(())
-}
-
-fn debug_specific_issue(store: &VersionedKvStore<32>, dataset: &str) -> Result<(), Box<dyn std::error::Error>> {
-    // Example: Debug a specific configuration issue
-    let problematic_config = store.get(b"feature:problematic_feature")?;
-    
-    if let Some(config) = problematic_config {
-        println!("Dataset {}: problematic_feature = {}", 
-            dataset, String::from_utf8_lossy(&config));
-        
-        // Apply your fix logic and test
-        let result = test_fix_logic(&config);
-        println!("Fix result for {}: {:?}", dataset, result);
-    }
-    
-    Ok(())
-}
-
-fn test_fix_logic(config: &[u8]) -> bool {
-    // Your fix logic here
-    true
-}
-```
-
-## Best Practices
-
-### Repository Structure
-
-#### Separate Repositories
-```
-# Use when:
-- Teams manage data and code separately
-- Different release cycles for data and code
-- Multiple applications share the same data
-- Strict separation of concerns required
-
-# Benefits:
-- Clear ownership boundaries
-- Independent versioning
-- Reusable data across projects
-- Granular access control
-```
-
-#### Monorepo
-```
-# Use when:
-- Small team with unified workflow
-- Data and code are tightly coupled
-- Atomic updates required
-- Simple deployment pipeline
-
-# Benefits:
-- Atomic commits
-- Simplified dependency management
-- Unified testing and CI/CD
-- Easier refactoring
-```
-
-### Branch Strategy
-
-#### For Data Branches
-```bash
-# Environment branches
-main              # Production-ready
-staging          # Pre-production testing
-development      # Integration testing
-
-# Feature branches
-feature/new-ui-config     # UI configuration changes
-feature/api-v2-schema     # API schema updates
-hotfix/critical-config    # Critical configuration fixes
-```
-
-#### For Code Branches
-```bash
-# Standard Git flow
-main                    # Production code
-develop                 # Integration branch
-feature/new-feature     # Feature development
-hotfix/critical-fix     # Critical fixes
-```
-
-### Testing Strategy
-
-#### Unit Tests
-```rust
-#[cfg(test)]
-mod tests {
-    use super::*;
-    
-    #[test]
-    fn test_with_mock_data() {
-        // Test with controlled data
-        let mut store = create_test_store();
-        store.insert(b"test:key".to_vec(), b"test:value".to_vec()).unwrap();
-        
-        // Your test logic
-        assert_eq!(get_processed_value(&store), Some("expected".to_string()));
-    }
-    
-    fn create_test_store() -> VersionedKvStore<32> {
-        // Create a temporary store for testing
-        let temp_dir = tempfile::tempdir().unwrap();
-        VersionedKvStore::init(temp_dir.path()).unwrap()
-    }
-}
-```
-
-#### Integration Tests
-```rust
-#[cfg(test)]
-mod integration_tests {
-    use super::*;
-    
-    #[test]
-    fn test_with_real_data() {
-        // Test with real data from different branches
-        let tester = CrossBranchTester::new("./config");
-        let results = tester.test_all_branches().unwrap();
-        
-        for result in results {
-            assert!(result.passed, "Integration test failed for {}: {}", 
-                result.branch, result.details);
-        }
-    }
-}
-```
-
-### CI/CD Integration
-
-#### GitHub Actions Example
-```yaml
-# .github/workflows/test.yml
-name: Test with Multiple Datasets
-
-on: [push, pull_request]
-
-jobs:
-  test:
-    runs-on: ubuntu-latest
-    strategy:
-      matrix:
-        dataset: [staging, production, production-sample]
-    
-    steps:
-    - uses: actions/checkout@v2
-      with:
-        submodules: true  # For separate repo approach
-    
-    - name: Setup Rust
-      uses: actions-rs/toolchain@v1
-      with:
-        toolchain: stable
-    
-    - name: Switch to test dataset
-      run: |
-        cd config
-        git-prolly checkout ${{ matrix.dataset }}
-        cd ..
-    
-    - name: Run tests
-      run: cargo test
-```
-
-## Common Scenarios
-
-### Scenario 1: Feature Development with Data Changes
-
-```bash
-# Developer workflow
-git checkout -b feature/recommendation-engine
-
-# Update KV data
-cd config
-git-prolly set ml:model_version "2.1"
-git-prolly set ml:confidence_threshold "0.85"
-git-prolly commit -m "Update ML model configuration"
-cd ..
-
-# Update application code
-# ... make code changes ...
-
-# Test together
-cargo test
-
-# Commit everything
-git add .
-git commit -m "Implement recommendation engine v2.1"
-```
-
-### Scenario 2: Hotfix Testing
-
-```bash
-# Critical bug in production
-git checkout -b hotfix/memory-leak-fix
-
-# Fix the code
-vim src/memory_manager.rs
-
-# Test against production data
-cd config
-git-prolly checkout production
-cd ..
-cargo test --test memory_tests
-
-# Test against staging data
-cd config
-git-prolly checkout staging
-cd ..
-cargo test --test memory_tests
-
-# Deploy with confidence
-git checkout main
-git merge hotfix/memory-leak-fix
-```
-
-### Scenario 3: Environment Promotion
-
-```bash
-# Promote from staging to production
-git checkout staging
-
-# Verify staging tests pass
-cargo test
-
-# Update production KV data
-cd config
-git-prolly checkout production
-git-prolly merge staging
-git-prolly commit -m "Promote staging configuration to production"
-cd ..
-
-# Deploy to production
-git checkout main
-git merge staging
-git push origin main
-```
-
-### Scenario 4: Data Migration
-
-```bash
-# Migrate data schema
-cd config
-git-prolly branch migration/v2-schema
-git-prolly checkout migration/v2-schema
-
-# Update schema
-git-prolly set schema:version "2"
-git-prolly set schema:user_table "id,name,email,created_at,updated_at"
-git-prolly delete schema:legacy_fields
-git-prolly commit -m "Migrate to schema v2"
-
-# Test migration
-cd ..
-cargo test --test migration_tests
-
-# Merge when ready
-cd config
-git-prolly checkout main
-git-prolly merge migration/v2-schema
-```
-
-## Conclusion
-
-git-prolly provides powerful workflows for managing versioned key-value data alongside your application code. Whether you choose separate repositories or a monorepo approach, the key is to:
-
-1. **Maintain consistency** between code and data versions
-2. **Test thoroughly** across different data states
-3. **Use Git workflows** you're already familiar with
-4. **Automate testing** for multiple datasets
-5. **Document your patterns** for team consistency
-
-Choose the approach that best fits your team size, deployment complexity, and organizational structure. Both patterns provide robust solutions for different scenarios.
\ No newline at end of file
diff --git a/doc/guide/git-prolly.md b/doc/guide/git.md
similarity index 100%
rename from doc/guide/git-prolly.md
rename to doc/guide/git.md
diff --git a/doc/guide/git-prolly-sql.md b/doc/guide/sql.md
similarity index 100%
rename from doc/guide/git-prolly-sql.md
rename to doc/guide/sql.md
diff --git a/doc/guide/storage.md b/doc/guide/storage.md
new file mode 100644
index 0000000..ffbdb07
--- /dev/null
+++ b/doc/guide/storage.md
@@ -0,0 +1,399 @@
+# ProllyTree Storage Backends Guide
+
+ProllyTree supports multiple storage backends to meet different performance, persistence, and deployment requirements. This guide provides a comprehensive overview of each available storage backend, their characteristics, use cases, and configuration options.
+
+## Overview
+
+ProllyTree uses a pluggable storage architecture through the `NodeStorage` trait, allowing you to choose the appropriate backend for your specific needs:
+
+- **InMemoryNodeStorage**: Fast, volatile storage for development and testing
+- **FileNodeStorage**: Simple file-based persistence for local applications
+- **RocksDBNodeStorage**: High-performance LSM-tree storage for production workloads
+- **GitNodeStorage**: Git object store integration for development (experimental)
+
+## InMemoryNodeStorage
+
+### Description
+The in-memory storage backend keeps all ProllyTree nodes in a `HashMap` in RAM. This provides the fastest access times but offers no persistence across application restarts.
+
+### Characteristics
+- **Performance**: Fastest read/write operations
+- **Persistence**: None - data is lost when application terminates
+- **Memory Usage**: Entire tree stored in RAM
+- **Concurrency**: Thread-safe with internal locking
+- **Storage Overhead**: Minimal (just HashMap overhead)
+
+### Use Cases
+- **Unit testing**: Fast test execution without I/O overhead
+- **Development**: Quick prototyping and debugging
+- **Caching layer**: Temporary storage for frequently accessed data
+- **Small datasets**: When entire dataset fits comfortably in memory
+
+### Usage Example
+```rust
+use prollytree::storage::InMemoryNodeStorage;
+use prollytree::tree::{ProllyTree, Tree};
+use prollytree::config::TreeConfig;
+
+let storage = InMemoryNodeStorage::<32>::new();
+let config = TreeConfig::<32>::default();
+let mut tree = ProllyTree::new(storage, config);
+
+// Data will be lost when `tree` goes out of scope
+tree.insert(b"key".to_vec(), b"value".to_vec());
+```
+
+### Configuration
+The in-memory storage is self-contained and requires no configuration. It automatically manages memory allocation and cleanup.
+
+## FileNodeStorage
+
+### Description
+The file storage backend persists each ProllyTree node as a separate file on the filesystem using binary serialization. Configuration data is stored in separate files with a `config_` prefix.
+
+### Characteristics
+- **Performance**: Moderate - limited by filesystem I/O
+- **Persistence**: Full persistence across application restarts
+- **Storage Format**: Binary-serialized nodes (using bincode)
+- **File Organization**: One file per node, named by hash
+- **Platform Support**: Works on all platforms with filesystem access
+
+### Use Cases
+- **Local applications**: Desktop applications needing persistence
+- **Development**: When you need persistence but don't want database setup
+- **Small to medium datasets**: Up to thousands of nodes
+- **Debugging**: Easy to inspect individual node files
+
+### Usage Example
+```rust
+use prollytree::storage::FileNodeStorage;
+use prollytree::tree::{ProllyTree, Tree};
+use prollytree::config::TreeConfig;
+use std::path::PathBuf;
+
+let storage_dir = PathBuf::from("./prolly_data");
+let storage = FileNodeStorage::<32>::new(storage_dir);
+let config = TreeConfig::<32>::default();
+let mut tree = ProllyTree::new(storage, config);
+
+tree.insert(b"key".to_vec(), b"value".to_vec());
+// Data persists in ./prolly_data/ directory
+```
+
+### File Structure
+```
+prolly_data/
+├── a1b2c3d4e5f6... (node file - hex hash)
+├── f6e5d4c3b2a1... (node file - hex hash)
+├── config_tree_config (configuration file)
+└── config_custom_key (custom configuration)
+```
+
+### Limitations
+- **Scalability**: Performance degrades with large number of nodes
+- **Atomicity**: No atomic updates across multiple nodes
+- **Concurrent Access**: Not safe for concurrent writers
+
+## RocksDBNodeStorage
+
+### Description
+RocksDB storage provides a production-ready, high-performance backend using Facebook's RocksDB LSM-tree implementation. It's optimized for ProllyTree's content-addressed, write-heavy workload patterns.
+
+### Characteristics
+- **Performance**: High throughput for both reads and writes
+- **Persistence**: Durable storage with WAL (Write-Ahead Log)
+- **Scalability**: Handles millions of nodes efficiently
+- **Compression**: LZ4 for hot data, Zstd for cold data
+- **Caching**: Multi-level caching (LRU cache + RocksDB block cache)
+- **Compaction**: Background cleanup of obsolete data
+
+### Architecture
+```
+Application
+    ↓
+LRU Cache (1000 nodes default)
+    ↓
+RocksDB
+├── Write Buffer (128MB)
+├── Block Cache (512MB)
+├── Bloom Filters (10 bits/key)
+└── SST Files (compressed)
+```
+
+### Use Cases
+- **Production applications**: High-performance persistent storage
+- **Large datasets**: Millions of nodes and frequent updates
+- **Write-heavy workloads**: Frequent tree modifications
+- **Distributed systems**: Building block for distributed storage
+
+### Usage Example
+```rust
+use prollytree::storage::RocksDBNodeStorage;
+use prollytree::tree::{ProllyTree, Tree};
+use prollytree::config::TreeConfig;
+use std::path::PathBuf;
+
+// Basic usage
+let db_path = PathBuf::from("./rocksdb_data");
+let storage = RocksDBNodeStorage::<32>::new(db_path)?;
+let config = TreeConfig::<32>::default();
+let mut tree = ProllyTree::new(storage, config);
+
+// Custom cache size
+let storage = RocksDBNodeStorage::<32>::with_cache_size(db_path, 5000)?;
+
+// Custom RocksDB options
+let mut opts = RocksDBNodeStorage::<32>::default_options();
+opts.set_write_buffer_size(256 * 1024 * 1024); // 256MB
+let storage = RocksDBNodeStorage::<32>::with_options(db_path, opts)?;
+```
+
+### Configuration Options
+
+#### Default Optimizations
+- **Write Buffer**: 128MB for batching writes
+- **Memory Tables**: Up to 4 concurrent memtables
+- **Compression**: LZ4 for L0-L2, Zstd for bottom levels
+- **Block Cache**: 512MB for frequently accessed data
+- **Bloom Filters**: 10 bits per key for faster lookups
+
+#### Performance Tuning
+```rust
+use rocksdb::{Options, DBCompressionType, BlockBasedOptions, Cache};
+
+let mut opts = Options::default();
+
+// Increase write buffer for high write throughput
+opts.set_write_buffer_size(256 * 1024 * 1024);
+
+// More aggressive compression for storage efficiency
+opts.set_compression_type(DBCompressionType::Zstd);
+
+// Larger block cache for read-heavy workloads
+let cache = Cache::new_lru_cache(1024 * 1024 * 1024); // 1GB
+let mut block_opts = BlockBasedOptions::default();
+block_opts.set_block_cache(&cache);
+opts.set_block_based_table_factory(&block_opts);
+```
+
+### Batch Operations
+RocksDB storage supports efficient batch operations:
+
+```rust
+let nodes = vec![
+    (hash1, node1),
+    (hash2, node2),
+    (hash3, node3),
+];
+
+// Atomic batch insert
+storage.batch_insert_nodes(nodes)?;
+
+// Atomic batch delete
+storage.batch_delete_nodes(&[hash1, hash2])?;
+```
+
+### Monitoring and Maintenance
+- **Statistics**: RocksDB provides detailed performance metrics
+- **Compaction**: Automatic background compaction
+- **Backup**: Use RocksDB backup utilities for data safety
+- **Tuning**: Monitor write amplification and adjust settings
+
+## GitNodeStorage
+
+### Description
+The Git storage backend stores ProllyTree nodes as Git blob objects in a Git repository. This experimental backend is designed for development workflows where you want to leverage Git's content-addressable storage.
+
+### ⚠️ Important Limitations
+
+**Development Use Only**: GitNodeStorage should only be used for local development and experimentation. It is not suitable for production use due to several important limitations:
+
+1. **Dangling Objects**: ProllyTree nodes are stored as Git blob objects but are **not committed** to any branch or tag. These objects exist as "dangling" or "unreachable" objects in Git's object database.
+
+2. **Garbage Collection Risk**: Git's garbage collector (`git gc`) will **delete these dangling objects** during cleanup operations. This can happen:
+   - When running `git gc` manually
+   - Automatically during Git operations (push, pull, repack, etc.)
+   - When Git's automatic garbage collection triggers
+
+3. **Data Loss**: Since the objects are not referenced by any commit, branch, or tag, they will be permanently lost when garbage collected. There is no recovery mechanism.
+
+### Characteristics
+- **Storage Format**: Git blob objects (binary serialized nodes)
+- **Content Addressing**: Leverages Git's SHA-1 content addressing
+- **Persistence**: Temporary - objects can be garbage collected
+- **Integration**: Works with existing Git repositories
+- **Caching**: LRU cache for performance
+
+### Use Cases (Development Only)
+- **Git Integration Experiments**: Testing Git-based storage concepts
+- **Development Workflows**: Temporary storage during development
+- **Learning**: Understanding content-addressable storage
+- **Prototyping**: Rapid prototyping with Git infrastructure
+
+### Usage Example
+```rust
+// Only available with "git" feature
+#[cfg(feature = "git")]
+use prollytree::git::GitNodeStorage;
+
+let repo = gix::open(".")?;
+let dataset_dir = std::path::PathBuf::from("./git_data");
+let storage = GitNodeStorage::<32>::new(repo, dataset_dir)?;
+
+// ⚠️ WARNING: Data may be lost during git gc!
+let config = TreeConfig::<32>::default();
+let mut tree = ProllyTree::new(storage, config);
+tree.insert(b"key".to_vec(), b"value".to_vec());
+```
+
+### Data Safety Measures
+
+If you must use GitNodeStorage for development, consider these safety measures:
+
+1. **Disable Automatic GC**:
+   ```bash
+   git config gc.auto 0
+   git config gc.autopacklimit 0
+   ```
+
+2. **Create Temporary Commits** (advanced):
+   ```bash
+   # Periodically commit to preserve objects
+   git add -A
+   git commit -m "temp: preserve prolly objects"
+   ```
+
+3. **Use Separate Repository**:
+   Create a dedicated Git repository just for ProllyTree storage to avoid conflicts.
+
+### Architecture
+```
+ProllyTree Node
+    ↓
+Bincode Serialization
+    ↓
+Git Blob Object (dangling)
+    ↓
+Git Object Database
+    ↓
+⚠️ git gc → Deletion
+```
+
+## Storage Backend Comparison
+
+| Feature | InMemory | File | RocksDB | Git |
+|---------|----------|------|---------|-----|
+| **Persistence** | None | Full | Full | Temporary⚠️ |
+| **Performance** | Fastest | Moderate | High | Moderate |
+| **Scalability** | RAM-limited | Poor | Excellent | Poor |
+| **Setup Complexity** | None | None | Low | Medium |
+| **Production Ready** | No | Limited | Yes | No⚠️ |
+| **Concurrent Access** | Limited | No | Yes | Limited |
+| **Storage Overhead** | None | High | Low | Medium |
+| **Backup/Recovery** | N/A | File copy | RocksDB tools | Git tools |
+
+## Choosing the Right Backend
+
+### Development & Testing
+- **Unit Tests**: InMemoryNodeStorage
+- **Integration Tests**: FileNodeStorage or InMemoryNodeStorage
+- **Local Development**: FileNodeStorage or RocksDBNodeStorage
+
+### Production Deployments
+- **Small Applications**: FileNodeStorage (with careful consideration)
+- **High-Performance Applications**: RocksDBNodeStorage
+- **Distributed Systems**: RocksDBNodeStorage as foundation
+
+### Experimental
+- **Git Integration Research**: GitNodeStorage (development only)
+
+## Performance Benchmarks
+
+Run the storage comparison benchmarks to understand performance characteristics:
+
+```bash
+# Compare all available backends
+cargo bench --bench storage_bench --features rocksdb_storage
+
+# Run specific benchmark
+cargo bench --bench storage_bench storage_insert
+```
+
+## Migration Between Backends
+
+Currently, there's no built-in migration tool between storage backends. To migrate:
+
+1. **Export Data**: Iterate through the old storage and collect all key-value pairs
+2. **Create New Storage**: Initialize the target storage backend
+3. **Import Data**: Insert all data into the new storage
+4. **Validate**: Verify data integrity after migration
+
+Example migration pattern:
+```rust
+// Export from old storage
+let old_tree = ProllyTree::load_from_storage(old_storage, config.clone())?;
+let mut data = Vec::new();
+// ... collect all key-value pairs
+
+// Import to new storage
+let mut new_tree = ProllyTree::new(new_storage, config);
+for (key, value) in data {
+    new_tree.insert(key, value);
+}
+```
+
+## Best Practices
+
+### General
+- Choose the simplest backend that meets your requirements
+- Always benchmark with your specific data patterns
+- Consider backup and recovery procedures
+- Plan for data growth and scaling needs
+
+### InMemoryNodeStorage
+- Monitor memory usage to prevent OOM conditions
+- Use for temporary data only
+- Consider data loss implications
+
+### FileNodeStorage
+- Ensure adequate disk space and I/O performance
+- Implement application-level locking for concurrent access
+- Regular filesystem maintenance and monitoring
+
+### RocksDBNodeStorage
+- Monitor RocksDB metrics for performance tuning
+- Configure appropriate cache sizes for your workload
+- Plan for disk space and compaction overhead
+- Use batch operations for bulk updates
+
+### GitNodeStorage
+- **Never use in production**
+- Disable automatic garbage collection during development
+- Use dedicated Git repositories
+- Regularly backup important data to commits
+- Understand that data can be lost without warning
+
+## Troubleshooting
+
+### Common Issues
+
+#### OutOfMemory with InMemoryNodeStorage
+- Reduce dataset size or switch to persistent storage
+- Monitor heap usage and tune JVM/runtime parameters
+
+#### Poor Performance with FileNodeStorage
+- Check filesystem performance and available disk space
+- Consider switching to RocksDBNodeStorage for better performance
+- Reduce concurrent access patterns
+
+#### RocksDB Compilation Issues
+- Ensure proper build tools (cmake, C++ compiler)
+- Check RocksDB system dependencies
+- Use pre-built binaries if available
+
+#### Git Storage Data Loss
+- This is expected behavior - objects are not committed
+- Disable garbage collection or switch to persistent storage
+- Create periodic commits to preserve important data
+
+For additional help, consult the project documentation or open an issue on the GitHub repository.
\ No newline at end of file