feat: Add GGUF/llama-cpp backend to terraphim LLM proxy layer

## Problem

terraphim-ai has no local inference capability. All LLM calls go through remote APIs (OpenRouter via genai). For machines without GPU or API access, there is no fallback. GGUF models for MedGemma are available (`unsloth/medgemma-1.5-4b-it-GGUF`, 11.8K downloads on HuggingFace).

## Proposed Change

Add a `terraphim_llm_local` crate (or feature in `terraphim_multi_agent`) that wraps `llama-cpp-rs` for local GGUF inference. Implement the same LLM client trait so agents can transparently use local or remote models.

Key requirements:
- CPU-only inference support (many dev machines lack GPU)
- Automatic GGUF model download via `hf-hub` crate
- Quantization variant selection (Q4_K_M ~2.5GB for 4B models, Q8_0 for higher quality)
- Same trait interface as the remote genai client for seamless swapping

## Scope

- New crate `crates/terraphim_llm_local/` or feature gate in `terraphim_multi_agent`
- Dependencies: `llama-cpp-rs`, `hf-hub` (both approved)

## Context

This is UPLIFT-5 from the medgemma-competition multi-agent integration plan. Local GGUF inference is essential for development workflows where remote API calls are slow or unavailable. The MedGemma 1.5-4b-it GGUF model is the primary target for local inference.

Related upstream issues: #534, #535, #536

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add GGUF/llama-cpp backend to terraphim LLM proxy layer #538

Problem

Proposed Change

Scope

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: Add GGUF/llama-cpp backend to terraphim LLM proxy layer #538

Description

Problem

Proposed Change

Scope

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions