02 Product Overview.md

Product Introduction

Quiver is a pure .NET embedded vector database with zero native dependencies, running as an in-process library without requiring standalone database server deployment. It draws on EF Core's DbContext design pattern, allowing developers to define entities, payload fields, and indexing strategies through declarative attributes such as [QuiverKey], [QuiverVector], [QuiverLargeField], and [QuiverIndex], with the framework automatically completing model discovery, index construction, and persistence management at runtime.

Core Capabilities at a Glance:

Code-First Declarative Modeling — Like EF Core, annotate entity classes with attributes, and the framework automatically discovers and registers QuiverSet<T> collections via reflection — zero configuration required.
Multiple ANN Index Algorithms — Built-in Flat (brute-force search), HNSW (Hierarchical Navigable Small World graph), IVF (Inverted File Index), and KDTree indexes, covering the full range from small-scale exact search to million-scale approximate search.
Binary-First Persistence (v4 segmented) — Primary storage always uses the high-performance v4 binary format (QDB\x04). SaveAsync writes a full atomic snapshot, while AppendAsync / FlushTombstonesAsync write new segments and only rewrite the footer, giving O(Δ) disk cost without WAL-style memory doubling. JSON and XML remain as ExportAsync / ImportAsync side channels.
Mmap Vector Storage — VectorMemoryMode.MemoryMapped / Auto backs vector arenas with a read-only MemoryMappedFile view over the VectorBlob segment, dropping resident memory for large vector sets while keeping search SIMD-friendly.
Non-InMemory Vector Fields & Large Fields — [QuiverVector(MemoryMode = ...)] (partial property + source generator) loads vector payloads on demand; [QuiverLargeField] byte[] keeps large binary payloads in a dedicated Blob segment outside EntityMeta.
Background Auto-Merge & File Utilities — EnableBackgroundMerge triggers MaybeAutoMergeAsync after appends; QuiverDbFile.InspectAsync / MergeAsync enable per-segment CRC verification and multi-file merge with FirstWriterWins / LastWriterWins policies.
Out-of-the-box Concurrency Safety — QuiverSet<T> internally implements reader-writer separation locks via ReaderWriterLockSlim, making concurrent multi-threaded searching and writing inherently safe without external locking.
9 Distance Metrics + Custom Similarity — Built-in Cosine, Euclidean, DotProduct, Manhattan, Chebyshev, Pearson, Hamming, Jaccard, Canberra. Also supports user-defined ISimilarity<float> implementations via CustomSimilarity attribute.
SIMD Hardware Acceleration — All similarity implementations use internal VectorMath helpers backed by Vector<float> SIMD instructions, auto-adapting to SSE4 / AVX2 / AVX-512 register widths without System.Numerics.Tensors.
Schema Migration

Typical Use Cases: Semantic search, RAG (Retrieval-Augmented Generation), face recognition, image-to-image search, recommendation systems, multimodal retrieval, etc.

⚠️ Native AOT Compatibility: Quiver is not compatible with Native AOT publishing. The framework relies on runtime reflection to discover QuiverSet<T> properties and scan [QuiverKey] / [QuiverVector] / [QuiverIndex] / [QuiverLargeField] attributes, and compiles expression-tree accessors (Expression.Lambda(...).Compile()) at startup — both of which are unsupported under Native AOT. Quiver targets standard JIT / .NET 10 runtimes only.

Creation Overview

The inspiration for creating Quiver can be traced back to my development of the Vorcyc.AwesomeAI.Ash class, which provided simple vector storage and retrieval functionality to meet lightweight semantic search needs. Although Ash pursued minimalism and ease of use, as application scenarios evolved, its design bottlenecks became increasingly apparent:

Non-customizable table structure — Ash's storage architecture was internally fixed by the framework. Users could only access data according to a preset field layout and could not freely define entity properties and structures based on business requirements. This limitation was particularly prominent when designing differentiated data models for different scenarios (such as face recognition, document retrieval, and multimodal search).
Only brute-force search supported — Ash's retrieval method was brute-force search, traversing each record and computing similarity one by one, with time complexity O(n×d). While acceptable for small data volumes, search latency increased dramatically when vector scale grew to tens or even hundreds of thousands. The lack of Approximate Nearest Neighbor (ANN) index support made it unsuitable for production scenarios requiring fast response times.
No concurrent operations supported — Ash's internal data structures had no thread synchronization protection. Performing simultaneous read and write operations in a multi-threaded environment would cause data races and unpredictable exceptions. For server-side scenarios requiring concurrent queries (such as an ASP.NET Web API handling multiple search requests simultaneously), users had to add external locks themselves, which increased usage complexity and could easily lead to performance bottlenecks or deadlock risks due to improper lock granularity.

While reflecting on these pain points, EF Core's design philosophy provided key inspiration — especially its "Code-First" concept: developers simply annotate entity class properties with attributes, and the framework automatically completes model discovery, relationship mapping, and data persistence, all in a declarative and non-intrusive manner. The Python library Annoy (Approximate Nearest Neighbors Oh Yeah) also provided inspiration, but its .NET wrapper HNSWSharp did not support a structured database-like design and only offered a single HNSW index type, lacking flexibility and diversity.

Therefore, I decided to design a brand-new vector database framework that would maintain EF Core-style ease of use and declarative modeling, support multiple ANN index algorithms to accommodate scenarios with different scales and performance requirements, and also include built-in concurrency safety mechanisms and efficient persistence solutions.

English

#	Chapter
01	Release Notes
02	Product Overview
03	Architecture Overview
04	Quick Start
05	Core Concepts
06	Distance Metrics
07	Index Types
08	CRUD Operations
09	Vector Search
10	Persistent Storage
11	Migration System
11a	Schema Migration
12	Multi-Vector Field Support
13	Thread Safety and Concurrency
14	Lifecycle Management
15	Configuration Options
16	Internal Implementation Details
17	Complete Examples
18	API Reference Cheat Sheet
19	Usage Recommendations

简体中文

#	章节
01	版本说明
02	产品概述
03	架构概述
04	快速开始
05	核心概念
06	距离度量
07	索引类型
08	CRUD 操作
09	向量搜索
10	持久化存储
11	迁移系统
11a	模式迁移
12	多向量字段支持
13	线程安全与并发
14	生命周期管理
15	配置选项
16	内部实现细节
17	完整示例
18	API 参考速查表
19	使用建议

Uh oh!

02 Product Overview.md

Product Introduction

Creation Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

English

简体中文

Clone this wiki locally