-
Notifications
You must be signed in to change notification settings - Fork 1
16 Internal Implementation.md
The framework uses expression trees to compile high-performance accessors for each primary key and vector property, replacing runtime reflection calls:
// Before compilation (reflection): ~200ns / call
var value = propertyInfo.GetValue(entity);
// After compilation (expression tree): ~2ns / call, ~100x improvement
private static Func<TEntity, TResult> CompileGetter<TResult>(PropertyInfo prop)
{
var param = Expression.Parameter(typeof(TEntity), "e");
Expression body = Expression.Property(param, prop);
// Value types automatically get boxing node inserted (e.g., int -> object)
if (prop.PropertyType != typeof(TResult))
body = Expression.Convert(body, typeof(TResult));
return Expression.Lambda<Func<TEntity, TResult>>(body, param).Compile();
}Uses C# static abstract interface members on readonly struct types. The JIT generates specialized machine code for each concrete type, enabling direct inlining at call sites with zero virtual dispatch or delegate indirection:
// Interface definition
public interface ISimilarity<T> where T : unmanaged, INumber<T>, IRootFunctions<T>
{
static abstract T Compute(ReadOnlySpan<T> x, ReadOnlySpan<T> y);
}
// Built-in implementations (readonly struct, zero-size, JIT-inlined)
public readonly struct DotProductSimilarity : ISimilarity<float>
{
public static float Compute(ReadOnlySpan<float> x, ReadOnlySpan<float> y)
=> VectorMath.Dot(x, y);
}
public readonly struct ManhattanSimilarity : ISimilarity<float>
{
public static float Compute(ReadOnlySpan<float> x, ReadOnlySpan<float> y)
{
// SIMD-accelerated via Vector<float>
int i = 0;
float sum = 0f;
if (Vector.IsHardwareAccelerated && x.Length >= Vector<float>.Count)
{
var vsum = Vector<float>.Zero;
var lastBlock = x.Length - x.Length % Vector<float>.Count;
for (; i < lastBlock; i += Vector<float>.Count)
vsum += Vector.Abs(new Vector<float>(x[i..]) - new Vector<float>(y[i..]));
sum = Vector.Sum(vsum);
}
for (; i < x.Length; i++)
sum += MathF.Abs(x[i] - y[i]);
return 1f / (1f + sum);
}
}Advantages over the v1 delegate approach:
| Dimension | v1 SimilarityFunc delegate |
v2 ISimilarity<T> static abstract |
|---|---|---|
| Dispatch | Indirect call (~2ns overhead) | JIT-inlined, zero overhead |
| Generics |
float only |
Generic over T : INumber<T> (float, double, Half) |
| Extensibility | Framework-internal only | Users implement ISimilarity<float> + [QuiverVector(CustomSimilarity=...)]
|
Levels follow an exponential decay distribution, ensuring upper layers are sparse and lower layers are dense:
level = floor(-ln(uniform(0, 1)) × ml)
where ml = 1 / ln(M)
Most nodes (~93.75% when M=16) exist only on layer 0, while a few nodes exist on higher layers serving as "highway" entry points.
Converges faster and produces higher-quality clusters than random initialization:
- Randomly select the first centroid
- For each vector not yet selected as a centroid, compute its distance D(x) to the nearest centroid
- Select the next centroid with probability proportional to D(x)²
- Repeat until K centroids are selected
Uses split hyperplane distance for pruning during search:
diff = query[splitDim] - node.splitValue- Prioritize searching the side containing the query point
- For the other side: explore only when the heap is not full or
|diff| < current search radius - Can skip large numbers of subtrees in low dimensions; pruning fails in high dimensions
Simple factory pattern, invoked only during ExportAsync / ImportAsync. Primary storage always uses BinaryStorageProvider directly:
// Primary storage — always binary, created directly in QuiverDbContext
var storageProvider = new BinaryStorageProvider();
// Export/Import factory — creates export-only providers on demand
internal static IStorageProvider Create(ExportFormat format, JsonSerializerOptions? jsonOptions = null) =>
format switch
{
ExportFormat.Json => new JsonExportProvider(jsonOptions ?? DefaultJsonOptions),
ExportFormat.Xml => new XmlExportProvider(),
_ => throw new ArgumentOutOfRangeException(nameof(format))
};In v4, QuiverSet<T> no longer maintains a WAL-style change log. Instead, it tracks pending deletions in a tombstone buffer that is drained by AppendAsync / FlushTombstonesAsync:
-
Add/Upsertmutate the in-memory dictionary and indexes directly; the entity becomes part of the next appendedEntityMetasegment. -
Remove/RemoveByKey/Clearregister tombstone entries (type + key) on the parentQuiverDbContextso they can be written as aTombstonesegment. - During
LoadAsync, segments are replayed in file order; tombstones are applied last so they consistently shadow earlierAddrecords, regardless of segment layout. -
ReplayAddsilently skips when the primary key already exists, matching the V3 semantics for forward-compatible merges.
Full-snapshot writes still use the temp-file-then-rename pattern, which prevents mid-write corruption:
var tempPath = filePath + ".tmp";
await _storageProvider.SaveAsync(tempPath, setsData);
File.Move(tempPath, filePath, overwrite: true); // Atomic replaceAppendAsync and FlushTombstonesAsync open the existing file in append mode and only rewrite the footer at the end of the operation, so a crash mid-append leaves the previous footer (and all previously committed segments) intact.
Each segment payload is checksummed with Quiver's internal IEEE CRC32 helper; the segment table in the footer also stores the per-segment CRC. QuiverDbFile.InspectAsync(path, verifyCrc: true) walks the segment table and recomputes every CRC, surfacing corrupt or truncated segments without modifying the file. The CRC result remains compatible with files written by the previous System.IO.Hashing.Crc32 implementation.
| # | 章节 |
|---|---|
| 01 | 版本说明 |
| 02 | 产品概述 |
| 03 | 架构概述 |
| 04 | 快速开始 |
| 05 | 核心概念 |
| 06 | 距离度量 |
| 07 | 索引类型 |
| 08 | CRUD 操作 |
| 09 | 向量搜索 |
| 10 | 持久化存储 |
| 11 | 迁移系统 |
| 11a | 模式迁移 |
| 12 | 多向量字段支持 |
| 13 | 线程安全与并发 |
| 14 | 生命周期管理 |
| 15 | 配置选项 |
| 16 | 内部实现细节 |
| 17 | 完整示例 |
| 18 | API 参考速查表 |
| 19 | 使用建议 |