-
Notifications
You must be signed in to change notification settings - Fork 1
06 Distance Metrics.md
cyclonedll edited this page Jun 2, 2026
·
1 revision
The DistanceMetric enum defines nine vector similarity computation methods:
| Metric Type | Mathematical Formula | Range | Use Case | Pre-normalization |
|---|---|---|---|---|
Cosine |
[-1, 1] | Text embeddings, semantic search | ✅ Automatically enabled | |
Euclidean |
(0, 1] | Spatial coordinates, physical distances | ❌ | |
DotProduct |
Pre-normalized vectors, MIPS | ❌ | ||
Manhattan |
$\frac{1}{1 + \sum | a_i - b_i | }$ | (0, 1] |
Chebyshev |
$\frac{1}{1 + \max | a_i - b_i | }$ | (0, 1] |
Pearson |
[-1, 1] | Text embeddings (de-biased), TF-IDF, rating vectors | ❌ | |
Hamming |
[0, 1] | Binary hash codes, LSH, SimHash/MinHash fingerprints | ❌ | |
Jaccard |
[0, 1] | BoW/TF-IDF sparse features, histogram comparison | ❌ | |
Canberra |
$1 - \frac{1}{n}\sum\frac{ | a_i-b_i | }{ | a_i |
flowchart LR
subgraph On Write
V["Original vector v"] --> NORM["L2 normalize<br/>v-hat = v / ||v||"]
NORM --> IDX["Store in index: v-hat"]
end
subgraph On Search
Q["Query vector q"] --> QNORM["L2 normalize<br/>q-hat = q / ||q||"]
QNORM --> DOT["Dot(q-hat, v-hat)"]
DOT --> RES["= CosineSimilarity(q, v)"]
end
style NORM fill:#d4edda
style QNORM fill:#d4edda
style DOT fill:#cce5ff
Why is Dot faster than Cosine?
-
CosineSimilarity(a, b)= one dot product + two norm computations = 3 vector traversals - After pre-normalization,
Dot(a-hat, b-hat)= one dot product = 1 vector traversal - Normalization overhead is incurred only once during write/query, while search only performs dot products against N candidates
SIMD-Accelerated Implementation:
// SIMD-optimized implementation using internal VectorMath
private static void NormalizeVector(ReadOnlySpan<float> source, Span<float> destination)
{
var norm = VectorMath.Norm(source); // SIMD-accelerated L2 norm
if (norm > 0f)
VectorMath.Divide(source, norm, destination); // SIMD-accelerated vector division
else
destination.Clear(); // Zero vector safety, avoid NaN
}// Cosine — most common, text/semantic search
[QuiverVector(384, DistanceMetric.Cosine)]
public float[] TextEmbedding { get; set; } = [];
// Euclidean — scenarios caring about absolute distance (geographic coordinates, physical space)
[QuiverVector(3, DistanceMetric.Euclidean)]
public float[] Position { get; set; } = [];
// DotProduct — vectors already pre-normalized or needing Maximum Inner Product Search (MIPS)
[QuiverVector(128, DistanceMetric.DotProduct)]
public float[] Feature { get; set; } = [];
// Manhattan — sparse features, recommendation systems
[QuiverVector(256, DistanceMetric.Manhattan)]
public float[] SparseFeature { get; set; } = [];
// Hamming — binary hash codes, LSH fingerprints
[QuiverVector(64, DistanceMetric.Hamming)]
public float[] BinaryHash { get; set; } = [];Implement ISimilarity<float> to define a custom metric, then assign it via the CustomSimilarity property on [QuiverVector]. When CustomSimilarity is set, the metric parameter is ignored.
// 1. Define custom similarity (readonly struct + ISimilarity<float>)
public readonly struct WeightedL1Similarity : ISimilarity<float>
{
public static float Compute(ReadOnlySpan<float> x, ReadOnlySpan<float> y)
{
float sum = 0f;
for (int i = 0; i < x.Length; i++)
sum += MathF.Abs(x[i] - y[i]) * (i < 128 ? 2f : 1f); // first 128 dims weighted 2x
return 1f / (1f + sum);
}
}
// 2. Use it on entity
public class MyEntity
{
[QuiverKey]
public string Id { get; set; } = string.Empty;
[QuiverVector(256, CustomSimilarity = typeof(WeightedL1Similarity))]
public float[] Embedding { get; set; } = [];
}Requirements:
- Must be a
readonly structimplementingISimilarity<float> - Must have a public parameterless constructor (default for structs)
- JIT will inline
TSim.Compute()at the call site — zero overhead vs built-in metrics
The framework internally resolves each metric to an ISimilarity<float> implementation. All implementations use SIMD-accelerated computation:
| Metric | PreNormalize | ISimilarity Type | SIMD Backend |
|---|---|---|---|
Cosine |
true |
DotProductSimilarity |
VectorMath.Dot |
Cosine (fallback) |
false |
CosineSimilarity |
VectorMath.CosineSimilarity |
DotProduct |
false |
DotProductSimilarity |
VectorMath.Dot |
Euclidean |
false |
EuclideanSimilarity |
VectorMath.Distance |
Manhattan |
false |
ManhattanSimilarity |
Vector<float> abs-diff accumulation |
Chebyshev |
false |
ChebyshevSimilarity |
Vector<float> abs-diff max tracking |
Pearson |
false |
PearsonCorrelationSimilarity |
VectorMath.Sum + Vector<float> centered-dot |
Hamming |
false |
HammingSimilarity |
Vector<float> equality mask + ConditionalSelect |
Jaccard |
false |
JaccardSimilarity |
Vector<float> Min/Max accumulation |
Canberra |
false |
CanberraSimilarity |
Vector<float> weighted division |
| (custom) | user-defined | User's ISimilarity<float>
|
User-defined |
| # | 章节 |
|---|---|
| 01 | 版本说明 |
| 02 | 产品概述 |
| 03 | 架构概述 |
| 04 | 快速开始 |
| 05 | 核心概念 |
| 06 | 距离度量 |
| 07 | 索引类型 |
| 08 | CRUD 操作 |
| 09 | 向量搜索 |
| 10 | 持久化存储 |
| 11 | 迁移系统 |
| 11a | 模式迁移 |
| 12 | 多向量字段支持 |
| 13 | 线程安全与并发 |
| 14 | 生命周期管理 |
| 15 | 配置选项 |
| 16 | 内部实现细节 |
| 17 | 完整示例 |
| 18 | API 参考速查表 |
| 19 | 使用建议 |