New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hnsw pq #2592
Conversation
…ed vectors are stored)
| @@ -29,6 +29,11 @@ var dotProductImplementation func(a, b []float32) float32 = func(a, b []float32) | |||
| return sum | |||
| } | |||
|
|
|||
| // ToDo: also have an ASM implementation | |||
| var dotProductStepImplementation func(x, y float32) float32 = func(x, y float32) float32 { | |||
| return x * y | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dotProductStepImplementation(x, y float32) is equivalent to xy.
I think this function can be replaced by xy. The simple product in this case would be more efficient that dot-product since x and y are scalers
| } | ||
|
|
||
| return sum | ||
| } | ||
|
|
||
| var manhattanStepImpl func(a, b float32) float32 = func(a, b float32) float32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use a function instead of a closure
We can try something like this
func manhattanStepImpl(a, b float32) float32 {
// take absolute difference, converted to float64 because math.Abs needs that
return float32(math.Abs(float64(a - b)))
}
| } | ||
|
|
||
| return sum | ||
| } | ||
|
|
||
| var l2SquaredStepImpl func(a, b float32) float32 = func(a, b float32) float32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as before I would use a function instead of a closure
func l2SquaredStepImpl(a, b float32) float32 {
diff := a - b
return diff * diff
}
| func (h *hnsw) initCompressedStore() error { | ||
| store, err := lsmkv.New(fmt.Sprintf("%s/%s/%s", h.rootPath, h.className, h.shardName), "", h.logger, nil) | ||
| if err != nil { | ||
| return errors.Wrapf(err, "init hnsw") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more descriptive error like would be better
fmt.Errorf("init lsmkv %w", err)
errors.Wrapf is deprecated I'd not to use it in future
| } | ||
| err = store.CreateOrLoadBucket(context.Background(), helpers.CompressedObjectsBucketLSM) | ||
| if err != nil { | ||
| return errors.Wrapf(err, "init hnsw") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as before a more descriptive error is needed
|
|
||
| func (m *KMeans) recalcCenters(data [][]float32) { | ||
| for index := 0; index < m.K; index++ { | ||
| m.centers[index] = make([]float32, m.dimensions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's be more efficient to reuse the existing buffer m.centers[index] if len(m.centers[index]) >= m.demensions
| } | ||
|
|
||
| func (m *KMeans) clearData() { | ||
| m.data.points = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about m.data.changes?
| ) | ||
| err := pq.kms[i].Fit(data) | ||
| if err != nil { | ||
| panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is panic intended?
| wg.Wait() | ||
| } | ||
|
|
||
| func FilterSegment(i int, ds int) FilterFunc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to create new function extractSegment() instead we could try
func FilterSegment(i int, ds int) FilterFunc {
return func(x []float32) []float32 {
return x[i*ds : (i+1)*ds]
}
}
IMHO this is more clear and more efficient
entities/vectorindex/hnsw/config.go
Outdated
| @@ -83,6 +84,14 @@ func (c *UserConfig) SetDefaults() { | |||
| c.Skip = DefaultSkip | |||
| c.FlatSearchCutoff = DefaultFlatSearchCutoff | |||
| c.Distance = DefaultDistanceMetric | |||
| c.PQ = PQConfig{} | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use standard struct initialisation in go like this
c.PQ = PQConfig{
Enabled: DefaultPQEnabled,
BitCompression: DefaultPQBitCompression,
Segments: DefaultPQSegments,
Centroids: DefaultPQCentroids,
Encoder: PQEncoder{
Type: DefaultPQEncoderType,
Distribution: DefaultPQEncoderDistribution,
},
}
Still pending the step function at the distancer
Maybe something did change affecting recall and needs to be investigated
as they take incredibly long with the race detector on
What's being changed:
Review checklist