Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hnsw pq #2592

Merged
merged 114 commits into from Mar 1, 2023
Merged

Hnsw pq #2592

merged 114 commits into from Mar 1, 2023

Conversation

abdelr
Copy link
Contributor

@abdelr abdelr commented Feb 1, 2023

What's being changed:

Review checklist

  • Documentation has been updated, if necessary. Link to changed documentation:
  • Chaos pipeline run or not necessary. Link to pipeline:
  • All new code is covered by tests where it is reasonable.
  • Performance tests have been run or not necessary.

@@ -29,6 +29,11 @@ var dotProductImplementation func(a, b []float32) float32 = func(a, b []float32)
return sum
}

// ToDo: also have an ASM implementation
var dotProductStepImplementation func(x, y float32) float32 = func(x, y float32) float32 {
return x * y
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dotProductStepImplementation(x, y float32) is equivalent to xy.
I think this function can be replaced by x
y. The simple product in this case would be more efficient that dot-product since x and y are scalers

}

return sum
}

var manhattanStepImpl func(a, b float32) float32 = func(a, b float32) float32 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use a function instead of a closure
We can try something like this

func manhattanStepImpl(a, b float32) float32 {
	// take absolute difference, converted to float64 because math.Abs needs that
	return float32(math.Abs(float64(a - b)))
}

}

return sum
}

var l2SquaredStepImpl func(a, b float32) float32 = func(a, b float32) float32 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before I would use a function instead of a closure

func l2SquaredStepImpl(a, b float32) float32 {
	diff := a - b
	return diff * diff
}

func (h *hnsw) initCompressedStore() error {
store, err := lsmkv.New(fmt.Sprintf("%s/%s/%s", h.rootPath, h.className, h.shardName), "", h.logger, nil)
if err != nil {
return errors.Wrapf(err, "init hnsw")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more descriptive error like would be better
fmt.Errorf("init lsmkv %w", err)

errors.Wrapf is deprecated I'd not to use it in future

}
err = store.CreateOrLoadBucket(context.Background(), helpers.CompressedObjectsBucketLSM)
if err != nil {
return errors.Wrapf(err, "init hnsw")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as before a more descriptive error is needed


func (m *KMeans) recalcCenters(data [][]float32) {
for index := 0; index < m.K; index++ {
m.centers[index] = make([]float32, m.dimensions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's be more efficient to reuse the existing buffer m.centers[index] if len(m.centers[index]) >= m.demensions

}

func (m *KMeans) clearData() {
m.data.points = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about m.data.changes?

)
err := pq.kms[i].Fit(data)
if err != nil {
panic(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is panic intended?

wg.Wait()
}

func FilterSegment(i int, ds int) FilterFunc {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to create new function extractSegment() instead we could try

func FilterSegment(i int, ds int) FilterFunc {
	return func(x []float32) []float32 {
		return  x[i*ds : (i+1)*ds]
	}
}

IMHO this is more clear and more efficient

@@ -83,6 +84,14 @@ func (c *UserConfig) SetDefaults() {
c.Skip = DefaultSkip
c.FlatSearchCutoff = DefaultFlatSearchCutoff
c.Distance = DefaultDistanceMetric
c.PQ = PQConfig{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use standard struct initialisation in go like this

c.PQ = PQConfig{
		Enabled:        DefaultPQEnabled,
		BitCompression: DefaultPQBitCompression,
		Segments:       DefaultPQSegments,
		Centroids:      DefaultPQCentroids,
		Encoder: PQEncoder{
			Type:         DefaultPQEncoderType,
			Distribution: DefaultPQEncoderDistribution,
		},
	}

@abdelr abdelr merged commit 41f7cb9 into master Mar 1, 2023
11 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants