Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-1837 WEAVIATE-2 fix segfault by copying memory #1843

Merged
merged 1 commit into from
Mar 8, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
18 changes: 17 additions & 1 deletion adapters/repos/db/lsmkv/segment_collection_strategy.go
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,23 @@ func (i *segment) getCollection(key []byte) ([]value, error) {
}
}

return i.collectionStratParseData(i.contents[node.Start:node.End])
// We need to copy the data we read from the segment exactly once in this
// place. This means that future processing can share this memory as much as
// it wants to, as it can now be considered immutable. If we didn't copy in
// this place it would only be safe to hold this data while still under the
// protection of the segmentGroup.maintenanceLock. This lock makes sure that
// no compaction is started during an ongoing read. However, as we could show
// as part of https://github.com/semi-technologies/weaviate/issues/1837
// further processing, such as map-decoding and eventually map-merging would
// happen inside the bucket.MapList() method. This scope has its own lock,
// but that lock can only protecting against flushing (i.e. changing the
// active/flushing memtable), not against removing the disk segment. If a
// compaction completes and the old segment is removed, we would be accessing
// invalid memory without the copy, thus leading to a SEGFAULT.
contentsCopy := make([]byte, node.End-node.Start)
copy(contentsCopy, i.contents[node.Start:node.End])

return i.collectionStratParseData(contentsCopy)
}

func (i *segment) collectionStratParseData(in []byte) ([]value, error) {
Expand Down
19 changes: 11 additions & 8 deletions adapters/repos/db/lsmkv/strategies_map.go
Original file line number Diff line number Diff line change
Expand Up @@ -212,11 +212,20 @@ func (kv MapPair) Bytes() ([]byte, error) {
func (kv *MapPair) FromBytes(in []byte, keyOnly bool) error {
var read uint16

// NOTE: A previous implementation was using copy statements in here to avoid
// sharing the memory. The general idea of that is good (protect against the
// mmaped memory being removed from a completed compaction), however this is
// the wrong place. By the time we are in this method, we can no longer
// control the memory safety of the "in" argument. Thus, such a copy must
// happen at a much earlier scope when a lock is held that protects against
// removing the segment. Such an implmentation can now be found in
// segment_collection_strategy.go as part of the *segment.getCollection
// method. As a result all memory used here can now be considered read-only
// and is safe to be used indefinitely.

keyLen := binary.LittleEndian.Uint16(in[:2])
read += 2 // uint16 -> 2 bytes

// kv.Key = make([]byte, keyLen)
// copy(kv.Key, in[read:read+keyLen])
kv.Key = in[read : read+keyLen]
read += keyLen

Expand All @@ -227,12 +236,6 @@ func (kv *MapPair) FromBytes(in []byte, keyOnly bool) error {
valueLen := binary.LittleEndian.Uint16(in[read : read+2])
read += 2

// if int(valueLen) > cap(kv.Value) {
// kv.Value = make([]byte, valueLen)
// } else {
// kv.Value = kv.Value[:valueLen]
// }
// copy(kv.Value, in[read:read+valueLen])
kv.Value = in[read : read+valueLen]
read += valueLen

Expand Down