Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flush memtable after timeout since got dirty (1st write) instead of idle (last write) #4335

Merged
merged 1 commit into from
Mar 6, 2024

Conversation

aliszka
Copy link
Member

@aliszka aliszka commented Feb 27, 2024

What's being changed:

Memtable is flushed to segment file when its size exceeds given threshold or time has passed since it is being written to.

Previously idle timeout (time since last write) was used to trigger flush. In case of multiple tenants having little data pushed to memtable very often, both flush conditions (exceeded size or idle timeout) could not be met and a lot of data had to be stored in memory before it was eventually saved to file.

This PR changes flush trigger to dirty timeout (time since first write). That way memtable is flushed even if very little data is written within configured timeframe, preventing data being kept in memory for too long.

Notes:

  • new env var is introduced PERSISTENCE_MEMTABLES_FLUSH_DIRTY_AFTER_SECONDS replacing existing PERSISTENCE_MEMTABLES_FLUSH_IDLE_AFTER_SECONDS. If first one is not set, latter one is used as a fallback.
  • default value of PERSISTENCE_MEMTABLES_FLUSH_DIRTY_AFTER_SECONDS is 60s (same as PERSISTENCE_MEMTABLES_FLUSH_IDLE_AFTER_SECONDS used to be)

Review checklist

  • Documentation has been updated, if necessary. Link to changed documentation:
  • Chaos pipeline run or not necessary. Link to pipeline: [TEST] memtables dirty flush weaviate-chaos-engineering#185
  • All new code is covered by tests where it is reasonable.
  • Performance tests have been run or not necessary.

@aliszka aliszka changed the title Memtable flush on continuous writing Flush memtable after time elapsed since 1st write instead of last write (idle) Feb 27, 2024
@aliszka aliszka force-pushed the memtable_flush_on_continuous_writing branch 2 times, most recently from 9285308 to f58bf37 Compare February 27, 2024 16:47
@aliszka aliszka changed the title Flush memtable after time elapsed since 1st write instead of last write (idle) Flush memtable after timeout since got dirty (1st write) instead of idle (last write) Feb 27, 2024
@aliszka aliszka marked this pull request as ready for review February 27, 2024 17:00
Copy link
Contributor

@amourao amourao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look great, I just think we should do "standard" bm25 benchmarking suite to see how/if this affects performance and also run the chaos pipeline, to see if there are any changes in the tests.

adapters/repos/db/lsmkv/memtable.go Show resolved Hide resolved
usecases/config/environment.go Show resolved Hide resolved
@aliszka aliszka force-pushed the memtable_flush_on_continuous_writing branch from 6ea667c to aa2d2be Compare March 6, 2024 09:41
Copy link

sonarcloud bot commented Mar 6, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
36.2% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud

@aliszka aliszka merged commit e6bcdfe into stable/v1.24 Mar 6, 2024
30 of 32 checks passed
@aliszka aliszka deleted the memtable_flush_on_continuous_writing branch March 6, 2024 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants