Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ability to remove all keys #49

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

ryan-summers
Copy link
Contributor

Fixes #47 by adding a remove_all_items() function to map to erase all of the saved keys using an overwrite mechanism. This prevents needing to erase device flash, which could take a long time.

The benefit of this approach is that flash erasure counts are minimized, as you need to loop through all of flash before anything ever needs to be erased.

I haven't tested this on hardware yet, so leaving it as a draft.

Copy link
Collaborator

@diondokter diondokter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's fast!

Would be nice if you could add a test for this and add to the changelog as well.
The mock flash tests are very reliable.

I can add this to fuzzing later as well

src/map.rs Outdated Show resolved Hide resolved
src/map.rs Outdated Show resolved Hide resolved
let (item_key, _) = K::deserialize_from(item.data())?;
&item_key == search_key
} else {
false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
false
true

?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is set to true, then there doesn't need to be the search_key.is_none() in the if statement after this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I did this intentionally to make it more clear what the code is doing, even though it's slightly more logic.

The intent is to make the conditional read as "If there is no search key or if the search key matched**. We could combine it down, but I was letting the compiler handle that to keep the "search key matched" logic a bit more pure.

If you'd prefer I condense it down, I'd be happy to (it was my original implementation)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the pros in both yeah. I don't have a strong opinion on it

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah!

I'd take this here but don't mind either way:

                    let item_match = if let Some(search_key) = &search_key {
                        let (item_key, _) = K::deserialize_from(item.data())?;
                        &item_key == search_key
                    } else {
                        true
                    }

                    if item_match {
...

/// new items are stored again.
///
/// <div class="warning">
/// This might be really slow!
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this?
Compared to e.g. a fetch_item with a cold/no cache, don't they both need to scan a significant part of the flash? The fetch needs to read/deserialize and check the keys while the remove needs to erase the checksum.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you fetch, the algo looks for the newest page and scans each item there to get the newest value of a given key. Only if it doesn't find it does it need to scan the page before it and then so on until the value is found.

The difference with remove is that it always needs to scan all items/pages and needs to write to flash too

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. In the few-page scenario there isn't much difference then. And in any scenario both have the same time complexity AFAICT.

As as sidenote while browsing the code (please let me know if you mind and we can move these discussions elsewhere):
I wonder why ItemIter iterates forward and not backward? Wouldn't it be faster (for fetch_item at least) to first quickly seek from page start to the last item in a page and then iterate backward from there? It would certainly need changes to the data structure (store previous length or previous address as well) but since the most recent key is always the last, it would be faster (require fewer key deserialization/comparison).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I love feedback!

Yes, iterating back would yield higher performance (in time), but I also need forward iteration for the queue implementation. So that would require storing extra data in the header.

While in your case that would make sense with your giant pages, it wouldn't in the case I found myself when writing this crate (namely 8x4kb pages). So it's a size vs speed tradeoff.

Also, just adding the back-size or back-pointer to the header wouldn't work well. Because if you start from the back, how do you find the header? If you need to start from the front anyways, to search for the last header first, then that will lessen the gains from back-iterating a bit.

But this ordeal is already solved by using the KeyPointerCache. If you keep that cache around, it will always know where each key is stored and be able to read the item directly.

So I don't think adding back-iteration would make sense. It would add disksize overhead and there's already a good alternative if you really need better performance (that outperforms back-iteration).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you find the header?

Quickly seek forward. Read and add up lenghts on Iter creation without (having to read and deserialize keys).

Sure there is a bit of a downside and not a gigantic advantage. I just got curious because we built https://git.m-labs.hk/M-Labs/sfkv many years ago and the design decisions came to mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Erasing all items
3 participants