Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryBackedStore cloned on every FileLoad::map #35

Closed
spl opened this issue Feb 24, 2020 · 4 comments
Closed

MemoryBackedStore cloned on every FileLoad::map #35

spl opened this issue Feb 24, 2020 · 4 comments

Comments

@spl
Copy link
Contributor

spl commented Feb 24, 2020

Consider this snippet of the implementation of the FileLoad trait for MemoryBackedStore (src/storage/memory.rs):

pub struct MemoryBackedStore {
    vec: Arc<sync::RwLock<Vec<u8>>>,
}
impl FileLoad for MemoryBackedStore {
    // ...
    fn map(&self) -> Box<dyn Future<Item = SharedVec, Error = std::io::Error> + Send> {
        let vec = self.vec.clone();
        Box::new(future::lazy(move || {
            future::ok(SharedVec(Arc::new(vec.read().unwrap().clone())))
        }))
    }
}

In the expression vec.read().unwrap().clone() here, if my understanding is correct, it appears that the Vec<u8> underlying the RwLock is being cloned.

I understand that MemoryBackedStore may be primarily intended for testing and, therefore, for relatively small vectors. However, that may not always be the case, and I'd guess that a clone like this could result in excessive memory usage and possibly even a surprise out-of-memory error. (Then again, I could be blowing things out of proportion, pun intended! 💥)

Would it be a good idea to avoid the .clone() here?

@matko
Copy link
Member

matko commented Feb 24, 2020

It is probably a good idea yes. While it was indeed primarily intended for tests, we are actually using the memory store in terminusdb right now, and I'm sure there's many use cases.

The idea behind the extra clone here is that you get a snapshot view of whatever the contents of the in-memory 'file' is, at the time of resolving this future. There's probably better ways of doing this though.

@spl
Copy link
Contributor Author

spl commented Feb 24, 2020

The idea behind the extra clone here is that you get a snapshot view of whatever the contents of the in-memory 'file' is, at the time of resolving this future. There's probably better ways of doing this though.

Ah, thanks, it's good to know that the clone was intended for a snapshot. Is there an important reason not to replace it with an active view of the data?

@matko
Copy link
Member

matko commented Feb 24, 2020

It shouldn't be important. The data should not actually change, since all files are write-once, but this is not properly enforced by the types.
For the file backend, we actually give out mmap'ed buffers which could in theory change if the underlying file changes though (although I'm seriously considering replacing that with a version that reads all data into memory instead). But for the current MemoryBackedStore implementation we can't really give out the RwLock-wrapped vec directly though, since it isn't an AsRef<[u8]>.

@spl
Copy link
Contributor Author

spl commented Feb 26, 2020

I don't know the order of operations. Does the MemoryBackedStore need to always be backed by a RwLock? Is the data being written more than once, or is it written once and then read multiple times without being written to again? If the latter is true, then can the RwLock be consumed after writing?

@matko matko added this to actual-triage in Core dev (old) Mar 5, 2021
@matko matko moved this from triage to Later in Core dev (old) Mar 8, 2021
@matko matko closed this as completed Apr 1, 2021
Core dev (old) automation moved this from Later to Done Apr 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

2 participants