Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strange segfault's occur related to lmdb-rs #29

Closed
code-ape opened this issue Feb 21, 2016 · 8 comments
Closed

strange segfault's occur related to lmdb-rs #29

code-ape opened this issue Feb 21, 2016 · 8 comments

Comments

@code-ape
Copy link

I've encountered a rather bizarre segfault occurrence that is somehow linked to lmdb-rs. The segfault is odd and manifests in strangle places. In my project that I currently am experiencing issues with this it generally manifests when trying to transform data read from the db or move an Arc to the environment into a new thread. You can check out the code here.

I created a repository purely for reference in helping solve this issue, which you can find here. I'm happy to help in fixing this if you can point me in the right direction. I've really enjoyed using this library and need it to work to complete my senior project.

@code-ape
Copy link
Author

Simply reiterating comments from code-ape/lmdbrs_segfault#1:

I suspect it may be an issue with how Vec's work in Rust. I don't know if this relates to heap vs stack operations and pointers but I suspect this is not an issue with mmapping as if it were the case then it would also fail for the &[u8] allocation which also uses it's own, near identical, ::from_raw_parts(...). I was thinking that these should both work identically, but when I looked up the method for transforming a slice to a Vec I found this:

pub fn into_vec<T>(mut b: Box<[T]>) -> Vec<T> {
    unsafe {
        let xs = Vec::from_raw_parts(b.as_mut_ptr(), b.len(), b.len());
        mem::forget(b);
        xs
    }
}

It appears that the use of men::forget(b) is an important difference that is also related to this issue.

Also upon examining docs for Vec::from_raw_parts there seems to be a strong case that there may just be an issue with trying to cast the original data directly to a Vec:

This is highly unsafe, due to the number of invariants that aren't checked:

  • ptr needs to have been previously allocated via String/Vec (at least, it's highly likely to be incorrect if it wasn't).
  • length needs to be the length that less than or equal to capacity.
  • capacity needs to be the capacity that the pointer was allocated with.

Violating these may cause problems like corrupting the allocator's internal data structures.

It seems like the lifetime of the Vec may also have something to do with this issue, see here, where when the Vec goes out of scope all data related to it does as well. Maybe this could be fixed by casting to a slice and then converting to a Vec? I'm not sure.

@xitep
Copy link
Collaborator

xitep commented Feb 22, 2016

thanks for the research! the pointer to the mailing list confirms my suspicion. either Vec<u8> should not implement FromMdbValue (preferably) or it must make its own copy of the bytes it receives in the FromMdbValue implementation. currently, the implementation is unsafe and unsound since it 1) will try to free the memory it got from lmdb when it's being dropped and 2) since it can outlive its source.

as i mentioned already in the pull request, i'd prefer dropping the FromMdbValue implementation for Vec<u8> and just go with slices since that will force users to explicitly think about when to make a copy and when not.

@code-ape
Copy link
Author

I forked the repo and removed the FromMdbValue trait for Vec<u8>. I was wondering if this means ToMdbValue should also be removed? It seems odd to be able to only cast it one way and I imagine some of the memory issues with retrieving a value to a Vec also apply to saving a value from Vec, such as if a vector went out of scope before the transaction for lmdb was committed.

@xitep
Copy link
Collaborator

xitep commented Feb 23, 2016

hello @code-ape,

the one way direction of the conversion is ok. i'd see it as a convenience, otherwise i'd have to pass in &v[..] explicitly (where v is a Vec<u8>).

regarding your memory safety concern on ToMdbValue: i'm not entirely sure, but i think its implementation for Vec<u8> is fine:

impl ToMdbValue for Vec<u8> {
    fn to_mdb_value<'a>(&'a self) -> MdbValue<'a> {
        unsafe {
            MdbValue::new(std::mem::transmute(self.as_ptr()), self.len())
        }
    }
}
  • the returned MdbValue returned from to_mdb_value cannot outlive the original vector due to the lifetime 'a.
  • MdbValue seems not to do any any "freeing" of the memory its receives a pointer to.
  • as you mentioned, typically the value gets saved into the lmdb ... it seems the save routine makes a copy of the data (into the mmapped pages).

@vhbit
Copy link
Owner

vhbit commented Feb 23, 2016

@xitep

as i mentioned already in the pull request, i'd prefer dropping the FromMdbValue implementation for Vec and just go with slices since that will force users to explicitly think about when to make a copy and when not.

Actually, it should just copy as that's semantics of Vec<T> - it's owned data. As I remember implementation for Vec was a quick hack for one of internal projects where I knew how to use without a segfault (shame on me, it's not a correct for a library).

@vhbit
Copy link
Owner

vhbit commented Feb 23, 2016

I apologize for a delay and I'll try to get some time tomorrow to fix it.

@xitep
Copy link
Collaborator

xitep commented Feb 24, 2016

@vhbit many thanks for the feedback. yes, making a copy of the data in Vec<u8>'s implementation for FromMdbValue would definitely be correct. i surely have no objections against keeping it (with the copy semantics). btw: i think we have the same issue with the FromMdbValue implementation for String.

it's just that i personally - as a client code programmer - prefer to see allocations explicitly ... however, thinking again about it, i agree that this can lead to more verbose client code. after all, it's the user's choice whether to db_ref.get::<Vec<u8>>(&"log/0") or db_ref.get::<&[u8]>(&"log/0") (and making a copy of that). so, i'm taking back my preference for not having the trait implementation.

@vhbit vhbit closed this as completed in 29b2f05 Feb 24, 2016
xitep pushed a commit that referenced this issue Feb 24, 2016
@xitep
Copy link
Collaborator

xitep commented Feb 24, 2016

Thank you for the fix Valerii! That was quite fast and works nicely! :) I added a test to cover this.

vhbit added a commit that referenced this issue Feb 24, 2016
#29 Test on safe Vec<u8> conversion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants