Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use custom allocators #28

Closed
romainfrancois opened this issue Apr 16, 2014 · 3 comments
Closed

use custom allocators #28

romainfrancois opened this issue Apr 16, 2014 · 3 comments

Comments

@romainfrancois
Copy link
Member

Suppose we have a binary file of some sort where each vector of the data frame are contiguous. I'm pretty sure we could leverage R custom allocators (wch/r-source@ebf11b0)

/* R_allocator_t typedef is also declared in Rinternals.h 
   so we guard against random inclusion order */
#ifndef R_ALLOCATOR_TYPE
#define R_ALLOCATOR_TYPE
typedef struct R_allocator R_allocator_t;
#endif

typedef void *(*custom_alloc_t)(R_allocator_t *allocator, size_t);
typedef void  (*custom_free_t)(R_allocator_t *allocator, void *);

struct R_allocator {
    custom_alloc_t mem_alloc; /* malloc equivalent */
    custom_free_t  mem_free;  /* free equivalent */
    void *res;                /* reserved (maybe for copy) - must be NULL */
    void *data;               /* custom data for the allocator implementation */
};

So we would need to come up with a custom_alloc_t function that would return the appropriate location in the binary mmaped file rather than allocate.

Perhaps we can maintain a count of the vectors that are coming from this mmaped file and then unmap when mem_free is used on the last of them.

@s-u does that sound like a potential use case for this feature ? It seems that with this we could load data pretty instantly (just a bit more than cost of mmap I guess).

@romainfrancois
Copy link
Member Author

One thing I guess to worry about is the long vector support:

        R_size_t hdrsize = sizeof(SEXPREC_ALIGN);
#ifdef LONG_VECTOR_SUPPORT
        if (length > R_SHORT_LEN_MAX)
        hdrsize = sizeof(SEXPREC_ALIGN) + sizeof(R_long_vec_hdr_t);
#endif

because then there is:

mem = allocator ?
            custom_node_alloc(allocator, hdrsize + size * sizeof(VECREC)) : 
            malloc(hdrsize + size * sizeof(VECREC));

so if we saved to a binary file from a version of R that has long vector support and load from a version that does not have it, this might cause issues.

@s-u
Copy link

s-u commented Apr 16, 2014

In general, anything you do is specific to that build of R and platform. But, yes, it essentially gives you the freedom to use arbitrary ways of obtaining memory for specific vector allocations. It's entirely up to you how you obtain the memory that you pass to R.

@hadley
Copy link
Member

hadley commented Mar 9, 2015

@romainfrancois I've just pushed a big update that I think obviates the need for this. The parser now makes very very few copies.

@hadley hadley closed this as completed Mar 9, 2015
@lock lock bot locked and limited conversation to collaborators Sep 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants