Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precise GC #3

Open
ysbaddaden opened this issue Jan 17, 2018 · 5 comments
Open

Precise GC #3

ysbaddaden opened this issue Jan 17, 2018 · 5 comments

Comments

@ysbaddaden
Copy link
Owner

For now, the identification of reachable objects is conservative; we iterate stacks (and objects) for each sizeof(void *) to identify pointers to the managed HEAP. If it looks like a pointer then it's a pointer. We have no mean to know that it's actually an integer than happens to look like a valid pointer.

A precise tracing would identify what object is allocated and knows what are references to HEAP managed memory (no need to iterate all sizeof(void *)), knows whether this is a direct reference (identifies the allocation directly) or an inner pointer (search the allocation).

There are two places where we identify reachable objects: 1. we trace stacks, then 2. trace objects found on the stacks for inner references (recursively).

LLVM provides solutions for generating stack maps, but they seem either slow (shadow stack map) or do require to create an LLVM plugin manually in C++ (i.e. a lot of work). I propose to keep conservative tracking of stacks for the time being, but enable precise tracing of objects.

I have a few ideas for implementation. Maybe registering a marking callback, maybe registering a map for objects (once at the beginning of the program) then having the GC identify objects to find the mapping... Please propose :)

@RX14
Copy link

RX14 commented Jan 17, 2018

Luckily we get the class ID of any object for free, it's just the first 4 bytes after the pointer. Then you'd need a fairly optimized data structure to get from those IDs to the pointer maps. Luckily, (I think) the compiler allocates the class IDs sequentially, so you might be able to get away with just allocating a huge lookup table of class ID to pointers to the object layouts.

Not a clue on how to store object layouts, i'm sure that's a well covered topic in literature.

@RX14
Copy link

RX14 commented Jan 17, 2018

Actually, if we assume every pointer is aligned, for most classes a 32-bit bitmask of which 64-bit offsets after the pointer are pointers would do. For example 0b1010 would be a class with a 64-bit non-pointer, 64-bit pointer, 64-bit non-pointer, 64-bit pointer.

@ysbaddaden
Copy link
Owner Author

Let's skip low level implementation detail. Bitmap is a great idea, but itand doesn't help distinguish object references from nameless pointers for example. Searching the object for a pointer is costly, knowing a pointer points isn't an inner pointers allows to skip the search.

Let's focus on the overall design. What information the compiler must provide to the GC allocator, so the collector can identify pointers and references, and seek them efficiently. How to differentiate between a generic GC.malloc, another one to account for a HEAP allocated struct, and lastly one for a Reference (with a class id), ...

@RX14
Copy link

RX14 commented Jan 17, 2018

I actually don't have a sense of the amount of information needed to implement precise. Perhaps just: "things where we know where the pointers are", "things where we don't know where the pointers are" and "things we know don't contain pointers". bdwgc has the latter two only. I'd imagine you'd embed that on the object layout side.

@ysbaddaden
Copy link
Owner Author

ysbaddaden commented Jan 18, 2018

For now I have the atomic flag on each allocation. If true then the allocation doesn't have any pointer to managed memory, if false then the allocation may have pointers. It's here to stay.

The precise information means that the GC can identify an allocation has being object T, which has a known mapping. Instead of blindly tracing the allocation for pointers, the GC knows where pointers are within the allocation and can test them directly. To improve performance, the GC should distinguish between direct references, that is the pointer points to the first byte of an allocation, so its metadata is at fixed place, such as pointer - sizeof(GC_Object), otherwise it may be an inner pointer, and we must search the allocation metadata (expensive).

AFAIK: structs are inlined into the class definition, and the first UInt64 of class allocations is the class id. If we know whether an allocation is a class, we can identify its mapping (with flattened struct definitions), so GC needs to know whether the allocation is a class, or not.

The mapping should provide at least those informations:

  • an identifier (e.g. class id);
  • a count of sizeof(void *) indexes;
  • which indexes are pointers;
  • which indexes are direct references.

Lasts but not leasts:

  • the allocation may be a collection, such as an array or a hash;
  • the allocation may be a collection of aggregate types (unions), such as an Array(X | Y | Z);
  • the allocation could be one (or many) struct(s), for example Pointer(X).malloc(12).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants