GC without limited overhead for metadata ---------------------------------------- For each type with outgoing references (including opaque types that may or may not have outgoing refs), have a "scan-function" that builds a structure: ( size in bytes, array of (pointer to data, pointer to scan-function) for outgoing refs ) This "scan-function" needs to be included in interfaces as well (i.e. it needs to be part of the vtable) The stack needs either: 1. fat pointers, perhaps in a special area, in order to know the "scan-function" for each root reference. 2. information about the stack frame Option 2 is probably better. The overhead is: - metadata for stack frames (basically, "debug info"). - pointers to the scan function in interfaces - scan functions (in code segment) Generational GC without tracking reference changes -------------------------------------------------- Could count the amount/number/out-refs-count/in-refs-count of each type on each full scan. And then activate "tracking" of allocations on specific types, based on which types can point to what other types. This will probably only work will if generic parameters can be taken into account (otherwise generic types can point to anything). If types {c,d,e} can only be referenced by {a,b}, then {a,b,c,d,e} can be tracked. And then all {a,b} are scanned, and the unreachable {c,d,e} objects can be queued for deletion. Other possible GC optimizations ------------------------------- * Don't scan immutable objects (or arena chunks?) over and over again. * Could do GC during blocking I/O calls. * Can dirtyness/accessedness of pages be queried from the OS? - would require stopping all threads, so maybe not any real optimization? - maybe the check could be done AFTER having done the scanning? * Can the pages be protected/unmapped during GC, and then userfaultfd be used to handle pagefaults? Calling vtable functions vs having RTTI as data ----------------------------------------------- Calling a vtable function for interfaces could be a bottleneck if there are lots of objects that are reached from interfaced-typed references. Maybe RTTI refs could be 1) compact and 2) be de-duplicated? E.g. a 24-bit type + 8-bit repeat count. The RTTI pointer is "only" needed when the reference is a non-concrete type. ...but the types of the references can't be known when the object is allocated, so in practice, the RTTI pointer always has to be present if the type implements at least one interface. Return value separate-allocation optimization? ---------------------------------------------- Consider the following: func f code Obj obj = new obj.add_stuff stuff set_value obj.get_value end Here, `obj` is short-lived, but the value `obj.get_value` is longer-lived. Can the return value of `obj.get_value` be allocated in the normal arena, while `obj` itself and any objects allocated in `obj.add_stuff` could go into a temporary (LIFO?) arena? * Perhaps obj could be provided with two arena? But then they would have to be stored in the object. * Perhaps thread-local-storage could be used to store the current "normal" as well as "temporary" arena? And the temporary arena could then be changed during the execution of `f`. But... What if obj.get_value returns obj itself?? Tracking object continuation via a bloom filter? ------------------------------------------------ The bloom filter would contain a 1 bit for each address that is a "continuation", and belongs to the same object as the previous address. This way, it become possible to find where objects start: If the bloom filter is 0 for a given location, then it is guaranteed to be a valid start address of an object (i.e. it can be de-allocated independently). Disadvantages: There will be LOTS of "continuation" addresses in the bloom filter. This would lead to lots of false positives and also inefficient allocation (due to many insertions). Track and merge start-addresses? -------------------------------- Could track start addresses in a fixed-size buffer, and when it's full (perhaps do a GC and) perform the "least lossy" merges until half of the tracking buffer is free. The tracking information could also contain a bit that indicates whether the addresses could possibly have been leaked outside arena / arena chunk. To keep the data structure compact, it would only have to store lengths, not actual addresses. Address 0 will be the first object in the arena chunk. Copy collection with meta-data growing down at the end of the arena chunk? -------------------------------------------------------------------------- Copy collection needs these three items of information for each object: * Start address * Length * Outgoing pointers Since a bump allocator doens't give any holes, and the first address is always at the start of the arena chunk, the start address can be computed from the length. This could be stored as an array of 16-bit (type, length) values. Without nested objects: * Type: OBJECT | POINTER * Length With nested objects: * Type: OBJ_START | DATA_RUN | POINTER_RUN | SEQUENCE_OF * Length (or count for SEQUENCE_OF) But this cannot encode union types where the kind isn't known at time of allocation.