GC without limited overhead for metadata
----------------------------------------

For each type with outgoing references (including opaque types that may or
may not have outgoing refs), have a "scan-function" that builds a structure:

    (
        size in bytes,
        array of (pointer to data, pointer to scan-function) for outgoing refs
    )

This "scan-function" needs to be included in interfaces as well
(i.e. it needs to be part of the vtable)

The stack needs either:

1. fat pointers, perhaps in a special area, in order to
   know the "scan-function" for each root reference.

2. information about the stack frame

Option 2 is probably better.


The overhead is:

- metadata for stack frames (basically, "debug info").
- pointers to the scan function in interfaces
- scan functions (in code segment)


Generational GC without tracking reference changes
--------------------------------------------------

Could count the amount/number/out-refs-count/in-refs-count of each type
on each full scan. And then activate "tracking" of allocations on specific
types, based on which types can point to what other types.

This will probably only work will if generic parameters can be taken into
account (otherwise generic types can point to anything).

If types {c,d,e} can only be referenced by {a,b}, then {a,b,c,d,e} can be
tracked. And then all {a,b} are scanned, and the unreachable {c,d,e}
objects can be queued for deletion.

Other possible GC optimizations
-------------------------------

* Don't scan immutable objects (or arena chunks?) over and over again.
* Could do GC during blocking I/O calls.
* Can dirtyness/accessedness of pages be queried from the OS?
    - would require stopping all threads, so maybe not any real optimization?
    - maybe the check could be done AFTER having done the scanning?
* Can the pages be protected/unmapped during GC, and then userfaultfd be
  used to handle pagefaults?

Calling vtable functions vs having RTTI as data
-----------------------------------------------

Calling a vtable function for interfaces could be a bottleneck if there are
lots of objects that are reached from interfaced-typed references.

Maybe RTTI refs could be 1) compact and 2) be de-duplicated?
E.g. a 24-bit type + 8-bit repeat count.

The RTTI pointer is "only" needed when the reference is a non-concrete type.
...but the types of the references can't be known when the object
is allocated, so in practice, the RTTI pointer always has to be present if the
type implements at least one interface.

Return value separate-allocation optimization?
----------------------------------------------
Consider the following:

    func f
    code
        Obj obj = new
        obj.add_stuff  stuff
        set_value  obj.get_value
    end

Here, `obj` is short-lived, but the value `obj.get_value` is longer-lived.
Can the return value of `obj.get_value` be allocated in the normal arena,
while `obj` itself and any objects allocated in `obj.add_stuff` could go
into a temporary (LIFO?) arena?

* Perhaps obj could be provided with two arena?
  But then they would have to be stored in the object.
* Perhaps thread-local-storage could be used to store the current
  "normal" as well as "temporary" arena?
  And the temporary arena could then be changed during the execution of `f`.

But... What if obj.get_value returns obj itself??