aboutsummaryrefslogtreecommitdiff
path: root/notes/arena_io_non_io.txt
diff options
context:
space:
mode:
Diffstat (limited to 'notes/arena_io_non_io.txt')
-rw-r--r--notes/arena_io_non_io.txt142
1 files changed, 142 insertions, 0 deletions
diff --git a/notes/arena_io_non_io.txt b/notes/arena_io_non_io.txt
new file mode 100644
index 0000000..6064b98
--- /dev/null
+++ b/notes/arena_io_non_io.txt
@@ -0,0 +1,142 @@
+arena pointers / io vs non-io
+=============================
+
+Could require all "io-capable" objects to have a arena information at
+the beginning of the page (i.e. the address with the low order bits
+set to zero).
+
+This way, constant data can still work as usual without special alignment
+or ordering requirements (e.g. having .rodata before .text in the same page,
+which might not even be possible).
+
+It also allows large data to span across a page boundary, as long as any
+"io-capable" objects are not allocated in a such page (since they need
+the arena-chunk header).
+
+Contents of the arena-chunk
+---------------------------
+
+* Pointer to arena base.
+* Size used in chunk (= offset to next free byte)
+
+If the arena-base is always allocated at a fixed offset (e.g. as the very
+first allocation), then the pointer and the size can be merged into a single
+pointer.
+
+
+Contents of the arena base
+--------------------------
+
+This can be allocated inside the first arena-chunk.
+
+* Capabilities
+* IO-function pointer overrides
+* Chunked-list of pointers to the arena-chunks.
+ (constant-sized block, but can continue in increasing block sizes
+ as normal allocations)
+
+How / where to store information for garbage collection?
+--------------------------------------------------------
+
+Some form of GC would be useful. I don't think it has to be very
+fine-grained, though. It also doesn't have to free everything, but it
+would be good to have some kind of guarantee that it frees e.g. at least
+say 80% of the completely unused "allocation units".
+
+However, "allocation unit" should probably be smaller than the arena-chunk,
+which can be quite large, in particular if the page size is large (like
+64 KiB or even 2 MiB).
+
+It has to be somewhat resistent against attacks. Putting random data that
+happens to resemble pointers MUST NOT affect memory usage, for two reasons:
+1) it could work as a Denial-of-Service attack, and 2) it could cause the
+heap memory base address to be leaked.
+
+The really tricky parts:
+
+* Can it be done without lots of overhead for meta-data / RTTI?
+ - Maybe just a bitmap with 1 bit per pointer-sized word.
+ That's 128 bytes per 4K page on x86, and 64 bytes on x86_64.
+ - It could also be an area that grows down from the top
+ (e.g. run-length encoded). But the overhead might make it
+ less efficient than a bitmap.
+ - Just tracking the pointers is not enough to free parts of a
+ chunk, nor to do copy collection.
+ - That would require knowing the start (or size) of each
+ object.
+ - Could have 2 bits (is-start, contains-pointers)
+ - Or could store only sizes.
+ - Maybe it doesn't need object granularity?
+ - Could have an "end-bit" per e.g. pointer-sized word.
+ It would indicate if at least one object ends inside the word.
+ Granularity -> Size usage -> Percentage of 4K page on 32-bit:
+ 8 bytes 128+128 = 256 6.25%
+ 64 bytes 128+8 = 136 3.32%
+ - With a pointer-bitmap + a 1 byte size (16 byte granularity),
+ assuming 20 byte allocations on average:
+ 16 bytes 128+204 = 332 8.11%
+ assuming 60 byte allocations on average:
+ 16 bytes 128+68 = 196 4.79%
+* With non-io chunks, the meta-data may cover a larger (combined) chunk, and
+ hence the meta-data part will be larger, too.
+* Is limited granularity a problem when detecting live objects.
+ - Yes, it will be less accurate! Objects within the same "object-block"
+ that have a higher address will be incorrectly assumed to be in use.
+* Can GC'ed objects safely be read by C code?
+* Can GC'ed objects safely be modified by C code?
+* Is any temporary space required to perform GC?
+ - Needs to mark used object-blocks.
+ - Needs to queue objects to scan?
+
+Weird ideas:
+
+* Keep an allocation journal (with object sizes and locations of pointers)
+ - Has to be very fast to append to.
+ - Has to be very compact.
+ - Even with only 1 byte used:
+ with avg alloc size of 20 bytes => 5.00% overhead
+ with avg alloc size of 60 bytes => 1.67% overhead
+* Extremely coarse-grained GC:
+ - Just have a bitmap of pointers, don't bother about sizes.
+ Size: pagesize/ptralign/8
+ Overhead: 1.56% (64b) - 3.13% (32b)
+ (pagesize/ptralign/8)/pagesize = 1/(ptralign*8)
+ Possibly no overhead with CHERI
+ (because it already tracks this info)
+ - Track pointers that might point to another arena-chunk
+ (or even another arena).
+ - Garbage collect only entire arena-chunks.
+* Track only allocations in the "topmost" arena-chunk
+ (the one where new allocations go).
+ - Use a bitmaps for:
+ - words that contain a pointer
+ - allocation start status per word:
+ - 00 = inside allocation
+ - 01 = inside allocation, and is the last word of it
+ - 10 = start, hasn't escaped
+ - 11 = start, could have escaped
+ - 3 bits in total per word.
+ - On simple allocations w/o pointers, only a single bit has to be set
+ - Allocations with pointers will be word-aligned, so setting the
+ bitmap bits should be easy for them as well (on 64b platform it's
+ just a byte-aligned OR to the bitmap)
+ - Escaped blocks cannot be moved
+ - Therefore, trigger GC before escaping a block if let's say
+ >= 50% of the arena-chunk has been used and no GC has been
+ performed.
+ - Track free "holes", so escaped blocks can be moved there quickly
+ (by just copying them and updating the refs to them.
+ the refs update is just a linear scan).
+ - Set a "might need GC" flag when a pointer (that at least before pointed
+ to the arena-chunk) is somehow changed.
+ - Just adding new pointers cannot trigger GC
+ - Note that the pointers can come from stack or other chunks.
+ - Scan all words marked as containing a pointer
+ - Can use the next unallocated page as a temporary storage
+ - Will be all 0 at start
+ - Needs to be all 0 at end
+ - Is it possible to use dynamic programming can be used to scan
+ linearly a few times?
+ - If not, simply queue the roots and start scanning from those.
+ - Unmarked words are then free.
+ - Re-pack the movable words to use space more efficiently.