diff options
Diffstat (limited to 'notes/arena_io_non_io.txt')
-rw-r--r-- | notes/arena_io_non_io.txt | 142 |
1 files changed, 142 insertions, 0 deletions
diff --git a/notes/arena_io_non_io.txt b/notes/arena_io_non_io.txt new file mode 100644 index 0000000..6064b98 --- /dev/null +++ b/notes/arena_io_non_io.txt @@ -0,0 +1,142 @@ +arena pointers / io vs non-io +============================= + +Could require all "io-capable" objects to have a arena information at +the beginning of the page (i.e. the address with the low order bits +set to zero). + +This way, constant data can still work as usual without special alignment +or ordering requirements (e.g. having .rodata before .text in the same page, +which might not even be possible). + +It also allows large data to span across a page boundary, as long as any +"io-capable" objects are not allocated in a such page (since they need +the arena-chunk header). + +Contents of the arena-chunk +--------------------------- + +* Pointer to arena base. +* Size used in chunk (= offset to next free byte) + +If the arena-base is always allocated at a fixed offset (e.g. as the very +first allocation), then the pointer and the size can be merged into a single +pointer. + + +Contents of the arena base +-------------------------- + +This can be allocated inside the first arena-chunk. + +* Capabilities +* IO-function pointer overrides +* Chunked-list of pointers to the arena-chunks. + (constant-sized block, but can continue in increasing block sizes + as normal allocations) + +How / where to store information for garbage collection? +-------------------------------------------------------- + +Some form of GC would be useful. I don't think it has to be very +fine-grained, though. It also doesn't have to free everything, but it +would be good to have some kind of guarantee that it frees e.g. at least +say 80% of the completely unused "allocation units". + +However, "allocation unit" should probably be smaller than the arena-chunk, +which can be quite large, in particular if the page size is large (like +64 KiB or even 2 MiB). + +It has to be somewhat resistent against attacks. Putting random data that +happens to resemble pointers MUST NOT affect memory usage, for two reasons: +1) it could work as a Denial-of-Service attack, and 2) it could cause the +heap memory base address to be leaked. + +The really tricky parts: + +* Can it be done without lots of overhead for meta-data / RTTI? + - Maybe just a bitmap with 1 bit per pointer-sized word. + That's 128 bytes per 4K page on x86, and 64 bytes on x86_64. + - It could also be an area that grows down from the top + (e.g. run-length encoded). But the overhead might make it + less efficient than a bitmap. + - Just tracking the pointers is not enough to free parts of a + chunk, nor to do copy collection. + - That would require knowing the start (or size) of each + object. + - Could have 2 bits (is-start, contains-pointers) + - Or could store only sizes. + - Maybe it doesn't need object granularity? + - Could have an "end-bit" per e.g. pointer-sized word. + It would indicate if at least one object ends inside the word. + Granularity -> Size usage -> Percentage of 4K page on 32-bit: + 8 bytes 128+128 = 256 6.25% + 64 bytes 128+8 = 136 3.32% + - With a pointer-bitmap + a 1 byte size (16 byte granularity), + assuming 20 byte allocations on average: + 16 bytes 128+204 = 332 8.11% + assuming 60 byte allocations on average: + 16 bytes 128+68 = 196 4.79% +* With non-io chunks, the meta-data may cover a larger (combined) chunk, and + hence the meta-data part will be larger, too. +* Is limited granularity a problem when detecting live objects. + - Yes, it will be less accurate! Objects within the same "object-block" + that have a higher address will be incorrectly assumed to be in use. +* Can GC'ed objects safely be read by C code? +* Can GC'ed objects safely be modified by C code? +* Is any temporary space required to perform GC? + - Needs to mark used object-blocks. + - Needs to queue objects to scan? + +Weird ideas: + +* Keep an allocation journal (with object sizes and locations of pointers) + - Has to be very fast to append to. + - Has to be very compact. + - Even with only 1 byte used: + with avg alloc size of 20 bytes => 5.00% overhead + with avg alloc size of 60 bytes => 1.67% overhead +* Extremely coarse-grained GC: + - Just have a bitmap of pointers, don't bother about sizes. + Size: pagesize/ptralign/8 + Overhead: 1.56% (64b) - 3.13% (32b) + (pagesize/ptralign/8)/pagesize = 1/(ptralign*8) + Possibly no overhead with CHERI + (because it already tracks this info) + - Track pointers that might point to another arena-chunk + (or even another arena). + - Garbage collect only entire arena-chunks. +* Track only allocations in the "topmost" arena-chunk + (the one where new allocations go). + - Use a bitmaps for: + - words that contain a pointer + - allocation start status per word: + - 00 = inside allocation + - 01 = inside allocation, and is the last word of it + - 10 = start, hasn't escaped + - 11 = start, could have escaped + - 3 bits in total per word. + - On simple allocations w/o pointers, only a single bit has to be set + - Allocations with pointers will be word-aligned, so setting the + bitmap bits should be easy for them as well (on 64b platform it's + just a byte-aligned OR to the bitmap) + - Escaped blocks cannot be moved + - Therefore, trigger GC before escaping a block if let's say + >= 50% of the arena-chunk has been used and no GC has been + performed. + - Track free "holes", so escaped blocks can be moved there quickly + (by just copying them and updating the refs to them. + the refs update is just a linear scan). + - Set a "might need GC" flag when a pointer (that at least before pointed + to the arena-chunk) is somehow changed. + - Just adding new pointers cannot trigger GC + - Note that the pointers can come from stack or other chunks. + - Scan all words marked as containing a pointer + - Can use the next unallocated page as a temporary storage + - Will be all 0 at start + - Needs to be all 0 at end + - Is it possible to use dynamic programming can be used to scan + linearly a few times? + - If not, simply queue the roots and start scanning from those. + - Unmarked words are then free. + - Re-pack the movable words to use space more efficiently. |