1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
|
arena pointers / io vs non-io
=============================
Could require all "io-capable" objects to have a arena information at
the beginning of the page (i.e. the address with the low order bits
set to zero).
This way, constant data can still work as usual without special alignment
or ordering requirements (e.g. having .rodata before .text in the same page,
which might not even be possible).
It also allows large data to span across a page boundary, as long as any
"io-capable" objects are not allocated in a such page (since they need
the arena-chunk header).
Contents of the arena-chunk
---------------------------
* Pointer to arena base.
* Size used in chunk (= offset to next free byte)
If the arena-base is always allocated at a fixed offset (e.g. as the very
first allocation), then the pointer and the size can be merged into a single
pointer.
Contents of the arena base
--------------------------
This can be allocated inside the first arena-chunk.
* Capabilities
* IO-function pointer overrides
* Chunked-list of pointers to the arena-chunks.
(constant-sized block, but can continue in increasing block sizes
as normal allocations)
How / where to store information for garbage collection?
--------------------------------------------------------
Some form of GC would be useful. I don't think it has to be very
fine-grained, though. It also doesn't have to free everything, but it
would be good to have some kind of guarantee that it frees e.g. at least
say 80% of the completely unused "allocation units".
However, "allocation unit" should probably be smaller than the arena-chunk,
which can be quite large, in particular if the page size is large (like
64 KiB or even 2 MiB).
It has to be somewhat resistent against attacks. Putting random data that
happens to resemble pointers MUST NOT affect memory usage, for two reasons:
1) it could work as a Denial-of-Service attack, and 2) it could cause the
heap memory base address to be leaked.
The really tricky parts:
* Can it be done without lots of overhead for meta-data / RTTI?
- Maybe just a bitmap with 1 bit per pointer-sized word.
That's 128 bytes per 4K page on x86, and 64 bytes on x86_64.
- It could also be an area that grows down from the top
(e.g. run-length encoded). But the overhead might make it
less efficient than a bitmap.
- Just tracking the pointers is not enough to free parts of a
chunk, nor to do copy collection.
- That would require knowing the start (or size) of each
object.
- Could have 2 bits (is-start, contains-pointers)
- Or could store only sizes.
- Maybe it doesn't need object granularity?
- Could have an "end-bit" per e.g. pointer-sized word.
It would indicate if at least one object ends inside the word.
Granularity -> Size usage -> Percentage of 4K page on 32-bit:
8 bytes 128+128 = 256 6.25%
64 bytes 128+8 = 136 3.32%
- With a pointer-bitmap + a 1 byte size (16 byte granularity),
assuming 20 byte allocations on average:
16 bytes 128+204 = 332 8.11%
assuming 60 byte allocations on average:
16 bytes 128+68 = 196 4.79%
* With non-io chunks, the meta-data may cover a larger (combined) chunk, and
hence the meta-data part will be larger, too.
* Is limited granularity a problem when detecting live objects.
- Yes, it will be less accurate! Objects within the same "object-block"
that have a higher address will be incorrectly assumed to be in use.
* Can GC'ed objects safely be read by C code?
* Can GC'ed objects safely be modified by C code?
* Is any temporary space required to perform GC?
- Needs to mark used object-blocks.
- Needs to queue objects to scan?
Weird ideas:
* Keep an allocation journal (with object sizes and locations of pointers)
- Has to be very fast to append to.
- Has to be very compact.
- Even with only 1 byte used:
with avg alloc size of 20 bytes => 5.00% overhead
with avg alloc size of 60 bytes => 1.67% overhead
* Extremely coarse-grained GC:
- Just have a bitmap of pointers, don't bother about sizes.
Size: pagesize/ptralign/8
Overhead: 1.56% (64b) - 3.13% (32b)
(pagesize/ptralign/8)/pagesize = 1/(ptralign*8)
Possibly no overhead with CHERI
(because it already tracks this info)
- Track pointers that might point to another arena-chunk
(or even another arena).
- Garbage collect only entire arena-chunks.
* Track only allocations in the "topmost" arena-chunk
(the one where new allocations go).
- Use a bitmaps for:
- words that contain a pointer
- allocation start status per word:
- 00 = inside allocation
- 01 = inside allocation, and is the last word of it
- 10 = start, hasn't escaped
- 11 = start, could have escaped
- 3 bits in total per word.
- On simple allocations w/o pointers, only a single bit has to be set
- Allocations with pointers will be word-aligned, so setting the
bitmap bits should be easy for them as well (on 64b platform it's
just a byte-aligned OR to the bitmap)
- Escaped blocks cannot be moved
- Therefore, trigger GC before escaping a block if let's say
>= 50% of the arena-chunk has been used and no GC has been
performed.
- Track free "holes", so escaped blocks can be moved there quickly
(by just copying them and updating the refs to them.
the refs update is just a linear scan).
- Set a "might need GC" flag when a pointer (that at least before pointed
to the arena-chunk) is somehow changed.
- Just adding new pointers cannot trigger GC
- Note that the pointers can come from stack or other chunks.
- Scan all words marked as containing a pointer
- Can use the next unallocated page as a temporary storage
- Will be all 0 at start
- Needs to be all 0 at end
- Is it possible to use dynamic programming can be used to scan
linearly a few times?
- If not, simply queue the roots and start scanning from those.
- Unmarked words are then free.
- Re-pack the movable words to use space more efficiently.
|