notes/hardening.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242


Various ideas for hardening against exploits
============================================

While SLUL is meant to be a strictly memory safe langauge with no loopholes
in the language, there can still be bugs in the runtime, in the compiler
(code generation bugs), in the dynamic loader, in the kernel, in the hardware
or simply unavoidable random memory corruption due to interference.

Here are some ideas of how to harden SLUL to make exploits harder:

* Hardened function references:
    - Index into a table instead of a pointer?
        - With a randomized check value (stored in high bits, and also in the table)
    - Or a fat pointer?
        - Check value + Address
        - Check value is stored (XOR'ed with some value) before the address?
    - For non-CFI capable CPUs:
        - Check that the function begins with the correct bytes.
        - Can't be done if code is mapped un-readable.

* Allow-list for function pointers
    - Build a hash-set of valid function pointer targets (and their function
      type hashes).
        - Could be built at startup and then mapped read-only.
    - Instead of calling function-pointers directly, call a wrapper function
      that checks that the target address is valid given the function
      type hash.

* Return check value (first attempt):
    - Push a "check value" before calling functions
      (but don't touch on tail call)
    - Before returning, pop and check the value (compare to the return address)
    - On ISAs that use a link registers, a (callee-saved?) register could be used instead (will this improve performance?).
    - The check value could be:
        - The full return address, on ISAs that allow obtaining the PC.
        - The low order ("within-page") bits of the return address?
    - Perhaps XOR'ed with some hash?
        - Note that this must be a static value in order to support tail calls,
          so it might not be very useful.

* Return check value (second attempt):
    - Only in functions taking an arena/pointer parameter, and
      only when making a non-tail call:
        - Push a "check value" before calling functions
          (but don't touch on tail call)
        - Before returning, pop and check the value (compare to the return address)
        - On ISAs that use a link registers, a (callee-saved?) register could be used instead (will this improve performance?).
    - The check value could be:
        - The arena pointer (or the first pointer) XOR'ed with some constant
        (- For arenas, it would also be possible to XOR this with some data inside the arena)

* Pre-return NOP check:
    - Require that all calls are followed by a NOP
    - Before returning, check that the instruction at the return addr is a NOP.
    - Requires a readable code segment
    - This prevents jumping into (most of the) code that cannot do a call.

* Pre-return type check:
    - Require that all calls are followed by some form of NOP/useless instruction
    - This instruction encodes hash of the return type.
    - Before returning, the callee reads the instruction at the return addr
      and checks that it matches the expected return type.
    - The check could be performed on function entry as well.
    - Requires a readable code segment

* Pre-return frame size check:
    - Check that the difference between the frame pointer and the
      stack pointer is the expected value before returning.
      (All these pre-return checks should have some NOP instructions before)

* Return address bounds check
    - For private functions, we can bounds check the return address.
      (Or, if the function is only called from a small number of locations,
       then we can compare it against those).

* Separate return address stack and data stack
    - Use stack pointer for return addresses only.
      I.e. don't put stack data there.
    - Use frame pointer for stack data.
    - Advantages:
        - Return addresses cannot be overwritten by stack smashing.
        - Does not require any hardware support at all.
    - Disadvantages:
        - Unwinding code needs support for it (does it affect FFI?)
        - Tool-chain needs support for it
        - The frame pointer cannot be used as an arbitrary register.
          (But it seems that the trend is to move away from that anyway).
        - Uses an additional virtual address area to separate
          the "return address stack" and the "data stack".
    - Fast unwinding could possibly be supported by saving the frame pointers
      to the "return address stack".

* Mid-instruction jump protection:
    - Avoid generating RET instructions bytes inside other instructions
      (might be hard, because it is a one byte C2/C3 on x86)
    - Other instructions also? (Maybe NOP with the pre-return NOP check)

* Stack data clearing:
    - Needed to prevent PII (personally identifable information) from
      lingering in memory
    - Set stack bytes to 0xCC when popping or removing from stack
      (including when returning from a function)

* Heap data clearing:
    - Needed to prevent PII from lingering in memory
    - Set heap bytes to 0xCC when arena is free'd

* Opaque-pointer obfuscation:
    - Scramble opaque pointers before calling into another module
      (or returning)
    - Unfortunately it also makes the information unavailable
      to debuggers (unless one steps into a function that can see
      the struct contents, and only after stepping past the de-scrambling
      instructions).
    - Pointers in structs could be scrambled using a combination of the
      type hash (of the struct) and the field hash.

* Opaque-struct checksumming
    - Compute a checksum of opaque/internal structs
    - Mainly useful for small immutable structs.
    - Allows debuggers to see, but not modify, the contents.
    - Can be XOR'ed with the type hash (see "Type hash before structs").

* Stack Frame Size Randomization
    - Add a few words when calling functions (before pushing function args).
    - The size should be small (or zero) most of the time, and sometimes
      a somewhat larger value. But it's not possible to randomize a lot,
      without getting a stack overflow.
    - Most likely incompatible with .eh_frame etc.

* Out-of-band arena meta-data:
    - The arena meta-data could be placed out-of-band:
      Either at a fixed virtual address offset (locked by ABI),
      or accessible via the thread-local storage.
    - Prevents corruption of arena state
    - Allows allocations to span over an arena-chunk boundary.

* Out-of-band "type hash bit"
    - Incompatible with C ABI (unless applied only to SLUL-allocated types)
    - Store a type hash bit for every machine word.
    - Check the element bits before accessing an array element.
    - Check all the bits before accessing a struct or other types.
    - Increases memory usage by (pagesize/8/wordsize_bytes)
      3.1% on 32-bit platforms, and 1.6% on 64-bit platforms.
    - Accuracy can be increased with more bits (but uses more memory).
        - Perhaps it could be a bloom filter?
        - And it would not need as much info in it if pointers are
          only allowed to point to (the beginning of) structs.
    - Kind of a weak software-based CHERI.
    - It could also include qualifiers such as arena, var, threaded, etc.
      (these should be set when the object is fully initialized!)

* Type hash before structs:
    - Incompatible with C ABI (unless applied only to opaque/internal types)
    - Could prevent type confusion
        - BUT it could also make it easier for an exploit to
          scan for a given type :(
          (can be combined with "Opaque-struct checksumming" to avoid this)
    - Could be applied to opaque or internal structs
      (can't expose such structs (if nested) with a C ABI)
    - Uses one machine word per allocation

* Type hash in register before function calls:
    - Incompatible with C ABI
    - Checks that functions are called with the correct
      parameters, return types and lifetime annotations.
    - Needs an extra "landing pad" before functions that performs that check
      (and has the endbr64 or equivalent instruction).
    - Direct jumps/calls skip past the "landing pad".

* Type hash check before calling pointers:
    - Incompatible with C ABI
    - See here:
      https://pax.grsecurity.net/docs/PaXTeam-H2HC15-RAP-RIP-ROP.pdf
    - The type hash is stored before the function,
      so it requires a readable code segment.
        - Unless the information can be stored out-of-band
          (e.g. at a fixed offset -- but that would use a lot of RAM)
        - Or, unless there is a "is_valid_indirect_target" function
          that takes the type hash and function pointer as
          parameters. This could be implemented in the main module
          (should work on ELF) and would recursively call into dependencies
          (with cycle detection). Extremely slow but safe.

* Protection of indirect-calls to arena "system functions"
    - Use "type hash" or "Hardened function reference" protection.
    - And of course CFI/endbr64
    - "Out-of-band arena meta-data" would protect the
      function pointer itself from buffer overflows.
    - No need to care about C ABI here.
    - Require that some register has a specific value on entry?
      (or just use the "type hash" check)
        - Perhaps the first param should be a
          "typically-invalid value"? e.g. 0xf00f123123123123

* Variable/parameter checks
    - Range constraints checks
    - Alignment of pointers
    - Not null for non-optional pointers
    - No aliasing of non-aliased pointers.
    - Type hash checks

* Struct checksums
    - Internal/opaque structs could have a checksum value of their contents.
    - This could be XOR'ed with the type hash.
        - That would also make it harder for exploits to scan for
          specific types.

* Serialzed data "no-arena check"
    - Serialized data should only ever go into .rodata
      (or possibly in memory mapped files).
    - Therefore it may NOT belong to an arena.
    - With in-band arena meta-data, the start of the page
      can be checked to NOT be a valid arena chunk.
    - With out-of-band arena meta-data, ???
    - For known module-local data, the pointer could be checked to be
      in bounds of .rodata (but when can a pointer be known to be
      module-local???)

* Allocation zero-ness check
    - Require that allocations contain only null values upon initializing
      them with data.
      (That's what mmap returns, and if "heap clearing" is implemented,
      then re-used allocations would also contains zeros)
    - This only makes sense in three cases:
        1. Reused memory (no point in checking what mmap returns)
        2. Delayed initialization of allocated data.
        3. Sufficiently large allocations in multi-threaded applications,
           that could get overwritten during the initialization phase.
    - I.e. it only prevents a few data-only attacks.

* Protected code block:
    - It would be nice to have a code block that could only be entered
      at a specific location (no jumps into the middle of it).
    - For example, it could check capabilities before calling OS functions.
    - This can be done with CFI, but it would be nice to be able to
      protect it on CPUs and on environments that don't have CFI
      (and even in the precense of C code).
    - But it could be bypassed by calling directly into the VDSO
      (or Linux) or into ntdll (on Windows).
    - On x86 it could also be bypassed by jumping into the middle of an
      instruction so it gets parsed as an INT or SYSENTER instruction.