notes/runtime_hardening.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375


Hardening via the runtime
=========================

Hardening against exploits that rely on memory corruption etc.
Even though SLUL is meant to be a safe language it cold still have
bugs (e.g. miscompilation).


Hardening vtables-like structures?
----------------------------------

Two possiblities here:

1. Keep pointers in a separate data area, which can be mprotect()'ed
as read-only and then mimmutable()'ed / mseal()'ed.

2. Keep *jump instructions* in a separate area, which can be mprotect()'ed
as execute-only (even read disallowed) and then mimmutable()'ed / mseal()'ed.

Pros: Makes it harder to obtain the address of the runtime.
(In particular if the runtime clears all pointers from stack before
returning. Or maybe the stack should always be zero'ed out before returning
from any function call.)

Cons: Requires the platform to allow generating executable code at runtime.


Zero out registers and stack on function-call, return, OR immediately
---------------------------------------------------------------------

When a function is called or gets called "seldom enough" it can be a good
idea to zero out stack and registers as much as possible.

OR, perhaps better (and more secure): Zero out anything that becomes "dead":
Both registers, stack variables, heap variables, and global variables.
Registers can contain both variables and compiler temporaries.

(As an optimization, this could be done for all addresses and for all
memory/non-register variables)

The following can NOT be zero'ed out when calling a function:

* Function parameter registers that are used.
* Caller-saved registers that in in use.
* Stack variables that are in use (including passing them to the function).

The following can NOT be zero'ed out when returning:

* Return registers that are in use.
* Return values that are passed on the stack.
* Any caller-saved registers.

See also "Setting non-return argument registers to known garbage values".


Unmap parts of RTL before calling main()
----------------------------------------

Could unmap the following before the RTL transfers control to the
application:

* RTL initialization code.
* `giveme` constructors.
* Dynamic library loading code (if dynamic loading is handled by the runtime)
* Dynamically-dead-code (things that are not needed for the given `giveme`s).
  Note that the code to jump to the final entry point is always necessary,
  and so is (almost always) the exit function.

Probably it shouldn't simply be unmapped, but rather mapped with some guard
pages or similar (and those could be mimmutable/mseal'ed).

This gives two benefits:

1. It removes some (out of many) possible gadges in the runtime.
2. It possibly free's up a few kilobytes of memory (not guaranteed to make
   any difference on virtual memory systems).

There's also a downside: It might cause more overhead in the kernel to have
two small mappings rather than a small one.


Turn EFAULT into trap / abort() / segfault
-------------------------------------------

EFAULT can be used to probe for accessbility of pages. I.e. you can scan
memory to locate gadgets and/or certain libraries / functions / etc.

EFAULT should never happen in normal programs written in safe programming
language.


Pre-map memory and disable mmap
-------------------------------

Pre-map all memory that might be needed, e.g. gigabytes.

On some platforms (mainly shared-kernel VPS platforms) having a page mapped
but not populated yet will count against the memory quota (a.k.a. "RAM" size
of the VPS).

So the runtime would have to detect when should be done or not.

Downsides:

* Special-casing for different platforms.
* There's probably *some* cost of having pages mapped, even if not
  populated yet. Even if the kernel creates page-table entries lazily,
  there might still be some kind of data structure in the kernel that
  occupies some space.
* Some exploits become *easier* because some out-of-bounds no longer
  trigger a segmentation fault.
* Similarly, some out-of-bounds bugs may go undetected.

See also the "seccomp ruleset" section.


seccomp ruleset
---------------

Install a BPF program at startup to disallow unexpected calls.

For example, it could:

* allow-list only known-to-be-called syscalls (based on the `givemes`
  in the application). E.g. execve etc. might not be needed.
* allow-list flags/parameters in syscalls. E.g. for mmap, PROT_EXEC
  might not be needed.
* allow only system calls originating from specific locations in the
  runtime library and/or the vDSO.
    - IF the vDSO is allowed to make syscalls, then the application code
      shouldn't be allowed to know the address of the vDSO.
      (but it can learn this via /proc...)
* kill the process if it tries to EFAULT-probe memory via e.g. write() or
  other syscalls that read memory from userspace.
    - there seems to be no way to invoke bpf rules on syscall returns
    - but checking pointer arguments might work (of all allowed syscalls)
    - some syscalls might be impossible, though, if they use more than 6 args.
  if it is too hard (or slow) to do with seccomp, would it be enough
  to do it in the runtime library?

In some cases, it might even be possible to use SECCOMP_SET_MODE_STRICT.


Lock-out on segmentation fault / trap / assertion failure / etc.
----------------------------------------------------------------

grsecurity has this for the kernel. Maybe it can be done in userspace,
in servers, as well?

Should then block the IP (or maybe subnet in case of rotating IPs),
and (maybe) kill all existing processes that are serving that IP (or
maybe even those that *have*  recently served it).

This would require some sticky-ness of each IP / subnet, or it would
open up for DoS attacks.

Also, there's the case when all (or most) requests come from a single
IP (e.g. a reverse proxy / load balancer). That needs to be taken into
account as well.


Runtime/Software-based Pointer Authentication (very slow?)
----------------------------------------------------------

Jump to an address in execute-only memory in the runtime-library.

At this address, there's a jump to a function at random location.

That function can then perform the operation.

There has to be at least two such functions (for code pointers) and
maybe more for data pointers:

* Call authenticated code pointer
* Make authenticated code pointer
* Dereference authenticated data pointer with offset and read <byte/u16/u32/u64/...>
* Dereference authenticated data pointer with offset and write <byte/u16/u32/u64/...>
* Dereference and copy between source/target authenticated data pointers with offsets
* Dereference and compare authenticated data pointers with offsets (side-channel free)
* Dereference and memset authenticated data pointer with offset
* Create authenticated data pointer from stack address
* Create authenticated data pointer from .rodata address
* Advance authenticated data pointer
* Compare authenticated data pointers with offsets for equality
* Compare authenticated data pointers with offsets for ordering (less, equal, greater)
* Navigate in authenticated data pointer with offset (a = *b->f)

Note: This leads to a separate, incompatible, ABI.


Runtime/Software-based indexed off-side array pointer storage (very slow?)
--------------------------------------------------------------------------

Instead of normal pointers, have indexes into an array that is only
accessible by the runtime (or at least only writeable by the runtime).
The array would contain elements like this:

    (address of pointer itself, real pointer value)

Support the operations in the previous section. Read-only operations might
be possible to inline safely.

The total size of a pointer becomes rather large:

    (32+2*W) bits.
    32 bits for the index into the array
    2*W per array element.
        (W is the machine pointer word size).

This approach could be extended with hashed type information, with the
following tuples in the array:

    (address of pointer itself, type-hash, real pointer value)

Note that there might have to be two areas, one for code pointers and
another for data pointers. Also, note that one index (e.g. 0) for
the NULL value.

Note: This design leads to a separate, incompatible, ABI.

Runtime/Software-based tree with pointer/object info (extremely slow?)
----------------------------------------------------------------------

In a restricted area in memory, accessible only by the runtime,
store a tree of:

    address of pointer itself -> real pointer value

When reading a pointer value, check that the value in the map and the value
of the pointer match.

When writing a pointer value, update the map accordingly.

This technique could be extended to type information also, e.g.

    address of object -> type and size of object


Protected runtime storage for important bits
--------------------------------------------

Important flags, e.g. "isAdmin", could be shadowed by the runtime, in a
in a protected area that is only accessible by the runtime.
There could also be set-only "fuse-like" bits.

Internally, this could be a tree structure.

The (internal) API could be:

    slulrt__set_bit(id, value)
    slulrt__clear_bit(id)
    slulrt__require_bit(id)

    slulrt__set_oneway_bit(id)
    slulrt__require_no_oneway_bit(id)

Setting/checking a bit with conflicting APIs (bit vs oneway_bit) should
trigger a trap.

The `id` would just be some kind of compile-time hash of the attribute.

The authentication code path could be:

    login  username password
    Runtime.set_bit  "is_admin" is_admin

A following authorization path might be:

    if is_admin
        Runtime.require_bit  "is_admin"
        add_user  new_user_name
    end

This could protect against some specific data-only attacks.


Setting non-return argument registers to known garbage values
-------------------------------------------------------------

As an anti-ROP (return-oriented programming) measure, one could set
all argument registers (except the return register(s)) to garbage values
just before the return instruction.

This will also clear out any sensitive data in those registers, as a
side-effect.

    mox x0, ...
    mov x1, -123
    mov x2, -123
    ...
    ret

In order to harden against JOP (jump-oriented programming), the registers
could also be "junked" before any indirect jump or indirect call.

This "register junking" could also happen when an instruction contains a
return (or jump?) instruction, e.g. a 0xC2 or 0xC3 byte on x86.


Checking that padding bytes/bits are zero
-----------------------------------------

There are many places where there are alignment padding bytes.
If those are always initialized to zero (they probably should be),
then it is simple to check that they are still zero at "important
places" in the program (e.g. before a return from a function).

Similarly, pointers could be validated to be aligned (zero bits at end)
and not have low non-zero values (e.g. 1..65535), and if non-nullable,
not be zero.

Disadvantages:

* Calling from C code is trickier, because everything must be initially
  zero-initialized (or padding areas zeroed out).
* Some out-of-bounds accesses that would previously be detected as an
  uninitialized value in Valgrind will no longer be detected.
  (Due to padding areas now appearing to be properly initialized).


Special calling convention for "risky" functions
------------------------------------------------

Low-level RTL functions that do risky stuff could require special register
values, for example this on aarch64:

    x1 = 0xFF00007F + <func_id>
        this contains several special bytes, and
        requires multiple instructions to produce.
        Also, it requires a "func_id" to prevent "stealing" it from
        another risky function.
    x2 = <instruction pointer> + x1
        this isn't constant, which makes it harder to produce.

"Risky" function could include e.g. mapping memory, creating files,
establishing network connections, executing processes, or perhaps anything
OS or native-code related.

Risky functions could additionally be mapped into a separate memory mapping
(with a random address) to prevent calling them directly. If all code is
execute-only (no read permission) and there are no leaks (e.g. via registers),
then it will be hard to figure out the address to the "risky function area".


Restrict writes within an arena
-------------------------------

If an arena only has access to it's own data, and the virtual memory of
the arena can be reserved beforehand (size and contiguity), then all
write operations could be bounds-checked.

But this is tricky to do:

1. Might not know whether to bounds-check at compile time.
2. Other code might be called, that doesn't know about the write restriction.
3. Setting up a separate memory map might require spawning a full new process.
    - But it seems that aarch64 has some kind of "permission overlays"?
4. With a separate memory map, (2) would cause segfaults, if enforced
   strictly.
    - This would also make I/O or other system functions trickier.


Aside: Enforcing "not-worsened" constant-time for `wrapping` integers
---------------------------------------------------------------------

Wrapping integers are intended to be used for e.g. cryptographic operations.

When a variable is of a `wrapping` type:

* If it's is not used in any branches, enforce (in the code generator)
  that it is never used in branches in the generated machine code.
* If it's not used in any data-dependent lookups etc, used the "DIT"
  subset on aarch64 (and equivalent subsets on other CPU ISAs)

Or perhaps there should be a separate type that enforces these two?