1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
|
If the final compiler will be written in SLUL, shouldn't the stdlib also
be written in SLUL?
But that leads to problems with implementing things that cannot be
implemented in SLUL, such as low-level stuff and data structures:
* syscall code
* mem-copy, vector, and similar.
* allocation
* threading
* fatal error handling
Should the bootstrap compiler use the normal stdlib?
Or a limited built-in one?
Solutions
---------
Can combine more than one!
Write those things in:
* C
* Assembler
(probably safer than using inline assembler)
* Annotated "hexcode"/binary
- perhaps with macro support to make it more usable.
* Unsafe IR
* Safe IR with unsafe elements
* Some neither-100%-nor-0%-safe language:
Ada, Zig, Rust-with-unsafe, Hare, ...
(note that some of those are not very portable,
e.g. Hare uses QBE, which has somewhat limited target support)
* Formally-verified IR/assembler (see below)
Discourage people from writing in the unsafe language
-----------------------------------------------------
It would be sad if the safe SLUL language loses out to an unsafe cousin
language. So IF an unsafe cousin language is ever created, it should
1) be very clear that it is NOT slul, and 2) discourage it's use by making
it an "unappealing" language.
Make it explicitly a different language:
* Different name (uhoh? oops? noo? slunsafe?)
* Different file extension (.rtlc? .stdlib? .lul? .uhoh? .rtlu? .noo? .rsy? .slunsafe?)
* Different syntax
Make it ugly but still readable:
* Use uppercase keywords?
* Or use some sigill?
Make unsafe stuff really stand out:
* Maybe require an UNSAFE_xxx keyword around anything unsafe
Hard-core solution: formally verified IR/assembler
--------------------------------------------------
* Could build in some minimal but powerful proof language into the IR
(this could then also be used by the frontend)
- ...and then it would be safe to expose this
(although without backwards compatibility guarantees)
* But it's still necessary to do things like syscalls etc.
That can't be done safely at all... unless the compiler can
directly generate syscalls AND enforce that the are used in a
safe way (e.g. by emitting pre/post-conditions that can be
used by the proof language)
Pre/post-condition ideas:
For registers, and for fields in structs in memory:
* valid pointer of type X
* valid uninitialized memory of size N
* free'd pointer
* not null
* length of field F
* etc.
Hard-code solution 2: write a C compiler in SLUL
------------------------------------------------
Could write a C compiler in SLUL, using the same backend as the SLUL
compiler will use.
This could even be a subset of C89 (+ some things like uint64_t etc.).
Candidates for things to remove/forbid:
* digraphs/trigraphs
* float/double
* varargs (both usage, declarations and va_list etc.)
* multi-char chars
* octal literals and escape sequences
* assuming char is either signed or unsigned, i.e. assigning < 0 or >= 128
* strange control structures:
- case not directly under switch
- statements between switch and first case
- not using {} when block comes on a separate line
* some types:
- void *
- wchat_t
* several casts:
- pointer to/from int
- function ptr to/from other data pointer types
* depending on operator precedence of:
- & vs && levels
* declarations:
- * vs [] precedence
- prototypes: `f()`
- auto/register storage classes
- common initialization: `static int i;` or `int i;` at top-level.
- volatile
- new struct/enum inside function parameter list
* unwanted macros:
- __DATE__, __TIME__
- #pragma
* stdlib except for:
- memset, memcmp, memcpy, strlen
Syntax test
-----------
Using UNSAFE_ everywhere to make unsafe stuff stand out and to make the
language more ugly.
Using __ in identifiers to avoid collisions and also for increased ugliness.
Using % before all keywords for even more ugliness.
%func __init_arena
%code
[UNSAFEPointer base_ptr, UNSAFESize pgsize] = __sys_mmap 1
ArenaBase base_info %UNSAFE_OVERLAY base_ptr
.chunk = [
# ability to merge base pointers and offsets?
#.info = %UNSAFE_ADDROF .base %UNSAFE_PTRADD %UNSAFE_OFFSETOF .freespace
.baseptr = %UNSAFE_ADDROF .base
.alloced = %UNSAFE_OFFSETOF .freespace
.size = pgsize
]
.base = [
# capability info, list of chunks, ...
]
.freespace = []
%end
%end
%UNSAFE_GLOBAL __pagesize
%func __sys_mmap
int npages
%code
%%SYS_if os linux
__syscall6 NR_mmap2 ...
%%SYS_elif os windows
__sys_VirtualAlloc ...
%%SYS_endif
%end
%%SYS_switch os
%%SYS_case linux
%func __syscall6
int syscall_number
unsigned long arg1
...
unsigned long arg6
%code
%SYS_switch cpu
%SYS_case aarch64
%UNSAFE_asmdef $sysc($n) $w32_le(...)
...
%SYS_case i386
%UNSAFE_asmdef $int($n) CD $n
# Should probably use the vdso instead...
%UNSAFE_machinecode $int(80)
%SYS_case x86_64
...
%SYS_endswitch
%end
%%SYS_case windows
%func __sys_VirtualAlloc
...
%dllimport "kernel32.dll" "VirtualAlloc"
%end
%%SYS_endswitch
|