notes/bootstrap_stdlib.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72


stdlib for the bootstrap compiler
=================================

What should go into the bootstrap stdlib, and how should the APIs look like?

Areas that would be necessary:

* Data structures:
    - Strings
    - Lists
    - Maps
* File IO:
    - Files (corresponding to a file descriptor)
    - Separate buffers or buffer functionality embedded into file objects?
* CLI argument parsing
    - And help texts etc.
    - Environment variables
        - A variable might be assignable via both env-var and CLI-argument
        - Could have a `envVar` parameter to the CLI parameter definition
* Error output
* Localization
    - Would just be a no-op in the bootstrap compiler, but the API
      has to be designed.
* Arena management
    - Only stubs needed in bootstrap compiler:
        - For memory management
        - For sandboxing


File I/O
--------
* InputFile, OutputFile, RWFile, AppendFile, SeekableInputFile, ...?
  (Or just one File type with multiple constructors, and functions
  returning I/O error if the operation is not allowed? Or use typestates
  to distinguish between them? Or subtypes, or interfaces?)
    - These should map to a file descriptor
    - The fd could be encoded into a pointer (but with some offset
      to allow for `none` values). This should work even with generic
      `slot` types, since integers will be supported there.
    - Maybe typestates could be used to allow only one buffer?
      Not all OS'es might have a pwrite, and some types of files
      don't support that anyway (non-seekable files such as pipes).
    - Disallow mulitple command line arguments of different conflicting
      types (e.g. OutputFile and InputFile) to point to the same filename
      (perhaps compare inode on *nix-like systems).
* InputStream, OutputStream, RWStream
    - These should have buffering

Output files from CLI are tricky, because we don't want to create (or worse,
truncate) files if an earlier stage fails.
* Some OS'es might have some kind of "filename reference"?
* In Linux, there's O_PATH, but does that do what is wanted?
    - It seems that `openat`/`openat2` only support directories in the `fd`
      parameter, so no.
* In Linux, there's name_to_handle_at, but it users an unprotected userspace
  buffer (not suitable for sandboxing) and doesn't support all filesystems
  anyway.
* Opening with O_CREAT|O_NOATIME|O_NOCTTY and without O_TRUNC, and performing
  truncation lazily almost works. But then the file needs to be deleted
  if the file never gets created on the application level (including
  on signals).

Alternative solution:
* Have `File`s correspond to a filename (like in Java)
* For `File`s opened from the command line, store the filename in a
  memory area that gets `mprotect`ed as read-only.
* Using `seccomp` on Linux, allow only the read-only arguments to be used
  in calls to `open()`.
    - Or `unveil` in *BSD.
* Files could also be blocked (with some reference counting system
  perhaps? or allow only one `File` to point to a specific file?
  but how should that work with symlinks?)