Code generation
===============

Before code generation we need these attributes...

For functions:
- whether the function is public
- for private functions, the number of usages
  (0=>omit, 1=>always inline, more=>maybe inline [perhaps if params_opcodes_size <= calls*body_opcode_size?])
- low level representation of parameter types
- for private functions, it is also useful to know under which conditions the function is called.
  for example, which callee-saved registers actually need saving?
  (this is a chicken-and-egg problem. might be tricky)

...but we want functions that call each other to be close together, so
   generating them as the requisites are fulfilled is not so good
   (unless the code is written to a temp file. This may be a reason
    to put function bodies in a separate arena. That way they can
    be free'd once we are done generating code)
...also, how about generating code in a way that is fast with SIMD/vector instructions?

For basic blocks:
- parameter types of called functions
- types of data (local/params/global)
- for called private functions, it is useful to know which caller saved registers actually need saving.
  

For local/param data:
- type
- whether the address is taken

For global data:
- whether the address is taken OR the symbol is public
  (if not, it may be de-duplicated)
- low level representation of type

Existing
--------
First:
- check how good/easy/portable etc. existing libraries are, such as QBE.
- in particular:
    - small?
    - easy to use?
    - easy to add more target platforms? (or good existing support for target platforms)
- QBE appears to support x86_64 only (+aarch64 and riscv64 now),
  and support static linking only.
- Cwerg looks better, but also has some limitations and also does
  not support dynamic linking.
- LLVM is of course the most powerful library.
- TenDRA has an architecture neutral intermediate format (which is exactly
  what is desired for SLUL). But sadly it seems to have some bugs (it seems
  to miscompile cslul, but it could of course be a bug in the cslul code
  and/or in the C frontend). Also, it lacks support for modern CPU
  architectures such as x86_64 and aarch64.
- Generating C code is also an option
  (...but this requires generation of "dummy libraries" when linking
   against SLUL modules, because C requires the library to be present
   while SLUL doesn't)
- Using qemu's TCG (Tiny Code Generator) could be an option.
    - but it has no 8 or 16 bit types (=> can't pack two bytes in e.g. AL and AH on x86)
    - but the pointer type is an alias to a 32/64 bit int.
      => special pointer types might be hard to add? (e.g. mixed func/data pointer sizes, CHERI, etc)
    - no floating point support
    - probably no support for stable ABIs
    - no support for calling/generating dynamic libraries
      (but it might be possible to call a "helper"?)
    - really nice way to handle temporaries: temporaries and local temporaries
      both are modifiable.
    - has a limit of max opcodes per emulated "instruction"...
      not sure what this means if TCG is used in a compiler.
    - MANY supported ISAs: aarch64, arm, i386, loongarch64, mips, ppc, ppc64, riscv, s390x, sparc) + an interpreter

Evaluation of LLVM
------------------

- Tail calls should be doable using "sibling call optimization", which
  appears to be ABI-stable. But it also has some limitations regarding
  arguments passed on the stack and it is currently limited to
  x86/x86_64
- There is an ongoing transition to opaque pointers (started in 14, will be
  completed in 15 and the old pointer system is planned to be
  deprecated/removed in 16).
- Pointers can be either integral or non-integral. I don't know if
  non-integral pointers can support arenas (bitmasking to get arena/page
  base address where the bump pointer etc. is).
- A C++ compiler might be required to link to the C API (!?)  no, this seems like a rumor.
- LLVM does not do linking, so ld has to be called (and this makes cross-compilation and compilation on Windows harder)

Misc notes
----------
* Data section should be easy
    - No pointers
        => No relocations or initializers.
    - All global data is read-only
        => Data section could be merged with executable section
           (but this could create gadgets for exploits)
        => Objects could be allocated close to the code/functions that use them
        => Can use PC-relative addressing
    - We're unlikely to have alot of zero-initialized constant data

* Types:
    - We need to compute the layout of each type
    - Struct layout is different from C, but the algorithm is simple
        - Fill from first to last field
        - Prefer allocation inside alignment padding gaps when possible

* Imported functions/data
    - Generate dynamic library imports for these
    - Calls follow calling conventions strictly

* Exported functions
    - Generate dynamic library exports for these
      ("main" might be special, and might go through the runtime)
    - Calls follow calling conventions strictly

* Startup code
    - On *nix we have to use some kind of "crt*.o" code from the C library
      in order to be able to do dynamic linking, filesystem/network access
      (that respects LD_PRELOAD), DNS lookups, username lookups, etc.

* Code:
    - ...

* Other:
    - Different types of outputs:
        - executable, library, object file
        - dynamically linked, statically linked
    - Also non-binary outputs: headers
        - This should only be generated once, even if multiple targets have been specified