Avoiding linking against libc with slulrt-interp ================================================ Idea: libslulrt contains all functionality needed by basic SLUL programs. It links to the libc library, and is dependent on the C library version. We don't want binaries built by SLUL to depend on the C library, because the C library does not have a stable ABI on all platforms (e.g. OpenBSD). (TODO check this "probably" item): Probably, we need to execute the initialization of the C library (crt*.o). But I think this is not PIC code? So we can't include it directly in libslulrt. Instead, we can use the following setup: * custom _start in executables built by SLUL. This code calls some function in libslulrt (e.g. slulrt_start_native_syslib), and then call some kind of main function (could have SLUL specific parameter list etc). * slulrt_start_libc maps the crt*.o into memory, relocates and executes whatever initialization code is there. [crt1.o DOES some kind of initialization!] * libslulrt depends on libc.so, so it gets loaded by ld-linux.so (or whatever linker is used on the system). NOTE: Since SLUL code should be possible to use in C programs also, the call to slulrt_start_native_syslib must NOT be required for programs that contain standard crt*.o initialization code. (also, the slulrt_start_native_syslib call should only ever be made by the _start routine, NEVER by any other code) ld-linux and libc.so are apparently tighly coupled... So creating a slulrt-interp would be tricky, because it would have to load all dynamic libraries etc. - Using the target systems default interpreter(linker) means that the resulting binaries will be locked to the corresponding libc. But as long as the interpreter(linker) does not have a version in it's name, it should still not make the binaries locked to any particular version. (TODO: check how BSDs, which have unstable libc, handle the interpreter(linker) filename on major version changes). What does ld-*.so actually do? ============================== Can we have a ld-slul-.so that does the following: - patches the executable in memory, such that it: 1) depends on the system libc (e.g. libc.so.6), and 2) contains a relocated crt1.o (copied from ld-slul.so, so the binary does not need to have it) BUT..... we cannot call any libc functions at this point. All mappings must be prepared by the loader, by ELF "info" in the executable or in ld-slul.so - loads the system linker (e.g. ld-linux.so) and executes it, but makes it operate on the main slul library (which has been patched). - when libc initialization is done, we can unmap the ld-slul.so code. When there is an interpreter present, does the loader (Linux/BSD/whatever) check that the system-specific values (like ABI) in the ELF header matches? - on Linux, the ABI seems to be System V (instead of Linux!!!) with ABI version +0 (at least for dynamic executables and the loader) - some(!) dynamic executables use Linux ABI or ("UNIX - GNU") - relocatable (.o) files also use "System V" as the ABI - same across architectures (x86_64 vs aarch64) - static (x86_64 at least) binaries seems to use Linux as the ABI. (maybe the ABI field is totally ignored??) How do "libc patchers", such as fakeroot/proxychains/etc react to custom interpreters? How debuggers / toolchain tools react to custom interpreters? ld-slul-.so should be a symlink to the real interpreter: - ld-slul-.so - for example: ld-slul-aarch64.so --> ld-slul-aarch64-linux-gnu.so PROBLEM: ld-slul.so cannot call any system functions. but it needs libc and ld.so to be mapped into memory. We probably can't make the kernel do that. So it looks like using a custom SLUL program interpreter file will not work. Final solution? =============== Executable contains the following on ELF platforms: * _start code that does minimal needed initialization (of argv/env etc, if needed on the target platform) and calls _SLULRT_start * Program interpreter is the default for the system, i.e. possibly locked to a specific libc if ld.so and libc6 are coupled. * Depends on libslulrt.so (no libc). The program interpreter is responsible for looking it up in the correct directory (e.g. /usr/lib/ or /usr/lib, /usr/lib{32,64} etc) * _SLUL_main function libslulrt.so is specific to the triple and contains the following: * crt*.o somehow embedded * memory mappings for crt*.o in ELF program header (we cannot call mmap until libc has been initialized, i.e. crt*.o has been called) * main function that calls _SLUL_main * _SLULRT_start that: 1) prepares crt*.o (if this needed? e.g. relocations if ld.so does not do that for us) 2) calls crt*.o startup code (with our "main" function in libslulrt) 3) if crt*.o does not call main, that calls "main" Note: We cannot do any SLUL-specific initialization, since _SLULRT_start will not get called in C executables. PROBLEM 1: Can we include crt*.o on all ELF platforms, or will there be license issues? Perhaps we can include a MIT licensed re-implementation of it? PROBLEM 2: We don't want to depend on cross compilation support of the C compiler. SOLUTION to 1&2: Write crt*.o in C macros that evaluates to binary machine code. (If it's large we could include SLUL code also, but SLUL does not support unsafe code) NOTE: We should increase SONAME versions in lockstep with the ld.so file e.g. /lib/ld-linux-aarch64.so.1 -> libslul.so.1 (Linux/glibc) Many names for ld.so: /lib/ld-linux.so.2 (32 bit), /lib/ld-linux-aarch64.so.1 etc. (Linux/glibc) /lib/ld.so.1, /lib/64/ld.so.1, /lib/ld-elf.so.1, ...which one is actually used? (FreeBSD/libc) /lib/ld64.so.1, /lib/ld.so.1 ... which one is actually used? (OpenSD) NOTE: We need to be really carefull in _SLUL_start since it will also be called in setuid/setgid binaries! NOTE: If there are multiple ld.so files for the same multiarch on the system, then the libslulrt may needto support all of the corresponding C libraries, and detect which one to use by checking the PT_INTERP string in the executable. (can this ever happen?) PE platforms (Win32/Win64) -------------------------- PE executable contains the following: * Dependency on slulrt.dll (and never on msvcr*.dll or crtdll) * WinMain function that calls _SLULRT_start * _SLUL_main function slulrt.dll contains the following: * Dependency on kernel32.dll, user32.dll and maybe ntdll.dll also? * NO dependencies on msvcr*.dll/crtdll! * _SLUL_start function that calls _SLUL_main (this function should not have to do anything else) Building slulrt =============== * slulrt needs crt*.o on ELF platforms. * slulrt needs to use symbol versioning (because otherwise distros will not consider it to be a stable ABI) * slulrt needs to link to the system libc. On BSD, this means using system C headers. ---> So the ELF version (and especially the BSD versions) needs to be coded in C and built on each platform. Alternative ELF solution ======================== It looks like *maybe* ld.so does the C library initialization. If so _start could be quite simple: 1) check for library destructors that need to be registered with atexit (aparently these are passed in RDX on x86) 2) handle argv/envp and align stack correctly 3) call slul_main 4) on return, call exit (which calls atexit callbacks) This way crt*.o can be avoided altogether (not sure if it is safe to do so on *BSD, though) This will only work for dynamic executables. For static executables it looks like we need crt*.o, but on the other hand, static executables are a low priority (and fully static binaries are not really meaningful/possible on *BSD or Windows). Maybe it is still safer to explicitly call the libc init functions? This should be done in libslulrt.so, and the (machine) code for doing so could be embedded in lilslulrt.so as well (does not need to copy crt*.o, at least not on linux-gnu or linux-musl)