Jagged arrays ============= Jagged arrays are arrays with elements of different size, where the elements are still stored sequentially *within* the array. Jagged arrays are needed to allocate data as static constant data, (which cannot contain references in SLUL). For example: - For arrays of strings - For list of lists (of possibly different size) For arrays of strings: - Large array-of-strings could perhaps be compressed as a whole. Alternative solutions: - Relative pointers - Compile-time execution (this could provide compression as well) A general solution: Serialize the array --------------------------------------- List should be an opaque/private type, so normally it wouldn't be possible to use them as non-reference type, and hence they could typically not be allocated as static constant data. Note: This could possibly make it easier to perform exploits, because the serialized format does not involve any absolute pointers, and thereby defeats ASLR in a way. The following types need special handling: - strings - Lists - Maps A workaround could be to provide a stable serialization format for those types, and have some marker to distinguish them from normal arrays at runtime. So two things are necessary. First, some syntax for stable serialization in the interface: type List # Syntax with a per-type sentinel value. Not perfect, because it # would be nice to have multiple sentinel values to support mixed # 8/16/32/64 bit sizes/lengths/offsets. serialization array(T) sentinel usize .max # Syntax with a fixed range of sentinel values per type (usize, ref, ...) serialization array(T) sentinel usize # Syntax where the sentinel value is a type-independent bit pattern. # On little-endian platforms, this means that the first field has to # reserve the highest N values (e.g. 0xF0..0xFF). # On big-endian platforms, a fixed number of bits are always 1 and the # high-order bits in the following machine-word contain the mode bits. # The same range of highest N values still has to be reserved for # source-level platform independence. serialize array(T) Then the sentinel value in the impl also: type List = struct { usize len cond != .max # must be forbidden here, or you get an error! ... } Then some special casing in methods: func List.get(size index) -> T { if this.is_serialized() { # compiler built-in function Deserializer des = .new(this) des.skip(index) # note that the element is guaranteed to be of a serializable # type, or one would get a compiler error. return des.get() } else { # Handle normally ... } } Methods that allocate Lists would have to limit the range to not accidentally use the sentinel value as a length: func .new_allocated(arena, usize length, slot T initval) -> arena List cond length < .max # also: should constraints go here or immediately after the parameter? And then some usage of the serialization: # The type MUST be serializable here! data List> the_lists = { { "Hello", "world" }, { "abc", "123" }, } Note that this can't be extended to arrays, because arrays don't contain any struct/metadata about the array, so there is no way to distinguish between normal and serialized data. This means that []string cannot be used for static constant data. Format of the serialized data ----------------------------- The serialized data would have to support pointers to interior fields/elements! And it cannot have normal pointers inside itself, only absolute pointers. So perhaps some encoding like this could be used: i.e. max value total length of object (including elements and their nested elements). Needed only if memory mapping of non 100% trusted data is to be supported. numbers of elements relative pointer to first element relative pointer to second element ... relative pointer to n'th element serialization of element 1 (with sentinel value again) ... any nested objects inside element 1 serialization of element 2 (with sentinel value again) ... any nested objects inside element 2 ... serialization of element n (with sentinel value again) ... any nested objects inside element n Mapping serialized data from files ---------------------------------- Note: There seems to be no safe way to do this, because the mapped file could be modified after the safety checks (i.e. a race condition). So it should only be done on unmodifiable system files. Perhaps it should only be allowed from pre-defined directories (e.g. /usr/lib/mmap/ or ~/.mmap/). Note 2: There should be some indication in the filename that the contents could trigger security exploits. Perhaps the files should be plain ELF .so files? Or perhaps they should end with .insecure, .yolo or something. Note 3: Such files cannot be portable (consider that integers and booleans are encoded as-is; if a reference is created to a such value, then it MUST have the system endianess/size, i.e. it's not portable. And even the sentinel values have this problem). Note 4: When the safety checks below fail, there should be an error message informing the user that the safety checks are prone to race conditions. An advantage of supporting "direct usage" of serialized data like this, without any conversion, is that serialized data could possibly be mmap'ed instead of requiring parsing, memory allocation, copying, etc. But the files MUST have the sentinel values in place, or the safety checks of the compiler could be bypassed. The following safety measures are required: 1. Before getting the "root pointer" to the outer array/dictionary of the data, the object must be checked. 2. Before getting a pointer to a nested object, the nested object must be checked. 3. The mapping must not be modifiable by other processes. Checks for objects: 1. It should be checked that there is a sentinel value at the start, so the data does not get parsed as non-serialized data (which would be used as-is, without deserialization). (And if there are different sentinel values for different types, then the type has to be known), 2. The end address (start address + ) must not exceed that of the outer type (or the size of the memory mapping, if it is the outermost object). Related problem: Enums with values ---------------------------------- type ItemKind = enum(string id, string description) { box("box", "a box") chair("chair", "a chair") table("table, "a table") } Problem: Keeping the data private! Problem: How to store the data? Solution: Jagged arrays constants with checked indexes: type ItemKind = enum { box chair table } # in private impl file: stringtable item_ids = { "box" "chair" "table" }; stringtable item_descriptions Advantages with this solution: 1. Implementation (=values) can be hidden 2. stringtable can be used as a generic stringtable Negative indices are not used. But what about the length?? Should it be stored somewhere? 3. Values can be defined in another module than the enum definition!