aboutsummaryrefslogtreecommitdiffhomepage
path: root/notes/unified_lists_arrays.txt
blob: 60705f64d593a043f71ae140ec5d4002b0fdba29 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91

Unified lists, arrays (and iterators/pipes?)
============================================

Problems with lists (vs arrays):

1. Can't be stored directly in strutcs/arrays (need a reference).
2. Types smaller than a pointer use extra space
   (i.e. byte, int16, sometimes int32, and on CHERI platforms even int64).
3. The length cannot be specified in the type system

Solution to 1 & 2: Serialization
--------------------------------

Fixed-size lists in structs could be stored in serialized form.
For example, for this struct:

    struct {
        uint16 info
        List<byte,3> array
        uint16 trailer
    }

It could be stored like this in memory:
(on a 32-bit platform, but the technique works on 64-bit as well):

    uint16  info
    uint16  alignment_padding1
    uint32  serialization_marker_and_type  (type = immutable byte array)
    uint32  length   (small lengths could be merged into the marker)
    byte    value1
    byte    value2
    byte    value3
    byte    alignment_padding2
    uint16  trailer

Note: The serialization marker could include a mutability bit to allow
for modification (but not size change) or serialized lists.

Advantages:
* Can be used where a reference to a list is expected
  (no copying needed).
* Can be used in .rodata
* Can be used for mutable but fixed-size lists.

Disadvantages:
* Can obviously only support fixed-size lists.
* Uses up to two additional words + alignment padding before and after.
  (in the example, it is 8+2 = 10 additional bytes for a 3 byte array,
   on a 64-bit system it would have been 16+7 = 25 additional bytes.
   I.e. 4.3x vs 8.3x memory usage, but this is close to the worst case)
* The serialization format of immutable-List<byte> is different from String.
  Note that String cannot have a fully compatible format, because it is
  not fixed size and it can additionally change size on element mutation
  (non-UTF-8 or NUL bytes in the beginning require a header with length etc).

Solution to 3
-------------

Allow a type parameter to be:
* Type only    <T>
* Value only   <v>
* Value+Type   <v T>
* ?Value+Type  <v T> or <T>

Usage syntax:

    Cache<byte>
    BitArray<3>
    List<byte>
    List<3 byte>
    List<len byte>

Definition syntax with support for usize only:
(should booleans, enums and signed integers be allowed?)

    type Cache<T>
    type BitArray<len>
    type List<?len T>


Problem: Confusing when some types can be used without "ref"
------------------------------------------------------------

It could be confusing that List<n T> can be placed directly in a
struct/array, but List<T> or "OtherType" or "PrivateType" can't.

Solutions:
* Automatically add "ref" if needed?
* Support serialization on user-defined types?
* Make List<T> a fully built-in type and call it
  (along with string) by a lower-case name?