aboutsummaryrefslogtreecommitdiff
path: root/notes/combined_list_and_array.txt
blob: 71497c754e86c07d496414e06e5747e15681bd09 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118

Combined list and array type
============================

It is inconvenient to have two separate types for lists and arrays that
require conversion. But I think lists and arrays can be combined.

The same techniques presented here could be used for strings also.


Why arrays?
-----------

* Arrays can be allocated inline in a struct without any header.
  This is useful for interoperating with C and for e.g. hardware or kernel
  data structures that need a certain layout.
* Arrays has the size as part of the type, which means that can detect an
  incorrect size at compile-time.


Overview of idea
----------------

Add support for dependent generic types, that is, non-type parameters to
generic types.

Perform automatic conversions via auto-boxning (allocate a header/metadata
object with a pointer to the array) whenever it needs to be passed as a List
or in a type-parameterised type.


Possible Syntax
---------------

    # Unconstrained variable-sized list
    List Int uvl
    # Fixed-size list
    List[8]Int fl
    # Constrained variable-sized list
    List[len]Int cvl

This syntax could possibly also be used for non-dependent types, in order to
have better visual grouping:

    List[]Int l
    MapFromString[]SomeItem m


Data Layout and Values
----------------------

Unknown-sized lists:

* Size is NOT known at compile time
* May NOT be embedded inline in objects, always indirected via pointer.
* Passed as a single pointer
  (empty list may have a special representation)
* List object contains a metadata header
* Elements are always "generic-word-sized"
  (larger elements are indirected via pointers).
* If there are few elements AND they are not in an existing list,
  they might be stored directly in the header.

Fixed-size list:

* Size is known at compile time
* May be embedded (with the `embed` keyword) inline into objects.
* No metadata header, data is stored immediately at the location.
* Passed as a single pointer
* Elements have their natural size

Constrained variable-size list:

* Size is NOT known at compile time
* May NOT be embedded inline in objects, always indirected via pointer.
* No metadata header, data is stored immediately at the location.
* Passed as a single pointer
* Elements have their natural size

In general, it should work, but it could trigger a lot of copies between the
formats.


Automatic Conversions
---------------------

Unknown-size to fixed-size or variable-size:

* Require an explicit check that the size matches.
* The source and target lists must be immutable.
* If element type is smaller than the generic-word-size, then it requires
  a re-build of the array (to get the expected element sizes / offsets) and
  that may require an allocation.
    - Very small arrays can be stored directly in one or two registers when
      passing them to functions etc.
* If element type is generic-word-size or larger, then just return a pointer
  to it. Note that the elements might be stored inline if they are few, or
  indirected via a pointer if not.

Fixed-size or variable-size to unknown-size:

* The source and target lists must be immutable.
* Allocate the metadata header. If the target is known to not escape, then
  it could be allocated on stack.


Extending it to Strings?
------------------------

This could be extended to strings to allow them to be stored embedded/inline
in objects. However, here are some problems:

* It could trigger a lot of copies, in practice it could happen more
  frequently than with lists due to how often strings are passed around.
* The parameter would be the SIZE (in bytes) and NOT the width or character
  count of the string. That could be confusing.

Also, it is redundant with `List(n)Byte`, so it might be better to skip it
but allow easy conversion from `List(n)Byte` to `String` and vice versa.