aboutsummaryrefslogtreecommitdiff
path: root/docs/language_manual/expressions.md
blob: eac2bca2113a4c624fecbbb86f75322338a9fcfa (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489

Expressions
===========
An expression is either a value, an operation (such as addition) or one of
the "special expressions". The type of an expression is usually determined
by the type of the target, so in the following declaration

    byte x = expr;

_expr_ would be taken to be of the **byte** type.

Values
======

Identifiers
-----------
An identifier in an expression references some kind of data (such as a
variable or a constant). An identifier expression always has the type same
type as the definition of the variable. See identifiers.md for an explanation
of how identifiers work in LRL.

Type identifiers
----------------
A type identifier is a special kind of identifier that has colon in front of
it. Such identifiers are searched for in the namespace of the _target type_.
For instance, let's assume that the "bool" type is defined like this

    typedef bool = enum(false, true);

then the following declaration

    bool b = :false;

would search for the identifier "true" in the bool typedef. Type identifiers
may also reference functions or data definitions in the namespace (and nested
namespaces) of the target type's definition. If the target type is not a
typedef, then the search for a typedef will continue according to the following
rules, which are applied recursively:

 - **pointer type**: continue with the type that the pointer type points to.
 - **optional type**: continue with the value type.
 - **array type**: continue with the element type.
 - **parametric type**: bind the type parameters and continue with the base type
   (which should be a typedef).
 - **enum type**: search among the identifiers of the values.
 - otherwise, stop and report an error.

Numbers
-------
LRL currently supports decimal and hexadecimal integers, as well as floating
points numbers. The type of an integer is determined by the number, larger
numbers need larger integer types, and signed numbers need signed integer
types. For example:

      0     // type is eint8 (sign-less 8-bit integer)
      127   // type is eint8
      128   // type is uint8 (unsigned 8-bit integer)
     -128   // type is int8  (signed 8-bit integer)

    // Hexadecimal is written like this
     0x7f   // type is eint8
    -0x80   // type is int8
     0xff   // type is uint8
    0x100   // type is eint16

Optionally, numbers may be written with thousands separators like this:

    1_000_000

See types.md for more information about the numeric types. There are also
floating point number literals. For these you need to explicitly specify the
type using an as-expression:

    3.14 as float
    3.14 as float32
    3.14 as float64
    3.14 as cfloat

Floating point numbers may also be written with thousands separators, and may
in addition have an power-of-ten exponent:

    1_000.0
    4_321.987_654
    1e5             // 1 * 10^5
    1.234e6         // 1.234 * 10^6 = 1 234 000
    1_234.567e3     // 1 234.567 * 10^3 = 1 234 567

For NaN and Inf, see "Special values".

Strings
-------
A string is an array of bytes, and is written like this:

    "Hello"

They are also directly translated to arrays of bytes in LRL. The encoding of
the source files should always be UTF-8, so non-ASCII characters will use
several bytes in the array. These bytes will have a byte value >= 128.

**TODO:** Currently, strings are translated to a byte pointer. Change to an array? But strings should be null-terminated if the length is not specified (e.g. if we take the address directly, as in @"Hello")
**TODO:** Enforce that strings (and identifiers) are actually UTF-8

Special characters such as quotes, newlines and backslashes can be escaped
with a backslash. For example "Double quote \"\nBackslash \\". The following
escape sequences are supported:

    \\      Backslash
    \"      Double quote
    \a      Alarm character (0x07)
    \b      Backspace character (0x08)
    \f      Form feed character (0x0C)
    \n      Linefeed character (0x0A)
    \r      Carriage return character (0x0D)
    \t      Tab character (0x09)
    \v      Vertical tab character (0x0B)
    \xNN    Raw hexadecimal byte 0xNN
    \uNNNN  Unicode character, 4 hexadecimal digits
    \UNN... Unicode character, 8 hexadecimal digits
    /\*     Start of comment /* (which must be escaped)
    *\/     End of comment */ (which must be escaped)

Arrays
------
A literal array value is written in square brackets, for instance like this:

    [1, 2, 3, 4]

The type of the array elements is determined by the target type. For instance,
in the following example the element values will be int32 values even though
they could be eint8 values:

    int32#[4] data = [1, 2, 3, 4];

Array values can be nested:

    byte#[4,2] data = [[1,2], [2,3], [3,4], [4,5]];
    
It's allowed to put a comma after the last element, which is useful when an
array spans multiple lines. For example, like this:

    byte#[4,2] data = [
                       [1,2,],
                       [2,3,],
                       [3,4,],
                       [4,5,],
                      ];

Structs
-------
Struct values are written in parentheses with elements separated by commas,
like this:

    (1, :true, 2, "test", 3.0, 4)

Just like array values, struct values may be nested and a final trailing comma
is allowed. Also, in structs with exactly one element you **have** to put a
trailing comma (so the parser can distinguish it from a grouping parenthesis).
For example:

    (123,:true,)  // trailing comma
    (123,)        // trailing comma is required if there's exactly one element
    ((1,2,3),1)   // nested struct
    ()            // empty struct

The types of the elements is determined by the members of the target type,
which must be a struct type. For example:

    (int, bool) a = (x, y);   // x must be an int, y must be a bool

Special values
--------------
In addition, LRL supports these special literal values:

 - **undefined** - this value can be used to indicate that the value should
   never be read from. It can be used by code analyzers to check that this
   is in fact the case.
 - **none** - this special value can be used with optional types, but also
   with "raw pointers" (see types.md). It's analogous to null, NULL or nil
   in other programming languages.
 - **NaN** - This is a special floating point value, which stands for
   "Not A Number". This is used to indicate an error in a calculation.
 - **Inf** - This is another a special floating point value. It is used to
   represent infinity, and can be both positive and negative (i.e. +Inf and
   -Inf, though the + is optional).

Operators
=========

Arithmetic Operators
--------------------
Syntax:

    expr + expr
    expr - expr
    expr * expr
    expr / expr
    expr mod expr
    -expr
    +expr

**TODO** special cases?
**TODO** overflow/undeflow is not allowed, except in wuint types.
**TODO** All operations except for "mod" require the target type to be known.
**TODO**

Binary Operators
----------------
Syntax:

    expr bitand expr
    expr bitor expr
    expr bitxor expr
    expr << expr
    expr >> expr
    compl expr

**TODO** bitwise operations are not allowed on signed types
**TODO** and compl and right shift is, in addition, not allowed on eint types (because the backing type could have sign bit)
**TODO** check that "uint16 u = compl 1;" works, since 1 is an eint8
**TODO**

Boolean Operators
-----------------
Syntax:

    not expr
    expr and expr
    expr or expr
    expr xor expr

**TODO**

Comparison Operators
--------------------
Syntax:

    expr == expr
    expr != expr
    expr < expr
    expr <= expr
    expr > expr
    expr >= expr

**TODO** what rules apply to the types? e.g. you should be able to compare int/byte but not (int)/(byte) or int^/byte^
**TODO**

Assignment Operators
--------------------
Syntax:

    expr = expr
    expr += expr
    expr -= expr
    expr *= expr
    expr /= expr
    expr <<= expr
    expr >>= expr

**TODO** Describe what the left hand side expression is, and what types of expressions are allowed.
**TODO** All subexpressions are evaluated exactly once.
**TODO** Assignment operations may not be nested inside expressions.
**TODO** Multiple assignment is allowed with the = operator. Then the last expression will be read, and will be assigned in any order to the other expressions.
**TODO**

Address-of Operator
-------------------
Syntax:

    @expr

**TODO**

Dereference Operator
--------------------
Syntax:

    expr^

**TODO**

Size and Offset Operators
-------------------------
Syntax:

    sizeof expr
    minsizeof expr
    alignof expr
    offsetof expr

**TODO** what about determining the size of a type. Should there something like "sizeof type int"?
**TODO**

enumbase Operator
-----------------
Syntax:

    enumbase expr

Extracts the base value of an enum value. For example:

    typedef Color = int enum (red=1, green=2, blue=3);
    
    Color g = :green;
    int i = enumbase g;

The operand must be an enum type, but there's no specific target type
(since the operand can be of any enum type). Note that the default base type
of enums is "count" and not "int".

makeopt Operator
----------------
Syntax:

    makeopt expr

Wraps the expression in an optional value type. The target type of the operand
is that of the target types value. The result is an optional type.
Here's an example:

    bool? b = makeopt :false;

**TODO** The C backend doesn't support makeopt for non-pointer types.

Optional Operator
-----------------
Syntax:

    expr?

Extracts the value of an optional value. If the the expression is none, then
that's an error and the behavior is undefined. The target type of the operand
is an optional type of the target type of the operator.

then-else Operator
------------------
Syntax:

    condexpr then trueexpr else falseexpr

Evaluate either _trueexpr_ or _falseexpr_, depending on the value of
_condexpr_. The result of the expression is that of the expression that was
evaluated.

The expression _condexpr_ must be of bool type. The expressions _trueexpr_ and
_falseexpr_ have the same target type as the then-else expression.

Special Expressions
===================

Array Index Operation
---------------------
Syntax:

    arrayexpr#[indexexpr]
    arrayexpr#[indexexpr,indexexpr...]

An array index operation references the element at the given index in the
array. Array indexes are zero-based, so the allowed range of array indexes
in an array of type int#[5] is 0 to 4.

The index expression must be of type "count", or a strictly smaller
type (which must be smaller on any platform, which e.g. "int" isn't
necessarily). Otherwise, you can use a **typeassert** expression:

    int i = 1;
    byte#[5] arr = [5,4,3,2,1];
    
    byte x = arr#[i typeassert count]; // convert i into a count type

**TODO** it would be nice to have a bounds check at the same time (and/or index types, e.g. typedef arr = byte#[5 indextype arrlentype])

See the section on the typeassert operation for more information.

Elements in nested arrays may be accessed by using comma-separated indexes,
like this:

    int#[3,2] nested = [[00,01], [10,11], [20,21]];
    
    int x = nested#[2,1]; // = 21
    int y = nested#[2]#[1]; // This is equivalent to the expression above

Out of bounds accesses is an error and has undefined behavior. However, it is
allowed to have a pointer point to the element (which doesn't exist) after
the last element. In the example this index is 3 in the outer
array and 2 in the inner arrays. But referencing anything inside a such
non-existent element is not allowed. For example:

    int#[3,2] nested = [[00,01], [10,11], [20,21]];
    
    int#[2]^ optr1 = @nested#[0]; // Pointer to normal array index
    int#[2]^ optr2 = @nested#[3]; // Pointer to index one step
                                  // after the last one
    int^ iptr1 = @nested#[0,2]; // Pointer to index one step after
                                // the last one in the inner array.
    int^ iptr2 = @nested#[3,2]; // ERROR! Can't reference anything inside
                                // a non-existent element. This has undefined
                                // behavior.

Struct Member Operation
-----------------------
Syntax:

    structexpr.membername

The struct member operator is similar to the array index operator, except
that it works on structs. It's used to access a value in a struct.
Here's an example:

    typedef Point = (int x, int y);
    
    () test() {
        var Point point;
        
        // Set a value in the struct
        point.x = 123;
        
        // Read a value from the struct
        int x = point.x;
    }

Function Member Operation
-------------------------
Syntax:

    expr->functionname

**TODO** maybe change to expr:>functionname or expr.:functionname
**TODO** also the current syntax makes it harder to add a -- operator
         (but x-->y can only be parsed as x-- > y)
**TODO** should it look up typedefs instead of structs?
**TODO**

Call Operation
--------------
Syntax:

    functionexpr()
    functionexpr(x)
    functionexpr(x,y)
    ...

**TODO**

as Operation
------------
Syntax:

    expr as type

This operation sets the target type. It is useful in situations where the
target type can't be determined, such as in comparison expressions:

    int a = 1;
    int b = 2;
    
    if a + b as int == 5-2 as int {
        // do something
    }

It is only there to help the type checker determine the type. It can't be
used to cast a type into another type. For that, use the typeassert
expression.

typeassert Operation
--------------------
Syntax:

    expr typeassert type

The typeassert operator casts the value of the expression into the given
type, if the type can hold that value. Unlike the typeassert statement, the
typeassert expression offers no way of handling cases where the value is not
compatible with the type (e.g. out of range). Using the typeassert expression
on incompatible values is an error, and produces undefined behavior.

Here's an example:

    int i = 3;
    byte b = i typeassert byte; // int to byte
    
    int#[5] arr = [5,4,3,2,1]
    int v = arr#[i typeassert count]; // int to count

Note that the result type of the expression to cast must be known. If not,
please use the "as" expression, like this:

    int i = 3;
    // the type of "i + 1" is not known
    byte b = i + 1 as int typeassert byte;