Notes: Usability, references, numeric types / comparisons, etc.HEAD main

author: Samuel Lidén Borell <samuel@kodafritt.se> 2024-06-02 21:20:48 +0200
committer: Samuel Lidén Borell <samuel@kodafritt.se> 2024-06-02 21:20:48 +0200
commit: 580bf6130632f6855fddeea7b07c8401c56108f2 (patch)
tree: 4bd5e7cdb68408c52ad8df030f7f887c7d97def0 /notes
parent: db73835b12f41be8766384a1cdcc34a0848354dc (diff)
download: slul-main.tar.gz
slul-main.tar.bz2
slul-main.zip
9 files changed, 671 insertions, 1 deletions
diff --git a/notes/comparison_semantics.txt b/notes/comparison_semantics.txt
new file mode 100644
index 0000000..44ed548
--- /dev/null
+++ b/notes/comparison_semantics.txt
@@ -0,0 +1,78 @@
+Comparison semantics / typing
+=============================
+
+In most other proglangs, the terms get assigned types, and then the outer
+expressions are recursively assigned types. This means that deciding the
+integer type sizes/signedness in comparison operations is trivial there.
+
+In SLUL, it works the other way around: First the outermost expression is
+assigned a type (i.e. boolean for a comparison expression), and then the
+types of the integer expressions have to be inferred somehow.
+
+
+Options:
+* Always require either that both sides to have an unambiguous type, or that
+  the left side has an unambiguous type and the right side is a literal.
+* Use the largest type of any term, and report an error for mixed signedenss
+  within one side. (Literals get promoted to this type, and an error is
+  reported if the literal value is not in range).
+    - Will easilly trap with small types such as bytes.
+* Use the type of all terms, i.e. require all terms to have exactly the same
+  type.
+    - Will easilly trap with small types such as bytes.
+* Like either of the above, but never use types smaller than int/uint.
+
+
+Ortogonally, there is also the question whether if and how comparisons with
+mixed signedness should work:
+
+* Forbid
+* Forbid, but allow if the signed operand is never negative.
+* Promote to unsigned (like C). But this is confusing.
+* Compare by value, i.e. out-of-range values are handled specially
+  (and always return true/false depending on the side and operator).
+
+
+Revise type detection/promotion entirely?
+-----------------------------------------
+
+There is a performance and usability problem with the current system for
+type detection. For example, given the following expression:
+
+    [3]byte a = ...
+    byte b = a[0] + a[1] + a[2]
+
+Currently, each addition has to be performed as a byte, and range-checked.
+That is both annoying (because it could overflow and/or give range errors
+at compile time) and also slow, because the compiler needs to insert
+instructions to range check and/or to remove excess bits.
+
+It would be better if it was computed as an uint/int.
+
+To avoid confusion, maybe the byte/int16 types should only be allowed
+in structs/arrays? (and maybe in function parameters).
+
+If calculations are done with higher bit-width, then there are some edge
+cases that need to be tested:
+
+    var byte u8
+    var wuint16 w16
+    var uint u
+
+    u = w16 = (u8 * u8 * u8)
+
+The multiplications could yield a larger number than "w16" can hold.
+In that case, "u" should still receive the non-truncated value.
+
+
+How other languages handle integer promotion
+--------------------------------------------
+
+* C:
+    - Promote to larger
+    - Promote to unsigned
+    - This leads to strange behaviour in mixed-signedness comparisons
+* Hare:
+    - Promote to larger (but limited for uintptr/size types)
+    - Mixed signedness is an error
+    - This also solves the comparison issue
diff --git a/notes/compilation_failure_testing.txt b/notes/compilation_failure_testing.txt
new file mode 100644
index 0000000..59e8c22
--- /dev/null
+++ b/notes/compilation_failure_testing.txt
@@ -0,0 +1,31 @@
+Compilation failure testing
+===========================
+
+With more advanced type systems and/or parameter constraints, it can be
+useful to be able to check that those are enforced correctly.
+
+That could be done with some kind of special test.
+There is apparently something like this being use in Rust,
+see nolife/counterexamples.
+
+Should something like this be added to SLUL?
+What should the syntax be like?
+
+Syntax 1: Extend long/nestable comments
+---------------------------------------
+
+    #{{nocompile
+    func test()
+    {
+        # ERROR: .*Invalid code.*
+        invalid
+    }
+    #}}
+
+Syntax 2: Special keywords
+--------------------------
+
+    noncompiling func test()
+    {
+        bad(invalid)
+    }
diff --git a/notes/deref_syntax.txt b/notes/deref_syntax.txt
new file mode 100644
index 0000000..eb3852d
--- /dev/null
+++ b/notes/deref_syntax.txt
@@ -0,0 +1,17 @@
+Syntax for pointer dereferencing
+================================
+
+Current syntax in SLUL:
+
+    deref p = 123
+    int y = deref p
+
+Pascal-like syntax:
+
+    p^ = 123
+    int y = p^
+
+Zig-like syntax:
+
+    p.* = 123
+    int y = p.*
diff --git a/notes/goto_local_labels.txt b/notes/goto_local_labels.txt
new file mode 100644
index 0000000..a91cbbe
--- /dev/null
+++ b/notes/goto_local_labels.txt
@@ -0,0 +1,27 @@
+Local goto labels
+=================
+
+This can be used to provide scope for goto labels and avoid accidentally
+jumping to an unrelated block.
+
+Syntax idea (borrowed from the GCC extension):
+
+    func f()
+    {
+        {
+            label skip
+            ...
+            goto skip
+            ...
+          skip:
+            ...
+        }
+        {
+            label skip
+            ...
+            goto skip
+            ...
+          skip:
+            ...
+        }
+    }
diff --git a/notes/numeric_operations_type.txt b/notes/numeric_operations_type.txt
new file mode 100644
index 0000000..53a760d
--- /dev/null
+++ b/notes/numeric_operations_type.txt
@@ -0,0 +1,28 @@
+Which type should numeric operations use?
+
+Currently, it works like this:
+
+    byte a = 250
+    byte b = 10
+    ...
+    byte x = (a + b) - 20   # this overflows, because a+b is computed as a byte
+
+And this is actually inefficient, because the operation would most likely be
+performed using 32 or even 64 bit registers. The exception is when the
+temporary variable for (a+b) is spilled to stack.
+
+Comparison to other languages
+-----------------------------
+
+C (from reading the ANSI C spec):
+    - performs operations as int or a larger type
+    - promotes to the largest operand (appears to not care about the result type)
+    - unsigned wins when there are operands with mixed signedness
+
+Rust (from reading random blogs):
+    - all integer literals need to have a suffix, e.g. u8
+      (except for the default, which I think is int?)
+    - all casts, even narrowing ones, have to be explicit
+
+FreePascal (don't remember this exactly):
+    - Promotes to Int64 in mixed-sign comparisons
diff --git a/notes/numeric_types.txt b/notes/numeric_types.txt
index 65aeae9..837d7d2 100644
--- a/notes/numeric_types.txt
+++ b/notes/numeric_types.txt
@@ -11,10 +11,10 @@ Definitely YES:
 Yes:
     int   (= min(machine-word-size, 32), to avoid larger memory usage for arrays/structs on 64 bit platforms)
     usize
-    fileoffs
 
 Maybe:
     ssize
+    fileoffs
 
 Maybe but unlikely:
     intN/uintN/wintN  for arbitrary N up to some limit
diff --git a/notes/simple_to_parallel.txt b/notes/simple_to_parallel.txt
new file mode 100644
index 0000000..c46bc59
--- /dev/null
+++ b/notes/simple_to_parallel.txt
@@ -0,0 +1,83 @@
+From simple to parallel
+=======================
+
+Single-threaded code have some advantages:
+
+* Can modify data in place without issues
+* Does not need synchronization
+* Overall, it can be lightweight and simple
+
+Can code be made to be usable in both simple single-threaded code
+as well as in multi-threaded code?
+
+(Note that single-threaded includes some forms of "shared nothing"
+parallelism).
+
+In-place mutation vs. copy vs. specialized multi-threaded code
+--------------------------------------------------------------
+
+(The latter might be using some kind of multi-threading capable data structure)
+
+Let's say that there is a function that appends "=true" to a string if it does
+not already end with "=true".
+
+This can be done in several ways:
+
+* In-place modification, if the string has sufficient capacity.
+* Allocating a new string and writing the result string there.
+* Using a thread-safe refcount to select the best strategy.
+* Using some kind of data structure that allows this operation
+  to work without copying. E.g. a chain of delta encoded updates
+  to the string. This could be done in a lock-free/opportunistic way:
+    - Find the last delta entry of the string.
+    - Parse the string, check if it ends with "=true".
+    - Generate a "delta entry" to append.
+    - Atomically swap the pointer in the last delta entry
+      from null to the newly generated delta entry.
+    - If the swap operation fails, restart the process.
+
+The idea
+--------
+
+Can we generate "combined single/multi-threaded" code from the same source
+code, and have the compiler (and/or runtime) select the appropriate code
+variant?
+
+Related: Combined code for mutator methods:
+1. pre-allocated result (with mandatory usage)
+2. pre-allocated result (with optional usage, i.e. input lifetime is at least as long)
+3. arena-allocated result
+4. shallow-ref in-place modification (with existing refs)
+5. deep-copy in-place modification (with copies of refs)
+
+For example, given source code like this:
+
+    func String.append_if_absent(String suffix) -> String
+    {
+        if this.ends_with(suffix) {
+            return this
+        } else {
+            return .concat(this, suffix)
+        }
+    }
+
+It might be transformed into:
+
+    func String.append_if_absent(arena, ref var String placement, String suffix) -> String
+    {
+        String result
+        transaction_start(arena, .[this])
+        do {
+            if this.ends_with(suffix) {
+                result = .copy_if_needed(arena, placement, this)
+            } else {
+                # .concat selects the requested method 1-5
+                result = .concat(arena, placement, this, suffix)
+            }
+        } while not transaction_commit(arena)
+        return result
+    }
+
+Apparently. something similar is done by Swift already and is called "value semantics".
+- ...and apparently, it can still cause unintended behavior when a value
+  gets copied and then silently discarded.
diff --git a/notes/slul2.txt b/notes/slul2.txt
new file mode 100644
index 0000000..9d103d0
--- /dev/null
+++ b/notes/slul2.txt
@@ -0,0 +1,340 @@
+SLUL2
+=====
+
+Making SLUL:
+* easier to use
+* easier to implement
+* more future-proof / more portable
+
+Desirable changes:
+
+* Revise ref
+    - forbidding refs to non-struct/non-array types might enable some optimizations
+    - *removing* explicit refs would definitely have impacts on usability.
+      not sure if good or bad. it makes the proglang more implicit/"magic",
+      which can be a bad thing.
+* Implicit arenas?
+    - and for mutating methods, use the 4 "allocation variants": (inspired by Vale)
+        1. placement new        (uses arena only for "indirect"/references fields)
+        2. arena new            (allocate in given arena)
+        3. modify self          (in-place modification. Can keep referenced data)
+        4. discarding self-mod  (in-place modification. Cannot keep referenced data)
+        some of these are applicable to constructors also
+* Garbage-collected areanas?
+    - I.e. local GC
+    - It would remove "anxiety" around memory allocation
+    - Downside 1: It requires meta-data, which isn't needed with plain
+      arena allocation.
+    - Downsides N: The usual downsides with GC
+* Revise expr integer types
+  What are the use-cases for "non-plain ints"?
+    - length type / ssize/usize
+    - byte/int16 arrays
+    - small/bitsliced fields in structs
+    - fixed-size wrapping uints (e.g. for hash functions).
+* Move some stuff from hard-coded syntax to code? e.g. like Scheme, REBOL, Nim.
+    - might actually work with statements:
+        - they can only appear inside function bodies
+        - so the toplevels are available, and their types are known.
+    - But it would definitely need inlining to work with reasonable speed.
+    - Perhaps a bad idea after all?
+* Misc syntax stuff:
+    - Use tabs instead of spaces?
+      But this gets tricky with alignment of e.g. parameters.
+
+
+Control statements defined in library module headers
+----------------------------------------------------
+
+These need some kind of analysis of the IR to check varstates (liveness, etc).
+It also needs to handle nested if-elseif-elseif...
+
+So perhaps this is a bad idea?
+
+Example:
+
+    statement "if" Expr cond Statements true_block "else" Statements false_block
+    {
+        cond
+        CONDJUMP FALSE false_block
+        true_block
+        JUMP end
+        false_block
+        end
+    }
+    statement
+
+Super-simple proglang
+---------------------
+
+Only two/three kinds of typedefs. Not allowed as anonymous types (or? it is useful for e.g. return values)
+
+    record SomeStruct {
+        int field1
+        OtherType field2    # <--- compiler chooses whether to put in ref or not
+                            # this makes FFI trickier. But non-closed types are always refs.
+                            # this also makes lifetimes and aliasing trickier.
+                            # maybe "var" should not be allowed to alias?
+    }
+
+    enum SomeEnum {
+        ...
+    }
+
+    # Maybe some kind of sum/union/variant type
+    record ExprNode {
+        ExprType type
+        int line
+        int column
+        switch type {
+        case .unary
+        case .binary
+            Expr operand_a
+            if type == .binary {
+                Expr operand_b
+            }
+        case .call
+            Expr func_expr
+            int num_args
+            # have a built-in list type?
+            # and choose the best possible representation?
+            # (in this case it's runtime-determined frozen-length, so it could be a pointer to an array. or a full-blown list type)
+            int[num_args] args
+        }
+    }
+
+    # Maybe some kind of constraints
+    func process_op(ExprNode<.type in (.unary, .binary)> expr)
+    func process_op(ExprNode expr [.unary, .binary])
+    func process_op(ExprNode(.unary .binary) expr)
+    func process_op(ExprNode<.unary .binary> expr)
+    func ExprNode<.unary .binary>.process_op()
+    func ExprNode.process_op()
+        for (.unary, .binary)
+    func ExprNode.process_op()
+        with (.unary, .binary)
+    func ExprNode.process_op()
+        this in (.unary, .binary)
+    func ExprNode.process_op()
+        given type == .unary or type == .binary
+
+Qualifiers for records and enums:
+
+    record Point closed {  # (require a newline here?)
+        int x
+        int y
+        # no more fields can be added. allows some optimizations, such as call-by-value / embedding into structs
+    }
+
+    enum SubPixel closed {
+        .red
+        .green
+        .blue
+    }
+
+Enums can have a base type and/or integral values also
+(this is mainly useful for FFI)
+
+    enum StatusByte closed byte {
+        .ready = 10
+        .running = 20
+        .failure = 90
+    }
+
+Integer / elementary types:
+
+* Perhaps even use variable-size integers?
+  The downside is that += 1 etc. might require allocation.
+
+Methods:
+
+* Skip "this". But disallow shadowing.
+
+Type identifiers
+
+* For consistency, always include the "." in typeidentifiers, even in
+  e.g. enum definitions.
+* Constructors are maybe not that intuitive (can they be improved?):
+
+    func .new(int a, int b) -> Thing
+
+
+Avoiding punctuation:
+
+* Can the . in typeidentifiers be skipped?
+* Can the () in function calls be skipped?
+    - if the function call fits on one line
+    - (unless a comma is required between them) and the parameters are terms
+    - and the function call is not nested inside
+      a function call, field or index expression.
+    - related: tuples. but that would be ambiguouos if used as function arguments
+* Can the () in function declarations be skipped?
+
+    func example
+        int a
+        int b
+        return bool
+    {
+        if a == b {
+            otherfunc a, 123
+            return true
+        }
+        
+    }
+
+Can refs be avoided?
+
+    # objects:
+    # These are always passed by reference.
+    # References can be compared with "ref_is" or "is" or a similar operator.
+    # The "==" and "!=" operators are not allowed (maybe it should be allowed to implement them? e.g. with a method called "equals"?)
+    type Box = object {
+        # These are references:
+        Item a
+        Item b
+        # Perhaps allow syntax like this:
+        Item a1, b1
+        Item a1, Item b1
+        # Regarding tuples:
+        # I think that maybe they CAN be references if it too large to use values :)
+        # - We can require that if the object is mutable, it must also be passed by arena-ref.
+        # - Tuples up to some certain size could be embedded / passed by value
+        #   (Check the optimal limit. It's at least the size of two pointers, but it could be larger)
+        # - Tuples allocated in the *same* arena can just be referenced directly!
+        #   (this should be fairly simple and fast to check).
+        #   - If each thread uses a contiguous virtual-memory block,
+        #     then this would be a trivial range check.
+        # - Tuples allocated by the same thread and in SLUL code, can
+        #   (as an optional optimization) be referenced if
+        #   1) the lifetime allows it (how to check this at runtime?), or
+        #   2) the runtime uses garbage collection, and can perform GC in
+        #   this case.
+        # - Tuples allocated in SLUL code from other threads may or may not
+        #   be possible to reference depending on whether the runtime
+        #   supports cross-thread GC. For consistency accross implementations,
+        #   it might be better to just re-allocate/copy in this case.
+        # - Other tuples would require a copy. (This is really a requirement
+        #   for tuples allocated from C code, unless it uses SLUL's arena
+        #   allocation functions in slulrt.)
+        #   
+        LargeValue large
+    }
+    # opaque objects:
+    # - Like objects, but fields (and layout/size) are inaccessible
+    # - Lacks {} and has the layout defined in the impl, just like a function can have it's body in the impl
+    # - Perhaps it should be forbidden to have non-opaque objects in interfaces? It's generally an anti-pattern.
+    type Item = object
+    # tuples:
+    # - The ABI decides when to pass these by ref or value
+    # - Reference comparison operators are not allowed.
+    # - The contents can be compared with the "==" and "!=" operators.
+    # - Tuples can't be opaque/private.
+    type Point = (int x, int y)
+    type Point = (int x, y) # perhaps allow this syntax as well (...and multi-line syntax without comma also)
+    type LargeValue = ([10000]byte buffer)
+    # For type-scoped functions that return an object ("constructors"):
+    # - They implicitly take an arena parameter
+    # - The returned reference is an arena reference
+    func .new() -> Box
+    constructor Box.new()   # maybe type-scoped functions should have this syntax?
+    # Return values in methods have the same lifetime as the object itself
+    # - Should the this parameter be "var"? 
+    # - Should the this parameter be "arena"? 
+    func Box.get_contents() -> Item
+    # Parameters do not implicitly transfer ownership. Inside the callee, the lifetime of "other" ends when the function returns.
+    # - Should the parameter be "arena"? 
+    func Box.equals(Box other) -> bool
+    # Parameters can be marked with "keep" to allow shared ownership
+    func Box.set_contents(keep Item contents)
+    func Box.set_contents!(keep Item contents)    # perhaps there should be a ! for functions that modify the object?
+    func var Box.set_contents(keep Item contents) # or a qualifier like this.
+    # Parameters can be passed as "var"
+    # - if passed as "keep", we need exclusive access (or the item can be marked as aliased)
+    fucn Box.squeeze_item(keep var Item item)
+    fucn Box.squeeze_item(keep aliased var Item item)
+    # To have mulitple outputs from a function, use a tuple as the return value:
+    # - The ABI decides when to pass these by ref (implicit parameter) or value
+    # - Because tuples can't be opaque, the return value could be returned by value.
+    # - Because tuples can't be opaque, the *caller* allocates (on stack) if it's not possible to pass by value.
+    func Box.get_both_items() -> (Item a, Item b)
+
+    # How should function references work?
+    # - What keyword to use when there are no refs?
+    # - Most of the time, you want a context-parameter
+    # - For non-ref (or slot) types, you may want a (reference, length) to process multiple items at once.
+    func Box.process_contents(delegate(Item item) handler)
+
+Should the builtin types use TitleCase names also?
+
+    Probably yes, for consistency.
+
+    String
+    Byte
+    Int16
+
+    Java developers might confuse these with reference types, though :(
+    And worse, as a Java developer, you might start using e.g. Byte
+    where it should be byte in your Java code. That will usually silently
+    compile without any warnings, but can be broken (with == != operators)
+    or slow.
+
+    Solution:
+
+    Use different names:
+    - byte -> UInt8 or U8
+    - int  -> Integer (or even skip this type, and have only fixed-sized Int*/UInt*)
+
+    Regarding the extra finger strain to hold shift and stretching out the
+    finger to push the letter button: That could be solved by having the IDE
+    auto-capitalize the type if it exists and a type is expected at the given
+    location.
+
+
+Should/can there be a arbitrary-sized integer type?
+
+    E.g. allow integers -16384..16383 to be stored directly, and use a
+    reference for larger integers.
+
+    What should it be called?
+
+        num
+        int
+        intn
+        integer
+        BigNum
+        BigInt
+        Num
+        Int
+        IntN
+        Integer
+
+    The compiler could optimize it to a more efficient type if the range
+    is known!
+
+        num i = get_number()  # since it is immutable, we can infer the type from the return value of get_number()
+
+    Maybe it should be possible to specify a range? What syntax to use?
+
+        var num<0..=10> i = 0
+        var num<0 upto 10> i = 0
+        var num i [[0 <= value <= 10]] = 0
+        var num<0-10> i = 0     # but "-" is also the minus operator :(
+        var num<0~10> i = 0
+
+Print function:
+
+    How simple can it be, without creating confusion/problems or hard-coding things?
+
+        out.print("number: {}", .[123])     # array constructor
+        out.print("number: {}", 123)        # safe variant-type var-arg
+        out.print "number: {}", 123         # allowing () to be skipped (in some cases)
+        out "number: {}", 123               # allowing a default function on objects
+        out("number: {}", 123)              # allowing a default function on objects, but without allowing () to be skipped
+        # error handling?
+
+    Input streams could also have a default function.
+    But it would be limited to only reading e.g. a line.
+    (That's probably what iterators should do as well.)
+    What should it do on error?
+
+        string s = in()     # reads a line
diff --git a/notes/usability_improvements.txt b/notes/usability_improvements.txt
index 1da5546..5935860 100644
--- a/notes/usability_improvements.txt
+++ b/notes/usability_improvements.txt
@@ -657,3 +657,69 @@ Perhaps some interactive functions that can work either in a GUI or in a CLI:
   This prevents both security and portability issues.
 - Similarly, strange Unicode characters (RTL, control, etc)
   should be replaced with replacement characters.
+
+Avoid special characters: Module headers
+----------------------------------------
+
+Instead of \ for module headers:
+
+    \slul 0.0.0
+    \name test
+    \version 0.1.0
+
+It could be a ":" at the end:
+
+    slul: 0.0.0
+    name: test
+    version: 0.1.0
+
+(Maybe some module headers should be renamed to better
+work as "attribute:" rather than "\directive")
+
+
+Avoid special characters: Type identifiers?
+-------------------------------------------
+
+Can this be done at all? Is it a good idea?
+
+Currently:
+
+    ref Thing t = .new(arena)
+    t.set_type(.a)
+    t.set_flags(.visible=true, .enabled=false)
+
+Could/should the dots be skipped?
+
+    ref Thing t = .new(arena)
+    t.set_type(.a)
+    t.set_flags(.visible=true, .enabled=false)
+
+
+arena-refs vs non-arena refs
+----------------------------
+
+This could be confusing, because no major language has arenas.
+
+    ref Thing t1 = ...
+    arena Thing t2 = ...
+
+
+Add tuple type and disallow it to contain certain types?
+--------------------------------------------------------
+
+A tuple type could be useful for e.g. multiple return values:
+
+    func Thing.do_stuff() -> (int x, int y)
+
+Tuples:
+
+* Can be initialized with or without (e.g. (1,0)) field names
+* Can be compared
+* Can't contain funcrefs
+* Can't contain structs directly
+* Can't contain arrays of funcrefs/structs
+
+Structs:
+
+* Can only be initialized with field names, e.g. (.x=1,.y=0)
+* Can't be compared?
author	Samuel Lidén Borell <samuel@kodafritt.se>	2024-06-02 21:20:48 +0200
committer	Samuel Lidén Borell <samuel@kodafritt.se>	2024-06-02 21:20:48 +0200
commit	580bf6130632f6855fddeea7b07c8401c56108f2 (patch)
tree	4bd5e7cdb68408c52ad8df030f7f887c7d97def0 /notes
parent	db73835b12f41be8766384a1cdcc34a0848354dc (diff)
download	slul-main.tar.gz slul-main.tar.bz2 slul-main.zip