diff options
author | Samuel Lidén Borell <samuel@kodafritt.se> | 2024-06-02 21:20:48 +0200 |
---|---|---|
committer | Samuel Lidén Borell <samuel@kodafritt.se> | 2024-06-02 21:20:48 +0200 |
commit | 580bf6130632f6855fddeea7b07c8401c56108f2 (patch) | |
tree | 4bd5e7cdb68408c52ad8df030f7f887c7d97def0 | |
parent | db73835b12f41be8766384a1cdcc34a0848354dc (diff) | |
download | slul-main.tar.gz slul-main.tar.bz2 slul-main.zip |
-rw-r--r-- | notes/comparison_semantics.txt | 78 | ||||
-rw-r--r-- | notes/compilation_failure_testing.txt | 31 | ||||
-rw-r--r-- | notes/deref_syntax.txt | 17 | ||||
-rw-r--r-- | notes/goto_local_labels.txt | 27 | ||||
-rw-r--r-- | notes/numeric_operations_type.txt | 28 | ||||
-rw-r--r-- | notes/numeric_types.txt | 2 | ||||
-rw-r--r-- | notes/simple_to_parallel.txt | 83 | ||||
-rw-r--r-- | notes/slul2.txt | 340 | ||||
-rw-r--r-- | notes/usability_improvements.txt | 66 |
9 files changed, 671 insertions, 1 deletions
diff --git a/notes/comparison_semantics.txt b/notes/comparison_semantics.txt new file mode 100644 index 0000000..44ed548 --- /dev/null +++ b/notes/comparison_semantics.txt @@ -0,0 +1,78 @@ +Comparison semantics / typing +============================= + +In most other proglangs, the terms get assigned types, and then the outer +expressions are recursively assigned types. This means that deciding the +integer type sizes/signedness in comparison operations is trivial there. + +In SLUL, it works the other way around: First the outermost expression is +assigned a type (i.e. boolean for a comparison expression), and then the +types of the integer expressions have to be inferred somehow. + + +Options: +* Always require either that both sides to have an unambiguous type, or that + the left side has an unambiguous type and the right side is a literal. +* Use the largest type of any term, and report an error for mixed signedenss + within one side. (Literals get promoted to this type, and an error is + reported if the literal value is not in range). + - Will easilly trap with small types such as bytes. +* Use the type of all terms, i.e. require all terms to have exactly the same + type. + - Will easilly trap with small types such as bytes. +* Like either of the above, but never use types smaller than int/uint. + + +Ortogonally, there is also the question whether if and how comparisons with +mixed signedness should work: + +* Forbid +* Forbid, but allow if the signed operand is never negative. +* Promote to unsigned (like C). But this is confusing. +* Compare by value, i.e. out-of-range values are handled specially + (and always return true/false depending on the side and operator). + + +Revise type detection/promotion entirely? +----------------------------------------- + +There is a performance and usability problem with the current system for +type detection. For example, given the following expression: + + [3]byte a = ... + byte b = a[0] + a[1] + a[2] + +Currently, each addition has to be performed as a byte, and range-checked. +That is both annoying (because it could overflow and/or give range errors +at compile time) and also slow, because the compiler needs to insert +instructions to range check and/or to remove excess bits. + +It would be better if it was computed as an uint/int. + +To avoid confusion, maybe the byte/int16 types should only be allowed +in structs/arrays? (and maybe in function parameters). + +If calculations are done with higher bit-width, then there are some edge +cases that need to be tested: + + var byte u8 + var wuint16 w16 + var uint u + + u = w16 = (u8 * u8 * u8) + +The multiplications could yield a larger number than "w16" can hold. +In that case, "u" should still receive the non-truncated value. + + +How other languages handle integer promotion +-------------------------------------------- + +* C: + - Promote to larger + - Promote to unsigned + - This leads to strange behaviour in mixed-signedness comparisons +* Hare: + - Promote to larger (but limited for uintptr/size types) + - Mixed signedness is an error + - This also solves the comparison issue diff --git a/notes/compilation_failure_testing.txt b/notes/compilation_failure_testing.txt new file mode 100644 index 0000000..59e8c22 --- /dev/null +++ b/notes/compilation_failure_testing.txt @@ -0,0 +1,31 @@ +Compilation failure testing +=========================== + +With more advanced type systems and/or parameter constraints, it can be +useful to be able to check that those are enforced correctly. + +That could be done with some kind of special test. +There is apparently something like this being use in Rust, +see nolife/counterexamples. + +Should something like this be added to SLUL? +What should the syntax be like? + +Syntax 1: Extend long/nestable comments +--------------------------------------- + + #{{nocompile + func test() + { + # ERROR: .*Invalid code.* + invalid + } + #}} + +Syntax 2: Special keywords +-------------------------- + + noncompiling func test() + { + bad(invalid) + } diff --git a/notes/deref_syntax.txt b/notes/deref_syntax.txt new file mode 100644 index 0000000..eb3852d --- /dev/null +++ b/notes/deref_syntax.txt @@ -0,0 +1,17 @@ +Syntax for pointer dereferencing +================================ + +Current syntax in SLUL: + + deref p = 123 + int y = deref p + +Pascal-like syntax: + + p^ = 123 + int y = p^ + +Zig-like syntax: + + p.* = 123 + int y = p.* diff --git a/notes/goto_local_labels.txt b/notes/goto_local_labels.txt new file mode 100644 index 0000000..a91cbbe --- /dev/null +++ b/notes/goto_local_labels.txt @@ -0,0 +1,27 @@ +Local goto labels +================= + +This can be used to provide scope for goto labels and avoid accidentally +jumping to an unrelated block. + +Syntax idea (borrowed from the GCC extension): + + func f() + { + { + label skip + ... + goto skip + ... + skip: + ... + } + { + label skip + ... + goto skip + ... + skip: + ... + } + } diff --git a/notes/numeric_operations_type.txt b/notes/numeric_operations_type.txt new file mode 100644 index 0000000..53a760d --- /dev/null +++ b/notes/numeric_operations_type.txt @@ -0,0 +1,28 @@ +Which type should numeric operations use? + +Currently, it works like this: + + byte a = 250 + byte b = 10 + ... + byte x = (a + b) - 20 # this overflows, because a+b is computed as a byte + +And this is actually inefficient, because the operation would most likely be +performed using 32 or even 64 bit registers. The exception is when the +temporary variable for (a+b) is spilled to stack. + +Comparison to other languages +----------------------------- + +C (from reading the ANSI C spec): + - performs operations as int or a larger type + - promotes to the largest operand (appears to not care about the result type) + - unsigned wins when there are operands with mixed signedness + +Rust (from reading random blogs): + - all integer literals need to have a suffix, e.g. u8 + (except for the default, which I think is int?) + - all casts, even narrowing ones, have to be explicit + +FreePascal (don't remember this exactly): + - Promotes to Int64 in mixed-sign comparisons diff --git a/notes/numeric_types.txt b/notes/numeric_types.txt index 65aeae9..837d7d2 100644 --- a/notes/numeric_types.txt +++ b/notes/numeric_types.txt @@ -11,10 +11,10 @@ Definitely YES: Yes: int (= min(machine-word-size, 32), to avoid larger memory usage for arrays/structs on 64 bit platforms) usize - fileoffs Maybe: ssize + fileoffs Maybe but unlikely: intN/uintN/wintN for arbitrary N up to some limit diff --git a/notes/simple_to_parallel.txt b/notes/simple_to_parallel.txt new file mode 100644 index 0000000..c46bc59 --- /dev/null +++ b/notes/simple_to_parallel.txt @@ -0,0 +1,83 @@ +From simple to parallel +======================= + +Single-threaded code have some advantages: + +* Can modify data in place without issues +* Does not need synchronization +* Overall, it can be lightweight and simple + +Can code be made to be usable in both simple single-threaded code +as well as in multi-threaded code? + +(Note that single-threaded includes some forms of "shared nothing" +parallelism). + +In-place mutation vs. copy vs. specialized multi-threaded code +-------------------------------------------------------------- + +(The latter might be using some kind of multi-threading capable data structure) + +Let's say that there is a function that appends "=true" to a string if it does +not already end with "=true". + +This can be done in several ways: + +* In-place modification, if the string has sufficient capacity. +* Allocating a new string and writing the result string there. +* Using a thread-safe refcount to select the best strategy. +* Using some kind of data structure that allows this operation + to work without copying. E.g. a chain of delta encoded updates + to the string. This could be done in a lock-free/opportunistic way: + - Find the last delta entry of the string. + - Parse the string, check if it ends with "=true". + - Generate a "delta entry" to append. + - Atomically swap the pointer in the last delta entry + from null to the newly generated delta entry. + - If the swap operation fails, restart the process. + +The idea +-------- + +Can we generate "combined single/multi-threaded" code from the same source +code, and have the compiler (and/or runtime) select the appropriate code +variant? + +Related: Combined code for mutator methods: +1. pre-allocated result (with mandatory usage) +2. pre-allocated result (with optional usage, i.e. input lifetime is at least as long) +3. arena-allocated result +4. shallow-ref in-place modification (with existing refs) +5. deep-copy in-place modification (with copies of refs) + +For example, given source code like this: + + func String.append_if_absent(String suffix) -> String + { + if this.ends_with(suffix) { + return this + } else { + return .concat(this, suffix) + } + } + +It might be transformed into: + + func String.append_if_absent(arena, ref var String placement, String suffix) -> String + { + String result + transaction_start(arena, .[this]) + do { + if this.ends_with(suffix) { + result = .copy_if_needed(arena, placement, this) + } else { + # .concat selects the requested method 1-5 + result = .concat(arena, placement, this, suffix) + } + } while not transaction_commit(arena) + return result + } + +Apparently. something similar is done by Swift already and is called "value semantics". +- ...and apparently, it can still cause unintended behavior when a value + gets copied and then silently discarded. diff --git a/notes/slul2.txt b/notes/slul2.txt new file mode 100644 index 0000000..9d103d0 --- /dev/null +++ b/notes/slul2.txt @@ -0,0 +1,340 @@ +SLUL2 +===== + +Making SLUL: +* easier to use +* easier to implement +* more future-proof / more portable + +Desirable changes: + +* Revise ref + - forbidding refs to non-struct/non-array types might enable some optimizations + - *removing* explicit refs would definitely have impacts on usability. + not sure if good or bad. it makes the proglang more implicit/"magic", + which can be a bad thing. +* Implicit arenas? + - and for mutating methods, use the 4 "allocation variants": (inspired by Vale) + 1. placement new (uses arena only for "indirect"/references fields) + 2. arena new (allocate in given arena) + 3. modify self (in-place modification. Can keep referenced data) + 4. discarding self-mod (in-place modification. Cannot keep referenced data) + some of these are applicable to constructors also +* Garbage-collected areanas? + - I.e. local GC + - It would remove "anxiety" around memory allocation + - Downside 1: It requires meta-data, which isn't needed with plain + arena allocation. + - Downsides N: The usual downsides with GC +* Revise expr integer types + What are the use-cases for "non-plain ints"? + - length type / ssize/usize + - byte/int16 arrays + - small/bitsliced fields in structs + - fixed-size wrapping uints (e.g. for hash functions). +* Move some stuff from hard-coded syntax to code? e.g. like Scheme, REBOL, Nim. + - might actually work with statements: + - they can only appear inside function bodies + - so the toplevels are available, and their types are known. + - But it would definitely need inlining to work with reasonable speed. + - Perhaps a bad idea after all? +* Misc syntax stuff: + - Use tabs instead of spaces? + But this gets tricky with alignment of e.g. parameters. + + +Control statements defined in library module headers +---------------------------------------------------- + +These need some kind of analysis of the IR to check varstates (liveness, etc). +It also needs to handle nested if-elseif-elseif... + +So perhaps this is a bad idea? + +Example: + + statement "if" Expr cond Statements true_block "else" Statements false_block + { + cond + CONDJUMP FALSE false_block + true_block + JUMP end + false_block + end + } + statement + +Super-simple proglang +--------------------- + +Only two/three kinds of typedefs. Not allowed as anonymous types (or? it is useful for e.g. return values) + + record SomeStruct { + int field1 + OtherType field2 # <--- compiler chooses whether to put in ref or not + # this makes FFI trickier. But non-closed types are always refs. + # this also makes lifetimes and aliasing trickier. + # maybe "var" should not be allowed to alias? + } + + enum SomeEnum { + ... + } + + # Maybe some kind of sum/union/variant type + record ExprNode { + ExprType type + int line + int column + switch type { + case .unary + case .binary + Expr operand_a + if type == .binary { + Expr operand_b + } + case .call + Expr func_expr + int num_args + # have a built-in list type? + # and choose the best possible representation? + # (in this case it's runtime-determined frozen-length, so it could be a pointer to an array. or a full-blown list type) + int[num_args] args + } + } + + # Maybe some kind of constraints + func process_op(ExprNode<.type in (.unary, .binary)> expr) + func process_op(ExprNode expr [.unary, .binary]) + func process_op(ExprNode(.unary .binary) expr) + func process_op(ExprNode<.unary .binary> expr) + func ExprNode<.unary .binary>.process_op() + func ExprNode.process_op() + for (.unary, .binary) + func ExprNode.process_op() + with (.unary, .binary) + func ExprNode.process_op() + this in (.unary, .binary) + func ExprNode.process_op() + given type == .unary or type == .binary + +Qualifiers for records and enums: + + record Point closed { # (require a newline here?) + int x + int y + # no more fields can be added. allows some optimizations, such as call-by-value / embedding into structs + } + + enum SubPixel closed { + .red + .green + .blue + } + +Enums can have a base type and/or integral values also +(this is mainly useful for FFI) + + enum StatusByte closed byte { + .ready = 10 + .running = 20 + .failure = 90 + } + +Integer / elementary types: + +* Perhaps even use variable-size integers? + The downside is that += 1 etc. might require allocation. + +Methods: + +* Skip "this". But disallow shadowing. + +Type identifiers + +* For consistency, always include the "." in typeidentifiers, even in + e.g. enum definitions. +* Constructors are maybe not that intuitive (can they be improved?): + + func .new(int a, int b) -> Thing + + +Avoiding punctuation: + +* Can the . in typeidentifiers be skipped? +* Can the () in function calls be skipped? + - if the function call fits on one line + - (unless a comma is required between them) and the parameters are terms + - and the function call is not nested inside + a function call, field or index expression. + - related: tuples. but that would be ambiguouos if used as function arguments +* Can the () in function declarations be skipped? + + func example + int a + int b + return bool + { + if a == b { + otherfunc a, 123 + return true + } + + } + +Can refs be avoided? + + # objects: + # These are always passed by reference. + # References can be compared with "ref_is" or "is" or a similar operator. + # The "==" and "!=" operators are not allowed (maybe it should be allowed to implement them? e.g. with a method called "equals"?) + type Box = object { + # These are references: + Item a + Item b + # Perhaps allow syntax like this: + Item a1, b1 + Item a1, Item b1 + # Regarding tuples: + # I think that maybe they CAN be references if it too large to use values :) + # - We can require that if the object is mutable, it must also be passed by arena-ref. + # - Tuples up to some certain size could be embedded / passed by value + # (Check the optimal limit. It's at least the size of two pointers, but it could be larger) + # - Tuples allocated in the *same* arena can just be referenced directly! + # (this should be fairly simple and fast to check). + # - If each thread uses a contiguous virtual-memory block, + # then this would be a trivial range check. + # - Tuples allocated by the same thread and in SLUL code, can + # (as an optional optimization) be referenced if + # 1) the lifetime allows it (how to check this at runtime?), or + # 2) the runtime uses garbage collection, and can perform GC in + # this case. + # - Tuples allocated in SLUL code from other threads may or may not + # be possible to reference depending on whether the runtime + # supports cross-thread GC. For consistency accross implementations, + # it might be better to just re-allocate/copy in this case. + # - Other tuples would require a copy. (This is really a requirement + # for tuples allocated from C code, unless it uses SLUL's arena + # allocation functions in slulrt.) + # + LargeValue large + } + # opaque objects: + # - Like objects, but fields (and layout/size) are inaccessible + # - Lacks {} and has the layout defined in the impl, just like a function can have it's body in the impl + # - Perhaps it should be forbidden to have non-opaque objects in interfaces? It's generally an anti-pattern. + type Item = object + # tuples: + # - The ABI decides when to pass these by ref or value + # - Reference comparison operators are not allowed. + # - The contents can be compared with the "==" and "!=" operators. + # - Tuples can't be opaque/private. + type Point = (int x, int y) + type Point = (int x, y) # perhaps allow this syntax as well (...and multi-line syntax without comma also) + type LargeValue = ([10000]byte buffer) + # For type-scoped functions that return an object ("constructors"): + # - They implicitly take an arena parameter + # - The returned reference is an arena reference + func .new() -> Box + constructor Box.new() # maybe type-scoped functions should have this syntax? + # Return values in methods have the same lifetime as the object itself + # - Should the this parameter be "var"? + # - Should the this parameter be "arena"? + func Box.get_contents() -> Item + # Parameters do not implicitly transfer ownership. Inside the callee, the lifetime of "other" ends when the function returns. + # - Should the parameter be "arena"? + func Box.equals(Box other) -> bool + # Parameters can be marked with "keep" to allow shared ownership + func Box.set_contents(keep Item contents) + func Box.set_contents!(keep Item contents) # perhaps there should be a ! for functions that modify the object? + func var Box.set_contents(keep Item contents) # or a qualifier like this. + # Parameters can be passed as "var" + # - if passed as "keep", we need exclusive access (or the item can be marked as aliased) + fucn Box.squeeze_item(keep var Item item) + fucn Box.squeeze_item(keep aliased var Item item) + # To have mulitple outputs from a function, use a tuple as the return value: + # - The ABI decides when to pass these by ref (implicit parameter) or value + # - Because tuples can't be opaque, the return value could be returned by value. + # - Because tuples can't be opaque, the *caller* allocates (on stack) if it's not possible to pass by value. + func Box.get_both_items() -> (Item a, Item b) + + # How should function references work? + # - What keyword to use when there are no refs? + # - Most of the time, you want a context-parameter + # - For non-ref (or slot) types, you may want a (reference, length) to process multiple items at once. + func Box.process_contents(delegate(Item item) handler) + +Should the builtin types use TitleCase names also? + + Probably yes, for consistency. + + String + Byte + Int16 + + Java developers might confuse these with reference types, though :( + And worse, as a Java developer, you might start using e.g. Byte + where it should be byte in your Java code. That will usually silently + compile without any warnings, but can be broken (with == != operators) + or slow. + + Solution: + + Use different names: + - byte -> UInt8 or U8 + - int -> Integer (or even skip this type, and have only fixed-sized Int*/UInt*) + + Regarding the extra finger strain to hold shift and stretching out the + finger to push the letter button: That could be solved by having the IDE + auto-capitalize the type if it exists and a type is expected at the given + location. + + +Should/can there be a arbitrary-sized integer type? + + E.g. allow integers -16384..16383 to be stored directly, and use a + reference for larger integers. + + What should it be called? + + num + int + intn + integer + BigNum + BigInt + Num + Int + IntN + Integer + + The compiler could optimize it to a more efficient type if the range + is known! + + num i = get_number() # since it is immutable, we can infer the type from the return value of get_number() + + Maybe it should be possible to specify a range? What syntax to use? + + var num<0..=10> i = 0 + var num<0 upto 10> i = 0 + var num i [[0 <= value <= 10]] = 0 + var num<0-10> i = 0 # but "-" is also the minus operator :( + var num<0~10> i = 0 + +Print function: + + How simple can it be, without creating confusion/problems or hard-coding things? + + out.print("number: {}", .[123]) # array constructor + out.print("number: {}", 123) # safe variant-type var-arg + out.print "number: {}", 123 # allowing () to be skipped (in some cases) + out "number: {}", 123 # allowing a default function on objects + out("number: {}", 123) # allowing a default function on objects, but without allowing () to be skipped + # error handling? + + Input streams could also have a default function. + But it would be limited to only reading e.g. a line. + (That's probably what iterators should do as well.) + What should it do on error? + + string s = in() # reads a line diff --git a/notes/usability_improvements.txt b/notes/usability_improvements.txt index 1da5546..5935860 100644 --- a/notes/usability_improvements.txt +++ b/notes/usability_improvements.txt @@ -657,3 +657,69 @@ Perhaps some interactive functions that can work either in a GUI or in a CLI: This prevents both security and portability issues. - Similarly, strange Unicode characters (RTL, control, etc) should be replaced with replacement characters. + +Avoid special characters: Module headers +---------------------------------------- + +Instead of \ for module headers: + + \slul 0.0.0 + \name test + \version 0.1.0 + +It could be a ":" at the end: + + slul: 0.0.0 + name: test + version: 0.1.0 + +(Maybe some module headers should be renamed to better +work as "attribute:" rather than "\directive") + + +Avoid special characters: Type identifiers? +------------------------------------------- + +Can this be done at all? Is it a good idea? + +Currently: + + ref Thing t = .new(arena) + t.set_type(.a) + t.set_flags(.visible=true, .enabled=false) + +Could/should the dots be skipped? + + ref Thing t = .new(arena) + t.set_type(.a) + t.set_flags(.visible=true, .enabled=false) + + +arena-refs vs non-arena refs +---------------------------- + +This could be confusing, because no major language has arenas. + + ref Thing t1 = ... + arena Thing t2 = ... + + +Add tuple type and disallow it to contain certain types? +-------------------------------------------------------- + +A tuple type could be useful for e.g. multiple return values: + + func Thing.do_stuff() -> (int x, int y) + +Tuples: + +* Can be initialized with or without (e.g. (1,0)) field names +* Can be compared +* Can't contain funcrefs +* Can't contain structs directly +* Can't contain arrays of funcrefs/structs + +Structs: + +* Can only be initialized with field names, e.g. (.x=1,.y=0) +* Can't be compared? |