aboutsummaryrefslogtreecommitdiffhomepage
diff options
context:
space:
mode:
authorSamuel Lidén Borell <samuel@kodafritt.se>2024-06-02 21:20:48 +0200
committerSamuel Lidén Borell <samuel@kodafritt.se>2024-06-02 21:20:48 +0200
commit580bf6130632f6855fddeea7b07c8401c56108f2 (patch)
tree4bd5e7cdb68408c52ad8df030f7f887c7d97def0
parentdb73835b12f41be8766384a1cdcc34a0848354dc (diff)
downloadslul-main.tar.gz
slul-main.tar.bz2
slul-main.zip
Notes: Usability, references, numeric types / comparisons, etc.HEADmain
-rw-r--r--notes/comparison_semantics.txt78
-rw-r--r--notes/compilation_failure_testing.txt31
-rw-r--r--notes/deref_syntax.txt17
-rw-r--r--notes/goto_local_labels.txt27
-rw-r--r--notes/numeric_operations_type.txt28
-rw-r--r--notes/numeric_types.txt2
-rw-r--r--notes/simple_to_parallel.txt83
-rw-r--r--notes/slul2.txt340
-rw-r--r--notes/usability_improvements.txt66
9 files changed, 671 insertions, 1 deletions
diff --git a/notes/comparison_semantics.txt b/notes/comparison_semantics.txt
new file mode 100644
index 0000000..44ed548
--- /dev/null
+++ b/notes/comparison_semantics.txt
@@ -0,0 +1,78 @@
+Comparison semantics / typing
+=============================
+
+In most other proglangs, the terms get assigned types, and then the outer
+expressions are recursively assigned types. This means that deciding the
+integer type sizes/signedness in comparison operations is trivial there.
+
+In SLUL, it works the other way around: First the outermost expression is
+assigned a type (i.e. boolean for a comparison expression), and then the
+types of the integer expressions have to be inferred somehow.
+
+
+Options:
+* Always require either that both sides to have an unambiguous type, or that
+ the left side has an unambiguous type and the right side is a literal.
+* Use the largest type of any term, and report an error for mixed signedenss
+ within one side. (Literals get promoted to this type, and an error is
+ reported if the literal value is not in range).
+ - Will easilly trap with small types such as bytes.
+* Use the type of all terms, i.e. require all terms to have exactly the same
+ type.
+ - Will easilly trap with small types such as bytes.
+* Like either of the above, but never use types smaller than int/uint.
+
+
+Ortogonally, there is also the question whether if and how comparisons with
+mixed signedness should work:
+
+* Forbid
+* Forbid, but allow if the signed operand is never negative.
+* Promote to unsigned (like C). But this is confusing.
+* Compare by value, i.e. out-of-range values are handled specially
+ (and always return true/false depending on the side and operator).
+
+
+Revise type detection/promotion entirely?
+-----------------------------------------
+
+There is a performance and usability problem with the current system for
+type detection. For example, given the following expression:
+
+ [3]byte a = ...
+ byte b = a[0] + a[1] + a[2]
+
+Currently, each addition has to be performed as a byte, and range-checked.
+That is both annoying (because it could overflow and/or give range errors
+at compile time) and also slow, because the compiler needs to insert
+instructions to range check and/or to remove excess bits.
+
+It would be better if it was computed as an uint/int.
+
+To avoid confusion, maybe the byte/int16 types should only be allowed
+in structs/arrays? (and maybe in function parameters).
+
+If calculations are done with higher bit-width, then there are some edge
+cases that need to be tested:
+
+ var byte u8
+ var wuint16 w16
+ var uint u
+
+ u = w16 = (u8 * u8 * u8)
+
+The multiplications could yield a larger number than "w16" can hold.
+In that case, "u" should still receive the non-truncated value.
+
+
+How other languages handle integer promotion
+--------------------------------------------
+
+* C:
+ - Promote to larger
+ - Promote to unsigned
+ - This leads to strange behaviour in mixed-signedness comparisons
+* Hare:
+ - Promote to larger (but limited for uintptr/size types)
+ - Mixed signedness is an error
+ - This also solves the comparison issue
diff --git a/notes/compilation_failure_testing.txt b/notes/compilation_failure_testing.txt
new file mode 100644
index 0000000..59e8c22
--- /dev/null
+++ b/notes/compilation_failure_testing.txt
@@ -0,0 +1,31 @@
+Compilation failure testing
+===========================
+
+With more advanced type systems and/or parameter constraints, it can be
+useful to be able to check that those are enforced correctly.
+
+That could be done with some kind of special test.
+There is apparently something like this being use in Rust,
+see nolife/counterexamples.
+
+Should something like this be added to SLUL?
+What should the syntax be like?
+
+Syntax 1: Extend long/nestable comments
+---------------------------------------
+
+ #{{nocompile
+ func test()
+ {
+ # ERROR: .*Invalid code.*
+ invalid
+ }
+ #}}
+
+Syntax 2: Special keywords
+--------------------------
+
+ noncompiling func test()
+ {
+ bad(invalid)
+ }
diff --git a/notes/deref_syntax.txt b/notes/deref_syntax.txt
new file mode 100644
index 0000000..eb3852d
--- /dev/null
+++ b/notes/deref_syntax.txt
@@ -0,0 +1,17 @@
+Syntax for pointer dereferencing
+================================
+
+Current syntax in SLUL:
+
+ deref p = 123
+ int y = deref p
+
+Pascal-like syntax:
+
+ p^ = 123
+ int y = p^
+
+Zig-like syntax:
+
+ p.* = 123
+ int y = p.*
diff --git a/notes/goto_local_labels.txt b/notes/goto_local_labels.txt
new file mode 100644
index 0000000..a91cbbe
--- /dev/null
+++ b/notes/goto_local_labels.txt
@@ -0,0 +1,27 @@
+Local goto labels
+=================
+
+This can be used to provide scope for goto labels and avoid accidentally
+jumping to an unrelated block.
+
+Syntax idea (borrowed from the GCC extension):
+
+ func f()
+ {
+ {
+ label skip
+ ...
+ goto skip
+ ...
+ skip:
+ ...
+ }
+ {
+ label skip
+ ...
+ goto skip
+ ...
+ skip:
+ ...
+ }
+ }
diff --git a/notes/numeric_operations_type.txt b/notes/numeric_operations_type.txt
new file mode 100644
index 0000000..53a760d
--- /dev/null
+++ b/notes/numeric_operations_type.txt
@@ -0,0 +1,28 @@
+Which type should numeric operations use?
+
+Currently, it works like this:
+
+ byte a = 250
+ byte b = 10
+ ...
+ byte x = (a + b) - 20 # this overflows, because a+b is computed as a byte
+
+And this is actually inefficient, because the operation would most likely be
+performed using 32 or even 64 bit registers. The exception is when the
+temporary variable for (a+b) is spilled to stack.
+
+Comparison to other languages
+-----------------------------
+
+C (from reading the ANSI C spec):
+ - performs operations as int or a larger type
+ - promotes to the largest operand (appears to not care about the result type)
+ - unsigned wins when there are operands with mixed signedness
+
+Rust (from reading random blogs):
+ - all integer literals need to have a suffix, e.g. u8
+ (except for the default, which I think is int?)
+ - all casts, even narrowing ones, have to be explicit
+
+FreePascal (don't remember this exactly):
+ - Promotes to Int64 in mixed-sign comparisons
diff --git a/notes/numeric_types.txt b/notes/numeric_types.txt
index 65aeae9..837d7d2 100644
--- a/notes/numeric_types.txt
+++ b/notes/numeric_types.txt
@@ -11,10 +11,10 @@ Definitely YES:
Yes:
int (= min(machine-word-size, 32), to avoid larger memory usage for arrays/structs on 64 bit platforms)
usize
- fileoffs
Maybe:
ssize
+ fileoffs
Maybe but unlikely:
intN/uintN/wintN for arbitrary N up to some limit
diff --git a/notes/simple_to_parallel.txt b/notes/simple_to_parallel.txt
new file mode 100644
index 0000000..c46bc59
--- /dev/null
+++ b/notes/simple_to_parallel.txt
@@ -0,0 +1,83 @@
+From simple to parallel
+=======================
+
+Single-threaded code have some advantages:
+
+* Can modify data in place without issues
+* Does not need synchronization
+* Overall, it can be lightweight and simple
+
+Can code be made to be usable in both simple single-threaded code
+as well as in multi-threaded code?
+
+(Note that single-threaded includes some forms of "shared nothing"
+parallelism).
+
+In-place mutation vs. copy vs. specialized multi-threaded code
+--------------------------------------------------------------
+
+(The latter might be using some kind of multi-threading capable data structure)
+
+Let's say that there is a function that appends "=true" to a string if it does
+not already end with "=true".
+
+This can be done in several ways:
+
+* In-place modification, if the string has sufficient capacity.
+* Allocating a new string and writing the result string there.
+* Using a thread-safe refcount to select the best strategy.
+* Using some kind of data structure that allows this operation
+ to work without copying. E.g. a chain of delta encoded updates
+ to the string. This could be done in a lock-free/opportunistic way:
+ - Find the last delta entry of the string.
+ - Parse the string, check if it ends with "=true".
+ - Generate a "delta entry" to append.
+ - Atomically swap the pointer in the last delta entry
+ from null to the newly generated delta entry.
+ - If the swap operation fails, restart the process.
+
+The idea
+--------
+
+Can we generate "combined single/multi-threaded" code from the same source
+code, and have the compiler (and/or runtime) select the appropriate code
+variant?
+
+Related: Combined code for mutator methods:
+1. pre-allocated result (with mandatory usage)
+2. pre-allocated result (with optional usage, i.e. input lifetime is at least as long)
+3. arena-allocated result
+4. shallow-ref in-place modification (with existing refs)
+5. deep-copy in-place modification (with copies of refs)
+
+For example, given source code like this:
+
+ func String.append_if_absent(String suffix) -> String
+ {
+ if this.ends_with(suffix) {
+ return this
+ } else {
+ return .concat(this, suffix)
+ }
+ }
+
+It might be transformed into:
+
+ func String.append_if_absent(arena, ref var String placement, String suffix) -> String
+ {
+ String result
+ transaction_start(arena, .[this])
+ do {
+ if this.ends_with(suffix) {
+ result = .copy_if_needed(arena, placement, this)
+ } else {
+ # .concat selects the requested method 1-5
+ result = .concat(arena, placement, this, suffix)
+ }
+ } while not transaction_commit(arena)
+ return result
+ }
+
+Apparently. something similar is done by Swift already and is called "value semantics".
+- ...and apparently, it can still cause unintended behavior when a value
+ gets copied and then silently discarded.
diff --git a/notes/slul2.txt b/notes/slul2.txt
new file mode 100644
index 0000000..9d103d0
--- /dev/null
+++ b/notes/slul2.txt
@@ -0,0 +1,340 @@
+SLUL2
+=====
+
+Making SLUL:
+* easier to use
+* easier to implement
+* more future-proof / more portable
+
+Desirable changes:
+
+* Revise ref
+ - forbidding refs to non-struct/non-array types might enable some optimizations
+ - *removing* explicit refs would definitely have impacts on usability.
+ not sure if good or bad. it makes the proglang more implicit/"magic",
+ which can be a bad thing.
+* Implicit arenas?
+ - and for mutating methods, use the 4 "allocation variants": (inspired by Vale)
+ 1. placement new (uses arena only for "indirect"/references fields)
+ 2. arena new (allocate in given arena)
+ 3. modify self (in-place modification. Can keep referenced data)
+ 4. discarding self-mod (in-place modification. Cannot keep referenced data)
+ some of these are applicable to constructors also
+* Garbage-collected areanas?
+ - I.e. local GC
+ - It would remove "anxiety" around memory allocation
+ - Downside 1: It requires meta-data, which isn't needed with plain
+ arena allocation.
+ - Downsides N: The usual downsides with GC
+* Revise expr integer types
+ What are the use-cases for "non-plain ints"?
+ - length type / ssize/usize
+ - byte/int16 arrays
+ - small/bitsliced fields in structs
+ - fixed-size wrapping uints (e.g. for hash functions).
+* Move some stuff from hard-coded syntax to code? e.g. like Scheme, REBOL, Nim.
+ - might actually work with statements:
+ - they can only appear inside function bodies
+ - so the toplevels are available, and their types are known.
+ - But it would definitely need inlining to work with reasonable speed.
+ - Perhaps a bad idea after all?
+* Misc syntax stuff:
+ - Use tabs instead of spaces?
+ But this gets tricky with alignment of e.g. parameters.
+
+
+Control statements defined in library module headers
+----------------------------------------------------
+
+These need some kind of analysis of the IR to check varstates (liveness, etc).
+It also needs to handle nested if-elseif-elseif...
+
+So perhaps this is a bad idea?
+
+Example:
+
+ statement "if" Expr cond Statements true_block "else" Statements false_block
+ {
+ cond
+ CONDJUMP FALSE false_block
+ true_block
+ JUMP end
+ false_block
+ end
+ }
+ statement
+
+Super-simple proglang
+---------------------
+
+Only two/three kinds of typedefs. Not allowed as anonymous types (or? it is useful for e.g. return values)
+
+ record SomeStruct {
+ int field1
+ OtherType field2 # <--- compiler chooses whether to put in ref or not
+ # this makes FFI trickier. But non-closed types are always refs.
+ # this also makes lifetimes and aliasing trickier.
+ # maybe "var" should not be allowed to alias?
+ }
+
+ enum SomeEnum {
+ ...
+ }
+
+ # Maybe some kind of sum/union/variant type
+ record ExprNode {
+ ExprType type
+ int line
+ int column
+ switch type {
+ case .unary
+ case .binary
+ Expr operand_a
+ if type == .binary {
+ Expr operand_b
+ }
+ case .call
+ Expr func_expr
+ int num_args
+ # have a built-in list type?
+ # and choose the best possible representation?
+ # (in this case it's runtime-determined frozen-length, so it could be a pointer to an array. or a full-blown list type)
+ int[num_args] args
+ }
+ }
+
+ # Maybe some kind of constraints
+ func process_op(ExprNode<.type in (.unary, .binary)> expr)
+ func process_op(ExprNode expr [.unary, .binary])
+ func process_op(ExprNode(.unary .binary) expr)
+ func process_op(ExprNode<.unary .binary> expr)
+ func ExprNode<.unary .binary>.process_op()
+ func ExprNode.process_op()
+ for (.unary, .binary)
+ func ExprNode.process_op()
+ with (.unary, .binary)
+ func ExprNode.process_op()
+ this in (.unary, .binary)
+ func ExprNode.process_op()
+ given type == .unary or type == .binary
+
+Qualifiers for records and enums:
+
+ record Point closed { # (require a newline here?)
+ int x
+ int y
+ # no more fields can be added. allows some optimizations, such as call-by-value / embedding into structs
+ }
+
+ enum SubPixel closed {
+ .red
+ .green
+ .blue
+ }
+
+Enums can have a base type and/or integral values also
+(this is mainly useful for FFI)
+
+ enum StatusByte closed byte {
+ .ready = 10
+ .running = 20
+ .failure = 90
+ }
+
+Integer / elementary types:
+
+* Perhaps even use variable-size integers?
+ The downside is that += 1 etc. might require allocation.
+
+Methods:
+
+* Skip "this". But disallow shadowing.
+
+Type identifiers
+
+* For consistency, always include the "." in typeidentifiers, even in
+ e.g. enum definitions.
+* Constructors are maybe not that intuitive (can they be improved?):
+
+ func .new(int a, int b) -> Thing
+
+
+Avoiding punctuation:
+
+* Can the . in typeidentifiers be skipped?
+* Can the () in function calls be skipped?
+ - if the function call fits on one line
+ - (unless a comma is required between them) and the parameters are terms
+ - and the function call is not nested inside
+ a function call, field or index expression.
+ - related: tuples. but that would be ambiguouos if used as function arguments
+* Can the () in function declarations be skipped?
+
+ func example
+ int a
+ int b
+ return bool
+ {
+ if a == b {
+ otherfunc a, 123
+ return true
+ }
+
+ }
+
+Can refs be avoided?
+
+ # objects:
+ # These are always passed by reference.
+ # References can be compared with "ref_is" or "is" or a similar operator.
+ # The "==" and "!=" operators are not allowed (maybe it should be allowed to implement them? e.g. with a method called "equals"?)
+ type Box = object {
+ # These are references:
+ Item a
+ Item b
+ # Perhaps allow syntax like this:
+ Item a1, b1
+ Item a1, Item b1
+ # Regarding tuples:
+ # I think that maybe they CAN be references if it too large to use values :)
+ # - We can require that if the object is mutable, it must also be passed by arena-ref.
+ # - Tuples up to some certain size could be embedded / passed by value
+ # (Check the optimal limit. It's at least the size of two pointers, but it could be larger)
+ # - Tuples allocated in the *same* arena can just be referenced directly!
+ # (this should be fairly simple and fast to check).
+ # - If each thread uses a contiguous virtual-memory block,
+ # then this would be a trivial range check.
+ # - Tuples allocated by the same thread and in SLUL code, can
+ # (as an optional optimization) be referenced if
+ # 1) the lifetime allows it (how to check this at runtime?), or
+ # 2) the runtime uses garbage collection, and can perform GC in
+ # this case.
+ # - Tuples allocated in SLUL code from other threads may or may not
+ # be possible to reference depending on whether the runtime
+ # supports cross-thread GC. For consistency accross implementations,
+ # it might be better to just re-allocate/copy in this case.
+ # - Other tuples would require a copy. (This is really a requirement
+ # for tuples allocated from C code, unless it uses SLUL's arena
+ # allocation functions in slulrt.)
+ #
+ LargeValue large
+ }
+ # opaque objects:
+ # - Like objects, but fields (and layout/size) are inaccessible
+ # - Lacks {} and has the layout defined in the impl, just like a function can have it's body in the impl
+ # - Perhaps it should be forbidden to have non-opaque objects in interfaces? It's generally an anti-pattern.
+ type Item = object
+ # tuples:
+ # - The ABI decides when to pass these by ref or value
+ # - Reference comparison operators are not allowed.
+ # - The contents can be compared with the "==" and "!=" operators.
+ # - Tuples can't be opaque/private.
+ type Point = (int x, int y)
+ type Point = (int x, y) # perhaps allow this syntax as well (...and multi-line syntax without comma also)
+ type LargeValue = ([10000]byte buffer)
+ # For type-scoped functions that return an object ("constructors"):
+ # - They implicitly take an arena parameter
+ # - The returned reference is an arena reference
+ func .new() -> Box
+ constructor Box.new() # maybe type-scoped functions should have this syntax?
+ # Return values in methods have the same lifetime as the object itself
+ # - Should the this parameter be "var"?
+ # - Should the this parameter be "arena"?
+ func Box.get_contents() -> Item
+ # Parameters do not implicitly transfer ownership. Inside the callee, the lifetime of "other" ends when the function returns.
+ # - Should the parameter be "arena"?
+ func Box.equals(Box other) -> bool
+ # Parameters can be marked with "keep" to allow shared ownership
+ func Box.set_contents(keep Item contents)
+ func Box.set_contents!(keep Item contents) # perhaps there should be a ! for functions that modify the object?
+ func var Box.set_contents(keep Item contents) # or a qualifier like this.
+ # Parameters can be passed as "var"
+ # - if passed as "keep", we need exclusive access (or the item can be marked as aliased)
+ fucn Box.squeeze_item(keep var Item item)
+ fucn Box.squeeze_item(keep aliased var Item item)
+ # To have mulitple outputs from a function, use a tuple as the return value:
+ # - The ABI decides when to pass these by ref (implicit parameter) or value
+ # - Because tuples can't be opaque, the return value could be returned by value.
+ # - Because tuples can't be opaque, the *caller* allocates (on stack) if it's not possible to pass by value.
+ func Box.get_both_items() -> (Item a, Item b)
+
+ # How should function references work?
+ # - What keyword to use when there are no refs?
+ # - Most of the time, you want a context-parameter
+ # - For non-ref (or slot) types, you may want a (reference, length) to process multiple items at once.
+ func Box.process_contents(delegate(Item item) handler)
+
+Should the builtin types use TitleCase names also?
+
+ Probably yes, for consistency.
+
+ String
+ Byte
+ Int16
+
+ Java developers might confuse these with reference types, though :(
+ And worse, as a Java developer, you might start using e.g. Byte
+ where it should be byte in your Java code. That will usually silently
+ compile without any warnings, but can be broken (with == != operators)
+ or slow.
+
+ Solution:
+
+ Use different names:
+ - byte -> UInt8 or U8
+ - int -> Integer (or even skip this type, and have only fixed-sized Int*/UInt*)
+
+ Regarding the extra finger strain to hold shift and stretching out the
+ finger to push the letter button: That could be solved by having the IDE
+ auto-capitalize the type if it exists and a type is expected at the given
+ location.
+
+
+Should/can there be a arbitrary-sized integer type?
+
+ E.g. allow integers -16384..16383 to be stored directly, and use a
+ reference for larger integers.
+
+ What should it be called?
+
+ num
+ int
+ intn
+ integer
+ BigNum
+ BigInt
+ Num
+ Int
+ IntN
+ Integer
+
+ The compiler could optimize it to a more efficient type if the range
+ is known!
+
+ num i = get_number() # since it is immutable, we can infer the type from the return value of get_number()
+
+ Maybe it should be possible to specify a range? What syntax to use?
+
+ var num<0..=10> i = 0
+ var num<0 upto 10> i = 0
+ var num i [[0 <= value <= 10]] = 0
+ var num<0-10> i = 0 # but "-" is also the minus operator :(
+ var num<0~10> i = 0
+
+Print function:
+
+ How simple can it be, without creating confusion/problems or hard-coding things?
+
+ out.print("number: {}", .[123]) # array constructor
+ out.print("number: {}", 123) # safe variant-type var-arg
+ out.print "number: {}", 123 # allowing () to be skipped (in some cases)
+ out "number: {}", 123 # allowing a default function on objects
+ out("number: {}", 123) # allowing a default function on objects, but without allowing () to be skipped
+ # error handling?
+
+ Input streams could also have a default function.
+ But it would be limited to only reading e.g. a line.
+ (That's probably what iterators should do as well.)
+ What should it do on error?
+
+ string s = in() # reads a line
diff --git a/notes/usability_improvements.txt b/notes/usability_improvements.txt
index 1da5546..5935860 100644
--- a/notes/usability_improvements.txt
+++ b/notes/usability_improvements.txt
@@ -657,3 +657,69 @@ Perhaps some interactive functions that can work either in a GUI or in a CLI:
This prevents both security and portability issues.
- Similarly, strange Unicode characters (RTL, control, etc)
should be replaced with replacement characters.
+
+Avoid special characters: Module headers
+----------------------------------------
+
+Instead of \ for module headers:
+
+ \slul 0.0.0
+ \name test
+ \version 0.1.0
+
+It could be a ":" at the end:
+
+ slul: 0.0.0
+ name: test
+ version: 0.1.0
+
+(Maybe some module headers should be renamed to better
+work as "attribute:" rather than "\directive")
+
+
+Avoid special characters: Type identifiers?
+-------------------------------------------
+
+Can this be done at all? Is it a good idea?
+
+Currently:
+
+ ref Thing t = .new(arena)
+ t.set_type(.a)
+ t.set_flags(.visible=true, .enabled=false)
+
+Could/should the dots be skipped?
+
+ ref Thing t = .new(arena)
+ t.set_type(.a)
+ t.set_flags(.visible=true, .enabled=false)
+
+
+arena-refs vs non-arena refs
+----------------------------
+
+This could be confusing, because no major language has arenas.
+
+ ref Thing t1 = ...
+ arena Thing t2 = ...
+
+
+Add tuple type and disallow it to contain certain types?
+--------------------------------------------------------
+
+A tuple type could be useful for e.g. multiple return values:
+
+ func Thing.do_stuff() -> (int x, int y)
+
+Tuples:
+
+* Can be initialized with or without (e.g. (1,0)) field names
+* Can be compared
+* Can't contain funcrefs
+* Can't contain structs directly
+* Can't contain arrays of funcrefs/structs
+
+Structs:
+
+* Can only be initialized with field names, e.g. (.x=1,.y=0)
+* Can't be compared?