What shoud SLUL do? =================== SLUL will use "UTF-8 everywhere", but will allow broken encodings / WTF-8, and only show replacement characters on display (and filter/reject on validation, e.g. a "is_valid_email" function might reject invalid UTF-8). This could be called "(Possibly broken) UTF-8 everywhere". For code that really needs some other encoding, there could be an escape hatch. See below. Details ------- A combination of the following could be both simple and powerful: * byte strings for everything that does not care about encoding * "possibly broken UTF-8" for everything that is unlikely to be or to support other encodings (e.g. text in a user interface, filenames [which can be WTF-8]) * Maybe some kind of encoding override and/or storage of the original encoding. This does not have to be as fast as UTF-8. Code can be "capable of non-UTF-8" and use this data, or it can be assume "UTF-8 everywhere" and just see the UTF-8 bytes (which can either be generated on the fly, or be stored in the string). And this little modification makes things much more powerful: # in pseudo-code: type StringChunk = struct { # Most code will only support "possibly broken UTF-8" # Should it be possible to omit this, and generate UTF-8 on the fly? usize len byte[len] s # Code that does more advanced text operations # can choose to support loss-less access to the original encoding ?ref OrigEncChunk orig_encoding } type OrigEncChunk = struct { usize len ref OrigEncoding byte[len] s } type OrigEncoding = struct { ?ref CodePointType codepoint_type ?ref GraphemeType grapheme_type ?ref PhonemeType

phoneme_type # "semanteme" = semantic unit ?ref SemantemeType semanteme_type # TODO "slot var" type! i.e. qualifiers inside slot types! func pipe to_codepoint(Iterator b, ref var C codepoint) func pipe to_grapheme(Iterator b, ref var G grapheme) func pipe to_phoneme(Iterator b, ref var P phoneme) } Further reading --------------- https://thephd.dev/the-c-c++-rust-string-text-encoding-api-landscape https://ztdtext.readthedocs.io/en/latest/design.html https://ztdtext.readthedocs.io/en/latest/design/lucky%207.html - type: code unit - type: code point - max u/p and p/u - function: encode (code points -> code units) - function: decode (code units -> code points) - translation between encodings if possible if they use the same code point type