notes/compact_expr_encoding.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33


Compact encoding of expressions
===============================

First byte:

    7 bits: token type
    1 bit:  long encoding

Second byte:

    6 bits: column advancement (if line adv != 0, then from start of line)
    2 bits: line advancement

Third-fourth byte:

    16 bits: string table ID
             (different namespaces for:
              types/methods/fields/locals/constructors/typeidents/strings)
    - The IDs could be sorted such that the most common ones have low
      (16 bit) IDs.
    - This is a string, not a bound symbol, so the same ID might be used
      for different symbols/variables in different contexts!
    - With a 32-bit ID, it would be possible to have an offset into a string
      tables. BUT constructing such a string table (with immutable offsets)
      might be inefficient.

Long encoding
-------------

    7 bits: token type
    1 bit:  long encoding (= 1)
    8 bits: column number (0 = too long)
   16 bits: line number (0 = too long)