aboutsummaryrefslogtreecommitdiffhomepage
path: root/notes/source_locations.txt
blob: df75cf0b4e372be6a4a442dd5da94f40d7e0b5dc (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71

What items can be used in error messages AFTER parsing, i.e.
which source locations do we need to track?
(during parsing we know the current source location)

These need to be tracked:
* Expressions (including identifier references)
* Declared identifiers
* Control statements
Alternatively, track the following - this is method 6:
* Declarataions (types/data/functions)
* Statements (these could then have relative line/column counting also)
* RPN (also using relative counting)

-----

How to store them?
1) Store directly in expression/declaration:
    uint32 line
    uint32 column
    const char *filename
    (96/128 bits = 12/16 bytes)
2) Store somewhat compressed, but still directly in expression/declaration:
    uint32 line
    uint16 column
    uint16 file_id
    (64 bits = 8 bytes)
3) Store line and column only, and detect file from arena
    uint32 line
    uint32 column
    (64 bits = 8 bytes)
4) Store an index to some array (separate arrays for file-local stuff and symbols visible outside of the file)
    uint32 index
    (32 bits = 4 bytes)
    array of bytes:
        0xxx xxxx Increase line number
        1xxx xxxx Increase column number
        0000 0000 Change of file (following bytes are some kind of file identifier or pointer)
        Two repeated increases of the same type indicate that the following bits are high(er) order bits.
    (usually 8-16 bits  = 1-2 bytes)
    Total: 5-6 bytes
5) Store subexprs as an array in variable-length RPN format and encode it as follows:
    (it would be nice if the array could be stored separately)
    First:
    0ccc cccc             signed change of column number
    1ccc cccc  llll llll  signed change of column and line number
    0000 0000  int16 column int16 line
    1111 1111  int32 column int32 line
    (usually 8-16 bits = 1-2 bytes)
    Then the operator/terminal type follows.

    Declarations and top-level exprs still need full (absolute) line/column info.

6) Store an uint32/uint32 line/column for each declaration
   (should the filename be detected from the arena?)
   For each statement have an uint16 as follows:
        int8 start_line_increment  (relative to previous statement)
        uint8 start_column
   If start_column is 0, read an (uint32,uint32) tuple after the statement*
   For each subexpr have an uint32 as follows:
        uint8 
   * This requires that we detect "end of arena block" when reading, and then
     continue from the next arena (which must then start with the tuple OR
     there must reserved space for a pointer at the end of the arena to the
     next location).
   *2 If we use the hack above, then we could as well skip pointers have store
     everything as a stream of bytes.
   *3 And in that case, we could dump those streams to disk as a target and host
     independent format, e.g:
        uint32 offset to implementations
        ...declarations...
        ...implementations...