aboutsummaryrefslogtreecommitdiffhomepage
path: root/CODE_OVERVIEW.md
blob: 73f551c0653dc5d00289e4bd9a53b7b074ce883d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242

Code Overview
=============

Directory Overview
------------------
The source directories:

    src-backend     - The backend part. Also called CSBE (CSlul BackEnd)
    |-- codegen     - The code generators (e.g. aarch64)
    \-- outformat   - The output formats (e.g. ELF)
    src-common      - Common code for unit testing etc.
    src-cslul       - The compiler frontend (parser/analyzer/ir-gen) and CLI.
    \-- winlibc     - Minimal libc for WINE/Windows.
    src-runtime     - The runtime library for SLUL programs.

The directories with test SLUL source code:

    errortest   - Tests of errornouos syntax.
    testexec    - Test SLUL application and libraries.

In addition, the source directories contain `unittest` sub-directories
with a `test_*.c` corresponding to each `.c` source file.

Miscellaneous directories:

    misc        - Miscellaneous files
    |-- icons   - Very ugly icons. Should be replaced/improved :)
    \-- syntax  - Syntax highlighting definitions.
    notes       - Various random notes. Some may be irrelevant/obsolete.
                  or just some crazy ideas that will never be implemented.

The backend - CSBE
------------------
CSBE is a minimalistic self-contained compiler backend. It has the
following functionality:

* Functions for constructing an Intermediate Representation (IR)
* Machine code generation from the IR. Currently only aarch64.
* Output file generation from the machine code. Currently only ELF.

CSBE differs from other code generators (LLVM, QBE, ...) in these ways:

* There is no/minimal optimization. The goal is simplicity.
* The IR is *fully* architecture-neutral.
* There is no need for a toolchain/linker.
* There is no need for a sysroot. The IR has enough information
  to generate symbol imports, without any external information
  (i.e. libraries *don't* have to be installed at compile-time).
* **It is in a very, very early development phase :)**

Public header files (note that the API is **not** stable yet):

    include/csbe.h      - Public functions used by the frontend.
    include/csbe_ops.h  - Definitions of the IR operations.

Internal header files:

    csbe_internal.h     - Non-static internal functions and types.
    codegen/codegen_common.h - Helper functions for the codegen.
    outformat/outformat_common.h - Helper functions for ELF/PE.


The compiler frontend
---------------------
The compiler works in the following steps:

1. `main.c` parses the command line options, and initializes a
   compilation context object, `struct CSlul`.
2. The compilation stages are handled in `build.c`.
3. `mhtoken.c` / `mhparse.c` parse the "module header" lines in `main.slul`.
4. The main parsing (`token.c` / `parse.c`) is then done as follows:
    * If the module turns out to be an *application*,
        * The code in `main.slul` and any `\source` files are parsed
          into a AST (Abstract Syntax Tree).
    * If the module turns out to be a *library*:
        * The interface in `main.slul` is parsed to an AST.
        * The implementation in `\source` files is parsed. This is
          done in a separate AST.
    * Identifiers are "created" when they are first encountered during
      parsing. This includes references, not just definitions!
      Identifiers defined in a different AST (struct TopLevels) will
      be bound later, in the semantic verification phase.
5. For each dependency:
    * The currently installed version of each dependency is parsed
      (only the interface). Each one gets a separate AST.
    * Note that all interface dependencies must be specified in the module
      being compiled. So there isn't any need to check handle recursive
      dependencies.
6. Semantic verification begins (see `cslul_ll_start_phase` in `context.c`)
    * `tlverify.c` binds identifiers to definitions in interfaces of
      libraries.
    * `tlverify.c` verifies declarations. Type definitions are verified
      by `typechk.c`.
    * `tlverify.c` calls `check_funcbody` in `funcchk.c` on each
      function body. This will also check that variables are
      assigned before use, etc.
    * Expressions are verified by `exprchk.c`.
    * Type compatibility is checked by `typecompat.c`.
7. `ir.c` generates IR from the AST(s).
8. `bwrapper.c` asks the backend (CSBE) to generate output file
    contents.

Public header file (note that the API is **totally UNstable**):

    cslul.h     - The interface used by main.c to perform compilation.

Internal header files:

    ast.h       - Structures in the AST
    backend.h   - Functions in bwrapper.c, that then calls CSBE.
    defaults.h  - Defaults paths on POSIX platforms (not used on Windows).
    errors.h    - Compiler error codes + messages.
    hash.h      - Pre-computed hashes of SLUL keywords.
    internal.h  - Non-static internal functions and types.
    tokencase.h - "switch/case groups" of tokens.


The runtime library - `libslulrt.so`/`slulrt.dll`
---------------------------------------------
The runtime library will contain the following functionality:

* Initialization of the SlulApp object and the root arena.
* Management of arenas.
* Wrappers around memcpy and memcmp.
* String functions.
* Maybe lists functions also.
* System functions (e.g. file I/O, network functions, etc.)

Public header file:

    include/slulrt.h  - Definitions for accessing slulrt from C


The Makefile
------------
The makefile supports common Makefile variables such as DESTDIR, prefix,
srcdir, etc. See `notes/build_defines.txt` for a summary. There are some
system-specific makefiles, e.g. `Makefile.bsd`, that set some appropriate
variables for the given system and then include the main makefile.

A set of fast tests can be run with:

    make -s -j4 check

If you have TCC installed, you can run (most of the) tests with
bounds-checking enabled:

    make -s tcc-boundscheck

The tests can be run with Valgrind (use VALGRIND_OPTS=... to set options):

    make -s -j4 check-valgrind

A full check + scan, using several analysis tools, can be run. This can take
over 30 minutes on slow devices.

    make -s -j4 scan-all

If running `make` outside the source root directory, you need to use
either the `-C` option or set `srcdir`. If the source and build directories
are different, you need to run `make outdirs` before running any other
make commands. Examples:

    # Using -C
    make -s -j4 -C .. check
    # Using srcdir
    make -s srcdir=/home/user/Code/slul outdirs
    make -s -j4 srcdir=/home/user/Code/slul -f /tmp/slul/Makefile check


Appendix: Descriptions of all .c files
--------------------------------------

Note that all `unittest/test_*.c` files are omitted. Those are all tests of
the corresponding `.c` file in the parent directory.

This listing can be generated with `make -s source-overview`.


In src-backend:

    analyze.c -- IR analysis functions
    datastruct.c -- Functions for creating CSBE data structures
    init.c -- Initialization for CSBE
    output.c -- Output generation

In src-backend/outformat:

    elf.c -- ELF file handling
    outformat_common.c -- Common functions for ELF/PE output
    raw.c -- Raw output format. Used to dump a textual IR

In src-backend/codegen:

    aarch64.c -- Code generator for Aarch64
    codegen_common.c -- Common functions for the code generators
    irdump.c -- Dumps IR in text form
    x86.c -- Code generator for i386 and x86_64

In src-cslul:

    arch.c -- Handling of targets/multiarch
    arena.c -- Arena allocator
    build.c -- The main build function
    builtins.c -- Sets up built-in definitions
    bwrapper.c -- Wrapper around the backend (CSBE) and IR generator
    chkutil.c -- Utility functions for the semantic checker
    config.c -- Functions for configuring compilation contexts.
    context.c -- Handles compilation context state
    errors.c -- Builds table for error message strings
    exprchk.c -- Expression checker
    funcchk.c -- Checking of function bodies
    ir.c -- Generates Intermediate Representation (IR)
    main.c -- Entry point for CSLUL compiler
    mhparse.c -- Parsing of module headers
    mhtoken.c -- Tokenization of module headers
    misc.c -- Miscellaneous functions
    parse.c -- Parsing of a token stream to an AST
    platform.c -- Platform dependent code
    print_hashes.c -- Generates pre-computed hashcodes for hash.h
    tlverify.c -- Verifies top-level symbols
    token.c -- Tokenization of source
    tree.c -- AVL tree map for hashed items
    typechk.c -- Type checker
    typecompat.c -- Type compatibility checker

In src-cslul/fuzz:

    aflmain.c -- Special entry point for fuzzing C-SLUL

In src-cslul/testgen:

    testgen.c -- Generates a very large source file with random functions

In src-cslul/winlibc:

    winlibc.c -- Basic (incomplete) C library for Windows with UTF-8 support

In src-runtime:

    rtarena.c -- arena management functions
    rtinit.c -- Initialization functions for the runtime