aboutsummaryrefslogtreecommitdiffhomepage
path: root/notes/buildfile_format.txt
blob: 0fae4417f785f7fa3c51f0e32d4a1fa28448d877 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90

Many formats are quite complex and have syntax-based typing. E.g. YAML, TOML

INI-like formats are simple but different parsers have different limitations:

* spaces in [sections] might not be allowed
* properties and [sections] may be case insensitive
    - some implementations are, some aren't.
* escaping special characters in keys is usually not possible
* duplicate properties might be disallowed
    - GLib merges them!
* duplicate sections might be disallowed (good, would just be confusing anyway)
* leading and trailing spaces seem to be removed, but spacing around = seems to be implementation dependent (but I think most remove it)
    - https://cloanto.com/specs/ini/ allows spacing around the =
* some specs disallow non-ASCII property names. Some probably use latin1?
* in keys, many special characters may be problematic:  . " \ _ etc.
    - but . appears in filenames!

* are properties without = allowed???
    - in some formats yes
    - in libconfini, these are called "implicit keys"


SLUL format:
* Avoid spaces in [sections]
* lowercase or Capitalized?
* multiple word sections? multi_word or MultiWord?
* Comment character, both # ; or only one?
* Disallow comments in values? Would require escaping.
    - but on the other hand, many libraries parse them,
      so allowing #; in values will break those libraries
      (and only seldom, which is even worse)
    - https://cloanto.com/specs/ini/ seems to only allow ; comments after a space or at the start of a line
      (which is good to enforce if comments are allowed)
    - the same applies to quotes... perhaps " should simply be disallowed in values?
    - also, warn about repeated spaces in values? some implementations remove repeated spaces
* only allow lowercase filenames in files and properties
    - also solves issues when transferring files between case
      sensitive/insensitive file systems.
    - CS -> CI => duplicate files issue
    - CI -> CS => wrong case -> file not found
* only allow a-z0-9_ in file names, excluding file extension
    - can't lowercase otherwise (it is tricky with Unicode, and locale dependent)
    - spaces and special characters are not allowed by all ini parsers
      (e.g. git-config does not seem to allow underscores)
    - filesystems have restrictions on the allowed characters. don't want portability issues.
    - strange Unicode characters could be a security issue (e.g. lookalikes, or RTL)
    - special characters could be a security issue (e.g. quotes, control characters, etc.)
    - some INI parser implementations parse stuff before a . as a section,  [may be a non-issue, since those usually merge the section+key into a string]
      so perhaps the file extension should be omitted?
      but this is bad for usability/discoverability (i.e. what the sections should contain)
* Encoding?
    - UTF-8 but only allowed in comments
    - Exception for "author", "description" etc.
    - If UTF-8 is allowed, then we need to restrict control characters, RTL etc.
      (control characters need to be limited regardless. don't want to allow NUL characters for example)
    - If we move out translations and author names, then we could require all data to be ASCII
* Print warning if BOM is present
* Line continuations?
    - Needed if lines can get long
    - But the best would be if long lines can be avoided
    - Regardless, it makes sense to have length limits.
      (for example, a 1MB module name is only a problem)
        - 50 ASCII characters for names/identifiers/filenames
        - Values that allow UTF-8 should have a limit in bytes
        - 50 bytes for ALL values??
    - If disallow, we should report errors when possible (e.g. indented key, invalid symbol in value)
* Internationalization?
    - Multi-language support for properties?
    - GLib uses Property[xx]=Value for this
* Avoid duplicate keys.
    [dependencies]
    gtk=gtk2 or gtk3
    # versions can be specified
    gtk3=3.2
    somelib=any
    # if a version starts with a letter, write it like this:
    somelib=somelib A1.1
    # syntax examples:
    #       lib1=1.0 or lib1x 1.0
    #       lib2=lib2 1.0 or lib2x 1.0
    # and maybe:
    #       lib3=lib3 1.0 !1.1-1.1.5
    #       lib3=lib3 1.0 (not 1.1-1.1.5)
    # the same library name may not be specified (explicitly or implicitly) twice
    #
* Disallow duplicate sections
* The most important properties should have = in them, so they can be parsed
  by tools that might use some library that requires them.
    - file names don't need this. those should only be used by the compiler
    - names, versions, urls might need it.