Many formats are quite complex and have syntax-based typing. E.g. YAML, TOML INI-like formats are simple but different parsers have different limitations: * spaces in [sections] might not be allowed * properties and [sections] may be case insensitive - some implementations are, some aren't. * escaping special characters in keys is usually not possible * duplicate properties might be disallowed - GLib merges them! * duplicate sections might be disallowed (good, would just be confusing anyway) * leading and trailing spaces seem to be removed, but spacing around = seems to be implementation dependent (but I think most remove it) - https://cloanto.com/specs/ini/ allows spacing around the = * some specs disallow non-ASCII property names. Some probably use latin1? * in keys, many special characters may be problematic: . " \ _ etc. - but . appears in filenames! * are properties without = allowed??? - in some formats yes - in libconfini, these are called "implicit keys" SLUL format: * Avoid spaces in [sections] * lowercase or Capitalized? * multiple word sections? multi_word or MultiWord? * Comment character, both # ; or only one? * Disallow comments in values? Would require escaping. - but on the other hand, many libraries parse them, so allowing #; in values will break those libraries (and only seldom, which is even worse) - https://cloanto.com/specs/ini/ seems to only allow ; comments after a space or at the start of a line (which is good to enforce if comments are allowed) - the same applies to quotes... perhaps " should simply be disallowed in values? - also, warn about repeated spaces in values? some implementations remove repeated spaces * only allow lowercase filenames in files and properties - also solves issues when transferring files between case sensitive/insensitive file systems. - CS -> CI => duplicate files issue - CI -> CS => wrong case -> file not found * only allow a-z0-9_ in file names, excluding file extension - can't lowercase otherwise (it is tricky with Unicode, and locale dependent) - spaces and special characters are not allowed by all ini parsers (e.g. git-config does not seem to allow underscores) - filesystems have restrictions on the allowed characters. don't want portability issues. - strange Unicode characters could be a security issue (e.g. lookalikes, or RTL) - special characters could be a security issue (e.g. quotes, control characters, etc.) - some INI parser implementations parse stuff before a . as a section, [may be a non-issue, since those usually merge the section+key into a string] so perhaps the file extension should be omitted? but this is bad for usability/discoverability (i.e. what the sections should contain) * Encoding? - UTF-8 but only allowed in comments - Exception for "author", "description" etc. - If UTF-8 is allowed, then we need to restrict control characters, RTL etc. (control characters need to be limited regardless. don't want to allow NUL characters for example) - If we move out translations and author names, then we could require all data to be ASCII * Print warning if BOM is present * Line continuations? - Needed if lines can get long - But the best would be if long lines can be avoided - Regardless, it makes sense to have length limits. (for example, a 1MB module name is only a problem) - 50 ASCII characters for names/identifiers/filenames - Values that allow UTF-8 should have a limit in bytes - 50 bytes for ALL values?? - If disallow, we should report errors when possible (e.g. indented key, invalid symbol in value) * Internationalization? - Multi-language support for properties? - GLib uses Property[xx]=Value for this * Avoid duplicate keys. [dependencies] gtk=gtk2 or gtk3 # versions can be specified gtk3=3.2 somelib=any # if a version starts with a letter, write it like this: somelib=somelib A1.1 # syntax examples: # lib1=1.0 or lib1x 1.0 # lib2=lib2 1.0 or lib2x 1.0 # and maybe: # lib3=lib3 1.0 !1.1-1.1.5 # lib3=lib3 1.0 (not 1.1-1.1.5) # the same library name may not be specified (explicitly or implicitly) twice # * Disallow duplicate sections * The most important properties should have = in them, so they can be parsed by tools that might use some library that requires them. - file names don't need this. those should only be used by the compiler - names, versions, urls might need it.