Language
The language understood by the com.io7m.jsx
package can be defined by the following grammar:
The
com.io7m.jsx lexer
recognizes the sequences
U+000D U+000A
and
U+000A as line terminators for the
purposes of tracking line and column numbers for diagnostic messages. The
lexer does not permit bare
U+000D
characters to appear outside of
quoted strings.
Many systems that parse S-expressions allow for the use of square
brackets to increase readability. For example:
Note that the second version that uses square brackets is slightly
easier to understand due to the square brackets more clearly indicating
which of the nested lists are being terminated. The
com.io7m.jsx lexer can
optionally treat U+005B [ and
U+005D ] as tokens, and the parser
ensures that the use of the brackets is balanced with respect to
ordinary parentheses as part of the grammar above. If square brackets
are enabled, the language understood by the
com.io7m.jsx package is
defined by expression_squares.
Otherwise, the language is defined by
expression.
The terminals of the language are given by:
Due to limitations in the
EBNF
format, the definitions for
symbol_character
and
quoted_character cannot be expressed
directly.
Informally, the
symbol_character
rule should be understood to specify any Unicode character that is not
whitespace,
is not
U+0028 (,
is not
U+0029 ),
and is not
U+0022 ".
If
square brackets are enabled,
the
symbol_character_squares
rule should be understood to replace the
symbol_character
rule.
The quoted_character rule should
be understood to specify any character that is not
not U+0022 ".
Quoted strings may contain escape codes
that are transformed to specific characters during lexing.