bshell/doc/sample/syntax.bshell

# The lexer has three modes: ARITHMETIC, COMMAND, and STRING
# ARITHMETIC mode is operand-based, all symbols, keywords, and constant parsing
# is enabled.
# COMMAND mode is word-based, only a subset of symbols are enabled, no keyword
# or constant parsing is performed, and more liberal word formations and
# substitutions are allowed
# STRING mode is used to read string literals (i.e. those strings that DON'T
# support variable substitutions). All chars read are appended to the resulting
# string, with no further parsing performed.

# Initially, the lexer mode is unspecified, until:
#  a) The lexer reads a character, from which the correct mode is deduced.
#  b) The parser manually switches the lexer's mode
# Lexer state supports nesting.

# ARITHMETIC
# both of these are equivalant
$a = 2
# VAR(a)
# SYMBOL(=)
# INT(2)

$b=4
# VAR(b)
# SYMBOL(=)
# INT(4)

# ARITHMETIC
# this is a syntax error (there should be an operator between the two vars)
$a$b
# VAR(a)
# VAR(b)

# When the parser encounters SYMBOL(%) it should switch the lexer to COMMAND
# mode, which will allow the following word construction to be used.
# this executes the command whose name is equal to concatenating the values
# of $a and $b (in this case, '24')
% $a$b
# SYMBOL(%)
# WORD_START
# VAR(a)
# VAR(b)
# WORD_END

# executes the command with the name 'a+2b'. because the first char encountered
# by the lexer is alphabetic, it reads a regular word in COMMAND mode.
a+2b
# WORD(a+2b)

# executes the command with the name '-no$a' ($a is not substituted).
# the first char encountered is a symbol, which is read as a word in COMMAND
# mode
-no$a
# WORD(-no)

# returns the result of applying the NOT operator to the value of $a.
# the first char encountered is a symbol, which is read as a word in COMMAND
# mode. as characters are read, they are compared against registered operators.
# if a match is found, the operator is emitted, and the parser will switch
# the lexer to ARITHMETIC mode
-not$a
# OP(not)
# VAR(a)

# executes the command with the name '-not$a' ($a is NOT substituted)
# because of the preceding hyphen, variable substitution is not performed.
% -not$a
# SYMBOL(%)
# WORD(-not$a)

# executes the command with the name '-not2' ($a IS substituted)
# variable substitution IS performed in dquote strings regardless of the hyphen.
% "-not$a"
# SYMBOL(%)
# STR_START
# STRING(-not)
# VAR(a)
# STR_END

# interpreted as a command with args ['a', '+b', '/c']
# the first char encountered is alpbabetic, so the expression is parsed in
# COMMAND mode
a +b /c
# WORD(a)
# WORD(+b)
# WORD(/c)

# interpreted as an arithmetic expression (but not a well-formed one)
+b /c
# SYM(+)
# WORD(b)
# SYM(/)
# WORD(c)

# interpreted as a command with name '%+'
%+
# WORD(%+)

# interpreted as a command with args ['%', '+']
% +
# WORD(%)
# WORD(+)

# interpreted as a command with name '%'
%;
# WORD(%)
# SYMBOL(;)

# interpreted as a command with name '+'
&+
# SYMBOL(&)
# WORD(+)

# interpreted as a string, which triggers the parser to enter ARITHMETIC mode
'hello world'
# STRING(hello world)

# interpreted as a command with args ['echo', 'hello world']
echo 'hello world'
# WORD(echo)
# STRING(hello world)

# interpreted as an interpolated string
"Hello $(if ($x -lt 5) { echo 'yes' } else {echo 'no'})"


###############################################################################
# The lexer operates as a state machine, moving between different states as
# different characters are encountered
# The states are stored in a stack, to allow recursive parsing.
# The lexer has the following states:
# STATEMENT: A generic statement, could be a command, keyword, arithmetic
#       expression, etc. The next char or symbol encountered will cause the
#       lexer to switch to the appropriate state type:
#           letters, word-symbols -> COMMAND
#           squote -> ARITHMETIC
#           dquote -> ARITHMETIC, FSTRING
#           Digits, vars, var-splats, keywords, all other symbols -> ARITHMETIC
# EXPRESSION: Similar to STATEMENT, but only allows a single command or
#       arithmetic expression. CANNOT use keywords or statement terminators.
#           Letters, word-symbols -> COMMAND
#           squote -> ARITHMETIC
#           dquote -> ARITHMETIC, FSTRING
#           Digits, vars, var-splats, keywords, all other symbols -> ARITHMETIC
# COMMAND: Only words, (f)strings, vars, var-splats, and a subset of symbols are
#       parsed.
# ARITHMETIC: Words, strings, vars, var-splats, all symbols, keywords are parsed.
# STRING: Only a subset of symbols are parsed, all other characters are appended
#       to the resulting string.
#
# Once a state has changed from EXPRESSION to one of the other three state
# types, certain characters will result in the current state either changing
# type or being popped from the stack:
#   STATEMENT: semicolon -> STATEMENT
#            left-paren, left-brace -> POP
#   EXPRESSION: semicolon -> POP
#            left-paren, left-brace -> POP
#   COMMAND: semicolon -> STATEMENT
#            left-paren, left-brace -> POP
#   ARITHMETIC: semicolon -> STATEMENT
#            left-paren, left-brace -> POP
#
# Certain symbols require recursive parsing:
# - dquote strings allow string interpolation, so expressions withing the string
#   may be parsed in a different state. Once the expression is complete, the
#   lexer returns to the previous state.
# - in most cases, $(...) can be used to delimit sub-expressions (including in
#   strings. When '$(' is encountered, a new state entry of type EXPRESSION is
#   pushed onto the stack. When the corresponding ')' is encountered, that state
#   entry is popped from the stack.
# - similarly to $(...), (...) can be used to group expressions, just like in
#   mathematical expressions.