On Wed, 19 Feb 2020 14:34:33 +0100, Kelly Smith <[log in to unmask]> wrote:
>On Wed, 19 Feb 2020 12:27:31 +0100, Bruno Le Floch <[log in to unmask]> wrote:
>>On 2/19/20 10:25 AM, Jonathan Spratte wrote:
>>>> - Defining an environment that sets up active characters to emulate
>>>> traditional BNF syntax is very concise, but comes at the cost of
>>>> delimited arguments and catcode madness.
>>> You could as well use a letter-by-letter parser that doesn't need altered
>>> category codes. Take a look at the pgf module `parser`. That's not `expl3`
>>> but could give you an idea on what's possible.
>>I've been *toying* for a long time (haven't gone very far) with the idea
>>of writing a parser generator, but I couldn't decide how powerful to
>>make it. One option would be to support "parsing expression grammars"
>>(PEGs), which can be parsed in linear time using a packrat parser (but
>>use a lot of memory, possibly problematic). Another option would be to
>>stick with more traditional things like LL or LR parser. To be honest,
>>I don't know enough about parsers and what useful languages they cover
>>to decide. Thoughts welcome.
>Given that the machinery already exists in the regex module, I would
>suggest making a lexer generator and then complementing it with
>a parser generator.
>Some quick reading suggests that LALR parsers (e.g. YACC) would be
>a good balance of expressiveness and efficiency. Since this would be
>used for document syntax, the parsers do not need to be very powerful,
>since user-facing syntax should be relatively simple and certainly
>should be unambiguous.
After discussing with a friend who’s more knowledgeable about parsers
than I, I’m retracting my suggestion of an LALR parser, in favor of PEGs.
However, after thinking about it, I’ve come to favor the idea that parsing
document syntax should still be done as it’s always been in LaTeX:
incrementally and interleaved with command execution. Trying to parse and
heavily process entire documents or even large sections at once would be
very resource-intensive and require a lot of tooling. I’ll have to consider this
further and see if I can explain it more fully.