## LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

#### View:

 Message: [ First | Previous | Next | Last ] By Topic: [ First | Previous | Next | Last ] By Author: [ First | Previous | Next | Last ] Font: Proportional Font

Subject:

Re: Strings, and regular expressions

From:

Date:

Thu, 13 Oct 2011 14:34:45 -0400

Content-Type:

text/plain

Parts/Attachments:

 text/plain (57 lines)
 >> Do you have any idea on the natural resulting data structure? It's not >> clear how trees should be implemented in TeX. > > I would suggest just return a token sequence, into which the TeX commands > the user specified has been inserted to reflect the aspects of the structure > that he's interested in. A silly example might be something like > > \begin{Sentence}\begin{NounPhrase}My > \begin{Noun}hovercraft\end{Noun}\end{NounPhrase} > \begin{VerbPhrase}\begin{Verb}is\end{Verb} full > \begin{PrepositionalPhrase}of > \begin{Noun}eels\end{Noun}\end{PrepositionalPhrase}\end{VerbPhrase}.\end{Sentence} > > It seems to be a fairly standard extension that something inside braces in > an PEG means "When passing through here upon matching, then insert this > piece of material." An awkward point is however that you'd often want to > insert an opening brace at one point and a closing brace a bit later -- for > example to make that \Noun{hovercraft} rather than the less flexible > \begin{Noun}hovercraft\end{Noun} -- which is not obvious how to express. My main problem was the insertion of { and }. We cannot do that directly, of course, and need to walk an extra mile. A reasonable but ugly approach is to insert " { \iffalse } \fi " instead of " { " and " \iffalse { \fi } " instead of " } ", also putting every other piece of material within "\unexpanded", then once the token list is built, \edef the whole thing to get braces right. Of course, there needs to be a check (before \edef-ing) that the result will be brace balanced. I will probably implement PEGs at some point, but it is unlikely that I do it before I finish rewriting l3fp (so count a few months). > I'd be inclined to consider a grammar with a left recursion loop a > programming error, just like it is for ordinary TeX macros. There is a > transformation of PEGs that turn left-recursion into runtime match failures > rather than infinite loops, but I don't know how complicated it is, and I > suspect it's mainly interesting when the PEG effectively gets compiled to > machine code. LaTeX already runs inside the TeX virtual machine, so an > infinite loop is not that fatal. Detecting left-recursion should be easy, and then I'll just raise an error, and possibly try to behave in a sensible way (not loop eternally). The problem is not that different from epsilon-transitions in an NFA running in a loop. > And exchanging the sets of initial and final states. Yes, that's all there > is to it. (I assume you don't worry about submatch capturing there anyway.) > >> but the >> representation I have makes that non-trivial). > > It is often there that the crux lies. Indeed, I don't need to keep track of submatches. Some more thoughts about that this weekend. Regards, Bruno