LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

	LISTSERV Archives
	LATEX-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives
Options:	Use Forum View Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]
Subject:	xparse and xdoc
From:	Lars Hellström <[log in to unmask]>
Reply To:	Mailing list for the LaTeX3 project <[log in to unmask]>
Date:	Sun, 14 Sep 2008 23:03:23 +0200
Content-Type:	text/plain
Parts/Attachments:	text/plain (453 lines)
Since this list is currently in one of its active episodes, and 

questions are actually being answered, I thought maybe I should reraise 

some issues regarding the evolution of xparse that I put forth in 

February (according to my mailbox) but didn't get much of a response 

to. If memory serves, Frank wrote he wanted 3 weeks to think about it 

(the time I had taken to respond to a posting of his), but now it's 

been more like 30 weeks, and Frank is obviously not the only one around...



But no need to go digging through mailboxes, just read on:





PART I: Syntax



I now have a working implementation (for LaTeX2e) of an argspec based 

\NewDocumentCommand, which seems to work very well. The current code 

can be found at



   http://abel.math.umu.se/~lars/xdoc/xdoc2l3.dtx

   http://abel.math.umu.se/~lars/xdoc/xdoc2l3.ins



and an example document demonstrating various argspecs is



   http://abel.math.umu.se/~lars/xdoc/xdoc2l3test.dvi





When I started coding this, I though I needed to stick very close to 

the final xparse (a somewhat difficult task, considering xparse is 

still experimental) -- hence the use of \NewDocumentCommand as command 

name -- but as time went on I've drifted more towards the opinion that 

it will be sufficient to have a path for manual migration to xparse. In 

other words, I'd like the basic features to be available (in the core 

or after additional definitions), but it's no big deal if the names and 

details in syntax are different.



[What could maybe be an issue for you is that I'm considering splitting 

off generic parts of xdoc2 into separate packages -- like shortvrb is 

from doc -- and one of these would then be the referenced xdoc2l3: 

select LaTeX3(ish) features for xdoc2. If for some reason it would 

become popular, then there could be a corresponding pressure for 

backward compatibility, regarding for example \IfNoValueTF. ;-) ]



Anyway...

Since I wanted to stay close to the xparse syntax, I stuck with the 

idea of having each argument type be denoted by a single character, 

followed by zero or more arguments. Since the main new argument type I 

introduced was "composition of processors", I chose the syntax



   @{<processors>}



for this; @ happens to be the function composition operation (\circ) in 

at least Maple (although you'll probably see that it isn't so much the 

composition that is going to be characteristic of this specifier type, 

so I should probably rethink this). Like functions under composition, 

the processors are going to act on the argument in order right to left; 

this turned out to simplify the translation of these specifiers into 

actual code.



For the individual processors, I again chose the "single character 

followed by zero or more arguments" syntax, with the choice of 

character being influenced by xparse argument specifiers. Thus there is 

for example



   O{<default>}     Look ahead for a left bracket. If there is one, grab

                    the entire optional argument and place it within

                    braces. Otherwise insert <default> within braces.



   o                Look ahead for a left bracket. If there is one, grab

                    the entire optional argument and place it within

                    braces. Otherwise insert \NoValue and skip all

                    following processors.



   S{<char>}        Look ahead for a <char>. If there is one, gobble it

                    and insert \BooleanTrue, otherwise insert

                    \BooleanFalse.



   g{<assignments>} Make the specified <assignments> (usually to

                    \catcode or other parameters relevant for scanning).



With these, I can implement several (existing or proposed) xparse 

argument specifiers as mere shorthands for @{...} constructions:



   m                is  @{}  (no processors, just grab an argument)

   o                is  @{o}

   O{<default>}     is  @{O{<default>}}

   S{<symbol>}      is  @{S{<symbol>}}

   s                is  @{S{*}}

   g{<assignments>} is  @{g{<assignments>}}



where the last was implemented by Morten in his mail of December 13, 

2007. Listed as above, this doesn't look too impressive, but the fun 

begins when one can start composing processors. With only the above, 

there aren't all that many combinations that make sense, but one that 

can be made is



   @{ o g{\@sanitize} }



---first change catcodes (as in the argument of \index), then look for 

and grab an optional argument while these catcodes are in effect.



To really see the power of this, it is however necessary to have some 

processors which do things to mandatory arguments. Those that there are 

implementations for in the code above are:



   h                         convert argument to a "harmless character

                             sequence", i.e., any problematic character

                             is encoded as \PrintChar{<ASCII code>}



   t                         argument is \stringed



   x{<bool>}{<pre>}{<post>}  argument gets expanded; more precisely, if

                             <bool> is true then

                                <pre>{<argument>}<post>

                             gets expanded and otherwise

                                <pre><argument><post>

                             gets expanded



   .{intrange}{<min>}{<max>} Argument must be some <number>, at least

                             <min> and at most <max>.



The . here is syntactic sugar for "multiletter processor name follows"; 

as with xparse specifiers currently, it would be possible to write 

{intrange}{<min>}{<max>} instead, but having an unbraced character at 

the beginning of a processor specifier makes it easier to read.



With the above, Morten's d specifier (grab and \detokenize) would be a 

shorthand for



   @{ x{\BooleanTrue}{\detokenize}{} }



and it would have the optional counterpart



   @{ x{\BooleanTrue}{\detokenize}{} o }



or it could be combined with catcode changes as



   @{ x{\BooleanTrue}{\detokenize}{} g{\catcode`\%=12} }



and of course combined all three, into



   @{ x{\BooleanTrue}{\detokenize}{} o g{\catcode`\%=12} }



However, \detokenize isn't satisfactory for my needs; the main problem 

being that detokenized material cannot be written to a file (in 

particular .idx/.glo file) and then reliably read back. The "h" 

processor above employs xdoc2's more robust (and powerful) alternative 

of harmless character sequences. The reason I implemented this x 

processor in the first place was instead that I had a legacy grabber 

\XD@grab@harmless@asmacro which can be specified as



   @{

      x{\BooleanFalse}{\XD@unbackslash}{\@empty}

      h

      g{\catcode92=12\MakePrivateLetters}

   }



using the above; the x part here removes a leading backslash in the 

argument, and the \@empty is there in case the argument was empty.



Additional examples of grabbers in xdoc2 which can be expressed using 

this syntax are



   \XD@grab@harmless@oarg  as  @{ho}

   \XD@grab@harmless@cs    as

                 @{h t g{\MakePrivateLetters\escapechar=-1}}



(see table in xdoc2l3.dtx).



Finally, I also implemented a argument specifier = which stores the 

argument into a macro. Its syntax is



   ={<control sequence>}{<processors>}



but since the <processors> part is just as for the @ specifier, the 

composition turned out to not be what was characteristic for this. I 

considered renaming @ to #, but decided not to (for now) as that is 

likely to lead to quoting problems in real life usage. And maybe this 

list can come up with better names anyway.





It probably wouldn't be hard to convert \DeclareArgumentType from 

defining an argument type specifier to defining an argument processor 

specifier---concretely one that takes some fancy sort of argument and 

converts it to a more regular mandatory argument. IMHO defining a 

processor would be much better, at it allows for independent control of 

orthogonal aspects of argument processing. I didn't implement anything 

in that area though, as I don't have any immediate need for exotic 

argument delimiters.



It may however be even better to construct argument specifiers as 

defined by \DeclareArgumentType as a composition of two argument 

processors: one that looks ahead for an optional argument and one which 

actually grabs it. The reason is demonstrated by \testI in the example 

document; concretely the problem is that in



   \NewDocumentCommand{\test}{ @{ O{world} g{\catcode`\%=12} } }{%

      Hello, #1!%

   }

   \test % Is this a comment?

   \relax



the % gets tokenized already when looking for the left bracket 

delimiter of the optional argument, and it stays a token even after it 

has been determined that there wasn't such an argument.



If instead Q was a processor that does everything of O short of 

grabbing the argument, and b was a processor that just grabs a 

bracket-delimited argument, then the above specifier could have been



   \NewDocumentCommand{\test}{ @{ b g{\catcode`\%=12} Q{world} } }{%

      Hello, #1!%

   }



instead, for which things work more as expected:



   \test % This is a comment!

   \test[% This is not a comment!]



Splitting the two provides higher orthogonality (I find the long 

argument sequence of \DeclareArgumentType bewildering, and I don't 

think it is only a matter of lack of documentation), but it could make 

it harder to produce good error messages.





PART II: Technical issues



This mail discusses details of the implementation of the xdoc2l3 

\NewDocumentCommand command, and issues with these.



First a bit of context: In my old code, all argument grabbers were 

being expanded in the context



   <grabber> <subsequent grabbers> \XD@endgrab <next argument> ...



A grabber would do whatever it had to do to grab the <next argument>, 

and was then supposed to append it to \toks@; thereafter the 

\XD@endgrab macro expanded to \the\[log in to unmask] Xparse does roughly the same 

thing, but has an explicit \toks_use:N\l_xparse_grabbed_args_toks 

instead of the \XD@endgrab macro and uses \l_xparse_grabbed_args_toks 

as the token marking the beginning of arguments.



Implementing processors in this context was actually very easy, as all 

they had to do was to grab the <next argument>, do something to it, and 

then put the result back as a mandatory argument after \XD@endgrab, 

where the next processor would expect it. I'm a bit worried that the 

xparse construction, exposing the implementation detail of 

\l_xparse_grabbed_args_toks, in the long run will be restrictive; 

processors don't need to know the name of the token register (or even 

that there is one), but they do need to know the name of the mark token.



*Example*  The x{<bool>}{<pre>}{<post>} specifier compiles into the 

processor \XD@gp@expanded{<bool>}{<pre>}{<post>}, which is simply 

defined as



\def\XD@gp@expanded#1#2#3#4\XD@endgrab#5{%

    \protected@edef\@tempa{{%

       \IfBooleanTF{#1}{#2{#5}#3}{#2#5#3}%

    }}%

    \expandafter\XD@gh@put \expandafter{\@tempa}{#4}%

}

\def\XD@gh@put#1#2{#2\XD@endgrab#1} % Generic helper macro



---shovelling things around a bit, but quite straightforward. The 

(possibly less roundabout) approach of not throwing a processed 

argument back to the other side of the \XD@endgrab marker has the 

disadvantage that it forces a distinction between different types of 

processors and thus increases the complexity of the system.





Two things in xparse which I didn't implement are the P and W 

specifiers. In the case of W this is mostly because the LaTeX2e 

\@ifnextchar gobbles spaces, and I didn't feel like rolling my own; I 

have however encountered cases where I would have wanted an "o" 

specifier that didn't gobble spaces, so I certainly see the need for 

it. In the case of P, I was simply lazy; I don't think xdoc will need 

\long arguments anyway, and they increase the complexity quite 

considerably.



However, while thinking about the matter, it occurred to me that the 

status for some grabber/processor of being non-\long implies that one 

cannot use \par in the <default> for an O argument, even if that 

argument itself is \long; the reason being that such a \par token then 

appears within the <subsequent grabbers>. This issue should at least be 

remarked upon in xparse.dtx.





One *new* thing I did in this implementation is that I used groups to 

isolate processing of different arguments and also the argument 

grabbing procedure as a whole. Concretely the latter was just the 

matter of changing the defitions



\def\XD@grab@arguments#1#2{%

    \toks@={#1}%

    #2\XD@endgrab

}

\def\XD@endgrab{\the\toks@}



(as found in current xdoc2) to



\def\XD@grab@arguments#1#2{%

    \bgroup

    \toks@={#1}%

    #2\XD@endgrab

}

\def\XD@endgrab{\expandafter\egroup \the\toks@}



As for the per-argument groups, if you look at xdoc2l3test.dvi you'll 

see that each argument grabber begins with an explicit \bgroup, whereas 

the matching \egroup is hidden in the ending \XD@ge@parameter or 

\XD@[log in to unmask] At least that the g{<assignment>} processors should 

affect tokenisation and be possible to combine with optional-argument 

type processors requires that things are done this way. There are 

however at least two problems with doing it in general:



  1. If there is an error when TeX is grabbing arguments, then the

     macro simply disappears, along with all text grabbed so far. Such

     errors arise easily as typos, and would in this case cause the

     closing \egroups to disappear, throwing off the nesting of groups.



Seeking to limit the damage caused by this, I chose to use \bgroup and 

\egroup so that at least the nest end of an environment would get 

things back on track, but that causes a new problem.



  2. In math mode, \bgroup\egroup is far from invisible.



For xdoc this is acceptable, since the commands defined are anyway 

mostly for use in vertical or horizontal mode, but xparse should be 

able to define general commands and thus cannot accept this.



\begin{TeX-extension-idea}

   It occurs to me that it would be useful to have groups that nest even

   weaker than \bgroup-\egroup, and which do not contribute anything to

   math lists.

\end{TeX-extension-idea}



The existing alternative is to use \begingroup and \endgroup, but then 

one must guard against getting these mismatched. I have one idea for 

this, but I haven't tried it out yet:



\def\XD@gh@insure#1{%

    \let\reserved@a=#1%

    \expandafter\futurelet \expandafter\@let@token

      \expandafter\XD@gh@insure@

}

\def\XD@gh@insure@{%

    \ifx \reserved@a\@let@token \else \egroup\egroup \fi

}



The idea is to instead of a naked macro (like \XD@gp@expanded) that 

processes material after \XD@endgrab write



   \XD@gh@insure{\protected@edef}\XD@gp@expanded



where \protected@edef is the first token in the expected expansion of 

\XD@[log in to unmask] If that is not what lies ahead, then apparently 

expansion of \XD@gp@expanded failed, and it is necessary to insert some 

emergency \egroups to get the nesting back to normal.



  3. Adding groups makes the command non-mouth-expandable.

     In particular, it will throw TeX out of the inner loop

     (disrupt kerning etc.)



Again, this is something I can live with in xdoc, but if the LaTeX3 

\NewDocumentCommand is going to replace \newcommand, then there will 

probably be cases where one wants to define a command that is really 

just a macro. This is probably doable, if \NewDocumentCommand only 

introduces the groups if they're really needed -- using a simpler 

implementation for the nice cases, similar to that for the tail of m 

arguments in current xparse -- but could lead to a rather complex 

codebase. Still, making the distinction be between "commands with only 

`m' arguments" and "all other commands" wouldn't need to make the code 

any more complicated than it is today. The only disruption that one 

would then get by starting and ending groups as part of argument 

grabbing is between an \accent and its base character, but perhaps that 

is acceptable.



However, I couldn't help dreaming a bit... If one was allowed to add 

one new TeX primitive, for the purpose of solving the issues above, 

then what should that be? Much of our problems stem from the 

restriction that TeX does not allow assignments while expanding things 

(even though the mouth is Turing-complete), so my choice would be to 

offer an escape from this restriction.



\begin{TeX-extension-idea}

   Add an \emph{expandable} primitive that allows for some commands

   (primarily assignments) to be \emph{executed} and affect the

   expansion. The syntax could be

     \expandedwith{<text>}{<commands>}

   and the semantics could be as follows:

   \begin{enumerate}

     \item

        A new group is begun.

     \item

        The <commands> are executed.

     \item

        The <text> is expanded (as discussed regarding

        new \expanded primitive). The result of this will

        be the one-step expansion of the entire \expandedwith.

     \item

        The new group is closed.

   \end{enumerate}

   With this, a non-spacebypassing \@ifnextchar could be conservatively

   implemented as

   \begin{verbatim}

     \long\def\@ifnextchar#1#2#3{%

        \expandedwith{%

           \ifx\@let@token\reserved@d

              \noexpand\@firstoftwo

           \else

              \noexpand\@secondoftwo

           \fi

           \the\toks@

        }{%

           \let\reserved@d=#1%

           \toks@={{#2}{#3}}%

           \futurelet\@let@token

        }%

     }

   \end{verbatim}

   (it would look as if the \@ifnextchar<char> is replaced by

   \@firstoftwo or \@secondoftwo in one expansion step).



   More relevant here is however that \XD@grab@arguments could be

   implemented as

   \begin{verbatim}

     \def\XD@grab@arguments#1#2{%

        \expandedwith{\the\toks@}{\iffalse}\fi

        \toks@={#1}%

        #2\XD@endgrab

     }

     \def\XD@endgrab{\iffalse{\fi}}

   \end{verbatim}

   ---certainly a braintwister, but everything in this area tends

   to be. \emoticon{:-)}



   Implementation-wise, I would try making a procedure which is

   like a simplified main_control and have \expandedwith work

   primarily by calling this to execute the <commands>. Since most

   of the complexity of main_control seems to lie in the "inner loop"

   spaghetti, this might actually be straightforward. In terms of

   the traditional TeX anatomy, this would (sensitive readers beware)

   amount to fitting TeX's mouth with an auxiliary stomach that's

   severely ostomized; anything solid (such as ordinary text characters)

   mistakenly fed down it should throw errors, but invisible stuff not

   involving typesetting should be OK.

\end{TeX-extension-idea}





Such dreaming aside, I think the Big Issue is whether argument grabbers 

should operate in a group or not.





Lars Hellström
ATOM RSS1 RSS2
LISTSERV.UNI-HEIDELBERG.DE
Universität Heidelberg \| Impressum \| Datenschutzerklärung