LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

	LISTSERV Archives
	LATEX-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives
Options:	Use Forum View Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]
Subject:	Re: xparse
From:	Lars Hellström <[log in to unmask]>
Reply To:	Mailing list for the LaTeX3 project <[log in to unmask]>
Date:	Wed, 26 Aug 2009 00:51:03 +0200
Content-Type:	text/plain
Parts/Attachments:	text/plain (268 lines)
Joseph Wright skrev:
> Hello Lars,
> 
> Thanks for taking the time to look at all of this: I really appreciate it.

Just have patience with me not being all that quick at the moment...

> Lars Hellström wrote:
>> Have begun review, but only gotten about a third of the way. Some
>> remarks so far:
>>
>> 1. Is it possible to use a space as delimiter of an argument (perhaps
>> most interesting for u arguments)? I believe I spotted some places in
>> the d->D conversion that would gobble a space as <token>.
> 
> I tend to find it best to test things, even thought I had a feeling I
> knew the answers here.  The u argument works with spaces:
> 
> \ExplSyntaxOn
> \DeclareDocumentCommand \foo { u{~stop~} } { (#1) }
> \ExplSyntaxOff
> \foo word stop more
> 
> results in "(word)more".

I was thinking more about single spaces, as in

   \moveto 0 0 \curveto 47 0 100 53 100 100

(the idea being to express a bunch of graphic data compactly while 
still allowing the code to survive reflowing in a text editor), but 
this is of course on the boundary of what can be considered LaTeX2e-ish 
syntax.

> The D specifiers, on the other hand, do not work using a space as one
> delimiter. That is not affected by the shorthand: if you try something
> like { D[~{default} } or { D[{~}{default } all hell breaks loose.  That
> is basically what I expected, although I guess it should be documented.
>  Do we really want to support something like { d~~ } ?

Probably not, considering that automatic skipping of spaces in front of 
optional arguments has already made spaces special.

>> 2. I believe \xparse_prepare_next:w should be listed as a variable (or
>> maybe "variable function"), since it is getting redefined rather
>> frequently and seems to be keeping track of the state of the argspec
>> parser automaton.
> 
> I've tried to improve the documentation here a bit.
> \xparse_prepare_next:w is not a variable, as it contains things to be
> executed. So it has to be a function! I have tried to explain what it is
> grabbing. You often see this with "next" functions, so this is just one
> case of something that is bound to come up again.

OK, I think I understand how this part of the system works now:

1. Processors occur as \xparse_process_arg:n{<code>} items in the 
sequence of grabbers (the thing being constructed in 
\l_xparse_signature_toks), _before_ the grabber for the argument to 
which they will be applied.

2. At _runtime_ these pieces of code are placed into scratch functions 
with names on the form \xparse_processor_<N>:n, where N=1,2,3,... 
depending on how many processors there are. 
\l_xparse_processor_total_int is the number of processors stored so far 
(for the current argument).

3. When a grabber has successfully grabbed an argument, it hands it 
over to \xparse_add_arg:n. This function applies the processors that 
have been stored (if any), incrementing \l_xparse_processor_use_int as 
it goes. Only after all processors have been applied will it append the 
result to \l_xparse_args_toks.


One noticable difference to what I believe was discussed is in the 
order of processors. If ">{A} o" means "grab optional argument, then 
apply A", then I think it stands to reason that ">{B} >{A} o" should 
mean "grab optional argument, then apply A, finally apply B", but as 
implemented in revision 1494 it's B before A. (Reversing the processing 
order would allow one to do without \l_xparse_processor_use_int.)

If you find it important that processors should be applied in 
left-to-right order, then I believe they should appear after the 
argspec base, perhaps as "o <{A} <{B}", to keep as invariant that the 
thing being done first is closest to the argspec base. (SYNTAX CHANGE) 
This may also have the added value of being more intuitive for users.

Regarding implementation, one could also consider an approach where the 
next processor is a continuation to which the previous passes the 
argument. This would mean \l_xparse_signature_toks for the above 
example ends up with an item saying

   \xparse_grab_o:w {
      \xparse_process:nnn { \ProcessorA } {
         \xparse_process:nnn { \ProcessorB } {
              \xparse_add_arg:n
         }
      }
   }

where we have (mixing l3 and 2e styles)

   \def \xparse_process:nnn #1 #2 #3 {
      #1 { #3 }
      \exp_args:NnV \@firstofone {#2} \l_xparse_arg_toks
        % How do without \@firstofone here?
   }
   \def \xparse_add_arg:n #1 {
      \toks_put_right:Nn \l_xparse_args_toks {#1}
   }

Alternatively, one could make it a non-nested continuation

   \xparse_grab_o:w {
      \xparse_alt_process:nnn { \ProcessorA }
   } {
      \xparse_alt_process:nnn { \ProcessorB }
   } \xparse_add_arg:n

with

   \def \xparse_alt_process:nnn #1 #2 #3 {
      #1 { #2 }
      \exp_args:NnV \@firstofone {#3} \l_xparse_arg_toks
   }

The primary advantage of a continuation approach is that the runtime 
logic is simplified.


Finally, there is the issue that a processor has to put the argument in 
a toks register. I understand this is for generality (only sane way to 
pass along # tokens), but my experience with this type of API is that 
one should make it easy to use commands that happen to already exist. 
In this case, it would mean to also support a processor that would 
store its result in a macro rather than a toks register, since I'm 
quite sure this is what people tend to do unless they definitely need a 
toks register.

In the current implementation, such processors could be supported by 
defining a relative of >{CODE} that instead of 
\xparse_process_arg:n{CODE} puts \xparse_process_tl_arg:n{CODE} in the 
signature, where

   \cs_new:Nn \xparse_process_tl_arg:n {
      \int_incr:N \l_xparse_processor_total_int
      \cs_set:cn {
         xparse_processor_ \int_use:N \l_xparse_processor_total_int :n
      } {
         #1 {##1}
         \toks_set:No \l_xparse_arg_toks \l_xparse_arg_tl
      }
   }

With a continuation-style implementation of processors, it would not 
even be necessary to take the detour over \l_xparse_arg_toks.


>> 3. Though provided in .dtx format, I find the implementation section
>> somewhat illiterate (i.e., not up to literate programming standards).
>> Some concrete examples:
> 
> No-one ever said I was any good at literate programming :-)
> 
>>  * \l_xparse_processor_int and \l_xparse_processor_use_int are
>> documented as "For keeping a count of post-processors and then using
>> them." Well, I could guess about as much from the names alone. What
>> would be more interesting to see spelt out is /how/ these keeps a track
>> of post-processors; what does actual values of these variables mean? At
>> what stage in the process are they used?
> 
> I've tried to improve the documentation on this (whether I have I leave
> to others). I've also altered the name \l_xparse_processor_int to
> \l_xparse_processor_total_int. Trying to explain what has to happen here
> is not easy, at least for me. 

That is a catch of literate programming: you can't do it if you can't 
explain how your program works. (Of course, it might be argued that one 
shouldn't write programs that one cannot explain, period. :-) ) At 
least for me, this puts a restriction on how sleepy I can be when 
programming, as my ability to produce text tends to disappear before my 
ability to produce code...

Is the description I gave above of use to you?

> At the parsing stage, the processor
> functions are in front of the parser in reverse order. So as each one is
> found, it is saved and \l_xparse_processor_total_int is increased. Then
> the argument is grabbed, and the processors are used. This has to start
> a 1, so \l_xparse_processor_use_int is used to count up until it is
> equal to \l_xparse_processor_total_int, at which point the processing is
> done.
> 
>>  * The overall structure of the code starts with lots of little things
>> (macros concerned with parsing specific argspecs) and ends (I'm
>> guessing, from xparse-alt) with putting them together. The more literate
>> approach would be to start with the big picture -- either from the "what
>> goes on at run-time" or "what goes on at define-time" point of view --
>> so that one knows what the little things will fit into when one gets to
>> them.
> 
> This reflects how I think, I suspect. I like to start at the low level
> and work up,

Literate programming does not mandate a top-down or bottom-up order of 
code, but it does require you to construct a comprehensible "story" 
about how the code works. If you want to start at the low level then 
that is fine -- one can for example argue that the grabbers 
(\xparse_grab_D:w etc.) are low level and the parsing helpers 
(\xparse_add_type_D:w etc.) are higher level -- but you need to provide 
enough information about the things defined that one can understand 
what they do. In the case of xparse, a crucial piece of information is 
what the token stream will look like when a particular function is 
expanded. To the extent that they modify the interpreter state, enough 
about this state must be known to understand that, too.

> hence variables come first (after the lead-off), then
> internal functions, 

The Pascal standard requires that you define constants first, then 
types, then variables, and finally functions and procedures, but that's 
1970's computer science. Hardly best practice these days...

> the user functions. I've divided internal functions
> into what seem to me to be logical "blocks", 

They could do with finer divisions. 12 pages is quite a lot.

> then I do everything
> alphabetically. So I can quickly find a function if I know its name. 

Programmer's editors tend to have commands to help you with that (not 
requiring definitions to be sorted), e.g. mine has a pop-up menu of 
things defined in a file. Of course, you may need to teach it the l3 
conventions first...

In the typeset form, there is an index which should provide the same 
functionality via hyperlinks (although for some reason all the links 
seem to go to page 1; is that just for me or is it broken in l3doc in 
general?)

> (I
> never read code from start to end, or even in typeset form. I always
> read it in my editor, find a function, read it, then find the next
> function, etc. So for me alphabetical is best.) In my defence, xparse
> was in roughly the same order before any changes were made by me.

With complex "shorthands" such as \xparse_grab_D_aux:NNnnNn, it would 
be nice to have it spelt out what each argument is.


>>> I'd missed this before as I'd not run makeindex on xparse. I've
>>> corrected my batch file so this happens!
>> A similar issue: Today I pdflatex'ed source3.tex (rev. 1464), and it
>> goes into an infinite loop at \end{document} (\seq_if_in:NnT being one
>> thingie involved).
> 
> All seems okay to me on the SVN.

Some further introspection reveals that this happens inside an 
expansion of \g_doc_functions_seq, which is very, VERY long. Hence I 
suppose this "infinite loop" may in fact be an O(n^2) operation for a 
very large n. Perhaps you should examine switching to a faster 
algorithm, or provide some indication of progress?


Lars Hellström
ATOM RSS1 RSS2
LISTSERV.UNI-HEIDELBERG.DE
Universität Heidelberg \| Impressum \| Datenschutzerklärung