Joseph Wright skrev:
> Hello Lars,
>
> Thanks for taking the time to look at all of this: I really appreciate it.
Just have patience with me not being all that quick at the moment...
> Lars Hellström wrote:
>> Have begun review, but only gotten about a third of the way. Some
>> remarks so far:
>>
>> 1. Is it possible to use a space as delimiter of an argument (perhaps
>> most interesting for u arguments)? I believe I spotted some places in
>> the d->D conversion that would gobble a space as <token>.
>
> I tend to find it best to test things, even thought I had a feeling I
> knew the answers here. The u argument works with spaces:
>
> \ExplSyntaxOn
> \DeclareDocumentCommand \foo { u{~stop~} } { (#1) }
> \ExplSyntaxOff
> \foo word stop more
>
> results in "(word)more".
I was thinking more about single spaces, as in
\moveto 0 0 \curveto 47 0 100 53 100 100
(the idea being to express a bunch of graphic data compactly while
still allowing the code to survive reflowing in a text editor), but
this is of course on the boundary of what can be considered LaTeX2e-ish
syntax.
> The D specifiers, on the other hand, do not work using a space as one
> delimiter. That is not affected by the shorthand: if you try something
> like { D[~{default} } or { D[{~}{default } all hell breaks loose. That
> is basically what I expected, although I guess it should be documented.
> Do we really want to support something like { d~~ } ?
Probably not, considering that automatic skipping of spaces in front of
optional arguments has already made spaces special.
>> 2. I believe \xparse_prepare_next:w should be listed as a variable (or
>> maybe "variable function"), since it is getting redefined rather
>> frequently and seems to be keeping track of the state of the argspec
>> parser automaton.
>
> I've tried to improve the documentation here a bit.
> \xparse_prepare_next:w is not a variable, as it contains things to be
> executed. So it has to be a function! I have tried to explain what it is
> grabbing. You often see this with "next" functions, so this is just one
> case of something that is bound to come up again.
OK, I think I understand how this part of the system works now:
1. Processors occur as \xparse_process_arg:n{<code>} items in the
sequence of grabbers (the thing being constructed in
\l_xparse_signature_toks), _before_ the grabber for the argument to
which they will be applied.
2. At _runtime_ these pieces of code are placed into scratch functions
with names on the form \xparse_processor_<N>:n, where N=1,2,3,...
depending on how many processors there are.
\l_xparse_processor_total_int is the number of processors stored so far
(for the current argument).
3. When a grabber has successfully grabbed an argument, it hands it
over to \xparse_add_arg:n. This function applies the processors that
have been stored (if any), incrementing \l_xparse_processor_use_int as
it goes. Only after all processors have been applied will it append the
result to \l_xparse_args_toks.
One noticable difference to what I believe was discussed is in the
order of processors. If ">{A} o" means "grab optional argument, then
apply A", then I think it stands to reason that ">{B} >{A} o" should
mean "grab optional argument, then apply A, finally apply B", but as
implemented in revision 1494 it's B before A. (Reversing the processing
order would allow one to do without \l_xparse_processor_use_int.)
If you find it important that processors should be applied in
left-to-right order, then I believe they should appear after the
argspec base, perhaps as "o <{A} <{B}", to keep as invariant that the
thing being done first is closest to the argspec base. (SYNTAX CHANGE)
This may also have the added value of being more intuitive for users.
Regarding implementation, one could also consider an approach where the
next processor is a continuation to which the previous passes the
argument. This would mean \l_xparse_signature_toks for the above
example ends up with an item saying
\xparse_grab_o:w {
\xparse_process:nnn { \ProcessorA } {
\xparse_process:nnn { \ProcessorB } {
\xparse_add_arg:n
}
}
}
where we have (mixing l3 and 2e styles)
\def \xparse_process:nnn #1 #2 #3 {
#1 { #3 }
\exp_args:NnV \@firstofone {#2} \l_xparse_arg_toks
% How do without \@firstofone here?
}
\def \xparse_add_arg:n #1 {
\toks_put_right:Nn \l_xparse_args_toks {#1}
}
Alternatively, one could make it a non-nested continuation
\xparse_grab_o:w {
\xparse_alt_process:nnn { \ProcessorA }
} {
\xparse_alt_process:nnn { \ProcessorB }
} \xparse_add_arg:n
with
\def \xparse_alt_process:nnn #1 #2 #3 {
#1 { #2 }
\exp_args:NnV \@firstofone {#3} \l_xparse_arg_toks
}
The primary advantage of a continuation approach is that the runtime
logic is simplified.
Finally, there is the issue that a processor has to put the argument in
a toks register. I understand this is for generality (only sane way to
pass along # tokens), but my experience with this type of API is that
one should make it easy to use commands that happen to already exist.
In this case, it would mean to also support a processor that would
store its result in a macro rather than a toks register, since I'm
quite sure this is what people tend to do unless they definitely need a
toks register.
In the current implementation, such processors could be supported by
defining a relative of >{CODE} that instead of
\xparse_process_arg:n{CODE} puts \xparse_process_tl_arg:n{CODE} in the
signature, where
\cs_new:Nn \xparse_process_tl_arg:n {
\int_incr:N \l_xparse_processor_total_int
\cs_set:cn {
xparse_processor_ \int_use:N \l_xparse_processor_total_int :n
} {
#1 {##1}
\toks_set:No \l_xparse_arg_toks \l_xparse_arg_tl
}
}
With a continuation-style implementation of processors, it would not
even be necessary to take the detour over \l_xparse_arg_toks.
>> 3. Though provided in .dtx format, I find the implementation section
>> somewhat illiterate (i.e., not up to literate programming standards).
>> Some concrete examples:
>
> No-one ever said I was any good at literate programming :-)
>
>> * \l_xparse_processor_int and \l_xparse_processor_use_int are
>> documented as "For keeping a count of post-processors and then using
>> them." Well, I could guess about as much from the names alone. What
>> would be more interesting to see spelt out is /how/ these keeps a track
>> of post-processors; what does actual values of these variables mean? At
>> what stage in the process are they used?
>
> I've tried to improve the documentation on this (whether I have I leave
> to others). I've also altered the name \l_xparse_processor_int to
> \l_xparse_processor_total_int. Trying to explain what has to happen here
> is not easy, at least for me.
That is a catch of literate programming: you can't do it if you can't
explain how your program works. (Of course, it might be argued that one
shouldn't write programs that one cannot explain, period. :-) ) At
least for me, this puts a restriction on how sleepy I can be when
programming, as my ability to produce text tends to disappear before my
ability to produce code...
Is the description I gave above of use to you?
> At the parsing stage, the processor
> functions are in front of the parser in reverse order. So as each one is
> found, it is saved and \l_xparse_processor_total_int is increased. Then
> the argument is grabbed, and the processors are used. This has to start
> a 1, so \l_xparse_processor_use_int is used to count up until it is
> equal to \l_xparse_processor_total_int, at which point the processing is
> done.
>
>> * The overall structure of the code starts with lots of little things
>> (macros concerned with parsing specific argspecs) and ends (I'm
>> guessing, from xparse-alt) with putting them together. The more literate
>> approach would be to start with the big picture -- either from the "what
>> goes on at run-time" or "what goes on at define-time" point of view --
>> so that one knows what the little things will fit into when one gets to
>> them.
>
> This reflects how I think, I suspect. I like to start at the low level
> and work up,
Literate programming does not mandate a top-down or bottom-up order of
code, but it does require you to construct a comprehensible "story"
about how the code works. If you want to start at the low level then
that is fine -- one can for example argue that the grabbers
(\xparse_grab_D:w etc.) are low level and the parsing helpers
(\xparse_add_type_D:w etc.) are higher level -- but you need to provide
enough information about the things defined that one can understand
what they do. In the case of xparse, a crucial piece of information is
what the token stream will look like when a particular function is
expanded. To the extent that they modify the interpreter state, enough
about this state must be known to understand that, too.
> hence variables come first (after the lead-off), then
> internal functions,
The Pascal standard requires that you define constants first, then
types, then variables, and finally functions and procedures, but that's
1970's computer science. Hardly best practice these days...
> the user functions. I've divided internal functions
> into what seem to me to be logical "blocks",
They could do with finer divisions. 12 pages is quite a lot.
> then I do everything
> alphabetically. So I can quickly find a function if I know its name.
Programmer's editors tend to have commands to help you with that (not
requiring definitions to be sorted), e.g. mine has a pop-up menu of
things defined in a file. Of course, you may need to teach it the l3
conventions first...
In the typeset form, there is an index which should provide the same
functionality via hyperlinks (although for some reason all the links
seem to go to page 1; is that just for me or is it broken in l3doc in
general?)
> (I
> never read code from start to end, or even in typeset form. I always
> read it in my editor, find a function, read it, then find the next
> function, etc. So for me alphabetical is best.) In my defence, xparse
> was in roughly the same order before any changes were made by me.
With complex "shorthands" such as \xparse_grab_D_aux:NNnnNn, it would
be nice to have it spelt out what each argument is.
>>> I'd missed this before as I'd not run makeindex on xparse. I've
>>> corrected my batch file so this happens!
>> A similar issue: Today I pdflatex'ed source3.tex (rev. 1464), and it
>> goes into an infinite loop at \end{document} (\seq_if_in:NnT being one
>> thingie involved).
>
> All seems okay to me on the SVN.
Some further introspection reveals that this happens inside an
expansion of \g_doc_functions_seq, which is very, VERY long. Hence I
suppose this "infinite loop" may in fact be an O(n^2) operation for a
very large n. Perhaps you should examine switching to a faster
algorithm, or provide some indication of progress?
Lars Hellström
|