LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

	LISTSERV Archives
	LATEX-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives
Options:	Use Forum View Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]
Subject:	Re: Newbie Question on LaTeX 3 Programming
From:	Philipp Stephani <[log in to unmask]>
Reply To:	Mailing list for the LaTeX3 project <[log in to unmask]>
Date:	Thu, 11 Feb 2010 00:57:56 +0100
Content-Type:	text/plain
Parts/Attachments:	text/plain (355 lines)
Hello Joseph,

thanks for your comprehensive answer. I'm just answering to a few points:

Am 06.02.2010 um 22:06 schrieb Joseph Wright:

> On Feb 6, 1:29 am, Philipp Stephani <[log in to unmask]> wrote:
>> I can't say a lot here because I'm "tainted" by TeX programming.  I can
>> only tell you a bit about my experience with source3.pdf.  The following
>> list is in completely random order and contains some of my suggestions
>> for expl3 and the LaTeX 3 documentation:
> 
> Hello Philipp,
> 
> First, thanks very much for taking the time to provide this list of
> useful suggestions: it really helpful. I'll start with a few general
> comments before looking at the issues you raise point by point (or at
> least a number of them).
> 
> As you'll be aware, expl3 has been developed over a long period. That
> means that some of the material carries historical "baggage", and that
> the documentation is not all written in the same way. Last year's
> refactor dealt with a lot of the problems that were present in the
> code at the time, but of course there are those that remain. There is
> also the fact that getting things written requires that there is a
> driving force. So the modules that get written depend on what the team
> want/need to do. At least part of this is because it becomes apparent
> what is needed when you need it yourself.
> 
> There are also time issues. The amount of "man-hour" effort available
> is limited, and there are several things to try to do. We could spend
> for ever trying to improve expl3, but without other stuff as well this
> won't get us very far. It's an iterative process, to some extent.

Agreed. I think it should be tried to reimplement the 100 or so most popular packages. This is already lots of work, and I think we would get a fairly good picture of what is really needed.

> 
> Another general question is who is best to write the documentation.
> Anyone who feels they can contribute can send updates to the team: I'm
> sure we'll add useful material to the sources if we have it!

Perhaps a better structure would be helpful: For each macro we need information: is it expandable (and if not, it should be protected); to what does it expand; what are the side effects. The manual mentions that terms like "function" and "result" are used, but these terms must be clearly defined (e.g. function = expandable macro, result = expansion) because you can never hide TeX completely.

>> - Probably you should give a rigorous definition and many examples for
>>   each of the specifiers.  For example, every LaTeX programmer must gain
>>   deep knowledge about expansion, and it should be made clear that both
>>   macros and token lists can be expanded, while all other data types
>>   can't.
>> 
>> - In general, the documentation should be more lengthy and rigorous.
>>  Nearly all macro descriptions leave a lot of open questions.  For
>>  example, consider \muskip_use:N.  It is clear from the naming scheme
>>  that the only argument must be a single token.  The description just
>>  says: "This function returns the length value kept in ⟨muskip⟩ in a
>>  way suitable for further processing."  Now what does "function" and
>>  "return" mean?  What will happen if <muskip> is a character token or a
>>  control sequence token that does not refer to a muskip register.  What
>>  is the "further processing" the description talks about?  What does
>>  "suitable" mean?  Is the macro expandable?—We have to look at the
>>  implementation to find out that \muskip_use:N is in fact an alias for
>>  \the and thus expands to the string representation of any register.
>>  If we would only guess by the name \muskip_use, then we would rather
>>  assume that this macro inserts a math glue item into the current list.
> 
> I assume that in the first point you mean that a token list can
> contain both expanded and unexpanded material, whereas something like
> an integer expands whatever you pass it until it finds something
> unexpandable and of the correct form.

No, I mean that there is a difference between

\tl_set:Nx \l_tmpa_tl { \l_tmpb_tl }
\tl_set:Nx \l_tmpa_tl { \tmpb: }

and

\tl_set:Nx \l_tmpa_tl { \l_tmpa_int }

And \l_tmpa_tl and \tmpb: can be used in the typesetting process, while a single \l_tmpa_int is a syntax error.

> I think that it's clear that we need a "Programming in LaTeX3" guide
> that starts from assuming only a working knowledge of LaTeX as a user,
> rather than the current situation where you need to know things like
> expansion already. This needs some real thought, and at the moment I'm
> not sure who is best placed to do it!

It must be somebody who knows LaTeX 3 inside out, but also has not lost contact to "normal" users. I think that TeX by Topic is a good starting point: it is totally self-contained, yet compact.

>> - The documentation sometimes contradicts the implementation: for
>>   example, \tl_if_empty:nTF is listed as unexpandable, and \exp_arg:x is
>>   documented, but not implemented.
> 
> Listing expandable functions was not done to start with, and so I'm
> sure this is not the only missing item. (Again, a proper re-write
> would probably help here.) On \exp_arg:x, it got taken out of the main
> part of expl3 recently (it was supposed to be a wrapper for \expanded,
> but as pdfTeX 1.50 does not seem likely as a release we've dropped
> it.) I thought we'd removed it from the docs: can you point it out to
> me in the current CTAN snapshot|?

I think this should be reimplemented because everybody needs it. I tried to write a small package only with expl3 and xparse, but I still needed an implementation of \exp_arg:x. The l3expan module helps a bit, but there is simply nothing as general and powerful as \edef. Maybe you could define two macros, one expandable and one unexpandable.

> 
>> - What is the conceptual difference between token lists and token
>>   registers?  In what cases should I use which one?  
> 
> At the implementation level, a token list is a macro, whereas a token
> register is, well, a token register. Most of the time, token lists are
> more convenient as they don't need an access function. There are a
> really limited number of places where a token register is needed:
>  - When the stored material might contain # tokens
>  - When you want to have exactly one expansion inside an x scenario
> That is of course the same as any other TeX programming, but we should
> I guess discuss this somewhere.

I'm trying to ask from a new user's perspective. The manual must document what exactly a token is (and the tl functions sometimes treat groups as single tokens). If the user wants to store data somewhere (a very common task), the user would wonder which data structure to use.

> 
>> Same questions for
>> integers/numbers and comma lists/sequences.
> 
> The num data type is one of the historical things, which I've also
> noted on the team list and asked about. I think we sort-of agreed to
> remove it but got somewhat stalled: I'll try again.
> 
> On comma lists versus stacks, comma list handling is needed as it goes
> up to the user level (commas are convenient for lists). On the other
> hand, sequences use an internal marker to separate items, so can
> contain just about any input without confusion about the list
> boundaries. For example, if I do:
> 
> \clist_new:N \l_my_clist
> \clist_put_right:Nn \l_my_clist { a , b , c }
> 
> the clist now contains three entries, which might not have been the
> intention. However, with a sequence
> 
> \seq_new:N \l_my_seq
> \seq_put_right:Nn \l_my_seq { a , b , c }
> 
> only puts one item onto the sequence. (That said, I've used comma
> lists almost exclusively to date: see the next item on your list.)

I don't know if I like this. Perhaps the clist type should even be removed completely. If you look at other programming languages, you find various sequential containers (e.g. lists in Python, std::vector in C++), and string functions to split user input at arbitrary delimiters. The only reason for comma lists in TeX is efficiency (no conversion from user input required).

> 
>> - Why is there no separate stack/queue data type?
> 
> That is what sequence stacks are for: expl3.pdf, p. 12:
> 
> "l3seq This implements data-types such as queues and stacks"
> 
> This is where I have used the seq data type rather than using comma
> lists: when wanting to set up a stack.

l3clist has exactly the same stack functions. In other programming languages like C++, stacks are often container adaptors distinct from the underlying container. If that is not the case (like in Python), there is a unified interface and not two parallel interfaces. My suggestions would thus be: Remove clists and use sequences exclusively, and create a new stack datatype that just uses sequences internally.

> 
>> - Why are there no expandable stack peek operations?
> 
> I'm not sure quite how you'd work this. For xparse, I took some ideas
> from etextools for doing an expandable test for optional arguments.
> However, this is not really that robust (as I hope the xparse
> documentation makes clear), and so is not supposed to be encourages.
> TeX only gives us \futurelet, and that is not expandable.

\def\peek#1{\expandafter\peeek#1,\nil}
\def\peeek#1,#2\nil{#1}
\def\a{x,y,z}

Then \peek\a is fully expandable.

> 
>> - Why only a global undefine macro?
>> 
>> - Sometimes the "new" macros are local (\cs_new), sometimes global
>>   (\tl_new).
> 
> These two points go together: I'll address the second one first :-)
> \tl_new is consistent with all of the other \<var>_new functions, in
> that it works globally. (I only realised this s few weeks ago while
> testing something out.) So the question is why does \cs_new not work
> globally while \cs_gundefine is global. I'm not sure I have a good
> answer to this: I'd probably prefer names to be "taken" globally
> irrespective of whether they are variables or functions.

The point is that \int_new does a global allocation, so with the current allocator it must be global. I think this is a general problem with the current allocator: It forces users to declare all variables long before they are used which is generally seen as bad programming style. A good C++ programmer would always write:

for (...) {
  int j = ...
}

instead of

int j;
for (...) {
  j = ...
}

Similarly, many LaTeX users write

\whiledo{...}{
  \newcount\j
}

and then are surprised about error concerning insufficient room for registers.
Either the practice of globally declaring all variables should be enforced and documented, or something like the C++ snippet should be made possible.
Macros, on the other hand, have no such limitations. I'm really unsure what the best set of rules should be here, but I do think it should be consistent.

> At present, there are two pretty obvious gaps in expl3:
> 
> - strings (something like xstrings or stringstrings or ...)
> - floating points (something like fp or pgfmath or ...)
> 
> Both of these are on the "to do" list, but this depends on someone
> doing it. I've got reasons for being interested in implementing at
> least part of "l3fp" sooner rather than later, but at present I'm not
> sure who will take the strings issue (Will Robertson did mention it at
> one point, but he's currently rather busy with other things). Probably
> some discussion is needed about what to implement, but we certainly
> need something.

I don't think that the problem lies in the lack of manpower: look at Hans and Taco ;-)
Perhaps the whole approach of bottom-up programming is the problem: Maybe one should not start with expl3 but with the top 100 of the popular document classes and packages, and implement their combined functionality in a new kernel, while extending expl3 on the way.

> 
>> - l3io is too low-level.  \io_new should only check whether the control
>>   sequence already exists, and the allocation should be done by
>>   \io_open.  As long as we have only 16 streams, the allocation should
>>   not use the plain TeX allocator, but a "heap allocator" with a list of
>>   free streams instead.  Closing a stream should deallocate the stream
>>   handle.  This is the normal behavior in all programming languages.  On
>>   the contrary, the current implementation forces everybody to
>>   preallocate stream handles.
> 
> I wrote l3io as we had nothing at all and I needed some functions
> available: it's therefore not had too much reviewing just yet. I'm
> very much learning these things (I have no formal programming
> background) so I make mistakes, I'm afraid, and just went with
> essentially a re-code of the latex.ltx material in this area. Your
> suggestion is pretty sensible, and as long as the rest of the team
> look happy with this I'd hope a re-write can be arranged. Feel free to
> contribute more ideas [or even code :-)].

Most of my ideas actually come from personal experience. I'm currently doing a medium-sized project in LaTeX, and have found that there are simply no adequate solutions for my needs, so I had to code lots of things from the ground up. At the moment I have a build script with 819 lines, a document class with 710 lines, a private package for the project with another 236 lines, plus some other home-grown packages with 1846 lines in total. I think that these things contain mostly just boilerplate code that should be provided by the base system. For example, most other compilers issue messages in the file:line:error format, but the TeX compiler outputs useless noise like page numbers or names of included image files, and no sensible messages at all. This is ok if you have a five-page document contained in a single file, or if your computer is so slow that it takes a minute for each page to compile. Even worse, to find out the file in which an error has occurred, you must parse the whole log file, and a single misplaced parenthesis breaks the parsing process. I didn't want to adapt one of the standard build scripts because I think that I'm faster when I write my own script in Python. So I redefined an internal macro from scrlfile.sty to build up a stack of filenames currently read and to output the current file name with a unique prefix every time it changed, and I redefined \GenericError and friends to output file:line:error messages (with \MessageBreak = space). Then I arrived at compiler output that can easily be parsed by a parser that walks over all lines and uses some simple regular expressions. So my first suggestion would be: Remove l3file (it merely copies the current behavior which is flawed anyway), and implement a new file inputting interface with only \IfFileExists, \InputIfFileExists and \input{}, but without unchecked input (the \input primitive) or \input@path (the latter is considered wrong even by the LaTeX developers); this new module should output and make available to the user the file name of the file currently being read, manage a file stack, provide hooks before/after file/package/class (like memoir and scrlfile), hooks that are called before or after every file, and maybe also the other functions of scrlfile (like replacing packages). Another thing that I had to implement in Python was the nomenclature: I want it to be automatically sorted, which is much more complex than usual index processing because the (mathematical) symbols that need to be sorted are complex combinations of Latin, Greek and symbolic characters.

> 
>> - A file stack should be implemented so that there is always information
>>   about the file currently being read.
> 
> At some point, yes, but at the moment this is one of the "where to put
> it" bits. We'll need some kind of higher-level loading mechanism in
> any case, so it might go there. (l3io is always going to be pretty low-
> level I suspect.)

I think this is very low-level indeed, just like \InputIfFileExists in LaTeX 2e. The system must always know which files are currently open and which are currently inputted, so this must be implemented at the lowest level.

> 
>> - GetIdInfo requires a certain version control system and should thus be
>>   removed sooner or later.
> 
> Not sure about this. The Team uses SVN for the code, and I doubt this
> will change, and so we need something to turn the $Id data into
> something to put into the output. What would you suggest as an
> alternative? (\GetIdInfo works with both SVN and CVS: do other version
> control systems use the $Id line but with differing syntaxes?)

This point comes in fact from a question by Will Robertson who wants to use Git for unicode-math. The general agreement there is that CVS-like properties are defective by design, should never be used, and will never be implemented in other systems.
http://stackoverflow.com/questions/39742/does-git-have-anything-like-svn-propset-svnkeywords-or-pre-post-commit-hooks


> 
>> - The specifier "d" is used, but not documented.
> 
> Should be removed totally: can you point out where this is? (We did
> our best but may have missed something.)

fgrep -n 'd:' *.dtx says that there is exp_not:d in l3expan.dtx:952. Perhaps "\exp_not" is sometimes a misnomer? \exp_not:o means "expand exactly once", and not "expand not". Using that convention, you could also define a macro \exp_not:x for exhaustive expansion. I think the meaning of the specifiers should not contradict the meaning of the macro name.

>> - There are no functions to convert between a bool variable and its
>>   string representation.
> 
> I wonder where this would be used. Do you mean something like
> \bool_display:N (to work a bit like \prop_display:N), or for
> typesetting the result (and if so, when would this be used)?

I mean a function that expands to the character sequence "true" when given a true boolean variable, and "false" otherwise.
Consider this real-world example:

\documentclass{minimal}

\usepackage{expl3}
\usepackage{xparse}
\usepackage{xtemplate}

\ExplSyntaxOn

\cs_new_nopar:Nn \bool_to_str:N {
  \bool_if:NTF #1 { true } { false }
}

\DeclareObjectType { test } { 0 }

\DeclareTemplateInterface { test } { test } { 0 } {
  test : boolean
}

\bool_new:N \l_test_bool

\DeclareTemplateCode { test } { test } { 0 } {
  test = \l_test_bool
} {
  \AssignTemplateKeys
  \bool_to_str:N \l_test_bool
}

\NewDocumentCommand \test { s } {
  \UseTemplate { test } { test }
    % { test = #1 }% fails
    { test = \bool_to_str:N #1 }
}

\ExplSyntaxOff

\begin{document}

\test \relax

\test*\relax

\end{document}

Probably xtemplate is to blame here (why is it not possible to assign a boolean variable to a boolean key?), but the \bool_to_str macro is quite handy in any case.

>> - There should exist a message class between "error" and "fatal" that
>>   stops reading the current input file, but does not stop the LaTeX run
>>   completely.
> 
> We recently re-worked some of the message handling stuff, and things
> did change a bit here. Apart from fatal errors we moved to messages
> sticking purely to that: giving a message. So any file-loading issues
> should be addressed separately (indeed, should we even have fatal
> messages: should have thought about this before).

Why not? The l3msg module is usable only if there are differences between the message classes. And a class between fatal and error seems to be much sought-after: A short grep over the latex tree shows some 500 occurrences of \input that don't start a new line, suggesting heavy use of this kind of errors. (For comparison, errors are only four times as popular, and infos are only a bit more popular).

>> - A macro that tests whether a token list contains exactly one token.
> 
> I can see where this is probably going, but can you give a scenario to
> use this in? (It's always handy to see what people want.)

That suggestions comes from a discussion on de.c.t.t:
http://groups.google.de/group/de.comp.text.tex/browse_thread/thread/dee4fed8bd0b9385
Although the exact token counting probably was not required for the OP, but a lot of tricky code (see the posts by Ulrich Diez) appear on that thread. Roughly the question is: how to distinguish between "a" and "{a}"? It is clear that the latter contains three tokens, but how to count them?
ATOM RSS1 RSS2
LISTSERV.UNI-HEIDELBERG.DE
Universität Heidelberg \| Impressum \| Datenschutzerklärung