Hello Joel, I promised to go back to you earlier but didn't, sorry about that. I'm replying to two emails in one, and the result is somewhat long, hopefully helpful. > I've been developing my xpeek package [...] > see <https://github.com/jcsalomon/xpeek>. I see that you use the "NPC" prefix in xpeek, probably because of some code I had written (back when you were asking for a \NewPeekCommand command). It may be better to use xpeek as a prefix: since there can be no two packages on CTAN with the same name, using that name as a prefix for internal commands should avoid clashes. Furthermore, it would be best if you use the convention \__xpeek_... for internal commands, and \l__xpeek_... for internal variables. You probably don't have any public code-level functions \xpeek_... or variables \l_xpeek_..., but this would be the conventional beginning. To make the internal convention more convenient and shorter to type, we recently introduced l3docstrip. Replace docstrip by l3docstrip, and replace "xpeek" (or "NPC") by "@@" in all names. Then add % \begin{macrocode} %<@@=xpeek> % \end{macrocode} near the start of the implementation section (see e.g., some l3kernel modules for a model). This change will make it very easy to change the module name if needed, will make the code shorter, and will make the command names less accessible from outside. > \textit{foof}\xspace. > \textit{foof}\xspace! > > Thinking about the problem, it seems I need the ability to scan ahead > in the input stream, ignoring tokens from one list while looking for > tokens from another. In Expl3 terms, I’m hoping to define something > like `\peek_inlist_ignore_auxlist:nnTF`. It should be \xpeek, or \@@ (transformed to \__xpeek), not \peek in any case :). I think it is very important not to use the kernel namespace even when the command name would make more sense with such a name. For instance, in randomwalk.sty I have \@@_int_set_to_random:Nnn, not \int_set_to_random:Nnn. > \peek_ignore_list:N \ignorelist > `\l_peek_token' This syntax is impossible to acheive since \peek_ignore_list:N has no way to know where the `\l_peek_token' "argument" is supposed to end. > The direction I’m considering is to read ahead, consuming tokens. Each > token read is added to a save-list and compared to the ignore-list. If > it’s on the ignore-list, continue; otherwise put the save-list back on > the input stream and stop. > > Does this sound reasonable so far? Somewhat reasonable, yes. I'm not sure what the best approach is. You need to collect the tokens in your ignore list, and you then need to perform an action depending on the next token. It is possible to define \xpeek_collect_do:nn, whose first argument is a list of tokens to ignore, whose second argument is some operation to perform, which will receive as an argument the tokens: \xpeek_collect_do:nn { abc } { \foo \bar } caada => \foo \bar { caa } da Assuming that we have this function (see below for an implementation), and that the following token (the first which is not collected) has its meaning copied to \l_peek_token (like any \peek function), then we can built a \nextnonpunct as \DeclareDocumentCommand { \nextnonpunct } { } { \xpeek_collect_do:nn { .,!? } { ` \l_peek_token ' \use:n } } where the \use:n unbraces whatever punctuation \xpeek_collect_do:nn has collected. How is \xpeek_collect_do:nn implemented? I'm introducing a quark just to have a macro different from anything you may see when peeking ahead: then \peek_meaning:NF always takes the F branch. Not happy about that hack. \quark_new:N \q_@@ \bool_new:N \l_@@_ignore_bool \cs_new_protected:Npn \xpeek_collect_do:nn #1#2 { \@@_collect_do:nnnn { #1 } { #2 } { } { } } \cs_new_protected:Npn \@@_collect_do:nnnn #1#2#3#4 { \peek_meaning:NF \q_@@ { \bool_set_false:N \l_@@_ignore_bool \tl_map_inline:nn {#1} { \token_if_eq_charcode:NNT \l_peek_token ##1 { \bool_set_true:N \l_@@_ignore_bool \tl_map_break: } } \bool_if:NTF \l_@@_ignore_bool { \@@_collect_do:nnnn {#1} {#2} { #3#4 } } { #2 { #3#4 } } } } > To consume tokens one-by-one, I built this function: > > \cs_new_protected:Npn \peek_meaning_really_remove:NTF #1 #2 #3 > { > \peek_meaning_remove:NTF #1 > { #2 } > { > \peek_meaning_remove:NT \l_peek_token > { #3 } > } > } Well, that would remove tokens, not collect them. > (This should be created via \prg_new_conditional, but I haven’t yet > figured that out.) It is (pretty much?) impossible to define peek-like functions as conditionals. > Is the direction I'm taking appropriate for what I'm trying to do? Yes. > Is there some existing functionality that would help that I'm overlooking? Not really. I think we should add \peek_after:nw to cover my use of \peek_meaning:NF \q_@@ in the code above. That would make the code reasonably clean. I've added this function to l3trial/l3kernel-extras, not on CTAN, only on the SVN repository. One correct long-term approach would be to provide a parser for some class of grammar, but that is extremely hard in TeX (the regular expression parser l3regex took me about 4 months of hard work). So don't expect this any time soon. At least for now, I think the \xpeek_collect_do:nn code I give above is (up to a few improvement) a reasonable approach to practical situations where someone wants to look ahead in the input stream. So I'd say, provide \xpeek_collect_do:nn or a similar functionality as a public code-level function in your xpeek package. On 7/30/12, Joel C. Salomon <[log in to unmask]> wrote: > After some experimentation, it seems that the \peek_* family of > functions don't work well inside l3prg conditionals; source3.pdf seems > to bear this out in the justification for \__peek_def:nnnn. Indeed: consider \prg_new_conditional:Npnn \foo:n #1 { TF } { \prg_return_true: } This is (currently) equivalent to \cs_new:Npn \foo:nTF #1 { \prg_return_true: \c_zero } and the \prg_return_true: \c_zero combination is equivalent to \use_i:nn (see definition of \prg_return_true:), which selects the true branch and discards the false branch. Note how the \foo:nTF macro only takes one argument: the other two "arguments" are left in the input stream until the last moment, where \prg_return_true/false: selects one of the two. The problem with peek functions is that they need to see past those conditional branches in the input stream. Thus, \peek_meaning:NTF is roughly \cs_new_protected:Npn \peek_meaning:NTF #1#2#3 { \cs_set_eq:NN \l__peek_search_token #1 \cs_set_nopar:Npx \__peek_true:w { \exp_not:n {#2} } \cs_set_nopar:Npx \__peek_false:w { \exp_not:n {#3} } \peek_after:Nw \__peek_meaning: } \cs_new_protected_nopar:Npn \__peek_meaning: { \token_if_eq_meaning:NNTF \l__peek_search_token \l_peek_token { \__peek_true:w } { \__peek_false:w } } The T and F arguments must be taken out of the input stream, stored into dedicated functions \__peek_true:w and \__peek_false:w, and put back after the test. > On TeX.SE, Clemens Niederberger posted an answer to the specific > question I'd posed; see <http://tex.stackexchange.com/a/64351/2966>. > It works well, but it's built on recursive expansion of macros with :w > specifiers that I'm really not understanding. I'm thinking, therefore, > that I'm better off getting help implementing the functionality I want > in parts. I suspect his solution is needlessly complicated (he seems to test if the token is in the ignore list in a roundabout way). > What sorts of restrictions are there on the use of \l_peek_token > inside the true-code & false-code branches of the \peek_* functions? None, as far as I know. > Is it reasonable to use \__peek_def:nnnn to generate something like > \peek_unconditional:TF? (The false-code branch should never execute, I > expect.) Definitely not. \__peek_def:nnnn is internal, and may change at a whim. We have been careful to mark internal functions as such, and make no guarantee whatsoever that they will remain. The function you want is \peek_after:nw (see l3kernel-extras), and for now, you can use your own copy \tl_new:N \l__xpeek_code_tl \cs_new_protected:Npn \xpeek_after:nw #1 { \tl_set:Nn \l__xpeek_code_tl {#1} \peek_after:Nw \l__xpeek_code_tl } > Actually, it's \peek_unconditional_remove:T I think I need. I don't think you need that one since the token should be kept somewhere. The copy \l_peek_token is not appropriate, since that control sequence will later be changed to the next token in the input stream. Think of \l_peek_token as a pointer (that's almost not a lie), which TeX can unfortunately not dereference. > \tl_new:N \g_jcs_matchlist_tl > \tl_new:N \g_jcs_ignorelist_tl > \tl_new:N \l_jcs_ignored_tokens_tl > > \cs_new:Npn \jcs_peek_in_matchlist_ignore_ignorelist:TF #1#2 > { > \tl_clear:N \l_jcs_ignored_tokens_tl > \__jcs_peek_in_matchlist_ignore_ignorelist_aux:TF {#1}{#2} > } Braces missing. > \cs_new:Npn \__jcs_peek_in_matchlist_ignore_ignorelist_aux:TF #1#2 > { > \peek_unconditional_remove:T > { > \tl_if_in:N?TF \g_jcs_ignorelist_tl { something involving > \l_peek_token } Not possible, unfortunately. You have to map through \g_jcs_ignorelist_tl, comparing \l_peek_token to each token in the ignorelist (see code for \xpeek_collect_do:nn above). > { > \tl_put_right:N? \l_jcs_ignored_tokens_tl { something > involving \l_peek_token } > keep looking, probably by recursing Yes, that's roughly what I'm doing. I'm storing the tokens as macro arguments #3 and #4 of \@@_collect_do:nnnn, but that's not very sensible, storing in a token list is better. > \tl_use:N \l_jcs_ignored_tokens_tl > \tl_if_in:N?TF \g_jcs_matchlistlist_tl { something > involving \l_peek_token } > {#1} {#2} Again, \tl_if_in is not useable here. You should probably define an auxiliary test \prg_new_protected_conditional:Npnn \@@_if_in:NN #1#2 { TF } { \bool_set_false:N \l_@@_bool \tl_map_inline:Nn #1 { \token_if_eq_charcode:NNT #2 ##1 { \bool_set_true:N \l_@@_bool \tl_map_break: } } \bool_if:NTF \l_@@_bool { \prg_return_true: } { \prg_return_false: } } Used as \@@_if_in:NNTF \g_jcs_ignorelist_tl \l_peek_token { } { }. > Does this sound like the correct path to head down? Yes. Best regards, Bruno