LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

Options: Use Forum View

Use Proportional Font
Show Text Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Sender:
Mailing list for the LaTeX3 project <[log in to unmask]>
Date:
Sun, 16 Oct 2011 21:56:05 -0400
Reply-To:
Mailing list for the LaTeX3 project <[log in to unmask]>
Subject:
MIME-Version:
1.0
Message-ID:
In-Reply-To:
Content-Type:
text/plain; charset=ISO-8859-1
From:
Bruno Le Floch <[log in to unmask]>
Parts/Attachments:
text/plain (54 lines)
> I wonder if \regex_set:Nn would be better as \regex_save:Nn. My
> reasoning is that \<var>_set:Nn functions are used with a variable of
> name \l_..._<var>, while here we are (currently) naming as \l_..._tl.

Thinking about this, I feel \regex_const:Nn would make most sense,
since regular expressions would typically not be a "dynamic" variable.

> I'm not sure about the approach on submatches. You say
>
> % Submatches with numbers higher than $10$ are accessed in the same way,
> % namely |\10|, |\11|, \emph{etc}. To insert in the replacement text
> % a submatch followed by a digit, the digit must be entered using the
> % |\x| escape sequence: for instance, to get the first submatch followed
> % by the digit $7$, use |\1\x37|, because $7$ has character code |37|
> % (in hexadecimal).
>
> I wonder how likely it is that we'll need more than 9 submatches in the
> sort of scenario that l3regex is likely to applied in. TeX programmers
> are already used to the idea that we have up to 9 numbered parameters,
> so why not limit to nine submatches and avoid the need to use "\x" syntax?

I agree that \x37 is pretty awkward. But contrarily to macro
arguments, the number of submatches can grow pretty quickly. One case
where there can be more than 9 submatches is recognizing a date:

  \regex_const:Nn \c_date_regex
    { ((Jan)(uary)?|(Feb)(ruary)?|...) \ (\d\d?) }

Then "\2\4\6\8\10\12\14\16\18\20\22\24" gives the three first letters
of the month that was found, and "\26" is the day of the month. I
guess that would be done better with

  \regex_const:Nn \c_date_regex
    { (Jan(?:uary)?|Feb(?:ruary)|Mar(?:ch)|...) \ (\d\d?) }

then extracting the first three letters of \1 for the month, and \2
now holds the day. Note that I had to use non-capturing groups (?: ...
), otherwise the last submatch would be \14, again too large.

The best for that particular example may be that I finally implement
(?| ... ) groups, namely, non-capturing groups where the submatch
number is reset for each alternative

  \regex_const:Nn \c_date_regex
    { ( (?|Jan(uary)?|Feb(ruary)?|...) ) \ (\d\d?) }

Then the interesting submatches are \1 and \3.

All in all, I don't know what's best. Perhaps provide a \g{...} (or
whatever is standard) for submatches > 9 ?

--
Bruno

ATOM RSS1 RSS2