Mon, 10 Oct 2011 19:24:46 +0200
text/plain; charset=ISO-8859-1; format=flowed
Bruno Le Floch skrev 2011-10-10 17.07:
> Hello all,
> We just added on CTAN two related modules: l3str (string manipulation)
> and l3regex (regular expression matching and replacement).
Without having looked at it, I'll still say: Wow!
> Speed requirements forbid a back-tracking approach,
Does that mean you compile to a finite automaton? Then double wow!
> hence back-references cannot be supported. Only "truly regular" features are
I, for one, have no love for backreferences (a.k.a. The Feature from the
> - I had the idea of providing # as a shorthand for .*? (arbitrary
> sequence of characters, lazy), mimicking what TeX does when finding a
> macro parameter. Is it useful?
For something called # in analogy with macros, one would probably also
expect some interaction with capturing parentheses. Feels like overkill to me.
> - Same question for caseless matching, and for look-ahead/look-behind
> - A facility for matching a balanced group (e.g., as xparse does for
> optional arguments)? That is non-regular, and is difficult to
> implement, so I will only look at it if it is really needed.
For parsing text where balancing matters, I would suggest using Parsing
Expression Grammars (instead of mimicing Perlish extensions to regexps):
most of the expressive power of BNFs (and then some), none of the ambiguity,
and capable of doing it in linear time!