Print

Print


Bruno Le Floch skrev 2011-10-10 17.07:
> Hello all,
>
> We just added on CTAN two related modules: l3str (string manipulation)
> and l3regex (regular expression matching and replacement).

Without having looked at it, I'll still say: Wow!

[snip]
> Speed requirements forbid a back-tracking approach,

Does that mean you compile to a finite automaton? Then double wow!

> hence back-references cannot be supported. Only "truly regular" features are
> implemented.

I, for one, have no love for backreferences (a.k.a. The Feature from the 
Black Lagoon).

> - I had the idea of providing # as a shorthand for .*? (arbitrary
> sequence of characters, lazy), mimicking what TeX does when finding a
> macro parameter. Is it useful?

For something called # in analogy with macros, one would probably also 
expect some interaction with capturing parentheses. Feels like overkill to me.

> - Same question for caseless matching, and for look-ahead/look-behind
> assertions.
>
> - A facility for matching a balanced group (e.g., as xparse does for
> optional arguments)? That is non-regular, and is difficult to
> implement, so I will only look at it if it is really needed.

For parsing text where balancing matters, I would suggest using Parsing 
Expression Grammars (instead of mimicing Perlish extensions to regexps): 
most of the expressive power of BNFs (and then some), none of the ambiguity, 
and capable of doing it in linear time!

Lars Hellström