Subject: | |
From: | |
Reply To: | |
Date: | Mon, 10 Oct 2011 19:24:46 +0200 |
Content-Type: | text/plain |
Parts/Attachments: |
|
|
Bruno Le Floch skrev 2011-10-10 17.07:
> Hello all,
>
> We just added on CTAN two related modules: l3str (string manipulation)
> and l3regex (regular expression matching and replacement).
Without having looked at it, I'll still say: Wow!
[snip]
> Speed requirements forbid a back-tracking approach,
Does that mean you compile to a finite automaton? Then double wow!
> hence back-references cannot be supported. Only "truly regular" features are
> implemented.
I, for one, have no love for backreferences (a.k.a. The Feature from the
Black Lagoon).
> - I had the idea of providing # as a shorthand for .*? (arbitrary
> sequence of characters, lazy), mimicking what TeX does when finding a
> macro parameter. Is it useful?
For something called # in analogy with macros, one would probably also
expect some interaction with capturing parentheses. Feels like overkill to me.
> - Same question for caseless matching, and for look-ahead/look-behind
> assertions.
>
> - A facility for matching a balanced group (e.g., as xparse does for
> optional arguments)? That is non-regular, and is difficult to
> implement, so I will only look at it if it is really needed.
For parsing text where balancing matters, I would suggest using Parsing
Expression Grammars (instead of mimicing Perlish extensions to regexps):
most of the expressive power of BNFs (and then some), none of the ambiguity,
and capable of doing it in linear time!
Lars Hellström
|
|
|