## LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

 Options: Use Forum View Use Monospaced Font Show Text Part by Default Condense Mail Headers Message: [<< First] [< Prev] [Next >] [Last >>] Topic: [<< First] [< Prev] [Next >] [Last >>] Author: [<< First] [< Prev] [Next >] [Last >>]

```On 12/10/2011 03:59, Bruno Le Floch wrote:
>> I'm not really clear/keen on the 'save the regex' stuff. The result
>> seems to be we have 'N' argument functions which need a pre-compiled
>> regex, and 'n' ones which need a normal regex. I don't really like this,
>> and am really not sure it's necessary to provide optimisation in this
>> way. In the absence of use cases, I'm not sure about needing this type
>
> For short strings (e.g., matching \d\d\d\d-\d\d-\d\d on 2011-10-11),
> one third of the time is spent on building the automaton from the
> regular expression, and two thirds on running the automaton. I don't
> know how important that is in practice. Two aspects:
>
>   - providing it requires more code --- true
>
>   - the N arguments may be confusing (e.g., some people may think that
> it expects the regex as a string variable) --- not such a problem
> because the variable is checked to indeed be a proper compiled regex.
>
> If the feeling is that it should go, then I'll remove that this weekend.

At this stage, I'm simply raising the issue and hoping others have some
input. For any reused regex, there will be a speed gain even for small
strings, I guess. Perhaps adding 'fast', 'precom' or 'prebuilt' to the
names would be sensible, as this will keep the speed gains but avoid
name ambiguity.

(On use cases, I guess most strings will be small. I can see regexs
useful for things like validating input such as numbers and dates. A
more complex application might be syntax highlighting in code samples,
but even then we should be looking at relatively small blocks.)

>> since
>>     \regex_split:nnN { (\w+) } { Hello,~world! } \l_result_seq
>> gives
>>     {} {Hello} {,~} {world} {!}
>> if I understand you correctly. Maybe all that is needed is an example in the
>> documentation.
>
> I'll add \regex_extract_all:nnN this weekend.
>
>> Rather than \regex_extract_once:nnN, I would probably call it
>> \regex_extract_first:nnN.
>