Michiel Helvensteijn skrev 2013-10-24 02.12: > Secondly, an implementation question (which might lead to a > discussion): You're now running a regex as a NFA (Nondeterministic > Finite Automaton), keeping track of all active branches while matching > against an input. Is there a particular reason you're not translating > it into a DFA (Deterministic Finite Automaton) during the compilation > phase, i.e., applying the powerset construction? Not knowing the details of the implementation, I anyway hazard the guess that this is because running the NFA tends to be faster in practice. Sure, when executing an NFA you need to keep track of all active branches for this particular input, but in carrying out the powerset construction you need to consider any possible set of active branches for any possible input! You need to have run the match against a rather long string before NFA matching work starts to catch up with the work required by the powerset construction. So unless you plan to run a single regexp against a very large amount of text, it's in all likelyhood not worth it. Even in cases such as yours, where you actually do mean to run the same regexp against lots of text, it's not entirely clear that it's worth it. The reason is that once the machinery is there, it is easy to underestimate how large the NFA is, and the exponential computational complexity of the powerset construction really penalises even small NFA size increases. At the same time, it's easy to overestimate the amount of text a matching on average gets applied to. An optimal approach for your setting would perhaps be a lazy powerset construction: don't carry it out for a particular state in the DFA until you actually reach that state, but then record the result; matching typically doesn't reach all the states of the DFA, since you don't match against arbitrary data. This, however, assumes that you have some efficient method of memoising the results from lazy computations. (I suppose a family of control sequence names could do the trick.) Lars Hellström