On Thu, Oct 24, 2013 at 3:06 PM, Paulo Roberto Massa Cereda
<[log in to unmask]> wrote:

> What sometimes comes back to haunt me is the fact the in a DFA we have a
> transition function which has to be total by definition, so we might end up
> with a converted DFA which has 2^n states in the worst case, an the number
> of transition might end up being way greater than the number of states.

In my experience, practical cases never get the exponential
state-space explosion. An example that does: "the language of strings
over the alphabet {0,1} in which there are at least n characters, the
n-th from last of which is 1" (Wikipedia). Keep in mind, though, that
the using a NFA for this (with n+1 states), does *not* get rid of this
complexity, but merely moves it from compile-time to run-time, where
2^n active paths will need to be remembered.

(No one should be using a regular expression for such a task anyway.)

> I think we have benefits and shortcomings in both situations. I'm pro DFA,
> but the construction is quite time-consuming.

I've been thinking about that. Why not memoize a
regex-to-minimized-DFA translation in an auxiliary file? That way you
only have to compile a new regex once, even across LaTeX runs.

By the way, I have to admit I'm not sure under which optimizations
sub-group capturing can survive.

> One possible alternative is to
> use a NFA and applying Thompson's construction or a similar algorithm

I assume this is what l3regex already does, to construct the NFA.

-- 
www.mhelvens.net