On Thu, Oct 24, 2013 at 3:06 PM, Paulo Roberto Massa Cereda
<[log in to unmask]> wrote:
> What sometimes comes back to haunt me is the fact the in a DFA we have a
> transition function which has to be total by definition, so we might end up
> with a converted DFA which has 2^n states in the worst case, an the number
> of transition might end up being way greater than the number of states.
In my experience, practical cases never get the exponential
statespace explosion. An example that does: "the language of strings
over the alphabet {0,1} in which there are at least n characters, the
nth from last of which is 1" (Wikipedia). Keep in mind, though, that
the using a NFA for this (with n+1 states), does *not* get rid of this
complexity, but merely moves it from compiletime to runtime, where
2^n active paths will need to be remembered.
(No one should be using a regular expression for such a task anyway.)
> I think we have benefits and shortcomings in both situations. I'm pro DFA,
> but the construction is quite timeconsuming.
I've been thinking about that. Why not memoize a
regextominimizedDFA translation in an auxiliary file? That way you
only have to compile a new regex once, even across LaTeX runs.
By the way, I have to admit I'm not sure under which optimizations
subgroup capturing can survive.
> One possible alternative is to
> use a NFA and applying Thompson's construction or a similar algorithm
I assume this is what l3regex already does, to construct the NFA.

www.mhelvens.net
