On Thu, Oct 24, 2013 at 3:06 PM, Paulo Roberto Massa Cereda <[log in to unmask]> wrote: > What sometimes comes back to haunt me is the fact the in a DFA we have a > transition function which has to be total by definition, so we might end up > with a converted DFA which has 2^n states in the worst case, an the number > of transition might end up being way greater than the number of states. In my experience, practical cases never get the exponential state-space explosion. An example that does: "the language of strings over the alphabet {0,1} in which there are at least n characters, the n-th from last of which is 1" (Wikipedia). Keep in mind, though, that the using a NFA for this (with n+1 states), does *not* get rid of this complexity, but merely moves it from compile-time to run-time, where 2^n active paths will need to be remembered. (No one should be using a regular expression for such a task anyway.) > I think we have benefits and shortcomings in both situations. I'm pro DFA, > but the construction is quite time-consuming. I've been thinking about that. Why not memoize a regex-to-minimized-DFA translation in an auxiliary file? That way you only have to compile a new regex once, even across LaTeX runs. By the way, I have to admit I'm not sure under which optimizations sub-group capturing can survive. > One possible alternative is to > use a NFA and applying Thompson's construction or a similar algorithm I assume this is what l3regex already does, to construct the NFA. -- www.mhelvens.net