LATEX-L Archives

Mailing list for the LaTeX3 project


Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Michiel Helvensteijn <[log in to unmask]>
Reply To:
Mailing list for the LaTeX3 project <[log in to unmask]>
Thu, 24 Oct 2013 15:43:30 +0200
text/plain (36 lines)
On Thu, Oct 24, 2013 at 3:06 PM, Paulo Roberto Massa Cereda
<[log in to unmask]> wrote:

> What sometimes comes back to haunt me is the fact the in a DFA we have a
> transition function which has to be total by definition, so we might end up
> with a converted DFA which has 2^n states in the worst case, an the number
> of transition might end up being way greater than the number of states.

In my experience, practical cases never get the exponential
state-space explosion. An example that does: "the language of strings
over the alphabet {0,1} in which there are at least n characters, the
n-th from last of which is 1" (Wikipedia). Keep in mind, though, that
the using a NFA for this (with n+1 states), does *not* get rid of this
complexity, but merely moves it from compile-time to run-time, where
2^n active paths will need to be remembered.

(No one should be using a regular expression for such a task anyway.)

> I think we have benefits and shortcomings in both situations. I'm pro DFA,
> but the construction is quite time-consuming.

I've been thinking about that. Why not memoize a
regex-to-minimized-DFA translation in an auxiliary file? That way you
only have to compile a new regex once, even across LaTeX runs.

By the way, I have to admit I'm not sure under which optimizations
sub-group capturing can survive.

> One possible alternative is to
> use a NFA and applying Thompson's construction or a similar algorithm

I assume this is what l3regex already does, to construct the NFA.