On Thu, Oct 24, 2013 at 5:39 PM, Lars Hellström
<[log in to unmask]> wrote:

>> Interesting, I haven't come across the idea of a lazy powerset
>> construction before. But I don't think I fully understand what you are
>> proposing. Using "a family of csnames" suggests to me the general
>> technique of maintaining an input-output table for a pure function.
> I was thinking one csname per state of the DFA (subset of states of the
> NFA), that in turn acted as a function from input character to new state of
> DFA. This should work perfectly well in your one-token-at-a-time setting.

I was trying to determine what was 'lazy' about it. But I think I get
it now. You're only building states as you need them and leaving those
transition functions partially defined. You always try the DFA first,
but if you run into a dead end (no transition to fit the input), you
have to fall back to the NFA. Yes, that might work.

>> (2) I would also need the ability to merge new (N|D)FA's into an
>> existing one and mark the corresponding accepting states with an
>> identifier. I could then query this identifier when a match was found,
>> so I could know which original pattern(s) matched.
> Doable, but whichever way you do it, you effectively end up executing all
> the component automata in parallel. With DFAs, the new state space is the
> cartesian product of the component state spaces. With NFAs, you have a set
> of active branches for each component automaton. So if you're anyway feeding
> data to your automaton one token at a time, then you might as well feed each
> token to a sequence of automata instead.

Properly merging DFA's will gain you plenty if some rudimentary
optimizations are performed. But not so much for NFA's, as you say,
which is a shame. I've defined near 200 separate math-mode lexemes in
my thesis so far. I wouldn't want to start up 200 automata for every
potential occurrence. My current implementation uses a trie, i.e., a
very primitive DFA.