On Thu, Oct 24, 2013 at 5:39 PM, Lars Hellström <[log in to unmask]> wrote: >> Interesting, I haven't come across the idea of a lazy powerset >> construction before. But I don't think I fully understand what you are >> proposing. Using "a family of csnames" suggests to me the general >> technique of maintaining an input-output table for a pure function. > > I was thinking one csname per state of the DFA (subset of states of the > NFA), that in turn acted as a function from input character to new state of > DFA. This should work perfectly well in your one-token-at-a-time setting. I was trying to determine what was 'lazy' about it. But I think I get it now. You're only building states as you need them and leaving those transition functions partially defined. You always try the DFA first, but if you run into a dead end (no transition to fit the input), you have to fall back to the NFA. Yes, that might work. >> (2) I would also need the ability to merge new (N|D)FA's into an >> existing one and mark the corresponding accepting states with an >> identifier. I could then query this identifier when a match was found, >> so I could know which original pattern(s) matched. > > Doable, but whichever way you do it, you effectively end up executing all > the component automata in parallel. With DFAs, the new state space is the > cartesian product of the component state spaces. With NFAs, you have a set > of active branches for each component automaton. So if you're anyway feeding > data to your automaton one token at a time, then you might as well feed each > token to a sequence of automata instead. Properly merging DFA's will gain you plenty if some rudimentary optimizations are performed. But not so much for NFA's, as you say, which is a shame. I've defined near 200 separate math-mode lexemes in my thesis so far. I wouldn't want to start up 200 automata for every potential occurrence. My current implementation uses a trie, i.e., a very primitive DFA. -- www.mhelvens.net