LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

	LISTSERV Archives
	LATEX-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Forum View Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: l3regex feature request + a question about the implementation
From:	Lars Hellström <[log in to unmask]>
Reply To:	Mailing list for the LaTeX3 project <[log in to unmask]>
Date:	Thu, 24 Oct 2013 17:39:22 +0200
Content-Type:	text/plain
Parts/Attachments:	text/plain (51 lines)

Michiel Helvensteijn skrev 2013-10-24 13.43:
> On Thu, Oct 24, 2013 at 10:01 AM, Lars Hellström
> <[log in to unmask]>  wrote:
>> An optimal approach for your setting would perhaps be a lazy powerset
>> construction: don't carry it out for a particular state in the DFA until you
>> actually reach that state
during matching (should I perhaps have added for clarity)
>> , but then record the result; matching typically
>> doesn't reach all the states of the DFA, since you don't match against
>> arbitrary data. This, however, assumes that you have some efficient method
>> of memoising the results from lazy computations. (I suppose a family of
>> control sequence names could do the trick.)
>
> Interesting, I haven't come across the idea of a lazy powerset
> construction before. But I don't think I fully understand what you are
> proposing. Using "a family of csnames" suggests to me the general
> technique of maintaining an input-output table for a pure function.

I was thinking one csname per state of the DFA (subset of states of the 
NFA), that in turn acted as a function from input character to new state of 
DFA. This should work perfectly well in your one-token-at-a-time setting.

> But that would only memoize exact inputs; variations on the same
> pattern wouldn't get any benefit, and it wouldn't work in my setting
> anyway, where I get one token at a time. My current implementation
> uses a "trie" datastructure, which might be one step better than a
> simple table. But still not good enough by far, I think.
>
> When I first read the term "lazy powerset construction" in your post,
> I imagined restructuring the automaton on-the-fly somehow, which
> sounded pretty cool.

In my experience, most operations on automata tend to produce automata with 
epsilon-transitions, which one then at some cost (not cool) has to get rid 
of before executing them. Unless the operation does something even worse to 
the state set... :-(

> (2) I would also need the ability to merge new (N|D)FA's into an
> existing one and mark the corresponding accepting states with an
> identifier. I could then query this identifier when a match was found,
> so I could know which original pattern(s) matched.

Doable, but whichever way you do it, you effectively end up executing all 
the component automata in parallel. With DFAs, the new state space is the 
cartesian product of the component state spaces. With NFAs, you have a set 
of active branches for each component automaton. So if you're anyway feeding 
data to your automaton one token at a time, then you might as well feed each 
token to a sequence of automata instead.

Lars Hellström

ATOM RSS1 RSS2

LISTSERV.UNI-HEIDELBERG.DE
Universität Heidelberg \| Impressum \| Datenschutzerklärung