LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

Options: Use Forum View

Use Proportional Font
Show Text Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Message-ID:
Sender:
Mailing list for the LaTeX3 project <[log in to unmask]>
Subject:
From:
Michiel Helvensteijn <[log in to unmask]>
Date:
Thu, 24 Oct 2013 02:12:10 +0200
Content-Type:
text/plain; charset=ISO-8859-1
MIME-Version:
1.0
Reply-To:
Mailing list for the LaTeX3 project <[log in to unmask]>
Parts/Attachments:
text/plain (43 lines)
Hi all!  (and in particular, Bruno)

I've been exploring l3regex a bit more. Kudos on that one! It seems to
work well, and it will be very useful to me in the near future.

It would be even more useful if the public interface could expose just
a bit more of the lower level functionality, which brings me to my
feature request:

I'd like to match against a compiled regex, but feed it one token at a
time, rather than the entire token list at once. At each point in
between, I want to know whether a match is still possible. If not, I
want to go back one step and retrieve the captured groups (and perhaps
other available meta-data). From how the package is implemented, this
functionality should already exist, just not in the public interface.

My motivation behind this request is as follows: I'm still working on
a lexical analyzer package (already using it for personal documents;
lets me do some pretty cool things). And I now realize that in writing
my implementation I'm basically duplicating the effort that Bruno has
already gone through (and ending up with significantly less advanced
features, I might add). But to use l3regex I need to be able to supply
it with tokens while I scan.

Secondly, an implementation question (which might lead to a
discussion): You're now running a regex as a NFA (Nondeterministic
Finite Automaton), keeping track of all active branches while matching
against an input. Is there a particular reason you're not translating
it into a DFA (Deterministic Finite Automaton) during the compilation
phase, i.e., applying the powerset construction?

This would certainly speed up matching a great deal, and reduce memory
usage in most cases (except for those artificial ones that would
require an exponential number of states). It seems you're already
determined to restrict the package to regular languages, so a DFA
should always do the trick. It's possible I'm not aware of all the
complications, though. I'd be interested in hearing your thoughts.

Cheers!

-- 
www.mhelvens.net

ATOM RSS1 RSS2