The following header lines retained to affect attribution:
|Date: Mon, 03 Mar 1997 12:14:03 +0100
|From: [log in to unmask] (Hans Aberg)
|Subject: Re: Shortref mechanism
|To: Mailing list for the LaTeX3 project <[log in to unmask]>
|Cc: "Randolph J. Herber" <[log in to unmask]>
|"Randolph J. Herber" <[log in to unmask]> writes:
|>| I cannot follow the details in your reasoning, but I can note that with
|>|deterministic parsing, the method generally used in LaTeX, conditional
|> ^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^^
|>|parsing have such limits.
|> ^^^^^^^
|>| But with non-deterministic parsing more general things can be done:
|> ^^^^^^^^^^^^^^^^^
|>| For example, I just made a definition command that can produce commands
|>|having optional arguments; in this general approach, I had to switch from
|>|LaTeX style deterministic parsing to non-deterministic parsing.
|>To someone that has written several small compilers and has studied automata
|>theory at the doctorate level, your word choice as high-lighted above is
|>quite jarring. By using a power automaton, a non-deterministic automaton
|>can be reduced to a deterministic automaton. Therefore, one does not gain
|>any power of expression by using a non-deterministic automaton, rather one
|>only gains compaction of the description.
Your following comments are not pertinent to my comments above.
You may feel that you are making up nomenclature to describe your
proposed algorithms. But, in fact, you are using nomenclature
with already assigned meaning in the field of computer language
processing.
Furthermore, you ``added insult to injury'' by deleting my
provision of the proper nomenclature from the field of computer
language processing that does pertain to your proposed changes
to TeX's handling of its input, to wit:
I believe that what you intended is the distinction of
context free and context sensitive languages. From
what I have read in the TeX book, the tokenizer of TeX
is context sensitive with a single character look-ahead
and the TeX language based on the recognized tokens is
context free.
It is a significant change in the behavior of the TeX
language to change it from being context free to being
context sensitive. But, it may be a necessary change.
Most modern computer languages are context sensitive
with a single token look ahead. A few look ahead two
tokens in some situations. I imagine that some look
ahead three tokens. Parser generators for single token
look ahead readily are availed.
What you are proposing is a change from zero token look
ahead to one token look ahead.
Please. would you use the proper nomenclature?
The pairing from your improper nomenclature to what I believe
is the pertinent nomenclature for what you are attempting to
discuss is:
deterministic ==> context free
non-deterministic ==> context sensitive
Computer languages have both structure, i.e. syntax, and meaning,
i.e. semantics. Computer languages that are studied for their
syntactical properties might not have associated semantics. All
others do have semantics. Many computer languages are context
free in their syntax eventhough they have semantics and therefore
context among the language elements because of those semantics.
I believe that TeX (with the exception of the one character
look ahead in its tokenizer which is used to locate the
termination of tokens) is context free __in its syntax.__
This does not mean that TeX does not have semantics nor does
it mean that these semantic elements do not have context
among the various semantic elements.
I believe that Frank Mittelbach's point and position (not
``problem,'' as you say) is that changing TeX from a context
free to a context sensitive syntax (grammar, if you wish)
is too large of a change to be considered.
| This reasoning would be true in any sufficiently general purpose, but
|TeX is not such a language (or it is unknown if it is).
My observations above pertain to all languages which have syntax.
TeX is a language which has syntax. Therefore, it is such a language.
| The second thing is that, even though something may be theoretically
|possible, it may be practically impossible, because you simply do not have
|time to both doing that implemntation, and pay your bills.
This is Frank Mittelbach's point as I understand it.
| The third thing one must consider, is that a computer language is not
|only used to manipulate logical data, but logical data that has a semantic
|interpreation attached to it. Any logical transformation must keep track of
|that semantic interpretation, and this is related to the practicality
|question, I guess.
At the syntax level of language processing, the semantics do
not pertain. At the semantics level of language processing,
semantics is the entire purpose of the processing. Any
compiler or interpreter that does any semantics processing
must handle the semantics processing that is dictated by the
specifications indicated by the semantics associated with
the syntactical elements.
| With TeX the problem is this:
| You have a variable #1 equal to some parameter text, say ##1##2.
|When #1 pick up an argument, in the first pass, an argument of the form
| {section}{theorem}
|will be transformed into
| sectiontheorem,
|so, when writing an deterministic parser by hand, on puts back the argument
|to the next command as {##1}{##2}, say if you want to put it back all. Now,
|working in this generality, there is no obvious way of transforming
| #1 --> #1_new
|by a command doing
| ##1##2 --> {##1}{##2}
|implicitly.
I have written several compilers and know how to process
context sensitive grammars.
| By reverting to non-deterministic parsing, one can get around this
|problem, by first picking up some text that surely contains the original
|##1##2, and then sending this original text to the next command, instead of
|the partially parsed by #1 (which may be corrupted).
This passing along, unchanged, those syntactical elements which
have been determined by the grammar to belong to following
elements is part of the processing that occurs in a context
sensitive parser.
| But this does not solve Frank Mittelbach's problem, as he pointed out.
Unless you consider Frank Mittelbach's lack of interest in
redesigning or reimplementing or a lack of resources to redesign
and reimplement TeX's syntax processing as ``Frank Mittelbach's
problem,'' Frank Mittelbach does not have a problem here. I do
not have a problem here; computer languages are a major portion
of my education and work.
|>I believe that what you intended is the distinction of context free and
|>context sensitive languages. From what I have read in the TeX book, the
|>tokenizer of TeX is context sensitive with a single character look-ahead
|>and the TeX language based on the recognized tokens is context free.
| TeX is highly context sensitive, and this is much of the point with TeX:
|Each environment or grouping has its own set of local variables, which can
|be used to change the context rather radically. This is unrelated to the
|stuff I discussed above.
Please read my comments above. There is a major, significant
difference between syntax and semantics. I do not deny that
TeX is quite sensitive to the semantic context of the material
it processes. It would not be useful if it were not so. This
does not prevent TeX from having a context free grammar.
| Hans Aberg
Deciding whether TeX should have a context free or a context
sensitive grammar is an appropriate topic for this forum.
Since context sensitive grammars tend to be more complex and
to use more computer resources to process, I believe the
TeX developers will not change the grammar of TeX in such a
way as to make TeX's grammar context sensitive.
Randolph J. Herber, [log in to unmask], +1 630 840 2966,
CD/OSS/CDF CDF-PK-149O Mail Stop 234
Fermilab, Kirk & Pine Rds., P.O. Box 500, Batavia, IL 60190-0500.
(Speaking for myself and not for US, US DOE, FNAL nor URA.)
(Product, trade, or service marks herein belong to their respective owners.)
N 41 50 26.3 W 88 14 54.4 and altitude 700' approximately, WGS84 datum.
|