At 00.10 +0200 2001-06-11, I wrote:
>The main problem I see with context labels is that of when they should be
>attached, since one cannot do any context-dependent processing before the
>context is determined. I can think of at least three different models:
[snip]
>3. Have command-like markup for context-switching, but attach labels as
>part of the tokenization. This has the merit of looking like current LaTeX
>markup and allowing LaTeX to keep all ICR strings fully context-labeled,
>but it would also mean that processing of markup is a two-step process
>(first all language markup is processed, then all the rest). That doesn't
>feel right.
Here I was thinking of having the processing done by OCPs or something
similar. The problem with this is of course that these OCPs would have to
parse the input rather thoroughly to actually determine that a certain
command is markup for a context switch and not part of something else. In
particular one would need one such interpreting OCP for every set of
catcodes being used, since they otherwise almost certainly would get things
wrong. This is rather unrealistic, and having OCPs doing the interpretation
would probably also in effect be an invitation to syntax inconsistencies.
There is however another way of doing it, by introducing a mechanism which
generalizes \outer (thus I'm still in the game of imagining extensions to
TeX). Let's say a macro is `exceptional' if it uses this mechanism. Like
outer macros, an exceptional macro causes TeX to stop if it occurs in a
place where TeX is "absorbing tokens at high speed" (TeXbook p. 206), but
unlike outer tokens it doesn't make TeX report an error. Instead TeX should
make notes of everything it was currently doing an push that onto some
stack, after which it starts executing the replacement text of the macro;
in particular, it must be possible to make assignments. What the macro is
expected to do is to grab its arguments (with whatever catcodes, input
OCPs, language context, etc. in force that are needed for this) and then
return (using some new primitive) the resulting token list to TeX, after
which TeX resumes whatever processing was interrupted by the exceptional
macro.
With such a mechanism, one could in the infamous example
\newcommand{\foo}{\languageIC{manadrin}{\unichar{<Unicode code>}}}
have \languageIC being such an exceptional macro, and thus have the
\unichar{<Unicode code>} tagged as being mandarin *even in the replacement
text of \foo*! More fun one could have with this mechanism would be to
define a \verb command that _can_ be used in the arguments (or even
replacement texts) of commands! I suspect such a feature could be a useful
argument in convincing users untroubled by multilinguality problems to
switch to a new typesetting engine.
The crux of the matter is of course how much TeX would have to be changed
to allow such a mechanism. Seeing the exceptional macros wouldn't be a
problem, as TeX is already looking for outer macros. The mechanisms for
expanding the next token are already fairly reentrant, so I wouldn't expect
many problems there either. What could be tricky is actually executing
commmands (since TeX's main_control procedure is never called recursively),
but even that doesn't look like such a problem if we stay away from doing
typesetting; it seems after_token (related to \afterassignment) is the only
global variable that definitely must be saved away!
Could perhaps someone with experience of implementing and/or extending TeX
please comment on these ideas?
Lars Hellström
|