LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

Options: Use Classic View

Use Proportional Font
Show Text Part by Default
Show All Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Frank Mittelbach <[log in to unmask]>
Sat, 18 Jan 2003 19:19:02 +0100
text/plain (358 lines)
Hallo everybody

LICR = LaTeX Internal Character Representation

LICR objects are also known as "font encoding specific commands", ie stuff
that is declared by the font encoding files, eg

 \"  \ss \texteuro ...

since 1996 or so when inputenc was first introduced it was an open question
whether input chars mapped by inputenc should be "text", "math", "text+math"

in fact some commands like \copyright or \pounds have a 2.09 definition that
allows them to be used both in text and math, eg

\DeclareRobustCommand{\pounds}{%
   \ifmmode\mathsterling\else\textsterling\fi}

understandably some people wanted that for any LICR to work. however there are
serious limitations within TeX that do not really make this feasable.

nevertheless below is a .dtx file that tries an implementation as well as
explains the problems in this area.

i would be very much interested in comments concerning the approach the
feasibility or anything else you find useful to discuss in this respect, eg

 should such an implementation be incorporated (somehow not this one),
    - as a package,
    - as a standard,
    - not at all ...

frank


ps since the project web area is still under reconstruction i post the package below

------------ inpmath.ins
\input docstrip

\preamble

This is a generated file.

Copyright 2003
Frank Mittelbach

It may be distributed and/or modified under the
conditions of the LaTeX Project Public License, either version 1.2
of this license or (at your option) any later version.
The latest version of this license is in
   http://www.latex-project.org/lppl.txt
and version 1.2 or later is part of all distributions of LaTeX
version 1999/12/01 or later.

\endpreamble

\keepsilent

\generate{\file{inpmath.sty}{\from{inpmath.dtx}{package}}
          \file{inpmath-test.tex}{\from{inpmath.dtx}{test}}
          }

\endbatchfile
------------ inpmath.dtx
%
% \iffalse
%<*driver>
\documentclass{ltxdoc}
\usepackage[T1]{fontenc}
\usepackage{inpmath}
\GetFileInfo{inpmath.sty}
\title{Providing math support for  \texttt{inputenc}}
\date{\fileversion\space\filedate{} printed \today}
 \author{Frank Mittelbach}
\begin{document}
 \maketitle
 \tableofcontents
 \DocInput{inpmath.dtx}
\end{document}
%</driver>
% \fi
%
%
% \section{Introduction}
%
% Basically starting with the introduction of \texttt{inputenc} there
%    was the understandable wish to use any keyboard character in text
%    as well as in math.
%
% Unfortunately there is a limitation in \TeX{} which make this
%    anything but bullet-proof. Basically to find out if you are in
%    math mode one can use |\ifmode...\fi| and structure the encoding
%    specific commands so that they check whether or not being used in
%    math. However, this fails the moment such an encoding specific
%    command is used at the start of a math alignment. Why? Because in
%    such a place \TeX{} is scanning ahead to look for |\omit| or
%    |\noalign| and during this process is not in math mode
%    yet. However it will expand tokens until it is sure not to find
%    an |\omit| or |\noalign| thus when seeing the |\ifmmode| it will
%    happily select the wrong branch. Afterwards it will surround
%    everything with |$..$| and the result will be an error message or
%    worse.
%
%    The \TeX{}book even mentions this danger and suggests to use
%    |\relax\ifmmode| instead. WRONG! because the |\relax| will kill
%    any kerning or ligature that might exist between a previous
%    character and the character to typeset. Thus without an extension
%    to \TeX{} the program one can choose between three (bad)
%    solutions:
%    \begin{itemize}
%    \item Disallow the use in math (basically what the current
%    solution does)
%    \item Use |\relax\ifmmode| and loose the kerning in
%    \begin{quote}
%       \sffamily T\"ur (kerned) \\
%       T\relax\"ur (unkerned) \\
%       V\"A (kerned) \\
%       V\relax\"A  (unkerned) \\
%    \end{quote}
%    and similar places (can you spot\footnote{Well you can't with
%    ``T\"ur'' as in the current font it doesn't have any kerning:-)}
%    the difference?).
%    \item
%    Keep the definition expandable and fail if at the start of
%    \texttt{eqnarry} and the like.
%    \end{itemize}
%
%    The code below sets out to implement the third variant. If
%    somebody has any bright ideas how to detect (in an expandable
%    way) that one is at the start of an array row---shout!
%
%
%
% \subsection{The \LaTeXe{} model for encoding specific  text commands}
% \label{sec:currimp}
%
%    An encoding specific command |\foo| expands to three control sequences:
%    \begin{quote}
%       \ttfamily
%       \fbox{\bslash\emph{<enc>}-cmd} \fbox{\string\foo}
%       \fbox{\bslash\emph{<enc>}\string\foo}
%    \end{quote}
%    e.g.,
%\begin{quote}
%    |\"|  expands to  |\T1-cmd \"\T1\"|
%\end{quote}
%   Instead of |T1| a different encoding might appear, e.g., |OT1|,
%    |TS1|, etc. Which encoding is used depends on the order of
%    loading the definitions. It is not important except for speed.
%
%    The command \fbox{\ttfamily\bslash\emph{<enc>}-cmd} is either
%    equal to |\@current@cmd| (if the current encoding is |<enc>|) or
%    to |\@changed@cmd| if not.
%
%    |\@current@cmd| checks if typesetting should be done and if so
%    executes the third token, otherwise reinserts the second token
%    unexpanded and throws away the third.
%
%    |\@changed@cmd| also checks for typesetting action, but if
%    typesetting is called for it tries to find the right definition
%    for |\foo| in the current encoding (which should be saved in the
%    token \fbox{\ttfamily\bslash\emph{<curr-enc>}-cmd}). If that
%    doesn't exist it looks for a default definition stored in the
%    command \fbox{\ttfamily\bslash?-cmd}. If that also doesn't exist
%    it will generate a |\TextSymbolUnavailable| error.
%
%
%    The above roughly describes the mechanisms behind the font
%    encoding specific commands (ignoring a number of finer points
%    like the need to keep most of the stuff fully expandable etc.).
%
% \subsection{Extending the model for math}
%
%    Now to support math typesetting we introduce the following ideas:
%    \begin{itemize}
%    \item The glyph to typeset in math is independent of the current
%    text encoding, i.e., there exists at most one definition for the
%    LICR object |\foo| for use in math (in other words a sort of
%    ``math'' encoding).
%    \item
%      That definition is stored in the token \fbox{\ttfamily\bslash
%    Math\string\foo}.
%    \item
%      There is no direct support for composite objects in the LICR,
%      e.g., |\"| could be mapped to |\ddot| in math but for the LICR
%      object |\"a| no declaration possibilities exist.\footnote{Of
%    course, instead of \texttt{\bslash ddot} one could have a command
%    with an argument that picks up the ``a'' and thereby moves the
%    mapping to a sort of second level.} This is in line  with the
%    standard usages in math, but is mainly dictated by the choosen
%    implementation.
%    \item
%      We assign math meanings for objects of the LICR via
%    \begin{quote}
%     |\DeclareMathMeaning{\"}{\ddot}|
%    \end{quote}
%    where he name is more than open for suggestions :-)
%    \end{itemize}
%
%
%
%
% \section{The code}
%
%    \begin{macrocode}
%<*package>
\ProvidesPackage{inpmath}[2003/01/18 v0.0a test for math LICR objects]
%    \end{macrocode}
%
% \begin{macro}{\@current@cmd}
%    Redefinition of |\@current@cmd| means we have to check if we are
%    in math mode in case typesetting is called
%    for. |\@current@cmd@math| gets the encoding specific command as
%    an argument and we do a bit of |\expandafter| to get the
%    conditionals out of the way in case we need to pick up arguments.
%    \begin{macrocode}
\def\@current@cmd#1{%
   \ifx\protect\@typeset@protect
      \expandafter
       \@current@cmd@math
      \expandafter #1%
   \else
      \noexpand#1\expandafter\@gobble
   \fi}
%    \end{macrocode}
% \end{macro}
%
%
% \begin{macro}{\@current@cmd@math}
%    If in math mode we look if \fbox{\ttfamily\bslash
%    Math\string\foo} is defined, if not we raise an error.
%    Otherwise\ldots
%    \begin{macrocode}
\def\@current@cmd@math#1{%
  \ifmmode
    \expandafter
    \ifx\csname Math\string#1\endcsname\relax
% no math defined
       \@inmatherr #1%
    \else
%    \end{macrocode}
%    \ldots we execute this definition, but first we need to get the
%    two dangling |\fi|'s and the third token
%    \fbox{\bslash\emph{<enc>}\string\foo} from the original encoding
%    specific command expansion out if the way, this is done by
%    |\@current@cmd@math@[log in to unmask]
%    \begin{macrocode}
      \csname Math\string#1\expandafter\endcsname
      \@current@cmd@math@clear@input
    \fi
  \fi
}
%    \end{macrocode}
% \end{macro}
%
%
% \begin{macro}{\@current@cmd@math@clear@input}
%    This can probably be implemented better\ldots
%    \begin{macrocode}
\def\@current@cmd@math@clear@input#1#2#3{\fi\fi}
%    \end{macrocode}
% \end{macro}
%
%
% \begin{macro}{\@changed@cmd}
%    The original |\@changed@cmd| is quite complex as it has to do all
%    the work explained in section~\ref{sec:currimp}. We split it into
%    two now:
%    \begin{macrocode}
\def\@changed@cmd#1#2{%
   \ifx\protect\@typeset@protect
% do typesetting
%    \end{macrocode}
%    This is inefficient as we test for math mode twice, so a real
%    implementation should do better.
%    \begin{macrocode}
     \ifmmode
       \expandafter
       \@current@cmd@math
     \else
%    \end{macrocode}
%    If not in math we call |\x@changed@cmd| the main code of the
%    original |\@[log in to unmask]
%    \begin{macrocode}
       \expandafter
        \x@changed@cmd
     \fi
      #1#2%
   \else
      \noexpand#1%
   \fi}
%    \end{macrocode}
% \end{macro}
%
%
% \begin{macro}{\x@changed@cmd}
%    Nice piece isn't it? Explanation in the \LaTeX sources.
%    \begin{macrocode}
\def\x@changed@cmd#1#2{%
      \expandafter\ifx\csname\cf@encoding\string#1\endcsname\relax
         \expandafter\ifx\csname ?\string#1\endcsname\relax
            \expandafter\def\csname ?\string#1\endcsname{%
               \TextSymbolUnavailable#1%
            }%
         \fi
         \global\expandafter\let
               \csname\cf@encoding \string#1\expandafter\endcsname
               \csname ?\string#1\endcsname
      \fi
      \csname\cf@encoding\string#1%
         \expandafter\endcsname
}
%    \end{macrocode}
% \end{macro}
%
%
% \begin{macro}{\DeclareMathMeaning}
%    And here a trivial declaration command for testing, name and real
%    operation (which checks etc) should be changed
%    \begin{macrocode}
\def\DeclareMathMeaning#1#2{\global\@namedef{Math\string#1}{#2}}
%</package>
%    \end{macrocode}
% \end{macro}
%
%
%    \begin{macrocode}
%<*test>
\documentclass{article}

\usepackage{inpmath}

\usepackage[TS1,T1]{fontenc}
\usepackage[latin1]{inputenc}
\usepackage{trace}


\DeclareMathMeaning{\"}{\ddot}
\DeclareMathMeaning{\textsurd}{\surd}

\begin{document}

\typeout{\textsurd}

%\traceon

ä \textsurd

$ä \textsurd$
\stop
%</test>
%    \end{macrocode}
%
% \Finale
%
\endinput

ATOM RSS1 RSS2