LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

Options: Use Classic View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Bruno Le Floch <[log in to unmask]>
Sat, 12 Feb 2011 21:01:26 -0500
text/plain (100 lines)
Hello, and sorry for the long title (useful perhaps for searching
purposes later on).

There was recently a question on tex.stackexchange about writing a purely
expandable version of LaTeX2e's \MakeUppercase. Joseph Wright and me
posted two answers with different interpretations of uppercasing, and he asked
me to transfer the discussion to this list. For the code, see
         http://tex.stackexchange.com/questions/10805/
and in particular our two answers.

His method yields
"\Uppercase{Som{e } {te{x}t} with $math$.}" -> "SOMe  te{x}t WITH $math$."
Mine yields:
"\Uppercase{Som{e } {te{x}t} with $math$.}" -> "SOM{E } {TE{X}T} WITH $MATH$."

Two questions:
- what precise behaviour do we want an uppercase function to have? Note that
we could even provide hooks to let the user choose. (See near the
bottom of this
long email.)
- what do you think of the advantages/drawbacks described below?


== Joseph's way: (correct me if I didn't understand your code properly)
- Time: ~50*NL,
where L~26 is the number of letters and various accent tokens (\ae,\oe,etc),
and N is the length of the string to be uppercased.
- Number of expansions: O(NL)?
- Braces disappear, and protect their argument against uppercasing.
- Spaces are dropped at the start and end, kept in the middle.
- The stuff between dollars is kept.
- It expands its argument?

- It does not pollute the macro namespace.

It relies on comparing the current token with a, then b, etc., until
z, for each token, and replacing it by the uppercase letter. If the
token is not found, we keep it. The function that does the replacement
looks like

\prg_case_str:nnn {#1} {
      { a } { A }
      { b } { B }
      ...
    } {#1}

So it has L lines, and is difficult to patch (i.e. if the user wants
to add his custom accent, with a given uppercase behaviour, then he
has to redefine the whole function). Although, I don't understand
Joseph's code enough yet to be sure of this.



== My way:
- Time: ~100*N^2.
- Number of expansions: 2. (thanks to an \ifcsname hack)
- All spaces and braces are kept, but braces don't protect against
uppercasing (can be changed).
- Dollars could be taken care of.
- It does not expand the argument at all.
- It pollutes the macro namespace: uses L~26 macros.

It relies on having one macro for each token that should be transformed
by the case change. Namely, for uppercase, we would have defined the
following case table:
...
\tl_new:cn{UL_table_u_m}{M} ...
\tl_new:cn{UL_table_u_\string\ae}{\AE}
...
Then we read the tokens one by one. Say we see "\oe". If \UL_table_u_\oe
is defined, then we use it. Otherwise, we put \oe.


== Hooks
It should not be too hard to give hooks to the user so that he can
- decide the behaviour of braces
- define some commands that "do things" (e.g. protect their argument against
uppercasing)
- others?

== Final comments on namespace pollution

I don't know if time is an issue or not there, and whether having more macros
introduces an unacceptable overhead. Several times in the past, when trying
to convert from a list of tokens to another, I found that putting each
token in a
\csname construction, and defining one macro per token made things very much
easier. Possible issue: after `\let\?=?` and `\escapechar=-1\relax`, one cannot
distinguish between `\?` and `?`.

This idea of defining macros rather than comparing with a list of tokens makes
the second method easily customizable: the user can define arbitrary
"case-change"
tables by setting the relevant macros \UC_table_mytable_<token>. That would
lead to a "static" variant of \prg_case_str:nnn.

Best regards,
Bruno
@Joseph: were you thinking of the expansion control part?

ATOM RSS1 RSS2