Bruno Le Floch
Tue, 15 Feb 2011 03:22:06 -0500
> My personal opinion on uppercasing/lowercasing is that it should be a
> property of the font;

Both Will and Frank agree on this, but currently, in many fonts that's not
possible. Also, it is in fact possible to have an algorithm to produce
the result of {replacing some tokens by a corresponding macro} in a given tl.
Yes, macro: it can even take arguments.

For instance, with my current code (using a specific "case table"),

    \expandsome{A\foo BC{\expandthis\foo{\B\expandthis\foo{A}} \D\E} !}

will expand in two steps to

    A\foo BC{arg={\B arg=A.}. \D\E} !

Also, we now have \expandafter:nw which expands the token after its
argument before carrying on with the argument. It works by
"\expandafter-casing" the first argument, namely, replacing every
token by "\expandafter<token>" (including braces and spaces).

> In terms of the algorithms to perform these operations, I prefer the way
> Joseph's code executes (e.g., keeping the number of csnames low) but I
> prefer the extensibility of Bruno's (although I suspect Bruno's is faster --
> but a better question to ask is whether Joseph's is too slow).

After some work, I realized that there are two points:
(1) whether to use many macros, or look at a bunch of cases for each character.
(2) whether to be careful with braces and spaces or not.

The second point allows us do what I mentionned above. The first point
is not necessary for this extensibility, and it will only play a role
in speed issues. We are talking about defining (26 + #accents) macros
for uppercase, and the same number for lowercase (although I guess
that with UTF8, this can become much bigger).

For a typical input (sentences, braced stuff) of 5000 tokens, with
\tracingall, a wordcount (lines, words, bytes) gives:

 2128102  7187359 67511843 Joseph-ULcase.log
  230901  1161005  8589159 ULcase.log

where ULcase.log is my current version with brace and space checking, and
Joseph-ULcase.log has no brace checking. My version could be optimized
significantly (2-3x) by using the fact that the replacement that we want for
each token takes no argument, but as I said, I want to stay general,
because it becomes much more powerful.

> something like "\prg_case_str:nVn {#1} \g_uc_replacements_tl { <else> }".

I think that it would work. And in fact Joseph's way combined with
some ideas I have had will allow us to have a

\tl_expand_some:nn {abca} { {a} {A} {b} {\use_ii_i:nn} }   => AAc

And in fact, we _should_ be able to replace #text at definition time
as well, allowing 9 _named_ arguments (I'm not taking this very
seriously ;-) ). Namely, replace #first by #1 and #second by #2 in the

\keyworddef\foo#first#second{arg1 is #first, arg2 is #second}

I don't know where I should put the code, so it is at
Note that it is really just a plain TeX file with no \bye, compilable
with pdftex, pdflatex, etc.