LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Joseph Wright <[log in to unmask]>
Reply To:
Mailing list for the LaTeX3 project <[log in to unmask]>
Date:
Sun, 8 Feb 2015 12:36:05 +0000
Content-Type:
text/plain
Parts/Attachments:
text/plain (158 lines)
Hello all,

A few months ago now we added various expandable case changing functions
to expl3 with clearly 'experimental' status. I've recently had some
useful feedback on aspects of the behaviour and have revised some of the
code. I've now got some more questions, so thought it would be useful to
raise those here. (Note: I've updated the SVN code but this has yet to
go to CTAN. I can arrange a release if people want to test but not grab
via GitHub.)

*Background*

The current implementation has six functions

  \tl_upper_case:n
  \tl_lower_case:n
  \tl_mixed_case:n
  \tl_upper_case:nn
  \tl_lower_case:nn
  \tl_mixed_case:nn

where the two-argument versions deal with language-specific case
changing. The functions are x-type expandable. 'Letters' can be case
changed from the full Unicode range when using XeTeX/LuaTeX and the
mappings do not have to be 1-1 (cf. \uppercase/\lowercase).

There is also \str_fold_case:n which does folding for programmatic
applications. That function has a different set of use cases and is not
considered further here.

*Escaping from case changing*

The current implementation follows a BibTeX-like convention for
preventing case changing: braced content is not changed. In the original
approach there was no mechanism to do case changing inside the argument
to a command as a result. I have now altered this to include a list of
commands where case changing should be applied, so for example it would
be possible to arrange that

    \tl_upper_case:n { Hello~\emph{world} }

will case change the argument to \emph. At present, this functionality
is designed to work with commands taking one argument (i.e. a second or
subsequent argument will be unaffected).

The alternative to such an approach is to case change everything and
provide an escape mechanism (cf. the textcase package and
\NoChangeCase). As a user, I can see advantages to both approaches.

One thing that is not currently covered is dealing automatically with
math mode content. That is doable but would require some consistent
interface. In particular, while dealing with "$ ... $" and "\( ... \)"
is straight-forward (single-token delimiters), it would be more
challenging to cover "\begin{math} ... \end{math}" or similar. Some of
this has a relationship to expandability: see the next area.

*Expandability*

The current implementation is expandable as this allows the 'natural' usage

    \tl_set:Nx \l_tmpa_tl
      { \tl_upper_case:n { foo } }
    \tl_show:N \l_tmpa_tl % => "FOO"

Expandablity imposes some restrictions on the code and does have a
performance knock-on. The need to deal with changes that are not 1-1 or
have other context-dependence means that the performance aspect is not
so important: a full solution using \uppercase/\lowercase would still
require a mapping or similar to deal with all of the possibilities.

One area that is more tricky in this regard is input which is not fully
expanded. For example

    \def\myname{Joseph Wright}
    \MakeUppercase{Written by \myname}

will yield "WRITTEN BY JOSEPH WRIGHT" as there is an \edef inside the
LaTeX2e command before case changing. In contrast, the expl3 functions
currently do no expansion so

    \tl_upper_case:n { Written~by~\myname }

gives "WRITTEN BY Joseph Wright". Notably, if used in setting a token
list the content would be "WRITTEN BY \myname", i.e. further expansion
is inhibited.

It is not clear to me what the 'expected' outcome might be. It would be
possible to use f-type expansion to deal with stored tokens before case
changing, but for input such as

    \tl_upper_case:n { Written~by \\ Joseph~Wright }

that could break outcomes with LaTeX2e: \\ would be 'lost' and this
would could problematic if the text was used later in for example a
center environment. A non-expandable implementation could use the same
logic as \MakeUppercase but at the cost that case changing for storage
would then need dedicated functions for example

    \tl_set_upper_case:Nn
    \tl_set_lower_case:Nnn

This looses the 'natural' approach to case changing inside a tl setting
and requires separate 'set a tl with case changing' and 'typeset case
changed text' functions.

*LICR/Non-native input*

The original implementation for the expl3 functions only case changes
letters. Adding an 'escape' to cover e.g. \emph also allows coverage of
things like "\'{e}" and so it was natural to consider LICR input. I have
therefore extended the code to allow coverage of everything handled by
\MakeUppercase when T1/T2A/T2B/T2C/T4/T5/LGR encodings are in use. There
is of course a performance hit, but this should be comparable to that
for processing letters.

That then leaves the question of input outside of the ASCII range when
using pdfTeX. It would I think be possible to do this using an approach
detecting inputenc active chars, but I am reluctant to go this way (in
the longer term it will be increasingly hard to justify using a 8-bit
program as the world standardises on Unicode). With inputenc loaded case
changing does work if the input goes via LICR

    \documentclass{article}
    \usepackage[utf8]{inputenc}
    \usepackage{expl3}
    \makeatletter
    \ExplSyntaxOn
    \cs_generate_variant:Nn \tl_upper_case:n { V }
    \cs_new_protected:Npn \MakeExplUpperCase #1
      {
        \group_begin:
          \protected@edef \l_tmpa_tl {#1}
          \tl_upper_case:V \l_tmpa_tl
        \group_end:
      }
    \ExplSyntaxOff
    \makeatother
    \begin{document}
    \MakeExplUpperCase{Héllo}
    \end{document}

Again, this has a link to expandability.

*Naming*

As noted in previous mails on this topic, the naming here (\tl_...) at
least in part reflects the fact this code is difficult name. Any better
naming schemes welcome!

*Conclusions*

The current code works but there are open questions. What I am hoping
for is feedback on the ideas and in particular what issues come up with
real use cases. Ideas about all or any of the above, or indeed other
aspects, most welcome.
--
Joseph Wright

ATOM RSS1 RSS2