LATEX-L Archives

Mailing list for the LaTeX3 project


Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Joseph Wright <[log in to unmask]>
Reply To:
Mailing list for the LaTeX3 project <[log in to unmask]>
Mon, 30 Jun 2014 17:59:07 +0100
text/plain (76 lines)
On 30/06/2014 14:05, Joel C. Salomon wrote:
> There’s an important use-case that seems not to have been addressed,
> but perhaps this is better handled in a different layer:
> mixed-language strings.
> For example, consider a document with the title, “The Interesting Life
> of Ragıp Hulûsi Özdem” (to chose the first Turkish name I could find
> with both dotted i and dotless ı). Somehow, within the \title{}
> declaration, the change of language must be indicated so that (e.g.)
> at the top of the page this will be transformed to “THE INTERESTING
> A similar situation arises in German where within geographical names
> ‘ß’ should capitalize to the recently-defined ‘ẞ’, not ‘SS’.
> (According to <ẞ>, this rule was
> adopted in 2010.)
> As I said, this is probably best handled in a separate layer:  Code
> that capitalizes user-provided text would need to defer to the LaTeX3
> equivalent of Babel, which would scan the text for user-level
> language-change commands, and (among other things) call
> \tl_upper_case:nn with the appropriate language argument.  But I think
> it’s important that the interface to the casing functions being
> defined now be aware of the way they will likely be used.

As you say, this looks much more like a 'high level' requirement: it's
tricky to see how nesting can work and at the same time not be tied to
design otherwise. For example, we can mark up a language in the input
easily enough

  The Interesting Life of \SomeLangCommand{tr}{Ragıp Hulûsi Özdem}

but the problem is then making sure that the command does case changing
at point of use. One might imagine that such a command might have a
flexible definition:

   \TitleCase#1 =>
     \cs_set_eq:NN \SomeLangCommand \text_title_case:nn
     ... % Other similar stuff
     \text_title_case:Vn \l_language_current_tl {#1}

That might still leave a question about x- versus f-type expansion: if
the outcome is meant to be 'just text' then you need to expand
\SomeLangCommand, which at the moment is deliberately avoided. Of
course, such an issue might be avoided by doing a pre-parse, as you suggest:

   \TitleCase#1 =>
     \cs_set_eq:NN \SomeLangCommand \text_title_case_and_brace:nn
     ... % Other similar stuff
     \tl_set:Nx \l_some_tmpa_tl {#1} %
     % Now "The Interesting Life of {RAGIP HULÛSİ ÖZDEM}"
     \tl_set:Nx \l_some_tmpa_tl
        { \text_title_case:VV \l_language_current_tl \l_some_tmpa_tl }

On the German business, I'd already wondered how best to add the
'capital Eszett' business for the simple case. I'm not quite sure how
one is meant to do that (not de-DE or whatever, but de-<something>!).
I'm also not sure whether people who do use it will use it for all
Eszetts (might otherwise lead to some odd decisions in upper casing). As
you say, this could of course occur in input where such a decision
applies only to some cases.

What is worth noting is that while the commands I've added take general
text as input, we are seeing them as building blocks for e.g. a
hypothetical \text_title_case:nn. That and related operations need to
do things like worry about 'words', and in that context splitting up the
input is needed anyway. (Will has an expandable approach to do that. We
might imagine seeking to add to that some form of 'recursion' for nested

BTW, nice Turkish name: that one is going into the test suite for this area!
Joseph Wright