LATEX-L Archives

Mailing list for the LaTeX3 project


Options: Use Classic View

Use Proportional Font
Show HTML Part by Default
Condense Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Sender: Mailing list for the LaTeX3 project <[log in to unmask]>
Date: Tue, 14 Mar 2023 23:42:04 +0100
Content-Disposition: inline
Reply-To: Mailing list for the LaTeX3 project <[log in to unmask]>
Message-ID: <[log in to unmask]>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
In-Reply-To: <[log in to unmask]>
Content-Type: text/plain; charset=utf-8
From: Arthur Rosendahl <[log in to unmask]>
Parts/Attachments: text/plain (48 lines)
On Tue, Mar 14, 2023 at 06:40:06PM +0100, Javier Bezos wrote:
> Although intentional, this is not quite correct if we follow the
> Unicode CLDR, which considers the ‘default’ Norwegian is ‘no’
> (technically it’s a macrolanguage, and ‘nb’=“Norwegian Bokmål” is
> empty and just inherits from it). This is the correspondence made
> by babel, which follows Unicode.

  Except that that explanation is a bit turned on its head :-)  First,
the Common Locale Data Repository is not a part of the Unicode Standard
(which encodes scripts, not languages), it’s simply hosted on the
Unicode servers.  Second, the fact that [nb] inherits from [no] without
changes in the CLDR means precisely the opposite of the interpretation
you make of it: it means that for the CLDR, Norwegian is [nb], that’s
the relevant piece of information.  (That Norwegian is also [no] follows
from ISO 639-1 that define these codes.)  Finally, when naming
hyphenation pattern files we don’t need to care about what happens when
a user requests the code [no] or [nb] or whatever else, we just need to
find the best possible way to tag the particular sets of patterns that
we actually have.  In fact, we would not have any file called
hyph-no.tex, were it not for:

>                                  However, internally, hyphenation
> rules are set to the name ‘norwegian’ as set in language.dat,
> which loads ‘nb’, which, in turn, loads ‘no’, so in the end it
> works as it should 🙂.

  Not quite.  The pattern sets for [nb] and [nn] are actually nearly
identical, so that hyph-no.tex contains their intersection, which both
hyph-nb.tex and hyph-nn.tex load, with a very small number of additions.
Hence the code [no] is only used as a convenience and does not mean
“these are the patterns you get when you request [no]”.  That
information is beyond the scope of hyph-utf8; however there is, as you
note, a very similar mapping in language.dat (which in TeX Live is
entirely generated by information contained in the hyph-* files), since
hyph-nb.tex defines the name “norwegian” as a synonym; and language.dat
is indexed by language names, not codes.

  Anyway, the more general point is that the comments in the hyph-*
files have been very carefully tailored to generate exactly the contents
of language.dat (and other infrastructure files) that we need; so that
if one file seems to lack information, the right thing to do is to
ignore it.  It is quite nice that Thierry came to the same conclusion in
kerTeX :-)