LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

Options: Use Forum View

Use Monospaced Font
Show HTML Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Sender:
Mailing list for the LaTeX3 project <[log in to unmask]>
Date:
Sun, 5 Mar 2006 21:30:58 +0100
Reply-To:
Mailing list for the LaTeX3 project <[log in to unmask]>
Subject:
MIME-Version:
1.0
Content-Transfer-Encoding:
7bit
In-Reply-To:
Content-Type:
text/plain; charset=us-ascii
From:
Frank Mittelbach <[log in to unmask]>
Parts/Attachments:
text/plain (89 lines)
Heiko,

 > I don't, but this is used in next.def, where 0xFE and 0xFF isn't
 > part of the NextStep encoding:
 >   \DeclareInputText{254}{}
 >   \DeclareInputText{255}{}
 > Thus actually an empty "LICR" is used here.

as i said, mistake by the person contributing that encoding and harmless
anyway --- removed by now

 > > >Thus the entry for U+02C6 in utf8enc.dfu is not really correct:
 > > >  \DeclareUnicodeCharacter{02C6}{\textasciicircum}
 > > >  U+02C6 MODIFIER LETTER CIRCUMFLEX ACCENT
 > > >"\^" would be more correct, except that grabbing the
 > > >argument isn't too trivial in case of utf-8 characters
 > > >consisting of several bytes.
 > > 
 > > Aren't you thinking of the COMBINING circumflex accent here?
 > 
 > Yes.
 > 
 > > MODIFIER characters are more phonetic alphabet thingies.
 > 
 > Thanks.

but the combining don't work in TeX either, as unicode defines them as
following the base char an TeX requires them to precede the base char. in
short you can't turn unicode with combining chars into TeX/LaTeX code without
a preprocess as you can't make base chars act on following input

that is not to say that  the line 

 > > >  \DeclareUnicodeCharacter{02C6}{\textasciicircum}

is probably wrong it should be most likely

  \DeclareUnicodeCharacter{005E}{\textasciicircum}

and several others have similar defects.  would be good if that got checked.

But if you do that remember the direction of check.

one has to start from what the font encoding provides, eg \textasciicircum in
that case. then one has to find the UC code for that and that is what should
show up in the font encoding dfu.

 > > >What is the LICR of "fi"?
 > > >  U+FB01 LATIN SMALL LIGATURE FI
 > > >The ligature mechanism depends on the used fonts, "fi" is not
 > > >always available. What is better?
 > > >  \DeclareUnicodeCharacter{FB01}{\textfi}
 > > >  \ProvideTextCommandDefault{\textfi}{fi}
 > > >vs.
 > > >  \DeclareUnicodeCharacter{FB01}{fi}
 > > 
 > > Definitely the latter. As I understand it, these ligatures are in 
 > > unicode mostly for compatibility with legacy encodings (and perhaps for 
 > > font designers who need to assign something to these glyphs). At least 
 > > as far as TeX is concerned, "fi" doesn't carry any semantic information 
 > > different from "f" "i".
 > 
 > Example: Assuming there is a word "deaffish" and the
 > author does not want a ligature ffi spanning both word parts.
 > Therefore, having a good editor, he uses the Unicode sequence
 > U+0066 U+FB01 to specify the correct and desired ligature.
 > Using the later case of \DeclareUnicodeCharacter{FB01}
 > TeX would get "ffi" and then form the wrong ligature.

wrong example in my opinion. as Lars said: fi or ffi ligature ended up in
unicode as legacy codes because they were in legacy 8-bit encodings. million
other ligatures are not available as "chars" because UC like most other
standards are heavily influenced by what is right for certain countries but
not others. using "fi" in this way is like using tables in html to position
elements on the page, ie it works for that example but ...

so the right thing is not to use fi at all here but would be to a generic
method to denote subword boundaries or whatever to allow the formatter not to
use the ligature. TeX's method would be \textcompwordmark ... but unicode
never thought that such encoding of lgoical information is the task of the
standard.

some of Chris' and my musing on this subject can be found in the paper we gave
at the unicode conf in 1996:

http://www.latex-project.org/papers/unicode5.pdf

frank

ATOM RSS1 RSS2