## LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

 Options: Use Classic View Use Monospaced Font Show Text Part by Default Show All Mail Headers Topic: [<< First] [< Prev] [Next >] [Last >>]

 Re: latex/3480: Support for UTF-8 missing in inputenc.sty Dominique Unruh <[log in to unmask]> Thu, 5 Dec 2002 23:57:12 +0100 text/plain (78 lines) Short info on what this discussion is about: We were discussing the possibility of adding UTF-8 inputenc support to LaTeX. The existing package ucs.sty is deemed to big/resource consuming for inclusion into the kernel. This discussion is now moved onto LATEX-L. Frank wrote: > it seems important to me to follow up the question Chris has posted > about what are input and what are output (font) encodings. Yes, I do understand this difference. But when adding UTF-8 support, it is probably even unwise to load all supported UTF sequences. Therefore I proposed to add to the fontenc an information, which Unicode range is to be loaded for this fontencoding. To clarify this, here an example: if we have code like the following: \usepackage[utf8]{inputenc} \usepackage[T2A]{fontenc} the file t2aenc.def could contain a line like: \FontencUnicodeRange{"400-"4FF} and \AtBeginDocument UTF-8 sequences would only be loaded for the ranges given by the fontencodings, thus taking the need from the user to decide by himself, which sequences to load. In case no UTF-8 is needed, the \FontencUnicodeRange's are ignored. Of course, the fontencoding->Unicode-Range mappings could also be in some extra file, thus removing the need to change the existing fontencodings. > commands, eg instead of > \DeclareInputText{164}{\textcurrency} > we probably need something like > [...] > \DeclareUTFeightInputText{}{\textcurrency} Code for this can be extracted from utf8.def as with ucs.sty. Interested people could have a look at the following macros in this file (unfortunately mostly undocumented (yet)): \utf@viii@map{number} constructs the UTF-8 sequence formed \u8-n-BCD where n is the first character of the sequence (as decimal number), and BCD are the (one, two or three) further characters (as characters). Here the macros content gets just number, but the macros can easily be changes to define it to anything give (e.g. \textcurrency). \utf@viii@undef{number}{char}{char}{char} calculates the Unicode number for some UTF-8 sequence (given again as number, char, char, char, with \@nil instead of the chars for shorter sequences.) A UTF-8 sequence starter would then have to be defined approximately as (here the example for the sequence starter "E3 = 227) \def\^^E3#1#2{\ifx\csname u8-227-#1#2\endcsname\relax   \utf@viii@undef{227}#1#2\@nil\else   \csname u8-227-#1#2\endcsname\fi} \utf@viii@make does the job of defining such macros (containing some additional code) Chris wrote: > I tried to understand Dominique's approach and to compare it with > David's but both, as on CTAN, consist of undocumented code ... so > I gave up. Have you looked at David's code? My code is documented (though only partly). The comments can be found in utf8.dtx, or in the files in the CVS archive (see http://www.unruh.de/DniQ/latex/unicode/). I don't know David's code, could you give me a CTAN location? DniQ.