## LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

 Options: Use Classic View Use Monospaced Font Show HTML Part by Default Condense Mail Headers Topic: [<< First] [< Prev] [Next >] [Last >>]

 Sender: Mailing list for the LaTeX3 project <[log in to unmask]> Date: Thu, 13 Oct 2011 12:54:04 +0200 Content-Disposition: inline Reply-To: Mailing list for the LaTeX3 project <[log in to unmask]> MIME-Version: 1.0 Message-ID: <[log in to unmask]> In-Reply-To: <[log in to unmask]> Content-Type: text/plain; charset=us-ascii From: Heiko Oberdiek <[log in to unmask]> Parts/Attachments: text/plain (39 lines) ```On Thu, Oct 13, 2011 at 05:56:14AM -0400, Bruno Le Floch wrote: > > I wouldn't do it manually. There are mappings files for Unicode: > > http://unicode.org/Public/MAPPINGS/ > > In project I am using these mappings together with a perl script > > to generate the .def files. > > Thank you for the link. It seems that the simplest would be to > directly use the tables provided there as the .def files. Simply > \catcode`\#=14, and set a few other default catcodes, then input the > file, looping over the lines. Are all of the lines of the form > > 0xHH 0xHHHH # comment Not all, dec-mcs.txt is different:   sprintf('=%02X U+%04X %s\n', , , )   no comments > (or comment lines), with H = some hexadecimal digit? In other words, > are all those encodings 8-bit only, and with only Unicode points > <65536? In the directory MAPPINGS there are encodings with > 8-bit. And a quick look doesn't reveal Unicode points > U+FFFF. > > It could be made even expandable in linear time > > with a large lookup table (256). > > Right. I was thinking in terms of UTF-8 for some reason, and the > lookup table would be too big. In practice the table would be larger than 256 (16x16) to support lowercase and uppercase digits ([0-9a-fA-F]). The size would be 484 = (10 + 2 x 6) x (10 + 2 x 6). Yours sincerely   Heiko Oberdiek ```