LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

	LISTSERV Archives
	LATEX-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Classic View Use Monospaced Font Show Text Part by Default Show All Mail Headers
Topic:	[<< First] [< Prev] [Next >] [Last >>]

Re: Using Lua to Preprocess Unicode Data

Joseph Wright <[log in to unmask]>

Mon, 16 Mar 2020 17:32:29 +0000

text/plain (47 lines)

On 16/03/2020 17:01, Kelly Smith wrote:
> Hello!
> 
> I’ve been thinking: since Lua is already involved in the build process,
> by way of l3build, wouldn’t it be reasonable to use a lua script
> to preprocess Unicode data into forms that are easily consumed by LaTeX
> during the format-building process?
> 
> Warmly,
> Kelly
> 

It depends on the outcome you are after.

The original loading method for Unicode data in XeTeX was via a Perl 
script. That created a .tex file containing (for example) catcode data. 
To update the Unicode data, one had to run the Perl script, then send 
the processed files to CTAN. There were two issues. First, that meant 
that any change required active work to not only get the data from 
Unicode but also to manipulate it. Second, and more significant, it was 
*slower* than just reading the files in TeX. (This only became apparent 
when I wrote some test parsers.)

Now, there is more data being loaded today than when I did that work, 
and some of it is in LuaTeX so could be done Lua-only. It's also 
possible that the Perl script was sub-optimal, or that as part of a 
general 'install' function the time would not really show. However, 
XeTeX needs the data, so one is still looking at having to explicitly 
pre-process in Lua. Moreover, most of the time taken for format-building 
is not about reading Unicode data. With LuaTeX, pre-loading expl3 does 
cut out a slight 'stall' when loading everything for case-changing, but 
having a LuaTeX and a XeTeX path separately is not attractive.

The current set-up means that updating the Unicode files is just a 
question of copy-pasting the raw .txt files into a form that CTAN can 
accept. Pre-digesting still leaves us needing some way to co-ordinate 
between packages (format, luaotfload, expl3, specialist stuff), plus 
with having to do the explicit extraction.

As format-building is all about saving time for 'normal' runs, I'm not 
seeing there is a massive need to speed up the process. I know there is 
one engine in development that doesn't use format files, so that might 
be a place to consider things, but I think we'd need a strong case to 
alter the approach for XeTeX/LuaTeX (pdfTeX, ...).

Joseph

ATOM RSS1 RSS2

LISTSERV.UNI-HEIDELBERG.DE
Universität Heidelberg \| Impressum \| Datenschutzerklärung