LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

	LISTSERV Archives
	LATEX-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Forum View Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: Using Lua to Preprocess Unicode Data
From:	Joseph Wright <[log in to unmask]>
Reply To:	Mailing list for the LaTeX3 project <[log in to unmask]>
Date:	Mon, 16 Mar 2020 17:32:29 +0000
Content-Type:	text/plain
Parts/Attachments:	text/plain (47 lines)

On 16/03/2020 17:01, Kelly Smith wrote:

> Hello!

> 

> I’ve been thinking: since Lua is already involved in the build process,

> by way of l3build, wouldn’t it be reasonable to use a lua script

> to preprocess Unicode data into forms that are easily consumed by LaTeX

> during the format-building process?

> 

> Warmly,

> Kelly

> 

It depends on the outcome you are after.

The original loading method for Unicode data in XeTeX was via a Perl 

script. That created a .tex file containing (for example) catcode data. 

To update the Unicode data, one had to run the Perl script, then send 

the processed files to CTAN. There were two issues. First, that meant 

that any change required active work to not only get the data from 

Unicode but also to manipulate it. Second, and more significant, it was 

*slower* than just reading the files in TeX. (This only became apparent 

when I wrote some test parsers.)

Now, there is more data being loaded today than when I did that work, 

and some of it is in LuaTeX so could be done Lua-only. It's also 

possible that the Perl script was sub-optimal, or that as part of a 

general 'install' function the time would not really show. However, 

XeTeX needs the data, so one is still looking at having to explicitly 

pre-process in Lua. Moreover, most of the time taken for format-building 

is not about reading Unicode data. With LuaTeX, pre-loading expl3 does 

cut out a slight 'stall' when loading everything for case-changing, but 

having a LuaTeX and a XeTeX path separately is not attractive.

The current set-up means that updating the Unicode files is just a 

question of copy-pasting the raw .txt files into a form that CTAN can 

accept. Pre-digesting still leaves us needing some way to co-ordinate 

between packages (format, luaotfload, expl3, specialist stuff), plus 

with having to do the explicit extraction.

As format-building is all about saving time for 'normal' runs, I'm not 

seeing there is a massive need to speed up the process. I know there is 

one engine in development that doesn't use format files, so that might 

be a place to consider things, but I think we'd need a strong case to 

alter the approach for XeTeX/LuaTeX (pdfTeX, ...).

Joseph

ATOM RSS1 RSS2

LISTSERV.UNI-HEIDELBERG.DE
Universität Heidelberg \| Impressum \| Datenschutzerklärung