LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

	LISTSERV Archives
	LATEX-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Forum View Use Monospaced Font Show Text Part by Default Condense Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Sender:	Mailing list for the LaTeX3 project <[log in to unmask]>
Date:	Tue, 17 Mar 2020 08:30:54 +0000
Reply-To:	Mailing list for the LaTeX3 project <[log in to unmask]>
Message-ID:	<[log in to unmask]>
Subject:	Re: Using Lua to Preprocess Unicode Data
MIME-Version:	1.0
Content-Transfer-Encoding:	8bit
In-Reply-To:	<[log in to unmask]>
Content-Type:	text/plain; charset=utf-8; format=flowed
From:	Joseph Wright <[log in to unmask]>
Parts/Attachments:	text/plain (45 lines)

On 16/03/2020 22:54, Kelly Smith wrote:
>> As format-building is all about saving time for 'normal' runs, I'm not
>> seeing there is a massive need to speed up the process. I know there is
>> one engine in development that doesn't use format files, so that might
>> be a place to consider things, but I think we'd need a strong case to
>> alter the approach for XeTeX/LuaTeX (pdfTeX, ...).
>>
>> Joseph
> 
> Sorry, I should’ve clarified: the point of preprocessing the data wouldn’t
> be to speed up anything, instead, the point would be to do complex
> processing that would be very difficult or even impossible in LaTeX.
> 
> For example, if the l3regex module were extended so that precompiled
> regexes could be used as parts of other regexes, then Unicode properties
> could be simply implemented by referring to precompiled regexes whose
> content was created by running filters over the Unicode character database.
> 
> Another example would be processing the very complex XML files that are
> used in supplementary Unicode files, like the Common Locale Data Repository,
> which could help with localization and language-specific date and number
> parsing/formatting.
> 
> This idea of preprocessing could be applied to any complex data set that
> LaTeX3 may need to work with, but I used the example of Unicode data
> because that’s the one that immediately came to mind.

When there is a need to do complex pre-processing, Lua is the obvious 
way to go nowadays, and that's reflected in a number of scripts. The 
issue tends to be not what one does to script extraction, but rather the 
target at the TeX end.

For example, on the regex idea, I suspect performance would be the major 
concern: regex processing is already a lot of work, and I have a feeling 
Bruno would want to optimise how the data were stored inside TeX. (I'm 
not sure where the balance between data extraction and storage lies 
here.) Similarly, a lot of work is done by Javier for babel using the 
CLDR, but many of the outcomes are not amendable to scripting: it's 
about how you set up the TeX 'just right'.

So whilst there is no reason not to use Lua when it works, at the moment 
there are not pressing areas where the pre-processing is the barrier.

Joseph

ATOM RSS1 RSS2

LISTSERV.UNI-HEIDELBERG.DE
Universität Heidelberg \| Impressum \| Datenschutzerklärung