LATEX-L Archives

Mailing list for the LaTeX3 project


Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Mailing list for the LaTeX3 project <[log in to unmask]>
Tue, 17 Mar 2020 08:30:54 +0000
Mailing list for the LaTeX3 project <[log in to unmask]>
text/plain; charset=utf-8; format=flowed
Joseph Wright <[log in to unmask]>
text/plain (45 lines)
On 16/03/2020 22:54, Kelly Smith wrote:
>> As format-building is all about saving time for 'normal' runs, I'm not
>> seeing there is a massive need to speed up the process. I know there is
>> one engine in development that doesn't use format files, so that might
>> be a place to consider things, but I think we'd need a strong case to
>> alter the approach for XeTeX/LuaTeX (pdfTeX, ...).
>> Joseph
> Sorry, I should’ve clarified: the point of preprocessing the data wouldn’t
> be to speed up anything, instead, the point would be to do complex
> processing that would be very difficult or even impossible in LaTeX.
> For example, if the l3regex module were extended so that precompiled
> regexes could be used as parts of other regexes, then Unicode properties
> could be simply implemented by referring to precompiled regexes whose
> content was created by running filters over the Unicode character database.
> Another example would be processing the very complex XML files that are
> used in supplementary Unicode files, like the Common Locale Data Repository,
> which could help with localization and language-specific date and number
> parsing/formatting.
> This idea of preprocessing could be applied to any complex data set that
> LaTeX3 may need to work with, but I used the example of Unicode data
> because that’s the one that immediately came to mind.

When there is a need to do complex pre-processing, Lua is the obvious 
way to go nowadays, and that's reflected in a number of scripts. The 
issue tends to be not what one does to script extraction, but rather the 
target at the TeX end.

For example, on the regex idea, I suspect performance would be the major 
concern: regex processing is already a lot of work, and I have a feeling 
Bruno would want to optimise how the data were stored inside TeX. (I'm 
not sure where the balance between data extraction and storage lies 
here.) Similarly, a lot of work is done by Javier for babel using the 
CLDR, but many of the outcomes are not amendable to scripting: it's 
about how you set up the TeX 'just right'.

So whilst there is no reason not to use Lua when it works, at the moment 
there are not pressing areas where the pre-processing is the barrier.