Many thanks for this.
I'm really impressed with how deep into the thicket of baroque markup LaTeXML has penetrated, even to plain with 1990s style ams, e.g., https://ar5iv.org/html/alg-geom/9304003
(which, by the way, is a great article).
𝐚𝐫𝐗𝐢𝐯 𝐬𝐡𝐨𝐮𝐥𝐝 𝐠𝐨 𝐟𝐨𝐫 𝐭𝐡𝐢𝐬.
I am happy to announce a new preview site for arXiv.org's scientific preprints:
To try it out, change the "X" in any arXiv article link to the "5" in
ar5iv to get a modern HTML5 document.
What is included?
You can expect to find a collection of 1.75 million preprints,
readable (and machine-readable) today.
The underlying data is converted from TeX/LaTeX to HTML5 by latexml.
It is indeed the latest installment of the exact data I have been
packaging as a versioned "arXMLiv dataset", available under
A new "arXMLiv 2021" edition is planned for Februrary.
My hope for the short-term is that the new preview site will be
helpful to stress-test MathML Core support in all browsers, especially
ones striving to newly add support in 2022.
For the longer term, we hope to reintegrate back with the main
arXiv.org site, and propose a reliable transition away from PDF,
especially for the 2 million articles that have already been
And for the readers in the lists here - to enable a wide variety of
exciting MathNLP and MathIR research!
- The formulas and fragments in ar5iv are enriched with "id"
attributes, making them attractive targets for search indexes
- Our attempt at generating "Scholarly HTML5" offers a range of
reliable additional markup, such as classes for a variety of
scientific statements (theorems, proofs, definitions,...) +more!
You're welcome to use and reuse, share widely, and send feedback my
way. The plan is to keep actively improving, while offering this
service to the wider academic community.
For a more detailed overview, here is a link to the social media announcement:
I will also give an overview talk in next week's SIGMathLing seminar,
and would welcome early feedback and discussion from anyone who has
time to attend. My gratitude to Michael and Frederik for organizing!
Details as usual at:
LaTeXML mailing list
[log in to unmask]