At 13:03 -0600 2001/02/15, Randolph J. Herber wrote:
> The java language specification defines all three character
> sequences as line terminators and '\032' (also known as Control-Z)
> if it is the last character of the file as a file terminator
> (in MSDOS, Control-Z does mark the logical end of a text file).
> TeX and LaTeX could take a similar approach:
> If the underlaying operating system file is
> record structured and therefore is not a
> character stream file, then suffix each record
> with one of the above line terminator sequences.
> Then, in TeX's mouth, any of the line terminator
> sequences could be recognized as being a line
> terminator and, if Java's example is followed,
> then '\032' or end-of-file would mark the logical
> end of the input file.
> With such processing, it would not matter that one had
> transfered a text file between systems with incompatible text
> file structures as binary files (e.g., the scientists at
> CDF do that frequently then ask me to repair the problem).
Right. This is what I was hinting at: The thing is that a matter of
practise, one ends up with a flood of UNIX, MacOS, and MSOS files via the
Internet, and it is difficult to keep track which files that should be
translated, and which one should not.
Therefore, at least for computer related software, such as for compilers
and their text editors, it is now common on the MacOS that they use that
Java convention or something similar. Thus, I do not anymore translate the
text files I pick down, but merely give them the attribute 'TEXT', even
though they have UNIX newlines in them.
Experimenting though with Hugs (that is, the sources I ported to MacOS), I
found it tricky to write UNIX newlines, because sometimes one writes to a
console, and it does not accept \n as newlines. So the safe thing is to
write files as "text", whereas reading them with the Java convention,
either by tweaking the library routines, or by opening the files as binary
and do the parsing then. -- But with the latter approach, one will have to
check what happens under VMS and other platforms that have yet other
Another approach would be to make a TeX version that pretends to be 32-bit
internally, and require the \r, \n, and \r\n newlines to be translated into
the Unicode line separator.