LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

Options: Use Classic View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Hans Aberg <[log in to unmask]>
Wed, 14 Feb 2001 16:31:45 +0100
text/plain (58 lines)
At 11:08 +0100 2001/02/14, Bernd Raichle wrote:
> > >Incidentally one reason why xmltex can not support utf16 is that
> > >TeX buffers to ^J (or ^M) and throws away any bytes with value 32 that
> > >occur at the end of this buffer, which might just be half of a 16bit
> > >quantity that you'd rather keep. there's no way to control this
> > >behaviour from within TeX.
> >
> > So TeX is a lot less sophisticated than it appears at first sight.
>
>David has simpliefied it a lot.  Instead of saying ``TeX buffers to ^J
>(or ^M)'' is should read ``TeX buffers to the system-dependend and
>file type dependend end-of-line marker''.

This is the normal thing under say C/C++ for "text" files (but not for
"binary" files"), even though TeX uses Pascal, so people tend to forget
about it.

>  Nowadays stream oriented
>files are common, where a special character (^J or ^M) or a special
>combination of characters (^M^J) are used as end-of-line markers.

UNIX, MacOS & DOS (MSOS).

>  In
>the past and even nowadays there exist other file types where the
>end-of-line marker is not part of the file (i.e. a special character),
>e.g. files with a fixed-width record (aka line) length.

Today, which ones?

>And if you have to deal with files using a fixed-width record length
>usually padded by blanks, it was (and still is?) a good idea to remove
>these padding character at an appropriate stage ... why not directly
>after reading the line?

No, this is not the normal parsing: The normal would be to open the files,
and read the characters one by one, and let the lexical remove the space
characters whenever it sees it. A simple way to stack the input files in
say C++ is to merely have a function that creates and open the streams and
the parses it. When the function is finished, the parsed file automatically
closes. Normally, one does not allow lexical and grammar constructs pass
over files (i.e., one does not allow that the a word in one file is closed
by some additional letter in another file, or that say matching braces have
"{" in one file and "}" which simplifies this method.

This way one does not have to worry about buffering at all, as it is
hadnled automatically by the stream classes.

> > TeX really is a program from another age...
>
>Yes!  TeX is written between 1977 and 1982!

My thought was that perhaps this kind of modern way of handling streams was
(is?) not available in Pascal, so therefore, when it had to be implemented
by hand, simplifications were made.

  Hans Aberg

ATOM RSS1 RSS2