LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Hans Aberg <[log in to unmask]>
Reply To:
Mailing list for the LaTeX3 project <[log in to unmask]>
Date:
Sun, 7 Jan 2001 15:57:14 +0100
Content-Type:
text/plain
Parts/Attachments:
text/plain (208 lines)
At 14:50 -0500 1-01-06, William F. Hammond wrote:
>> If you are in the need of various translations, have you tried using Flex
>> (lexical analyzer generator) and Bison (parser generator, or
>> compiler-compiler), see
>
>Are you saying that it's easier to code translations from XML using
>lex and yacc descendants rather than using standard XML tools such as
>sgmlspl, jade, or xt?  I find that hard to believe.  (Of course, the
>situation before 1996 was different.)

I do not know exactly what you want to achieve: I get the impression that
you have an language of your own of some sort, and want to be able to
translate it into different formats. If your language is just a dialect of
XML, and there are XML parser generators available similar to that of
Bison, then use that.

The translation I needed was as follows: From my own language, I want to
output C++ code. This proved very difficult, because local code generates
information (such as include files, declarations, and definitions) that
should be output in different places and files in the C++ output files.

Therefore, instead of doing the parsing immediately into a new language, I
invented an intermediate "formatting" language: Given a set of macro
definitions, normally provided a formatting file (thus providing the
specific data of the output language, in my case, C++), and a set of
iterated lookup tables (in iternal binary format), produced by the parsing,
it knows how to pick together suitable output files.

The idea is to make the actual parsing as independent as possible of any
output language, only producing the lookup tables. Then by merely switching
the formatting file with the macro definitions, one can generate output to
different languages.

>> -- I use them together with C++, which is convenient as the latter has
>> standard string classes.
>
>Although I've written in C, I've never gotten into C++.  Are there
>good regular expression libraries for C++?

If you need full regular expressions and a full LR(1) parser within your
language, then the simplest approach is to let your language output Flex .l
and Bison .y files; then compile these files using Flex and Bison, and
finally compile the files so output using a C++ compiler. This is sort of a
standard computer technique: for example, the Haskell compiler GHC produces
.c files in this way.

Also note that Flex and Bison are themselves also compilers, and one can
use Flex and Bison to write new versions of themselves. -- Actually, they
do. :-)

-- I only use C++ because it is convenient to produce an internal binary
representation, which later can be used to produce the C++ output format.
The iterated lookup tables I use are just
  map<string, variable>
(meaning that one can index a finite set of variable by string keys) where
"variable" is a class with suitable lookup information.

Let's take a simple example: In the output in my application, I need to
build a sequence of classes, which can have a sequence of methods, with
definitions that should be output in various places. The main point is that
one has a sequence of lookup localities, like in most modern computer
languages.

In my formatting file, I may have something like the stuff below: Here,
  <#header|...|header#>
  <#header|...|#>
encloses a macro definition, and <|header_name|> is an invocation of the
variable "header_name", and so on.

<#header|
#ifndef Synergy_<|header_name|>_header
#define Synergy_<|header_name|>_header

#if !__cplusplus
#error Header file "<|header_name|>" only for C++.
#endif

#include <stdexcept>

#include "data"
#include "construct"

<|header_preamble|>

namespace Synergy {

<|class.declaration|>

} // namespace Synergy

#endif // <|header_name|>_
|header#>

<#class.declaration|
extern Synergy::data global_<|class_name:cpp|>;
class <|class_name:cpp|> : public virtual construct {
public:
  static const char* category;
  static object_method_base* lookup_method(const std::string&);
  static Synergy::data global;
  class object;
  typedef <|class_name:cpp|> constructor;
  virtual root* clone() const { return new
Synergy::<|class_name:cpp|>(*this); }
  virtual bool cloneable() { return <|object_cloneable|>; }
  <|object_copy_to_clone_method|>
  virtual Synergy::data method_method(Synergy::data&);
  virtual Synergy::data method_object(Synergy::data& x) { return new
object(x); }
  virtual Synergy::data method_object_method(Synergy::data& x);
  <|constructor_method.declaration|>
  <|constructor_cpp.declare|>

  class object : <|object_base|><|object_cpp_base|> {
  public:
    static const char* category;
    static object_method_base* lookup_method(const std::string&);
    <|object_constructor.declaration|>
    virtual root* clone() const { <|clone_method_definition|> }
    <|copy_method|>
    <|object_data|>
    virtual Synergy::data method_constructor(Synergy::data&) { return
Synergy::global_<|class_name:cpp|>; }
    virtual Synergy::data method_method(Synergy::data&);
    <|method.declaration|>
    <|object_cpp.declare|>
  };
};
|class.declaration#>

<#method.declaration|
virtual Synergy::data
method_<|method_name:cpp|>(Synergy::data&)<|method_is_abstract|>;
|#>


In my approach each variable can actually have a sequence of lookups
attached to it, so it becomes easy to sequence a series of classes with the
same template.

Suppose that we want to format a class named `foo' with an object method
named `bar' (among other data). Then the C++ code for that (the way I
implemented it) would look something like
    // Create a new class named "foo":
  (*table)["class"][push_back]["class_name"] = "foo";
    // Create a method named "bar" belonging to last created class ("foo"):
  (*table)["class"][last]["method"][push_back]["method_name"] = "bar";

The formatter then uses this lookup table with same kind of iterated
localities like in say TeX, or any other modern computer language: When one
prints out the "header" macro, when it encounters the "class.declaration"
variable, it iterates through all classes using the "class.declaration"
macro definition. Then, when in the "class.declaration" definition, when it
encounters the "method.declaration", it iterates through all methods
_in_that_class_. If a name is not found locally, it iterates towards the
base to find a more global name.

>> One approach is to parse objects into something like the DOM (Document
>> Object Model, http://www.w3.org/), and then onto that hook a program that
>> can translate into several different formats.
>
>Of course, sgmlspl, jade, xt, and other standard sgml/xml tools
>provide good frameworks for translating into as many different formats
>as one likes by writing, respectively, Perl, DSSSL, and XSLT.
>(Possibly also it would be viable to use David Carlisle's xmltex
>followed by Eitan Gurari's tex4ht in which case one writes TeX.)

So actually, I do not parse into a language, but into a binary model, which
has essentially the same general capacities (a local lookup system) of any
language. Then I use another program to format that into a suitable
language.

>  I wonder how some
>of these things would survive a double translation
>
>      gellmu/article ---(hypothetical)---> TEI ----> LaTeX .

So what I use is something like this your "hypothetical" label here, except
that it is not a language that I use, but a binary model, a sequence of
iterated lookup tables.

>2.  The default "article" document type for _regular_ GELLMU provides
>three character names for each of the 33 non-alphanumeric but
>printable ASCII characters.

As it is a binary model, such parsing concerns are irrelevant.

For example, I wanted to write classes with _arbitrary_ binary string
names, which does not work with C++, which only allows alpha-numerical
names and underscore with some restrictions. But it is easy to mangle
(encode) arbitrary binary string names, which I did by an addition to the
formatter; then it is also irrelevant what kind of parsing I use in my
original language to produce arbitrary binary string names.

If one plays this game along, one ends up with developing a better and
better intermediate binary model. For example, suppose I want to write a
floating number. Right now, it would suffice to use say the C++ syntax, and
parse them as strings which are output verbatim in the C++ files. But
suppose I want to produce output to some languages with a different syntax
than C++ in this respect. Then it would be natural to represent the
floating numbers in some internal binary model, and add to the formatter
the capacity to write out floating point numbers in different formats.

Of course, my needs are specialized at OOPL -> OOPL language translations,
and DPL ("document PL") translations may have other needs.

  Hans Aberg

ATOM RSS1 RSS2