At 14:50 -0500 1-01-06, William F. Hammond wrote: >> If you are in the need of various translations, have you tried using Flex >> (lexical analyzer generator) and Bison (parser generator, or >> compiler-compiler), see > >Are you saying that it's easier to code translations from XML using >lex and yacc descendants rather than using standard XML tools such as >sgmlspl, jade, or xt? I find that hard to believe. (Of course, the >situation before 1996 was different.) I do not know exactly what you want to achieve: I get the impression that you have an language of your own of some sort, and want to be able to translate it into different formats. If your language is just a dialect of XML, and there are XML parser generators available similar to that of Bison, then use that. The translation I needed was as follows: From my own language, I want to output C++ code. This proved very difficult, because local code generates information (such as include files, declarations, and definitions) that should be output in different places and files in the C++ output files. Therefore, instead of doing the parsing immediately into a new language, I invented an intermediate "formatting" language: Given a set of macro definitions, normally provided a formatting file (thus providing the specific data of the output language, in my case, C++), and a set of iterated lookup tables (in iternal binary format), produced by the parsing, it knows how to pick together suitable output files. The idea is to make the actual parsing as independent as possible of any output language, only producing the lookup tables. Then by merely switching the formatting file with the macro definitions, one can generate output to different languages. >> -- I use them together with C++, which is convenient as the latter has >> standard string classes. > >Although I've written in C, I've never gotten into C++. Are there >good regular expression libraries for C++? If you need full regular expressions and a full LR(1) parser within your language, then the simplest approach is to let your language output Flex .l and Bison .y files; then compile these files using Flex and Bison, and finally compile the files so output using a C++ compiler. This is sort of a standard computer technique: for example, the Haskell compiler GHC produces .c files in this way. Also note that Flex and Bison are themselves also compilers, and one can use Flex and Bison to write new versions of themselves. -- Actually, they do. :-) -- I only use C++ because it is convenient to produce an internal binary representation, which later can be used to produce the C++ output format. The iterated lookup tables I use are just map<string, variable> (meaning that one can index a finite set of variable by string keys) where "variable" is a class with suitable lookup information. Let's take a simple example: In the output in my application, I need to build a sequence of classes, which can have a sequence of methods, with definitions that should be output in various places. The main point is that one has a sequence of lookup localities, like in most modern computer languages. In my formatting file, I may have something like the stuff below: Here, <#header|...|header#> <#header|...|#> encloses a macro definition, and <|header_name|> is an invocation of the variable "header_name", and so on. <#header| #ifndef Synergy_<|header_name|>_header #define Synergy_<|header_name|>_header #if !__cplusplus #error Header file "<|header_name|>" only for C++. #endif #include <stdexcept> #include "data" #include "construct" <|header_preamble|> namespace Synergy { <|class.declaration|> } // namespace Synergy #endif // <|header_name|>_ |header#> <#class.declaration| extern Synergy::data global_<|class_name:cpp|>; class <|class_name:cpp|> : public virtual construct { public: static const char* category; static object_method_base* lookup_method(const std::string&); static Synergy::data global; class object; typedef <|class_name:cpp|> constructor; virtual root* clone() const { return new Synergy::<|class_name:cpp|>(*this); } virtual bool cloneable() { return <|object_cloneable|>; } <|object_copy_to_clone_method|> virtual Synergy::data method_method(Synergy::data&); virtual Synergy::data method_object(Synergy::data& x) { return new object(x); } virtual Synergy::data method_object_method(Synergy::data& x); <|constructor_method.declaration|> <|constructor_cpp.declare|> class object : <|object_base|><|object_cpp_base|> { public: static const char* category; static object_method_base* lookup_method(const std::string&); <|object_constructor.declaration|> virtual root* clone() const { <|clone_method_definition|> } <|copy_method|> <|object_data|> virtual Synergy::data method_constructor(Synergy::data&) { return Synergy::global_<|class_name:cpp|>; } virtual Synergy::data method_method(Synergy::data&); <|method.declaration|> <|object_cpp.declare|> }; }; |class.declaration#> <#method.declaration| virtual Synergy::data method_<|method_name:cpp|>(Synergy::data&)<|method_is_abstract|>; |#> In my approach each variable can actually have a sequence of lookups attached to it, so it becomes easy to sequence a series of classes with the same template. Suppose that we want to format a class named `foo' with an object method named `bar' (among other data). Then the C++ code for that (the way I implemented it) would look something like // Create a new class named "foo": (*table)["class"][push_back]["class_name"] = "foo"; // Create a method named "bar" belonging to last created class ("foo"): (*table)["class"][last]["method"][push_back]["method_name"] = "bar"; The formatter then uses this lookup table with same kind of iterated localities like in say TeX, or any other modern computer language: When one prints out the "header" macro, when it encounters the "class.declaration" variable, it iterates through all classes using the "class.declaration" macro definition. Then, when in the "class.declaration" definition, when it encounters the "method.declaration", it iterates through all methods _in_that_class_. If a name is not found locally, it iterates towards the base to find a more global name. >> One approach is to parse objects into something like the DOM (Document >> Object Model, http://www.w3.org/), and then onto that hook a program that >> can translate into several different formats. > >Of course, sgmlspl, jade, xt, and other standard sgml/xml tools >provide good frameworks for translating into as many different formats >as one likes by writing, respectively, Perl, DSSSL, and XSLT. >(Possibly also it would be viable to use David Carlisle's xmltex >followed by Eitan Gurari's tex4ht in which case one writes TeX.) So actually, I do not parse into a language, but into a binary model, which has essentially the same general capacities (a local lookup system) of any language. Then I use another program to format that into a suitable language. > I wonder how some >of these things would survive a double translation > > gellmu/article ---(hypothetical)---> TEI ----> LaTeX . So what I use is something like this your "hypothetical" label here, except that it is not a language that I use, but a binary model, a sequence of iterated lookup tables. >2. The default "article" document type for _regular_ GELLMU provides >three character names for each of the 33 non-alphanumeric but >printable ASCII characters. As it is a binary model, such parsing concerns are irrelevant. For example, I wanted to write classes with _arbitrary_ binary string names, which does not work with C++, which only allows alpha-numerical names and underscore with some restrictions. But it is easy to mangle (encode) arbitrary binary string names, which I did by an addition to the formatter; then it is also irrelevant what kind of parsing I use in my original language to produce arbitrary binary string names. If one plays this game along, one ends up with developing a better and better intermediate binary model. For example, suppose I want to write a floating number. Right now, it would suffice to use say the C++ syntax, and parse them as strings which are output verbatim in the C++ files. But suppose I want to produce output to some languages with a different syntax than C++ in this respect. Then it would be natural to represent the floating numbers in some internal binary model, and add to the formatter the capacity to write out floating point numbers in different formats. Of course, my needs are specialized at OOPL -> OOPL language translations, and DPL ("document PL") translations may have other needs. Hans Aberg