At 14:50 -0500 1-01-06, William F. Hammond wrote:
>2. The default "article" document type for _regular_ GELLMU provides
>three character names for each of the 33 non-alphanumeric but
>printable ASCII characters. Each of those is at risk for some
>conceivable translation target.
I will describe the mangling technique I used for my own OOPL -> C++
translation, which avoids all such problems:
C++ names (identifiers) are allowed to be alpha-numerical with underscores
_, but may not start with a digit; in addition, names starting with an
underscore, or containing two adjacent underscores are reserved for the
implementation of the compiler.
So, in order to make things simple, I started off in my OOPL with names
only containing letters and underscores, with the restriction that names
cannot start or end with an underscore or have two adjacent underscores.
(Names in math normally do not contain digits.) For example, foo_bar is OK,
but not foo__bar, _foo, bar_, or fo0.
My idea is really that the _ ought to be a shorthand for a space, so these
limitations seems reasonable. Then I wanted to extend this so that _any_
binary strings are allowed as names. I did this by allowing names within
quotes ` and ', and using standard C-string conventions with backslash for
special characters, and octal and hexadecimal character representations.
The idea is also that say
foo_bar = `foo bar'
so that when foo_bar is parsed, it is given the same binary translation as
`foo bar', which is the same as the C-string "foo bar" (minus the C
Now, I want to create a C++ label for every such binary string: It is now
irrelevant how I obtained this from the parsing in my OOPL. It is also
irrelevant how I mangle the names, as long as I stick to the same mangling
convention if different translation units should work together. If I change
mangling conventions in the future, the old sources must be recompiled, but
that is all.
It is not so difficult to invent a mangling convention. In my case I
decided that an isolated space, as in `foo bar' should be translated to an
underscore, so that in fact
foo_bar -> binary `foo bar' -> C++ foo_bar.
If there are more spaces, or any other binary character, I merely write
them out hexadecimally starting with a digit 0-7, and the second a digit
0-9 or a letter A-V. One also needs to prepend names with something, in
order to avoid it starting with a digit (which I need to do anyhow, in
order to put different categories of labels into different namespaces).
This idea can then be used in many ways: If the output language only admits
an infinitude of names, one can allow whatever names one wants in the input
and mangling them into the output language.