symbol table

This commit is contained in:
aarne
2004-09-20 14:28:52 +00:00
parent 825fb52b92
commit 0ff7e33a85

View File

@@ -678,8 +678,30 @@ Compositionality also prevents optimizations during linearization
by clever instruction selection, elimination of superfluous
labels and jumps, etc.
It would of course be possible to implement the compiler
back end in GF in the traditional way, as a noncompositional
One way to achieve compositional JVM linearization would be
to change the abstract syntax
so that variables do not only carry a string with them but
also a relative address. This would certainly be possible
with dependent types; but it would clutter the abstract
syntax in a way that is hard to motivate when we are in
the business of describing the syntax of C. The abstract syntax would
have to, so to say, anticipate all demands of the compiler's
target languages.
In fact, translation systems for natural
languages have similar problems. For instance, to translate
the English pronoun \eex{you} to German, you have to choose
between \eex{du, ihr, Sie}; for Italian, there are four
variants, and so on. All semantic distinctions
made in any of the involved languages have to be present
in the common abstract syntax. The usual solution to
this problem is \empha{transfer}: you do not just linearize
the same syntax tree, but define a function that translates
the trees of one language into the trees of another.
Using transfer in the compiler
back end is precisely what traditional compilers do.
The transfer function in our case would be a noncompositional
function from the abstract syntax of C to a different abstract
syntax of JVM. The abstract syntax notation of GF permits
definitions of functions, and the GF interpreter can be used
@@ -692,27 +714,20 @@ for evaluating terms into normal form. Thus one could write
transStm env (Assign typ var exp rest) = ...
\end{verbatim}
This would be cumbersome in practice, because
GF does not have facilities like built-in lists and tuples,
or monads. Of course, the compiler could no longer be
inverted into a decompiler, in the way true linearization
can be inverted into a parser.
GF does not have programming-language facilities
like built-in lists and tuples, or monads. Of course,
the compiler could no longer be inverted into a decompiler,
in the way true linearization can be inverted into a parser.
Yet another possibility is to change the abstract syntax
so that variables do not only carry a string with them but
also a relative address. This would certainly be possible
with dependent types; but it would clutter the abstract
syntax in a way that is hard to motivate when we are in
the business of describing the syntax of C.
Perhaps the key idea would be to hard-code some support
One more idea would be to hard-code some support
for symbol tables into the extension of GF tuned for
compiler construction. For instance, the linearization
of a binding could store, in addition to the variable
symbol field \verb6.$06, an integer-valued fiels \verb6.#06.
These fields correspond to an automatic renaming of variables
to \verb6x1, x2, x36,\ldots starting from the outermost one.
Linearization to C could then use the \verb6.$06 field, as
in this paper, and linearization to JVM the \verb6.#06 field.
compiler construction. For instance, the concrete syntax of HOAS
could not only keep track of variable symbols but also
assign a unique index to each symbol.
Linearization to C could then use the symbols, as
in this paper, and linearization to JVM could use
the indexes.
@@ -753,7 +768,8 @@ semantics that is actually used in the implementation.
\section{Conclusion}
We managed to compile a large subset of C, and growing it
We have managed to compile a representative
subset of C to JVM, and growing it
does not necessarily pose any new kinds of problems.
Using HOAS and dependent types to describe the abstract
syntax of C works fine, and defining the concrete syntax
@@ -765,17 +781,16 @@ The parser generated by GF is not able to parse all
source programs, because some cyclic parse
rules (of the form $C ::= C$) are generated from our grammar.
Recovery from cyclic rules is ongoing work in GF independently of this
experiment.
For the time being, the interactive editor is the best way to
experiment. For the time being, the interactive editor is the best way to
construct C programs using our grammar.
The most serious difficulty with using GF as a compiler tool
is how to generate machine code by linearization if this depends on
an evolving symbol table mapping variables to addresses.
a symbol table mapping variables to addresses.
Since the compositional linearization model of GF does not
support this, we needed postprocessing to get real JVM code
from the linearization result. The question is this problem can
be solved by some simple and natural new feature of GF.
be solved by some simple and natural extension of GF.