symbol table

This commit is contained in:
aarne
2004-09-20 14:28:52 +00:00
parent 07464264da
commit 6afcb5009a

View File

@@ -678,8 +678,30 @@ Compositionality also prevents optimizations during linearization
by clever instruction selection, elimination of superfluous by clever instruction selection, elimination of superfluous
labels and jumps, etc. labels and jumps, etc.
It would of course be possible to implement the compiler One way to achieve compositional JVM linearization would be
back end in GF in the traditional way, as a noncompositional to change the abstract syntax
so that variables do not only carry a string with them but
also a relative address. This would certainly be possible
with dependent types; but it would clutter the abstract
syntax in a way that is hard to motivate when we are in
the business of describing the syntax of C. The abstract syntax would
have to, so to say, anticipate all demands of the compiler's
target languages.
In fact, translation systems for natural
languages have similar problems. For instance, to translate
the English pronoun \eex{you} to German, you have to choose
between \eex{du, ihr, Sie}; for Italian, there are four
variants, and so on. All semantic distinctions
made in any of the involved languages have to be present
in the common abstract syntax. The usual solution to
this problem is \empha{transfer}: you do not just linearize
the same syntax tree, but define a function that translates
the trees of one language into the trees of another.
Using transfer in the compiler
back end is precisely what traditional compilers do.
The transfer function in our case would be a noncompositional
function from the abstract syntax of C to a different abstract function from the abstract syntax of C to a different abstract
syntax of JVM. The abstract syntax notation of GF permits syntax of JVM. The abstract syntax notation of GF permits
definitions of functions, and the GF interpreter can be used definitions of functions, and the GF interpreter can be used
@@ -692,27 +714,20 @@ for evaluating terms into normal form. Thus one could write
transStm env (Assign typ var exp rest) = ... transStm env (Assign typ var exp rest) = ...
\end{verbatim} \end{verbatim}
This would be cumbersome in practice, because This would be cumbersome in practice, because
GF does not have facilities like built-in lists and tuples, GF does not have programming-language facilities
or monads. Of course, the compiler could no longer be like built-in lists and tuples, or monads. Of course,
inverted into a decompiler, in the way true linearization the compiler could no longer be inverted into a decompiler,
can be inverted into a parser. in the way true linearization can be inverted into a parser.
Yet another possibility is to change the abstract syntax One more idea would be to hard-code some support
so that variables do not only carry a string with them but
also a relative address. This would certainly be possible
with dependent types; but it would clutter the abstract
syntax in a way that is hard to motivate when we are in
the business of describing the syntax of C.
Perhaps the key idea would be to hard-code some support
for symbol tables into the extension of GF tuned for for symbol tables into the extension of GF tuned for
compiler construction. For instance, the linearization compiler construction. For instance, the concrete syntax of HOAS
of a binding could store, in addition to the variable could not only keep track of variable symbols but also
symbol field \verb6.$06, an integer-valued fiels \verb6.#06. assign a unique index to each symbol.
These fields correspond to an automatic renaming of variables Linearization to C could then use the symbols, as
to \verb6x1, x2, x36,\ldots starting from the outermost one. in this paper, and linearization to JVM could use
Linearization to C could then use the \verb6.$06 field, as the indexes.
in this paper, and linearization to JVM the \verb6.#06 field.
@@ -753,7 +768,8 @@ semantics that is actually used in the implementation.
\section{Conclusion} \section{Conclusion}
We managed to compile a large subset of C, and growing it We have managed to compile a representative
subset of C to JVM, and growing it
does not necessarily pose any new kinds of problems. does not necessarily pose any new kinds of problems.
Using HOAS and dependent types to describe the abstract Using HOAS and dependent types to describe the abstract
syntax of C works fine, and defining the concrete syntax syntax of C works fine, and defining the concrete syntax
@@ -765,17 +781,16 @@ The parser generated by GF is not able to parse all
source programs, because some cyclic parse source programs, because some cyclic parse
rules (of the form $C ::= C$) are generated from our grammar. rules (of the form $C ::= C$) are generated from our grammar.
Recovery from cyclic rules is ongoing work in GF independently of this Recovery from cyclic rules is ongoing work in GF independently of this
experiment. experiment. For the time being, the interactive editor is the best way to
For the time being, the interactive editor is the best way to
construct C programs using our grammar. construct C programs using our grammar.
The most serious difficulty with using GF as a compiler tool The most serious difficulty with using GF as a compiler tool
is how to generate machine code by linearization if this depends on is how to generate machine code by linearization if this depends on
an evolving symbol table mapping variables to addresses. a symbol table mapping variables to addresses.
Since the compositional linearization model of GF does not Since the compositional linearization model of GF does not
support this, we needed postprocessing to get real JVM code support this, we needed postprocessing to get real JVM code
from the linearization result. The question is this problem can from the linearization result. The question is this problem can
be solved by some simple and natural new feature of GF. be solved by some simple and natural extension of GF.