forked from GitHub/gf-core
symbol table
This commit is contained in:
@@ -678,8 +678,30 @@ Compositionality also prevents optimizations during linearization
|
||||
by clever instruction selection, elimination of superfluous
|
||||
labels and jumps, etc.
|
||||
|
||||
It would of course be possible to implement the compiler
|
||||
back end in GF in the traditional way, as a noncompositional
|
||||
One way to achieve compositional JVM linearization would be
|
||||
to change the abstract syntax
|
||||
so that variables do not only carry a string with them but
|
||||
also a relative address. This would certainly be possible
|
||||
with dependent types; but it would clutter the abstract
|
||||
syntax in a way that is hard to motivate when we are in
|
||||
the business of describing the syntax of C. The abstract syntax would
|
||||
have to, so to say, anticipate all demands of the compiler's
|
||||
target languages.
|
||||
|
||||
In fact, translation systems for natural
|
||||
languages have similar problems. For instance, to translate
|
||||
the English pronoun \eex{you} to German, you have to choose
|
||||
between \eex{du, ihr, Sie}; for Italian, there are four
|
||||
variants, and so on. All semantic distinctions
|
||||
made in any of the involved languages have to be present
|
||||
in the common abstract syntax. The usual solution to
|
||||
this problem is \empha{transfer}: you do not just linearize
|
||||
the same syntax tree, but define a function that translates
|
||||
the trees of one language into the trees of another.
|
||||
|
||||
Using transfer in the compiler
|
||||
back end is precisely what traditional compilers do.
|
||||
The transfer function in our case would be a noncompositional
|
||||
function from the abstract syntax of C to a different abstract
|
||||
syntax of JVM. The abstract syntax notation of GF permits
|
||||
definitions of functions, and the GF interpreter can be used
|
||||
@@ -692,27 +714,20 @@ for evaluating terms into normal form. Thus one could write
|
||||
transStm env (Assign typ var exp rest) = ...
|
||||
\end{verbatim}
|
||||
This would be cumbersome in practice, because
|
||||
GF does not have facilities like built-in lists and tuples,
|
||||
or monads. Of course, the compiler could no longer be
|
||||
inverted into a decompiler, in the way true linearization
|
||||
can be inverted into a parser.
|
||||
GF does not have programming-language facilities
|
||||
like built-in lists and tuples, or monads. Of course,
|
||||
the compiler could no longer be inverted into a decompiler,
|
||||
in the way true linearization can be inverted into a parser.
|
||||
|
||||
Yet another possibility is to change the abstract syntax
|
||||
so that variables do not only carry a string with them but
|
||||
also a relative address. This would certainly be possible
|
||||
with dependent types; but it would clutter the abstract
|
||||
syntax in a way that is hard to motivate when we are in
|
||||
the business of describing the syntax of C.
|
||||
|
||||
Perhaps the key idea would be to hard-code some support
|
||||
One more idea would be to hard-code some support
|
||||
for symbol tables into the extension of GF tuned for
|
||||
compiler construction. For instance, the linearization
|
||||
of a binding could store, in addition to the variable
|
||||
symbol field \verb6.$06, an integer-valued fiels \verb6.#06.
|
||||
These fields correspond to an automatic renaming of variables
|
||||
to \verb6x1, x2, x36,\ldots starting from the outermost one.
|
||||
Linearization to C could then use the \verb6.$06 field, as
|
||||
in this paper, and linearization to JVM the \verb6.#06 field.
|
||||
compiler construction. For instance, the concrete syntax of HOAS
|
||||
could not only keep track of variable symbols but also
|
||||
assign a unique index to each symbol.
|
||||
Linearization to C could then use the symbols, as
|
||||
in this paper, and linearization to JVM could use
|
||||
the indexes.
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -753,7 +768,8 @@ semantics that is actually used in the implementation.
|
||||
|
||||
\section{Conclusion}
|
||||
|
||||
We managed to compile a large subset of C, and growing it
|
||||
We have managed to compile a representative
|
||||
subset of C to JVM, and growing it
|
||||
does not necessarily pose any new kinds of problems.
|
||||
Using HOAS and dependent types to describe the abstract
|
||||
syntax of C works fine, and defining the concrete syntax
|
||||
@@ -765,17 +781,16 @@ The parser generated by GF is not able to parse all
|
||||
source programs, because some cyclic parse
|
||||
rules (of the form $C ::= C$) are generated from our grammar.
|
||||
Recovery from cyclic rules is ongoing work in GF independently of this
|
||||
experiment.
|
||||
For the time being, the interactive editor is the best way to
|
||||
experiment. For the time being, the interactive editor is the best way to
|
||||
construct C programs using our grammar.
|
||||
|
||||
The most serious difficulty with using GF as a compiler tool
|
||||
is how to generate machine code by linearization if this depends on
|
||||
an evolving symbol table mapping variables to addresses.
|
||||
a symbol table mapping variables to addresses.
|
||||
Since the compositional linearization model of GF does not
|
||||
support this, we needed postprocessing to get real JVM code
|
||||
from the linearization result. The question is this problem can
|
||||
be solved by some simple and natural new feature of GF.
|
||||
be solved by some simple and natural extension of GF.
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user