diff --git a/examples/gfcc/complin.tex b/examples/gfcc/complin.tex index 835e92702..3f601e8ec 100644 --- a/examples/gfcc/complin.tex +++ b/examples/gfcc/complin.tex @@ -678,8 +678,30 @@ Compositionality also prevents optimizations during linearization by clever instruction selection, elimination of superfluous labels and jumps, etc. -It would of course be possible to implement the compiler -back end in GF in the traditional way, as a noncompositional +One way to achieve compositional JVM linearization would be +to change the abstract syntax +so that variables do not only carry a string with them but +also a relative address. This would certainly be possible +with dependent types; but it would clutter the abstract +syntax in a way that is hard to motivate when we are in +the business of describing the syntax of C. The abstract syntax would +have to, so to say, anticipate all demands of the compiler's +target languages. + +In fact, translation systems for natural +languages have similar problems. For instance, to translate +the English pronoun \eex{you} to German, you have to choose +between \eex{du, ihr, Sie}; for Italian, there are four +variants, and so on. All semantic distinctions +made in any of the involved languages have to be present +in the common abstract syntax. The usual solution to +this problem is \empha{transfer}: you do not just linearize +the same syntax tree, but define a function that translates +the trees of one language into the trees of another. + +Using transfer in the compiler +back end is precisely what traditional compilers do. +The transfer function in our case would be a noncompositional function from the abstract syntax of C to a different abstract syntax of JVM. The abstract syntax notation of GF permits definitions of functions, and the GF interpreter can be used @@ -692,27 +714,20 @@ for evaluating terms into normal form. Thus one could write transStm env (Assign typ var exp rest) = ... \end{verbatim} This would be cumbersome in practice, because -GF does not have facilities like built-in lists and tuples, -or monads. Of course, the compiler could no longer be -inverted into a decompiler, in the way true linearization -can be inverted into a parser. +GF does not have programming-language facilities +like built-in lists and tuples, or monads. Of course, +the compiler could no longer be inverted into a decompiler, +in the way true linearization can be inverted into a parser. -Yet another possibility is to change the abstract syntax -so that variables do not only carry a string with them but -also a relative address. This would certainly be possible -with dependent types; but it would clutter the abstract -syntax in a way that is hard to motivate when we are in -the business of describing the syntax of C. - -Perhaps the key idea would be to hard-code some support +One more idea would be to hard-code some support for symbol tables into the extension of GF tuned for -compiler construction. For instance, the linearization -of a binding could store, in addition to the variable -symbol field \verb6.$06, an integer-valued fiels \verb6.#06. -These fields correspond to an automatic renaming of variables -to \verb6x1, x2, x36,\ldots starting from the outermost one. -Linearization to C could then use the \verb6.$06 field, as -in this paper, and linearization to JVM the \verb6.#06 field. +compiler construction. For instance, the concrete syntax of HOAS +could not only keep track of variable symbols but also +assign a unique index to each symbol. +Linearization to C could then use the symbols, as +in this paper, and linearization to JVM could use +the indexes. + @@ -753,7 +768,8 @@ semantics that is actually used in the implementation. \section{Conclusion} -We managed to compile a large subset of C, and growing it +We have managed to compile a representative +subset of C to JVM, and growing it does not necessarily pose any new kinds of problems. Using HOAS and dependent types to describe the abstract syntax of C works fine, and defining the concrete syntax @@ -765,17 +781,16 @@ The parser generated by GF is not able to parse all source programs, because some cyclic parse rules (of the form $C ::= C$) are generated from our grammar. Recovery from cyclic rules is ongoing work in GF independently of this -experiment. -For the time being, the interactive editor is the best way to +experiment. For the time being, the interactive editor is the best way to construct C programs using our grammar. The most serious difficulty with using GF as a compiler tool is how to generate machine code by linearization if this depends on -an evolving symbol table mapping variables to addresses. +a symbol table mapping variables to addresses. Since the compositional linearization model of GF does not support this, we needed postprocessing to get real JVM code from the linearization result. The question is this problem can -be solved by some simple and natural new feature of GF. +be solved by some simple and natural extension of GF.