forked from GitHub/gf-core
report ready
This commit is contained in:
@@ -14,7 +14,7 @@ abstract Imper = {
|
|||||||
fun
|
fun
|
||||||
Empty : Program ;
|
Empty : Program ;
|
||||||
Funct : (AS : ListTyp) -> (V : Typ) ->
|
Funct : (AS : ListTyp) -> (V : Typ) ->
|
||||||
(Body AS) -> (Fun AS V -> Program) -> Program ;
|
Body AS -> (Fun AS V -> Program) -> Program ;
|
||||||
|
|
||||||
BodyNil : Stm -> Body NilTyp ;
|
BodyNil : Stm -> Body NilTyp ;
|
||||||
BodyCons : (A : Typ) -> (AS : ListTyp) ->
|
BodyCons : (A : Typ) -> (AS : ListTyp) ->
|
||||||
@@ -31,16 +31,12 @@ abstract Imper = {
|
|||||||
EVar : (A : Typ) -> Var A -> Exp A ;
|
EVar : (A : Typ) -> Var A -> Exp A ;
|
||||||
EInt : Int -> Exp TInt ;
|
EInt : Int -> Exp TInt ;
|
||||||
EFloat : Int -> Int -> Exp TFloat ;
|
EFloat : Int -> Int -> Exp TFloat ;
|
||||||
EAddI : Exp TInt -> Exp TInt -> Exp TInt ;
|
|
||||||
EAddF : Exp TFloat -> Exp TFloat -> Exp TFloat ;
|
|
||||||
ESubI : Exp TInt -> Exp TInt -> Exp TInt ;
|
|
||||||
ESubF : Exp TFloat -> Exp TFloat -> Exp TFloat ;
|
|
||||||
EMulI : Exp TInt -> Exp TInt -> Exp TInt ;
|
|
||||||
EMulF : Exp TFloat -> Exp TFloat -> Exp TFloat ;
|
|
||||||
ELtI : Exp TInt -> Exp TInt -> Exp TInt ;
|
ELtI : Exp TInt -> Exp TInt -> Exp TInt ;
|
||||||
ELtF : Exp TFloat -> Exp TFloat -> Exp TInt ;
|
ELtF : Exp TFloat -> Exp TFloat -> Exp TInt ;
|
||||||
EApp : (AS : ListTyp) -> (V : Typ) -> Fun AS V -> ListExp AS -> Exp V ;
|
EApp : (AS : ListTyp) -> (V : Typ) -> Fun AS V -> ListExp AS -> Exp V ;
|
||||||
|
EAddI, EMulI, ESubI : Exp TInt -> Exp TInt -> Exp TInt ;
|
||||||
|
EAddF, EMulF, ESubF : Exp TFloat -> Exp TFloat -> Exp TFloat ;
|
||||||
|
|
||||||
TInt : Typ ;
|
TInt : Typ ;
|
||||||
TFloat : Typ ;
|
TFloat : Typ ;
|
||||||
|
|
||||||
|
|||||||
@@ -46,27 +46,31 @@ The challenge was to use dependent types to write a compiler from
|
|||||||
C to bytecode. This paper does not meet the challenge quite literally,
|
C to bytecode. This paper does not meet the challenge quite literally,
|
||||||
since our compiler is for a different subset of C than Augustsson's
|
since our compiler is for a different subset of C than Augustsson's
|
||||||
specification, and since the bytecode that we generate is JVM instead
|
specification, and since the bytecode that we generate is JVM instead
|
||||||
of his format.
|
of his format. But it definitely makes use of dependent types.
|
||||||
|
|
||||||
Augustsson's challenge did not specify \textit{how} dependent
|
Augustsson's challenge did not specify \textit{how} dependent
|
||||||
types are to be used, and the first of the two points we make in this
|
types are to be used, and the first of the two points we make in this
|
||||||
paper is how we interpreted the use of dependent types:
|
paper (and its title) reflects our interpretation:
|
||||||
we use dependent types in combination with higher-order abstract syntax (HOAS)
|
we use dependent types, in combination with higher-order abstract syntax (HOAS),
|
||||||
to define the grammar of the source language (here, the fragment of C).
|
to define the grammar of the source language (here, the fragment of C).
|
||||||
This grammar constitutes a single, declarative source, from which
|
The grammar constitutes the single, declarative source from which
|
||||||
the compiler front end is derived, comprising both parser and type
|
the compiler front end is derived, comprising both parser and type
|
||||||
checker.
|
checker.
|
||||||
|
|
||||||
|
The second point, code generation by linearization, means that the
|
||||||
|
back end is likewise implemented by a grammar of the target
|
||||||
|
language (in this case, a fragment of JVM). This grammar is the
|
||||||
|
declarative source from which the compiler back end is derived.
|
||||||
|
|
||||||
The complete code of the compiler is 300 lines. It is found in
|
The complete code of the compiler is 300 lines. It is found in
|
||||||
the appendices of this paper.
|
the appendices of this paper.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\section{The Grammatical Framework}
|
\section{The Grammatical Framework}
|
||||||
|
|
||||||
The second point mentioned in the title,
|
The tool we have used for implementing the compiler is
|
||||||
code generation as linearization, was suggested by
|
GF, the \empha{Grammatical Framework} \cite{gf-jfp}. GF
|
||||||
the tool we have used for implementing the grammar:
|
|
||||||
the \empha{Grammatical Framework} \cite{gf-jfp}. GF
|
|
||||||
is similar to a Logical Framework (LF) extended with
|
is similar to a Logical Framework (LF) extended with
|
||||||
a notation for defining concrete syntax. GF was originally
|
a notation for defining concrete syntax. GF was originally
|
||||||
designed to help building multilingual
|
designed to help building multilingual
|
||||||
@@ -74,25 +78,29 @@ translation systems for natural languages and also
|
|||||||
between formal and natural languages. The translation model
|
between formal and natural languages. The translation model
|
||||||
implemented by GF is very simple:
|
implemented by GF is very simple:
|
||||||
\begin{verbatim}
|
\begin{verbatim}
|
||||||
linearization
|
|
||||||
--------------->
|
|
||||||
Abstract Syntax Language_i
|
parsing linearization
|
||||||
<---------------
|
------------> ------------>
|
||||||
parsing
|
Language_1 Abstract Syntax Language_2
|
||||||
|
<------------ <------------
|
||||||
|
linearization parsing
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
An abstract syntax is similar to a \empha{theory} ,or a
|
An abstract syntax is similar to a \empha{theory}, or a
|
||||||
\empha{signature} in a logical framework. A
|
\empha{signature} in a logical framework. A
|
||||||
concrete syntax defines, in a declarative way,
|
concrete syntax defines, in a declarative way,
|
||||||
a translation of abstract syntax trees (well-formed terms)
|
a translation of abstract syntax trees (well-formed terms)
|
||||||
into concrete language structures, and this definition can
|
into concrete language structures, and from this definition, one can
|
||||||
be used for both linearization and parsing.
|
derive both both linearization and parsing.
|
||||||
The number of languages related to one abstract syntax in
|
The number of languages related to one abstract syntax in
|
||||||
this way is unlimited. If just one language is involved, GF
|
this way is of course not limited to two. Sometimes just just one
|
||||||
works much the same way as any grammar formalism. The largest
|
language is involved;
|
||||||
application known to us links 88 languages and translates
|
GF then works much the same way as any grammar formalism or parser
|
||||||
numeral expressions between them.
|
generator.
|
||||||
|
The largest application known to us has 88 languages and translates
|
||||||
|
numeral expressions from 1 to 999,999 between them \cite{gf-homepage}.
|
||||||
|
|
||||||
From the GF point of view, the goal of the experiment
|
From the GF point of view, the goal of the compiler experiment
|
||||||
is to investigate if GF is capable of implementing
|
is to investigate if GF is capable of implementing
|
||||||
compilers using the ideas of single-source language definition
|
compilers using the ideas of single-source language definition
|
||||||
and code generation as linearization. The working hypothesis
|
and code generation as linearization. The working hypothesis
|
||||||
@@ -123,7 +131,7 @@ JVM code back into C code.
|
|||||||
The language has an interactive editor that also supports incremental
|
The language has an interactive editor that also supports incremental
|
||||||
compilation.
|
compilation.
|
||||||
\enqu
|
\enqu
|
||||||
The problems that we have encountered and their causes will be explained in
|
The problems that we encountered and their causes will be explained in
|
||||||
the relevant sections of this report. To summarize,
|
the relevant sections of this report. To summarize,
|
||||||
\bequ
|
\bequ
|
||||||
The scoping conditions resulting from HOAS are slightly different
|
The scoping conditions resulting from HOAS are slightly different
|
||||||
@@ -131,29 +139,43 @@ from the standard ones of C.
|
|||||||
|
|
||||||
Our JVM syntax is forced to be slightly different from original.
|
Our JVM syntax is forced to be slightly different from original.
|
||||||
|
|
||||||
Using HOAS to encode bindings of functions is somewhat cumbersome.
|
Using HOAS to encode all bindings is sometimes cumbersome.
|
||||||
|
|
||||||
The C parser derived from the GF grammar does not recognize all
|
The C parser derived from the GF grammar does not recognize all
|
||||||
valid programs.
|
valid programs.
|
||||||
\enqu
|
\enqu
|
||||||
The first two shortcomings seem to be inevitable with the technique
|
The first two shortcomings seem to be inevitable with the technique
|
||||||
we use. The real JVM syntax, however, is easy to obtain by simple
|
we use. The best we can do with the JVM syntax is to use simple
|
||||||
string processing from our one. The latter two shortcomings
|
postprocessing, on string level, to obtain valid JVM. The latter
|
||||||
|
two shortcomings have to do with the current implementation of GF
|
||||||
|
rather than its core. They
|
||||||
suggest that GF should be fine-tuned to give better support
|
suggest that GF should be fine-tuned to give better support
|
||||||
to compiler construction, which, after all, is not an intended
|
to compiler construction, which, after all, is not an intended
|
||||||
use of GF as it is now.
|
use of GF as it is now.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\section{The abstract syntax}
|
\section{The abstract syntax}
|
||||||
|
|
||||||
An abstract syntax in GF consists of \texttt{cat} judgements
|
An \empha{abstract syntax} in GF consists of \texttt{cat} judgements
|
||||||
declaring basic types, and \texttt{fun} judgements declaring
|
\[
|
||||||
functions. Syntax trees are well-formed terms of basic
|
\mbox{\texttt{cat}} \; C \; \Gamma
|
||||||
|
\]
|
||||||
|
declaring basic types (depending on a context $\Gamma$),
|
||||||
|
and \texttt{fun} judgements
|
||||||
|
\[
|
||||||
|
\mbox{\texttt{fun}} \; f \; \mbox{\texttt{:}} \; A
|
||||||
|
\]
|
||||||
|
declaring functions $f$ of any type $A$, which can be a basic type or
|
||||||
|
a function type.
|
||||||
|
\empha{Syntax trees} are well-formed terms of basic
|
||||||
types. As for notation, each judgement form is recognized by
|
types. As for notation, each judgement form is recognized by
|
||||||
its keyword, and the same keyword governs all judgements
|
its keyword (\texttt{cat}, \texttt{fun}, etc),
|
||||||
|
and the same keyword governs all judgements
|
||||||
until the next keyword is encountered.
|
until the next keyword is encountered.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\subsection{Statements}
|
\subsection{Statements}
|
||||||
|
|
||||||
Statements in C may involve variables, expressions, and
|
Statements in C may involve variables, expressions, and
|
||||||
@@ -168,8 +190,9 @@ that are needed to construct statements
|
|||||||
Var Typ ;
|
Var Typ ;
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
The type \texttt{Typ} is the type of C's datatypes.
|
The type \texttt{Typ} is the type of C's datatypes.
|
||||||
Expressions (\texttt{Exp}) is a dependent type: we
|
Expressions (\texttt{Exp}) is a dependent type, since it has a
|
||||||
will give rules to construct well-typed expressions of
|
nonempty context. The rules for \texttt{Exp}
|
||||||
|
will thus be rules to construct well-typed expressions of
|
||||||
a given type. \texttt{Var}\ is the type of variables,
|
a given type. \texttt{Var}\ is the type of variables,
|
||||||
of a given type, that get bound in C's variable
|
of a given type, that get bound in C's variable
|
||||||
declarations.
|
declarations.
|
||||||
@@ -185,7 +208,7 @@ rules define their abstract syntax:
|
|||||||
Dependent types are used in \texttt{Assign} to
|
Dependent types are used in \texttt{Assign} to
|
||||||
control that a variable is always assigned a value of proper
|
control that a variable is always assigned a value of proper
|
||||||
type. The \texttt{Decl}\ function captures the rule that
|
type. The \texttt{Decl}\ function captures the rule that
|
||||||
a variable must be declared before it can be assigned to:
|
a variable must be declared before it can be used or assigned to:
|
||||||
its second argument is a \empha{continuation}, which is
|
its second argument is a \empha{continuation}, which is
|
||||||
the sequence of statements that depend on (= may refer to)
|
the sequence of statements that depend on (= may refer to)
|
||||||
the declared variable.
|
the declared variable.
|
||||||
@@ -194,7 +217,7 @@ We will treat all statements, except
|
|||||||
\texttt{return}s, in terms of continuations. A sequence of
|
\texttt{return}s, in terms of continuations. A sequence of
|
||||||
statements (which always has the type \texttt{Stm}) thus
|
statements (which always has the type \texttt{Stm}) thus
|
||||||
always ends in a \texttt{return}, or, abruptly, in
|
always ends in a \texttt{return}, or, abruptly, in
|
||||||
an empty statement. Here are rules for some other
|
an empty statement, \texttt{End}. Here are rules for some other
|
||||||
statement forms:
|
statement forms:
|
||||||
\begin{verbatim}
|
\begin{verbatim}
|
||||||
Return : (A : Typ) -> Exp A -> Stm ;
|
Return : (A : Typ) -> Exp A -> Stm ;
|
||||||
@@ -209,7 +232,8 @@ Here is an example of a piece of code and its abstract syntax.
|
|||||||
x = 5 ; Assign TInt x (EInt 5) (
|
x = 5 ; Assign TInt x (EInt 5) (
|
||||||
return x ; Return TInt (EVar TInt x)))
|
return x ; Return TInt (EVar TInt x)))
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
(Expression syntax is explained in the next section.)
|
(Expression syntax will be explained in the next section.)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\subsection{Expressions and types}
|
\subsection{Expressions and types}
|
||||||
@@ -225,14 +249,10 @@ from variables, and by means of binary operations.
|
|||||||
EVar : (A : Typ) -> Var A -> Exp A ;
|
EVar : (A : Typ) -> Var A -> Exp A ;
|
||||||
EInt : Int -> Exp TInt ;
|
EInt : Int -> Exp TInt ;
|
||||||
EFloat : Int -> Int -> Exp TFloat ;
|
EFloat : Int -> Int -> Exp TFloat ;
|
||||||
EAddI : Exp TInt -> Exp TInt -> Exp TInt ;
|
|
||||||
EAddF : Exp TFloat -> Exp TFloat -> Exp TFloat ;
|
|
||||||
ESubI : Exp TInt -> Exp TInt -> Exp TInt ;
|
|
||||||
ESubF : Exp TFloat -> Exp TFloat -> Exp TFloat ;
|
|
||||||
EMulI : Exp TInt -> Exp TInt -> Exp TInt ;
|
|
||||||
EMulF : Exp TFloat -> Exp TFloat -> Exp TFloat ;
|
|
||||||
ELtI : Exp TInt -> Exp TInt -> Exp TInt ;
|
ELtI : Exp TInt -> Exp TInt -> Exp TInt ;
|
||||||
ELtF : Exp TFloat -> Exp TFloat -> Exp TInt ;
|
ELtF : Exp TFloat -> Exp TFloat -> Exp TInt ;
|
||||||
|
EAddI, EMulI, ESubI : Exp TInt -> Exp TInt -> Exp TInt ;
|
||||||
|
EAddF, EMulF, ESubF : Exp TFloat -> Exp TFloat -> Exp TFloat ;
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
Notice that GF has a built-in type \texttt{Int} of
|
Notice that GF has a built-in type \texttt{Int} of
|
||||||
integer literals, but floats are constructed by hand.
|
integer literals, but floats are constructed by hand.
|
||||||
@@ -277,14 +297,14 @@ of variables in statements, using a continuation.
|
|||||||
fun
|
fun
|
||||||
Empty : Program ;
|
Empty : Program ;
|
||||||
Funct : (AS : ListTyp) -> (V : Typ) ->
|
Funct : (AS : ListTyp) -> (V : Typ) ->
|
||||||
(Body AS) -> (Fun AS V -> Program) -> Program ;
|
Body AS -> (Fun AS V -> Program) -> Program ;
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
However, here we must also account for the binding of
|
However, here we must also account for the binding of
|
||||||
a function's parameters in its body. We could use
|
a function's parameters in its body. We could use
|
||||||
vectors of variables, in the same way as vectors of
|
vectors of variables, in the same way as vectors of
|
||||||
expressions are used to give arguments to functions.
|
expressions are used to give arguments to functions.
|
||||||
However, this would lead to the need of cumbersome
|
However, this would lead to the need of cumbersome
|
||||||
projection functions when dereferencing the parameters
|
projection functions when using the parameters
|
||||||
in the function body. A more elegant solution is
|
in the function body. A more elegant solution is
|
||||||
to use HOAS to build function bodies:
|
to use HOAS to build function bodies:
|
||||||
\begin{verbatim}
|
\begin{verbatim}
|
||||||
@@ -313,16 +333,36 @@ name of the function appears in the syntax trees.
|
|||||||
A more serious shortcoming of our way of defining functions
|
A more serious shortcoming of our way of defining functions
|
||||||
is that it does not allow recursion. The reason is simple:
|
is that it does not allow recursion. The reason is simple:
|
||||||
the function symbol is only bound in the continuation
|
the function symbol is only bound in the continuation
|
||||||
of the program, not in the function body.
|
of the program, not in the function body. It seems we could
|
||||||
|
save recursive functions with the following variant:
|
||||||
|
\begin{verbatim}
|
||||||
|
cat
|
||||||
|
BodyProgram ListTyp ;
|
||||||
|
Body ListTyp ;
|
||||||
|
|
||||||
|
fun
|
||||||
|
mkBodyProgram : (AS : ListTyp) ->
|
||||||
|
Body AS -> Program -> BodyProgram ListTyp ;
|
||||||
|
RecFunct : (AS : ListTyp) -> (V : Typ) ->
|
||||||
|
(Fun AS V -> BodyProgram AS) -> Program ;
|
||||||
|
\end{verbatim}
|
||||||
|
but we have not yet investigated this.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\section{The concrete syntax of C}
|
\section{The concrete syntax of C}
|
||||||
|
|
||||||
A concrete syntax, for a given abstract syntax,
|
A concrete syntax, for a given abstract syntax,
|
||||||
consists of \texttt{lincat} judgements
|
consists of \texttt{lincat} judgements
|
||||||
defining the \empha{linearization types} of each category,
|
\[
|
||||||
|
\mbox{\texttt{lincat}} \; C \; \mbox{\texttt{=}} \; T
|
||||||
|
\]
|
||||||
|
defining the \empha{linearization types} $T$ of each category $C$,
|
||||||
and \texttt{lin} judgements
|
and \texttt{lin} judgements
|
||||||
defining the \empha{linearization functions} of each function
|
\[
|
||||||
|
\mbox{\texttt{lin}} \; f \; \mbox{\texttt{=}} \; t
|
||||||
|
\]
|
||||||
|
defining the \empha{linearization functions} $t$ of each function $f$
|
||||||
in the abstract syntax. The linearization functions are
|
in the abstract syntax. The linearization functions are
|
||||||
checked to be well-typed with respect the \texttt{lincat}
|
checked to be well-typed with respect the \texttt{lincat}
|
||||||
definitions, and the syntax of GF forces them to be \empha{compositional}
|
definitions, and the syntax of GF forces them to be \empha{compositional}
|
||||||
@@ -348,9 +388,9 @@ linearization functions, and linearizations of trees.
|
|||||||
Because of compositionality, no case analysis on expressions
|
Because of compositionality, no case analysis on expressions
|
||||||
is possible in linearization rules. The values of linearization
|
is possible in linearization rules. The values of linearization
|
||||||
therefore have to carry information on how they are used in
|
therefore have to carry information on how they are used in
|
||||||
different situations, which means that linearization
|
different situations. Therefore linearization
|
||||||
types are record types instead of just the string type.
|
types are generally record types instead of just the string type.
|
||||||
The simplest record type that is used is
|
The simplest record type that is used in GF is
|
||||||
\begin{verbatim}
|
\begin{verbatim}
|
||||||
{s : Str}
|
{s : Str}
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
@@ -365,13 +405,23 @@ used by default.
|
|||||||
Resource modules define auxiliary notions that can be
|
Resource modules define auxiliary notions that can be
|
||||||
used in concrete syntax. These notions include
|
used in concrete syntax. These notions include
|
||||||
\empha{parameter types} defined by \texttt{param}
|
\empha{parameter types} defined by \texttt{param}
|
||||||
judgements, and \empha{operations} defined by
|
judgements
|
||||||
\texttt{oper} judgements. These judgements are
|
\[
|
||||||
|
\mbox{\texttt{param}} \; P \; \mbox{\texttt{=}}
|
||||||
|
\; C_{1} \; \Gamma_{1} \; \mid \; \cdots \; \mid \;
|
||||||
|
\; C_{n} \; \Gamma_{n}
|
||||||
|
\]
|
||||||
|
and \empha{operations} defined by
|
||||||
|
\texttt{oper} judgements
|
||||||
|
\[
|
||||||
|
\mbox{\texttt{oper}} \; f \; \mbox{\texttt{:}} \; T \; \mbox{\texttt{=}} \; t
|
||||||
|
\]
|
||||||
|
These judgements are
|
||||||
similar to datatype and function definitions
|
similar to datatype and function definitions
|
||||||
in functional programming, with the restriction
|
in functional programming, with the restriction
|
||||||
that parameter types must be finite and operations
|
that parameter types must be finite and operations
|
||||||
may not be recursive. It is due to these restrictions that
|
may not be recursive. It is due to these restrictions that
|
||||||
the parsing algorithms are possible to derive from
|
we can always derive a parsing algorithm from a set of
|
||||||
linearization rules.
|
linearization rules.
|
||||||
|
|
||||||
The following string operations are useful in almost
|
The following string operations are useful in almost
|
||||||
@@ -435,6 +485,7 @@ of superfluous parentheses, which is not allowed by
|
|||||||
the present grammar.
|
the present grammar.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\subsection{Statements}
|
\subsection{Statements}
|
||||||
|
|
||||||
Statements in C have
|
Statements in C have
|
||||||
@@ -448,7 +499,7 @@ the use of semicolons on a high level.
|
|||||||
statement : Str -> SS = \s -> ss (s ++ ";");
|
statement : Str -> SS = \s -> ss (s ++ ";");
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
As for declarations, which bind variables, we notice the
|
As for declarations, which bind variables, we notice the
|
||||||
special projection \verb6.$06 to refet to the bound variable.
|
special projection \verb6.$06 to refer to the bound variable.
|
||||||
Technically, terms that are linearized must be in $\eta$-long
|
Technically, terms that are linearized must be in $\eta$-long
|
||||||
form, which guarantees that a variable symbol can always be
|
form, which guarantees that a variable symbol can always be
|
||||||
found, and that a field representing it can be added to the
|
found, and that a field representing it can be added to the
|
||||||
@@ -465,6 +516,7 @@ linearization record.
|
|||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\subsection{Functions}
|
\subsection{Functions}
|
||||||
|
|
||||||
The only new problem presented by functions is the proper
|
The only new problem presented by functions is the proper
|
||||||
@@ -475,7 +527,7 @@ Since compositionality prevents taking cases of list forms, e.g.
|
|||||||
-- ConsExp _ _ e NilExp = e ;
|
-- ConsExp _ _ e NilExp = e ;
|
||||||
-- ConsExp _ _ e es = ss (e.s ++ "," ++ es.s) ;
|
-- ConsExp _ _ e es = ss (e.s ++ "," ++ es.s) ;
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
we have to encode the size of a list in a parameter:
|
we have to encode the size of a list (zero, one, or more) in a parameter:
|
||||||
\begin{verbatim}
|
\begin{verbatim}
|
||||||
param
|
param
|
||||||
Size = Zero | One | More ;
|
Size = Zero | One | More ;
|
||||||
@@ -526,40 +578,56 @@ component.
|
|||||||
|
|
||||||
JVM syntax is, linguistically, more straightforward than
|
JVM syntax is, linguistically, more straightforward than
|
||||||
the syntax of C, and could be described by a regular
|
the syntax of C, and could be described by a regular
|
||||||
expression. The transition of our abstract syntax to JVM,
|
expression. The translation from our abstract syntax to JVM,
|
||||||
however, is tricky because variables are replaced by
|
however, is tricky because variables are replaced by
|
||||||
their addresses (relative to the frame pointer), and
|
their addresses (relative to the frame pointer), and
|
||||||
linearization must maintain a symbol table that permits
|
linearization must therefore maintain a symbol table that permits
|
||||||
the lookup of a variable address. As shown in the code
|
the lookup of a variable address. As shown in the code
|
||||||
in Appendix C, we have not quite succeeded to do this
|
in Appendix C, we have not quite succeeded to do this
|
||||||
in the generated code. Instead, we generate (non-JVM)
|
in the code generated by linearization.
|
||||||
\texttt{alloc} instructions and use another pass,
|
Instead, we use variable symbols instead of addresses
|
||||||
written in Haskell (Appendix E), to replace variable
|
in \texttt{load} and \texttt{store} instructions, and
|
||||||
symbols by their addresses.
|
generate \texttt{alloc} pseudoinstructions from declarations.
|
||||||
|
Then we use another pass, written in Haskell (Appendix E),
|
||||||
|
to replace variable symbols by their addresses. The following example
|
||||||
|
shows how the three representations (C, pseudo-JVM, JVM) look like
|
||||||
|
for a piece of code.
|
||||||
|
\begin{verbatim}
|
||||||
|
int x ; alloc i x ; x has address 0
|
||||||
|
int y ; alloc i y ; y has address 1
|
||||||
|
x = 5 ; i _push 5 ipush 5
|
||||||
|
i _store x istore 0
|
||||||
|
y = x ; i _load x iload 0
|
||||||
|
i _store y istore 1
|
||||||
|
\end{verbatim}
|
||||||
|
|
||||||
A related problem is the generation of fresh labels for
|
A related problem is the generation of fresh labels for
|
||||||
jumps. We solve this by maintaining a growing label
|
jumps. We solve this by maintaining a growing label suffix
|
||||||
as a field of the linearization of statements into
|
as a field of the linearization of statements into
|
||||||
instructions. The problem remains that the two branches
|
instructions. The problem remains that the two branches
|
||||||
in an \texttt{if-else} statement can use the same
|
in an \texttt{if-else} statement can use the same
|
||||||
labels. Making them unique will have to be
|
labels. Making them unique will have to be
|
||||||
added to the post-processing pass. This is, however,
|
added to the post-processing pass. This is
|
||||||
always possible, because labels are nested in a
|
always possible, because labels are nested in a
|
||||||
disciplined way.
|
disciplined way, and jumps can never go to remote labels.
|
||||||
|
|
||||||
As it turned out too laborious to thread the label counter
|
As it turned out laborious to thread the label counter
|
||||||
to expressions, we decided to compile comparison
|
to expressions, we decided to compile comparison \verb6x < y6
|
||||||
expressions into method calls, which should be provided
|
expressions into function calls, which should be provided
|
||||||
by a run-time library.
|
by a run-time library. This would no more work for the
|
||||||
|
conjunction \verb6x && y6
|
||||||
|
and disjunction \verb6x || y6, if we want to keep their semantics
|
||||||
|
lazy, since function calls are strict in their arguments.
|
||||||
|
|
||||||
The JVM syntax used is from the Jasmin assembler
|
The JVM syntax used is from the Jasmin assembler
|
||||||
\cite{jasmin}, with small deviation which will
|
\cite{jasmin}, with small deviation which will
|
||||||
be removed shortly.
|
be removed shortly.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\subsection{A code example}
|
\subsection{A code example}
|
||||||
|
|
||||||
Here is a C source program, the JVM code obtained by linearization, and
|
Here is a complete C source program, the JVM code obtained by linearization, and
|
||||||
the postprocessed JVM code.
|
the postprocessed JVM code.
|
||||||
\small
|
\small
|
||||||
\begin{verbatim}
|
\begin{verbatim}
|
||||||
@@ -594,6 +662,7 @@ int main () { call ilt ; call ilt ;
|
|||||||
\normalsize
|
\normalsize
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\section{Related work}
|
\section{Related work}
|
||||||
|
|
||||||
The theoretical ideas behind our compiler experiment
|
The theoretical ideas behind our compiler experiment
|
||||||
@@ -603,22 +672,23 @@ dependent types and higher-order abstract syntax
|
|||||||
has been studied in various logical frameworks
|
has been studied in various logical frameworks
|
||||||
\cite{harper-honsell,magnusson-nordstr,twelf}.
|
\cite{harper-honsell,magnusson-nordstr,twelf}.
|
||||||
The idea of using a common abstract syntax for different
|
The idea of using a common abstract syntax for different
|
||||||
languages was clearly exposed in \cite{landin}. The view of
|
languages was clearly exposed by Landin \cite{landin}. The view of
|
||||||
code generation as linearization is a central aspect of
|
code generation as linearization is a central aspect of
|
||||||
the classic compiler textbook \cite{aho-ullman}. The use
|
the classic compiler textbook by Aho, Sethi, and Ullman
|
||||||
of the same grammar both for parsing and linearization
|
\cite{aho-ullman}.
|
||||||
|
The use of the same grammar both for parsing and linearization
|
||||||
is a guiding principle of unification-based linguistic grammar
|
is a guiding principle of unification-based linguistic grammar
|
||||||
formalisms \cite{pereira}. Interactive editors derived from
|
formalisms \cite{pereira-shieber}. Interactive editors derived from
|
||||||
grammars have been used in various programming and proof
|
grammars have been used in various programming and proof
|
||||||
assistants \cite{teitelbaum,metal,ALF}.
|
assistants \cite{teitelbaum,metal,magnusson-nordstr}.
|
||||||
|
|
||||||
Even though the different ideas are well-known, they are
|
Even though the different ideas are well-known, they are
|
||||||
applied less in practice than in theory. In particular,
|
applied less in practice than in theory. In particular,
|
||||||
we have not seen them used together to construct a complete
|
we have not seen them used together to construct a complete
|
||||||
compiler. In our view, putting these ideas together is
|
compiler. In our view, putting these ideas together is
|
||||||
an attractive approach to compiling, since a compiler written
|
an attractive approach to compiling, since a compiler written
|
||||||
in this way is completely declarative, and therefore concise,
|
in this way is completely declarative, hence concise,
|
||||||
and therefore easy to modify and extend. For instance, if
|
and therefore easy to modify and extend. What is more, if
|
||||||
a new language construct is added, the GF type checker
|
a new language construct is added, the GF type checker
|
||||||
verifies that the addition is propagated to all components
|
verifies that the addition is propagated to all components
|
||||||
of the compiler. As the implementation is declarative,
|
of the compiler. As the implementation is declarative,
|
||||||
@@ -631,6 +701,51 @@ semantics that is actually used in the implementation.
|
|||||||
|
|
||||||
We managed to compile a large subset of C, and growing it
|
We managed to compile a large subset of C, and growing it
|
||||||
does not necessarily pose any new kinds of problems.
|
does not necessarily pose any new kinds of problems.
|
||||||
|
Using HOAS and dependent types to describe the abstract
|
||||||
|
syntax of C works fine, and defining the concrete syntax
|
||||||
|
of C on top of this using GF linearization machinery is
|
||||||
|
already possible, even though more support could be
|
||||||
|
desired for things like literals and precedences.
|
||||||
|
|
||||||
|
The parser generated by GF is not able to parse all
|
||||||
|
source programs, because of cyclic context-free
|
||||||
|
rules (of the form $C ::= C$). Recovery from cyclic
|
||||||
|
rules is ongoing work in GF independently of this
|
||||||
|
experiment.
|
||||||
|
|
||||||
|
The most serious problem is that the idea of compilation
|
||||||
|
as linearization does not quite work in the generation
|
||||||
|
of JVM code, if linearization is understood in the
|
||||||
|
sense of GF. In GF, linearization rules must be
|
||||||
|
compositional, and can only depend on parameters from
|
||||||
|
finite parameter sets. Hence it is not possible to encode
|
||||||
|
linearization with updates to and lookups from a symbol table,
|
||||||
|
as is usual in code generation.
|
||||||
|
|
||||||
|
Compositionality also prevents optimizations during linearization
|
||||||
|
by clever instruction selection, elimination of superfluous
|
||||||
|
labels and jumps, etc.
|
||||||
|
|
||||||
|
It would of course be possible to implement the compiler
|
||||||
|
back end in GF in the traditional way, as a noncompositional
|
||||||
|
function from the abstract syntax of C to a different abstract
|
||||||
|
syntax of JVM. The abstract syntax notation of GF permits
|
||||||
|
definitions of functions, and the GF interpreter can be used
|
||||||
|
for evaluating terms into normal form. Thus one could write
|
||||||
|
\begin{verbatim}
|
||||||
|
fun
|
||||||
|
transStm : Env -> Stm -> EnvInstr ;
|
||||||
|
def
|
||||||
|
transStm env (Decl typ rest) = ...
|
||||||
|
transStm env (Assign typ var exp rest) = ...
|
||||||
|
\end{verbatim}
|
||||||
|
This would be cumbersome in practice, because
|
||||||
|
GF does not have facilities like built-in lists and tuples,
|
||||||
|
or monads. Of course, the compiler could no longer be
|
||||||
|
inverted into a decompiler, in the way true linearization
|
||||||
|
can be inverted into a parser.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\bibliographystyle{plain}
|
\bibliographystyle{plain}
|
||||||
@@ -778,7 +893,7 @@ concrete ImperJVM of Imper = open ResImper in {
|
|||||||
BodyCons a as body = instrb a.s [] (body ** {s3 = []});
|
BodyCons a as body = instrb a.s [] (body ** {s3 = []});
|
||||||
|
|
||||||
Decl typ cont = instrb typ.s (
|
Decl typ cont = instrb typ.s (
|
||||||
"alloc_" ++ typ.s ++ cont.$0
|
"alloc" ++ typ.s ++ cont.$0
|
||||||
) cont ;
|
) cont ;
|
||||||
Assign t x exp = instrc (
|
Assign t x exp = instrc (
|
||||||
exp.s ++
|
exp.s ++
|
||||||
|
|||||||
Reference in New Issue
Block a user