formal rules in the compilation doc

2026-07-08 14:42:46 -06:00 · 2006-11-01 09:47:28 +00:00
parent 264bcfb28e
commit 81b973e197
1 changed files with 156 additions and 10 deletions
@@ -30,7 +30,10 @@ productivity.

 For efficiency, the grammar is compiled to lower-level formats.

-Type checking is another essential compilation phase.
+Type checking is another essential compilation phase. Its purpose is
+twofold, as usual:
+- checking the correctness of the grammar
+- type-annotating expressions for code generation


 #NEW
@@ -133,28 +136,171 @@ weakly-typed interfaces or dynamically-checked interfaces."
 (B. Stroustrup, 1994, p. 107)


+
 #NEW

-==The computation model==
+==The computation model: abstract syntax==

 An abstract syntax defines a free algebra of trees (using
 dependent types, recursion, higher-order abstract syntax: 
 GF includes a complete Logical Framework).
+```
+  cat C (x_1 : A_1)...(x_n : A_n)  
+  a_1 : A_1 
+  ... 
+  a_n : A_n{x_1 : A_1,...,x_n-1 : A_n-1}
+  ----------------------------------------------------
+  (C a_1 ... a_n) : Type 
+
+
+  fun f : (x_1 : A_1) -> ... -> (x_n : A_n) -> A
+  a_1 : A_1 
+  ... 
+  a_n : A_n{x_1 : A_1,...,x_n-1 : A_n-1}
+  ----------------------------------------------------
+  (f a_1 ... a_n) : A{x_1 : A_1,...,x_n : A_n}
+
+
+ A : Type  x : A |- B : Type       x : A |- b : B        f : (x : A) -> B   a : A
+ ----------------------------   ----------------------   ------------------------
+     (x : A) -> B : Type        \x -> b : (x : A) -> B       f a : B{x := A}
+```
+Notice that all syntax trees are in eta-long form.
+
+
+#NEW
+
+==The computation model: concrete syntax==

 A concrete syntax defines a homomorphism (compositional mapping)
-from the abstract syntax to a system of tuples of strings.
+from the abstract syntax to a system of concrete syntax objects.
+```
+  cat C _
+  --------------------
+  lincat C = C* : Type

+  fun f : (x_1 : A_1) -> ... -> (x_n : A_n) -> A
+  -----------------------------------------------
+  lin f = f* : A_1* -> ... -> A_n* -> A*
+
+  (f a_1 ... a_n)*  =  f* a_1* ... a_n*
+```
 The homomorphism can as such be used as linearization function.

 It is a functional program, but a restricted one, since it works 
 in the end on finite data structures only.

+But a more efficient program is obtained via compilation to 
 GFC = Canonical GF: the "machine code" of GF.

 The parsing problem of GFC can be reduced to that of MPCFG (Multiple
 Parallel Context Free Grammars), see P. Ljunglöf's thesis (2004).


+
+#NEW
+
+==The core type system of concrete syntax: basic types==
+
+```
+                   param P      P : PType
+  PType : Type    ---------     ---------
+                  P : PType      P : Type
+
+                                          s : Str  t : Str
+  Str : type     "foo" : Str   [] : Str   ----------------
+                                            s ++ t : Str
+```
+
+
+#NEW
+
+==The core type system of concrete syntax: functions and tables==
+
+```
+ A : Type  x : A |- B : Type       x : A |- b : B        f : (x : A) -> B   a : A
+ ----------------------------   ----------------------   ------------------------
+     (x : A) -> B : Type        \x -> b : (x : A) -> B       f a : B{x := A}
+
+
+        P : PType   A : Type      t : P => A  p : p
+        --------------------      -----------------
+            P => A : Type            t ! p : A
+
+                   v_1,...,v_n : A
+       ---------------------------------------------- P = {C_1,...,C_n}
+       table {C_1 => v_1 ; ... ; C_n => v_n} : P => A 
+```
+Pattern matching is treated as an abbreviation for tables. Notice that
+```
+  case e of {...}  ==  table {...} ! e
+```
+
+
+#NEW
+
+==The core type system of concrete syntax: records==
+
+```
+              A_1,...,A_n : Type
+     ------------------------------------ n >= 0
+     {r_1 : A_1 ; ... ; r_n : A_n} : Type
+
+
+                 a_1 : A_1  ...  a_n : A_n
+     ------------------------------------------------------------
+     {r_1 = a_1 ; ... ; r_n = a_n} : {r_1 : A_1 ; ... ; r_n : A_n}
+
+
+             r : {r_1 : A_1 ; ... ; r_n : A_n}
+            ----------------------------------- i = 1,...,n
+                          r.r_1 : A_1
+```
+Subtyping: if ``r : R`` then ``r : R ** {r : A}``
+
+
+
+#NEW
+
+==Computation rules==
+
+```
+  (\x -> b) a = b{x := a}
+
+  (table {C_1 => v_1 ; ... ; C_n => v_n} : P => A) ! C_i  =  v_i
+
+  {r_1 = a_1 ; ... ; r_n = a_n}.r_i  =  a_i
+```
+
+
+
+#NEW
+
+==Canonical GF==
+
+Concrete syntax type system:
+```
+                             A_1 : Type ... A_n : Type
+  Str : Type   Int : Type    -------------------------   $i : A
+                               [A_1, ..., A_n] : Type
+
+
+     a_1 : A_1  ...  a_n : A_n         t : [A_1, ..., A_n]
+  ---------------------------------    ------------------- i = 1,..,n
+  [a_1, ..., a_n] : [A_1, ..., A_n]        t ! i : A_i 
+```
+Tuples represent both records and tables.
+
+There are no functions.
+
+Linearization:
+```
+  lin f = f*
+
+  (f a_1 ... a_n)*  =  f*{$1 = a_1*, ..., $n = a_n*} 
+```
+
+
 #NEW

 ==The compilation task, again==
@@ -295,8 +441,8 @@ The additional rule now is:

 ==Techniques used==

-The compiler is written in Haskell, using some C foreign function calls
-(readline, killing threads).
+The compiler is written in Haskell, with some C foreign function calls
+in the interactive version (readline, killing threads).

 BNFC is used for generating both the parsers and printers.
 This has helped to make the formats portable.
@@ -417,7 +563,7 @@ Irrelevant table branches are thrown away, which can reduce the size.
 But, since tables are expanded and auxiliary functions are inlined,
 the size can grow exponentially.

-How can we keep the first and eliminate the second?
+How can we keep the first property and eliminate the second?


 #NEW
@@ -450,14 +596,14 @@ By maintaining a canonical order of parameters in a type, we can
 eliminate the left hand sides of branches.
 ```
  table {              table T [
-    P => t ;    --->     P ;
-    Q => u               Q
+    P => t ;    --->     t ;
+    Q => u               u
    }                    ]
 ```
 The treatment is similar to ``Enum`` instances in Haskell.

-Actually, all parameter types could be translated to
-initial segments of integers. This is done in the GFCC format.
+In the end, all parameter types can be translated to
+initial segments of integers.


 #NEW