documented new GFCC

2026-05-24 18:28:55 -06:00 · 2007-10-05 16:18:23 +00:00
parent 2905d5552c
commit 51ac00a987
4 changed files with 1164 additions and 489 deletions
--- a/src/GF/GFCC/GFCC.cf
+++ b/src/GF/GFCC/GFCC.cf
@@ -29,11 +29,11 @@ Lin. LinDef   ::= CId "=" Term ;
 DTyp. Type    ::= "[" [Hypo] "]" CId [Exp] ;         -- dependent type
 DTr.  Exp     ::= "[" "(" [CId] ")" Atom [Exp] "]" ; -- term with bindings

-AC.  Atom     ::= CId ;
-AS.  Atom     ::= String ;
-AI.  Atom     ::= Integer ;
-AF.  Atom     ::= Double ;
-AM.  Atom     ::= "?" Integer ;
+AC.   Atom     ::= CId ;
+AS.   Atom     ::= String ;
+AI.   Atom     ::= Integer ;
+AF.   Atom     ::= Double ;
+AM.   Atom     ::= "?" Integer ;

 R.   Term     ::= "[" [Term] "]" ;          -- record/table
 P.   Term     ::= "(" Term "!" Term ")" ;   -- projection/selection
--- a/src/GF/GFCC/doc/gfcc.html
+++ b/src/GF/GFCC/doc/gfcc.html
@@ -7,41 +7,9 @@
 <P ALIGN="center"><CENTER><H1>The GFCC Grammar Format</H1>
 <FONT SIZE="4">
 <I>Aarne Ranta</I><BR>
-October 19, 2006
+October 5, 2007
 </FONT></CENTER>

-<P></P>
-<HR NOSHADE SIZE=1>
-<P></P>
-    <UL>
-    <LI><A HREF="#toc1">What is GFCC</A>
-    <LI><A HREF="#toc2">GFCC vs. GFC</A>
-    <LI><A HREF="#toc3">The syntax of GFCC files</A>
-      <UL>
-      <LI><A HREF="#toc4">Top level</A>
-      <LI><A HREF="#toc5">Abstract syntax</A>
-      <LI><A HREF="#toc6">Concrete syntax</A>
-      </UL>
-    <LI><A HREF="#toc7">The semantics of concrete syntax terms</A>
-      <UL>
-      <LI><A HREF="#toc8">Linearization and realization</A>
-      <LI><A HREF="#toc9">Term evaluation</A>
-      <LI><A HREF="#toc10">The special term constructors</A>
-      </UL>
-    <LI><A HREF="#toc11">Compiling to GFCC</A>
-      <UL>
-      <LI><A HREF="#toc12">Problems in GFCC compilation</A>
-      <LI><A HREF="#toc13">The representation of linearization types</A>
-      <LI><A HREF="#toc14">Running the compiler and the GFCC interpreter</A>
-      </UL>
-    <LI><A HREF="#toc15">The reference interpreter</A>
-    <LI><A HREF="#toc16">Interpreter in C++</A>
-    <LI><A HREF="#toc17">Some things to do</A>
-    </UL>
-
-<P></P>
-<HR NOSHADE SIZE=1>
-<P></P>
 <P>
 Author's address:
 <A HREF="http://www.cs.chalmers.se/~aarne"><CODE>http://www.cs.chalmers.se/~aarne</CODE></A>
@@ -50,11 +18,11 @@ Author's address:
 History:
 </P>
 <UL>
+<LI>5 Oct 2007: new, better structured GFCC with full expressive power
 <LI>19 Oct: translation of lincats, new figures on C++
 <LI>3 Oct 2006: first version
 </UL>

-<A NAME="toc1"></A>
 <H2>What is GFCC</H2>
 <P>
 GFCC is a low-level format for GF grammars. Its aim is to contain the minimum
@@ -68,18 +36,20 @@ advantages:
 </UL>

 <P>
-The idea is that all embedded GF applications are compiled to GFCC.
+Thus we also want to call GFCC the <B>portable grammar format</B>.
+</P>
+<P>
+The idea is that all embedded GF applications use GFCC.
 The GF system would be primarily used as a compiler and as a grammar
 development tool.
 </P>
 <P>
 Since GFCC is implemented in BNFC, a parser of the format is readily
-available for C, C++, Haskell, Java, and OCaml. Also an XML 
-representation is generated in BNFC. A 
+available for C, C++, C#, Haskell, Java, and OCaml. Also an XML 
+representation can be generated in BNFC. A 
 <A HREF="../">reference implementation</A>
 of linearization and some other functions has been written in Haskell.
 </P>
-<A NAME="toc2"></A>
 <H2>GFCC vs. GFC</H2>
 <P>
 GFCC is aimed to replace GFC as the run-time grammar format. GFC was designed
@@ -92,7 +62,14 @@ run-time. In particular, the pattern matching syntax and semantics of GFC is
 complex and therefore difficult to implement in new platforms.
 </P>
 <P>
-The main differences of GFCC compared with GFC can be summarized as follows:
+Actually, GFC is planned to be omitted also as the target format of
+separate compilation, where plain GF (type annotated and partially evaluated)
+will be used instead. GFC provides only marginal advantages as a target format
+compared with GF, and it is therefore just extra weight to carry around this
+format.
+</P>
+<P>
+The main differences of GFCC compared with GFC (and GF) can be summarized as follows:
 </P>
 <UL>
 <LI>there are no modules, and therefore no qualified names
@@ -101,56 +78,56 @@ The main differences of GFCC compared with GFC can be summarized as follows:
 <LI>records and tables are replaced by arrays
 <LI>record labels and parameter values are replaced by integers
 <LI>record projection and table selection are replaced by array indexing
-<LI>there is (so far) no support for dependent types or higher-order abstract
-  syntax (which would be easy to add, but make interpreters much more difficult
-  to write)
+<LI>even though the format does support dependent types and higher-order abstract
+  syntax, there is no interpreted yet that does this
 </UL>

 <P>
 Here is an example of a GF grammar, consisting of three modules, 
-as translated to GFCC. The representations are aligned, with the exceptions
-due to the alphabetical sorting of GFCC grammars.
+as translated to GFCC. The representations are aligned; thus they do not completely
+reflect the order of judgements in GFCC files, which have different orders of
+blocks of judgements, and alphabetical sorting.
 </P>
 <PRE>
                                      grammar Ex(Eng,Swe);
  
  abstract Ex = {                     abstract {
-    cat 
-      S ; NP ; VP ;
-    fun 
-      Pred : NP -&gt; VP -&gt; S ;            Pred : NP,VP -&gt; S = (Pred);
-      She, They : NP ;                  She : -&gt; NP = (She);
-      Sleep : VP ;                      Sleep : -&gt; VP = (Sleep); 
-                                        They : -&gt; NP = (They);
+    cat                                 cat
+      S ; NP ; VP ;                      NP[]; S[]; VP[];
+    fun                                 fun
+      Pred : NP -&gt; VP -&gt; S ;             Pred=[(($ 0! 1),(($ 1! 0)!($ 0! 0)))];
+      She, They : NP ;                   She=[0,"she"];
+      Sleep : VP ;                       They=[1,"they"];
+                                         Sleep=[["sleeps","sleep"]];
  }                                     } ;
                                      
  concrete Eng of Ex = {              concrete Eng {
-    lincat
-      S  = {s : Str} ;
-      NP = {s : Str ; n : Num} ;
-      VP = {s : Num =&gt; Str} ;
+    lincat                             lincat
+      S  = {s : Str} ;                  S=[()];
+      NP = {s : Str ; n : Num} ;        NP=[1,()];
+      VP = {s : Num =&gt; Str} ;           VP=[[(),()]];
    param
      Num = Sg | Pl ;
-    lin
-      Pred np vp = {                    Pred = [(($0!1),(($1!0)!($0!0)))];
+    lin                                lin
+      Pred np vp = {                    Pred=[(($ 0! 1),(($ 1! 0)!($ 0! 0)))];
        s = np.s ++ vp.s ! np.n} ;      
-      She = {s = "she" ; n = Sg} ;      She = [0, "she"];
-      They = {s = "they" ; n = Pl} ;    
-      Sleep = {s = table {              Sleep = [("sleep" + ["s",""])];
+      She = {s = "she" ; n = Sg} ;      She=[0,"she"];
+      They = {s = "they" ; n = Pl} ;    They = [1, "they"];
+      Sleep = {s = table {              Sleep=[["sleeps","sleep"]];
        Sg =&gt; "sleeps" ; 
-        Pl =&gt; "sleep"                   They = [1, "they"];
-        }                               } ;
+        Pl =&gt; "sleep"                   
+        }                               
      } ;
-  }
+  }                                   } ;
  
  concrete Swe of Ex = {              concrete Swe {
-    lincat
-      S  = {s : Str} ;
-      NP = {s : Str} ;
-      VP = {s : Str} ;
+    lincat                             lincat
+      S  = {s : Str} ;                  S=[()];
+      NP = {s : Str} ;                  NP=[()];
+      VP = {s : Str} ;                  VP=[()];
    param
      Num = Sg | Pl ;
-    lin
+    lin                                lin
      Pred np vp = {                    Pred = [(($0!0),($1!0))];
        s = np.s ++ vp.s} ;
      She = {s = "hon"} ;               She = ["hon"];
@@ -159,9 +136,12 @@ due to the alphabetical sorting of GFCC grammars.
  }                                     } ;                                   
 </PRE>
 <P></P>
-<A NAME="toc3"></A>
 <H2>The syntax of GFCC files</H2>
-<A NAME="toc4"></A>
+<P>
+The complete BNFC grammar, from which  
+the rules in this section are taken, is in the file 
+<A HREF="../DataGFCC.cf"><CODE>GF/GFCC/GFCC.cf</CODE></A>.
+</P>
 <H3>Top level</H3>
 <P>
 A grammar has a header telling the name of the abstract syntax
@@ -170,25 +150,43 @@ the concrete languages. The abstract syntax and the concrete
 syntaxes themselves follow.
 </P>
 <PRE>
-    Grammar  ::= Header ";" Abstract ";" [Concrete] ;
-    Header   ::= "grammar" CId "(" [CId] ")" ;
-    Abstract ::= "abstract" "{" [AbsDef] "}" ;
-    Concrete ::= "concrete" CId "{" [CncDef] "}" ;
+    Grm. Grammar  ::= 
+      "grammar" CId "(" [CId] ")" ";" 
+      Abstract ";" 
+      [Concrete] ;
+  
+    Abs. Abstract ::= 
+      "abstract" "{" 
+        "flags" [Flag] 
+        "fun"   [FunDef] 
+        "cat"   [CatDef] 
+      "}" ;
+  
+    Cnc. Concrete ::= 
+      "concrete" CId "{" 
+        "flags"  [Flag] 
+        "lin"    [LinDef] 
+        "oper"   [LinDef] 
+        "lincat" [LinDef] 
+        "lindef" [LinDef] 
+        "printname" [LinDef]
+      "}" ;
 </PRE>
 <P>
-Abstract syntax judgements give typings and semantic definitions.
-Concrete syntax judgements give linearizations.
+This syntax organizes each module to a sequence of <B>fields</B>, such
+as flags, linearizations, operations, linearization types, etc.
+It is envisaged that particular applications can ignore some
+of the fields, typically so that earlier fields are more
+important than later ones.
 </P>
-<PRE>
-    AbsDef   ::= CId ":" Type "=" Exp ;
-    CncDef   ::= CId "=" Term ;
-</PRE>
 <P>
-Also flags are possible, local to each "module" (i.e. abstract and concretes).
+The judgement forms have the following syntax.
 </P>
 <PRE>
-    AbsDef   ::= "%" CId "=" String ;
-    CncDef   ::= "%" CId "=" String ;
+    Flg. Flag     ::= CId "=" String ;
+    Cat. CatDef   ::= CId "[" [Hypo] "]" ;
+    Fun. FunDef   ::= CId ":" Type "=" Exp ;
+    Lin. LinDef   ::= CId "=" Term ;
 </PRE>
 <P>
 For the run-time system, the reference implementation in Haskell
@@ -203,33 +201,84 @@ uses a structure that gives efficient look-up:
      }
  
    data Abstr = Abstr {
-      funs :: Map CId Type,   -- find the type of a fun
-      cats :: Map CId [CId]   -- find the funs giving a cat
+      aflags  :: Map CId String,     -- value of a flag
+      funs    :: Map CId (Type,Exp), -- type and def of a fun
+      cats    :: Map CId [Hypo],     -- context of a cat
+      catfuns :: Map CId [CId]       -- funs yielding a cat (redundant, for fast lookup)
      }
  
-    type Concr = Map CId Term
+    data Concr = Concr {
+      flags   :: Map CId String, -- value of a flag
+      lins    :: Map CId Term,   -- lin of a fun
+      opers   :: Map CId Term,   -- oper generated by subex elim
+      lincats :: Map CId Term,   -- lin type of a cat
+      lindefs :: Map CId Term,   -- lin default of a cat
+      printnames :: Map CId Term -- printname of a cat or a fun
+      }
+</PRE>
+<P>
+These definitions are from <A HREF="../DataGFCC.hs"><CODE>GF/GFCC/DataGFCC.hs</CODE></A>.
+</P>
+<P>
+Identifiers (<CODE>CId</CODE>) are like <CODE>Ident</CODE> in GF, except that
+the compiler produces constants prefixed with <CODE>_</CODE> in
+the common subterm elimination optimization.
+</P>
+<PRE>
+    token CId (('_' | letter) (letter | digit | '\'' | '_')*) ;
 </PRE>
 <P></P>
-<A NAME="toc5"></A>
 <H3>Abstract syntax</H3>
 <P>
-Types are first-order function types built from
+Types are first-order function types built from argument type
+contexts and value types.
 category symbols. Syntax trees (<CODE>Exp</CODE>) are
-rose trees with the head (<CODE>Atom</CODE>) either a function
-constant, a metavariable, or a string, integer, or float
+rose trees with nodes consisting of a head (<CODE>Atom</CODE>) and
+bound variables (<CODE>CId</CODE>). 
+</P>
+<PRE>
+    DTyp. Type  ::= "[" [Hypo] "]" CId [Exp] ;        
+    DTr.  Exp   ::= "[" "(" [CId] ")" Atom [Exp] "]" ;
+    Hyp.  Hypo  ::= CId ":" Type ;
+</PRE>
+<P>
+The head Atom is either a function
+constant, a bound variable, or a metavariable, or a string, integer, or float
 literal.
 </P>
 <PRE>
-    Type     ::= [CId] "-&gt;" CId ;
-    Exp      ::= "(" Atom [Exp] ")" ;
-    Atom     ::= CId ;        -- function constant
-    Atom     ::= "?" ;        -- metavariable
-    Atom     ::= String ;     -- string literal
-    Atom     ::= Integer ;    -- integer literal
-    Atom     ::= Double ;     -- float literal
+    AC.   Atom  ::= CId ;
+    AS.   Atom  ::= String ;
+    AI.   Atom  ::= Integer ;
+    AF.   Atom  ::= Double ;
+    AM.   Atom  ::= "?" Integer ;
 </PRE>
-<P></P>
-<A NAME="toc6"></A>
+<P>
+The context-free types and trees of the "old GFCC" are special
+cases, which can be defined as follows:
+</P>
+<PRE>
+    Typ.  Type  ::= [CId] "-&gt;" CId
+    Typ args val = DTyp [Hyp (CId "_") arg | arg &lt;- args] val
+  
+    Tr.   Exp   ::= "(" CId [Exp] ")"
+    Tr fun exps  = DTr [] fun exps
+</PRE>
+<P>
+To store semantic (<CODE>def</CODE>) definitions by cases, the following expression
+form is provided, but it is only meaningful in the last field of a function
+declaration in an abstract syntax:
+</P>
+<PRE>
+    EEq. Exp      ::= "{" [Equation] "}" ;
+    Equ. Equation ::= [Exp] "-&gt;" Exp ;
+</PRE>
+<P>
+Notice that expressions are used to encode patterns. Primitive notions
+(the default semantics in GF) are encoded as empty sets of equations
+(<CODE>[]</CODE>). For a constructor (canonical form) of a category <CODE>C</CODE>, we
+aim to use the encoding as the application <CODE>(_constr C)</CODE>.
+</P>
 <H3>Concrete syntax</H3>
 <P>
 Linearization terms (<CODE>Term</CODE>) are built as follows.
@@ -237,12 +286,12 @@ Constructor names are shown to make the later code
 examples readable.
 </P>
 <PRE>
-    R.  Term ::= "[" [Term] "]" ;        -- array
-    P.  Term ::= "(" Term "!" Term ")" ; -- access to indexed field
-    S.  Term ::= "(" [Term] ")" ;        -- sequence with ++
+    R.  Term ::= "[" [Term] "]" ;        -- array (record/table)
+    P.  Term ::= "(" Term "!" Term ")" ; -- access to field (projection/selection)
+    S.  Term ::= "(" [Term] ")" ;        -- concatenated sequence
    K.  Term ::= Tokn ;                  -- token
-    V.  Term ::= "$" Integer ;           -- argument
-    C.  Term ::= Integer ;               -- array index
+    V.  Term ::= "$" Integer ;           -- argument (subtree)
+    C.  Term ::= Integer ;               -- array index (label/parameter value)
    FV. Term ::= "[|" [Term] "|]" ;      -- free variation
    TM. Term ::= "?" ;                   -- linearization of metavariable
 </PRE>
@@ -256,7 +305,7 @@ variant lists.
    Var. Variant  ::= [String] "/" [String] ;
 </PRE>
 <P>
-Three special forms of terms are introduced by the compiler
+Two special forms of terms are introduced by the compiler
 as optimizations. They can in principle be eliminated, but
 their presence makes grammars much more compact. Their semantics
 will be explained in a later section.
@@ -264,20 +313,20 @@ will be explained in a later section.
 <PRE>
    F.  Term ::= CId ;                     -- global constant
    W.  Term ::= "(" String "+" Term ")" ; -- prefix + suffix table
-    RP. Term ::= "(" Term "@" Term ")";    -- record parameter alias
 </PRE>
 <P>
-Identifiers are like <CODE>Ident</CODE> in GF and GFC, except that
-the compiler produces constants prefixed with <CODE>_</CODE> in
-the common subterm elimination optimization.
+There is also a deprecated form of "record parameter alias",
 </P>
 <PRE>
-    token CId (('_' | letter) (letter | digit | '\'' | '_')*) ;
+    RP. Term ::= "(" Term "@" Term ")";    -- DEPRECATED
 </PRE>
-<P></P>
-<A NAME="toc7"></A>
+<P>
+which will be removed when the migration to new GFCC is complete.
+</P>
 <H2>The semantics of concrete syntax terms</H2>
-<A NAME="toc8"></A>
+<P>
+The code in this section is from <A HREF="../Linearize.hs"><CODE>GF/GFCC/Linearize.hs</CODE></A>.
+</P>
 <H3>Linearization and realization</H3>
 <P>
 The linearization algorithm is essentially the same as in
@@ -289,18 +338,21 @@ in which linearization is performed.
 </P>
 <PRE>
    linExp :: GFCC -&gt; CId -&gt; Exp -&gt; Term
-    linExp mcfg lang tree@(Tr at trees) = case at of
+    linExp gfcc lang tree@(DTr _ at trees) = case at of
      AC fun -&gt; comp (Prelude.map lin trees) $ look fun
      AS s   -&gt; R [kks (show s)] -- quoted
      AI i   -&gt; R [kks (show i)]
      AF d   -&gt; R [kks (show d)]
      AM     -&gt; TM
     where
-       lin  = linExp mcfg lang
-       comp = compute mcfg lang
-       look = lookLin mcfg lang
+       lin  = linExp gfcc lang
+       comp = compute gfcc lang
+       look = lookLin gfcc lang
 </PRE>
 <P>
+TODO: bindings must be supported.
+</P>
+<P>
 The result of linearization is usually a record, which is realized as
 a string using the following algorithm.
 </P>
@@ -316,12 +368,12 @@ a string using the following algorithm.
      TM       -&gt; "?"
 </PRE>
 <P>
-Since the order of record fields is not necessarily
-the same as in GF source,
-this realization does not work securely for
-categories whose lincats more than one field.
+Notice that realization always picks the first field of a record.
+If a linearization type has more than one field, the first field
+does not necessarily contain the desired string.
+Also notice that the order of record fields in GFCC is not necessarily
+the same as in GF source.
 </P>
-<A NAME="toc9"></A>
 <H3>Term evaluation</H3>
 <P>
 Evaluation follows call-by-value order, with two environments
@@ -339,10 +391,9 @@ deep patterns (such as Java and C++).
 </P>
 <PRE>
  compute :: GFCC -&gt; CId -&gt; [Term] -&gt; Term -&gt; Term
-  compute mcfg lang args = comp where
+  compute gfcc lang args = comp where
    comp trm = case trm of
      P r p  -&gt; proj (comp r) (comp p)
-      RP i t -&gt; RP (comp i) (comp t)
      W s t  -&gt; W s (comp t)
      R ts   -&gt; R $ Prelude.map comp ts
      V i    -&gt; idx args (fromInteger i)  -- already computed
@@ -351,7 +402,7 @@ deep patterns (such as Java and C++).
      S ts   -&gt; S $ Prelude.filter (/= S []) $ Prelude.map comp ts
      _ -&gt; trm
  
-    look = lookLin mcfg lang
+    look = lookOper gfcc lang
  
    idx xs i = xs !! i
  
@@ -377,7 +428,6 @@ deep patterns (such as Java and C++).
      _ -&gt; trace ("ERROR in grammar compiler: field from " ++ show t) t
 </PRE>
 <P></P>
-<A NAME="toc10"></A>
 <H3>The special term constructors</H3>
 <P>
 The three forms introduced by the compiler may a need special
@@ -391,13 +441,13 @@ Global constants
 </PRE>
 <P>
 are shorthands for complex terms. They are produced by the
-compiler by (iterated) common subexpression elimination.
+compiler by (iterated) <B>common subexpression elimination</B>.
 They are often more powerful than hand-devised code sharing in the source
 code. They could be computed off-line by replacing each identifier by 
 its definition.
 </P>
 <P>
-Prefix-suffix tables 
+<B>Prefix-suffix tables</B>
 </P>
 <PRE>
    Term ::= "(" String "+" Term ")" ; 
@@ -428,56 +478,6 @@ explains the used syntax rather than the more accurate
 since we want the suffix part to be a <CODE>Term</CODE> for the optimization to
 take effect.
 </P>
-<P>
-The most curious construct of GFCC is the parameter array alias, 
-</P>
-<PRE>
-    Term ::= "(" Term "@" Term ")";
-</PRE>
-<P>
-This form is used as the value of parameter records, such as the type
-</P>
-<PRE>
-    {n : Number ; p : Person}
-</PRE>
-<P>
-The problem with parameter records is their double role.
-They can be used like parameter values, as indices in selection,
-</P>
-<PRE>
-    VP.s ! {n = Sg ; p = P3}
-</PRE>
-<P>
-but also as records, from which parameters can be projected:
-</P>
-<PRE>
-    {n = Sg ; p = P3}.n
-</PRE>
-<P>
-Whichever use is selected as primary, a prohibitively complex
-case expression must be generated at compilation to GFCC to get the
-other use. The adopted
-solution is to generate a pair containing both a parameter value index 
-and an array of indices of record fields. For instance, if we have
-</P>
-<PRE>
-    param Number = Sg | Pl ; Person = P1 | P2 | P3 ;
-</PRE>
-<P>
-we get the encoding
-</P>
-<PRE>
-    {n = Sg ; p = P3}  ---&gt; (2 @ [0,2])
-</PRE>
-<P>
-The GFCC computation rules are essentially
-</P>
-<PRE>
-    (t ! (i @ _)) = (t ! i)
-    ((_ @ r) ! j)  =(r ! j)
-</PRE>
-<P></P>
-<A NAME="toc11"></A>
 <H2>Compiling to GFCC</H2>
 <P>
 Compilation to GFCC is performed by the GF grammar compiler, and
@@ -489,32 +489,24 @@ in the process.
 The compilation phases are the following
 </P>
 <OL>
-<LI>translate GF source to GFC, as always in GF
-<LI>undo GFC back-end optimizations
-<LI>perform the <CODE>values</CODE> optimization to normalize tables
-<LI>create a symbol table mapping the GFC parameter and record types to
+<LI>type check and partially evaluate GF source
+<LI>create a symbol table mapping the GF parameter and record types to
  fixed-size arrays, and parameter values and record labels to integers
 <LI>traverse the linearization rules replacing parameters and labels by integers
-<LI>reorganize the created GFC grammar so that it has just one abstract syntax
+<LI>reorganize the created GF grammar so that it has just one abstract syntax
  and one concrete syntax per language
-<LI>apply UTF8 encoding to the grammar, if not yet applied (this is told by the
+<LI>TODO: apply UTF8 encoding to the grammar, if not yet applied (this is told by the
  <CODE>coding</CODE> flag)
-<LI>translate the GFC syntax tree to a GFCC syntax tree, using a simple
+<LI>translate the GF grammar object to a GFCC grammar object, using a simple
  compositional mapping
 <LI>perform the word-suffix optimization on GFCC linearization terms
 <LI>perform subexpression elimination on each concrete syntax module
 <LI>print out the GFCC code
 </OL>

-<P>
-Notice that a major part of the compilation is done within GFC, so that
-GFC-related tasks (such as parser generation) could be performed by
-using the old algorithms.
-</P>
-<A NAME="toc12"></A>
 <H3>Problems in GFCC compilation</H3>
 <P>
-Two major problems had to be solved in compiling GFC to GFCC:
+Two major problems had to be solved in compiling GF to GFCC:
 </P>
 <UL>
 <LI>consistent order of tables and records, to permit the array translation
@@ -527,17 +519,11 @@ to generate correct code. Any errors remaining are likely to be
 related to the two problems just mentioned.
 </P>
 <P>
-The order problem is solved in different ways for tables and records.
-For tables, the <CODE>values</CODE> optimization of GFC already manages to
-maintain a canonical order. But this order can be destroyed by the
-<CODE>share</CODE> optimization. To make sure that GFCC compilation works properly,
-it is safest to recompile the GF grammar by using the <CODE>values</CODE>
-optimization flag.
-</P>
-<P>
-Records can be canonically ordered by sorting them by labels.
-In fact, this was done in connection of the GFCC work as a part
-of the GFC generation, to guarantee consistency. This means that
+The order problem is solved in slightly different ways for tables and records.
+In both cases, <B>eta expansion</B> is used to establish a
+canonical order. Tables are ordered by applying the preorder induced
+by <CODE>param</CODE> definitions. Records are ordered by sorting them by labels.
+This means that
 e.g. the <CODE>s</CODE> field will in general no longer appear as the first
 field, even if it does so in the GF source code. But relying on the
 order of fields in a labelled record would be misplaced anyway.
@@ -547,7 +533,7 @@ The canonical form of records is further complicated by lock fields,
 i.e. dummy fields of form <CODE>lock_C = &lt;&gt;</CODE>, which are added to grammar
 libraries to force intensionality of linearization types. The problem
 is that the absence of a lock field only generates a warning, not
-an error. Therefore a GFC grammar can contain objects of the same
+an error. Therefore a GF grammar can contain objects of the same
 type with and without a lock field. This problem was solved in GFCC
 generation by just removing all lock fields (defined as fields whose
 type is the empty record type). This has the further advantage of
@@ -634,10 +620,22 @@ a case expression,
      }
 </PRE>
 <P>
-To avoid the code bloat resulting from this, we chose the alias representation
-which is easy enough to deal with in interpreters.
+To avoid the code bloat resulting from this, we have chosen to
+deal with records by a <B>currying</B> transformation:
 </P>
-<A NAME="toc13"></A>
+<PRE>
+    table {n : Number ; p : Person} {... ...}
+     ===&gt;
+    table Number {Sg =&gt; table Person {...} ; table Person {...}}
+</PRE>
+<P>
+This is performed when GFCC is generated. Selections with
+records have to be treated likewise,
+</P>
+<PRE>
+    t ! r   ===&gt; t ! r.n ! r.p
+</PRE>
+<P></P>
 <H3>The representation of linearization types</H3>
 <P>
 Linearization types (<CODE>lincat</CODE>) are not needed when generating with
@@ -647,14 +645,12 @@ concrete syntax, by using terms to represent types. Here is the table
 showing how different linearization types are encoded.
 </P>
 <PRE>
-    P*                         = size(P)        -- parameter type              
-    {_ : I ; __ : R}*          = (I* @ R*)      -- record of parameters
-    {r1 : T1 ; ... ; rn : Tn}* = [T1*,...,Tn*]  -- other record
-    (P =&gt; T)*                  = [T* ,...,T*]   -- size(P) times
+    P*                         = max(P)         -- parameter type
+    {r1 : T1 ; ... ; rn : Tn}* = [T1*,...,Tn*]  -- record
+    (P =&gt; T)*                  = [T* ,...,T*]   -- table, size(P) cases
    Str*                       = ()
 </PRE>
 <P>
-The category symbols are prefixed with two underscores (<CODE>__</CODE>).
 For example, the linearization type <CODE>present/CatEng.NP</CODE> is
 translated as follows:
 </P>
@@ -667,10 +663,9 @@ translated as follows:
      s : {ResEng.Case} =&gt; Str  -- 3 values
    }
  
-    __NP = [(6@[2,3]),[(),(),()]]
+    __NP = [[1,2],[(),(),()]]
 </PRE>
 <P></P>
-<A NAME="toc14"></A>
 <H3>Running the compiler and the GFCC interpreter</H3>
 <P>
 GFCC generation is a part of the 
@@ -679,8 +674,7 @@ of GF since September 2006. To invoke the compiler, the flag
 <CODE>-printer=gfcc</CODE> to the command
 <CODE>pm = print_multi</CODE> is used. It is wise to recompile the grammar from
 source, since previously compiled libraries may not obey the canonical
-order of records. To <CODE>strip</CODE> the grammar before
-GFCC translation removes unnecessary interface references.
+order of records. 
 Here is an example, performed in
 <A HREF="../../../../../examples/bronzeage">example/bronzeage</A>.
 </P>
@@ -690,8 +684,20 @@ Here is an example, performed in
    strip
    pm -printer=gfcc | wf bronze.gfcc
 </PRE>
+<P>
+There is also an experimental batch compiler, which does not use the GFC
+format or the record aliases. It can be produced by
+</P>
+<PRE>
+    make gfc
+</PRE>
+<P>
+in <CODE>GF/src</CODE>, and invoked by
+</P>
+<PRE>
+    gfc --make FILES
+</PRE>
 <P></P>
-<A NAME="toc15"></A>
 <H2>The reference interpreter</H2>
 <P>
 The reference interpreter written in Haskell consists of the following files:
@@ -701,23 +707,37 @@ The reference interpreter written in Haskell consists of the following files:
    GFCC.cf       -- labelled BNF grammar of gfcc
  
    -- files generated by BNFC
-    AbsGFCC.hs    -- abstrac syntax of gfcc
+    AbsGFCC.hs    -- abstrac syntax datatypes
    ErrM.hs       -- error monad used internally
    LexGFCC.hs    -- lexer of gfcc files
    ParGFCC.hs    -- parser of gfcc files and syntax trees
    PrintGFCC.hs  -- printer of gfcc files and syntax trees
  
    -- hand-written files
-    DataGFCC.hs   -- post-parser grammar creation, linearization and evaluation
-    GenGFCC.hs    -- random and exhaustive generation, generate-and-test parsing
-    RunGFCC.hs    -- main function - a simple command interpreter
+    DataGFCC.hs   -- grammar datatype, post-parser grammar creation
+    Linearize.hs  -- linearization and evaluation
+    Macros.hs     -- utilities abstracting away from GFCC datatypes
+    Generate.hs   -- random and exhaustive generation, generate-and-test parsing
+    API.hs        -- functionalities accessible in embedded GF applications
+    Generate.hs   -- random and exhaustive generation
+    Shell.hs      -- main function - a simple command interpreter
 </PRE>
 <P>
 It is included in the
 <A HREF="http://www.cs.chalmers.se/Cs/Research/Language-technology/darcs/GF/doc/darcs.html">developers' version</A>
-of GF, in the subdirectory <A HREF="../"><CODE>GF/src/GF/Canon/GFCC</CODE></A>.
+of GF, in the subdirectories <A HREF="../"><CODE>GF/src/GF/GFCC</CODE></A> and 
+<A HREF="../../Devel"><CODE>GF/src/GF/Devel</CODE></A>.
 </P>
 <P>
+As of September 2007, default parsing in main GF uses GFCC (implemented by Krasimir
+Angelov). The interpreter uses the relevant modules
+</P>
+<PRE>
+    GF/Conversions/SimpleToFCFG.hs  -- generate parser from GFCC
+    GF/Parsing/FCFG.hs              -- run the parser
+</PRE>
+<P></P>
+<P>
 To compile the interpreter, type
 </P>
 <PRE>
@@ -741,87 +761,34 @@ The available commands are
  and show their linearizations in all languages
 <LI><CODE>gtt &lt;Cat&gt; &lt;Int&gt;</CODE>:  generate a number of trees in category from smallest,
  and show the trees and their linearizations in all languages
-<LI><CODE>p &lt;Int&gt; &lt;Cat&gt; &lt;String&gt;</CODE>: "parse", i.e. generate trees until match or 
-  until the given number have been generated
-<LI><CODE>&lt;Tree&gt;</CODE>: linearize tree in all languages, also showing full records
-<LI><CODE>quit</CODE>: terminate the system cleanly
+<LI><CODE>p &lt;Lang&gt; &lt;Cat&gt; &lt;String&gt;</CODE>: parse a string into a set of trees
+<LI><CODE>lin &lt;Tree&gt;</CODE>: linearize tree in all languages, also showing full records
+<LI><CODE>q</CODE>: terminate the system cleanly
 </UL>

-<A NAME="toc16"></A>
-<H2>Interpreter in C++</H2>
-<P>
-A base-line interpreter in C++ has been started.
-Its main functionality is random generation of trees and linearization of them.
-</P>
-<P>
-Here are some results from running the different interpreters, compared
-to running the same grammar in GF, saved in <CODE>.gfcm</CODE> format.
-The grammar contains the English, German, and Norwegian
-versions of Bronzeage. The experiment was carried out on
-Ubuntu Linux laptop with 1.5 GHz Intel centrino processor.
-</P>
-<TABLE CELLPADDING="4" BORDER="1">
-<TR>
-<TH></TH>
-<TH>GF</TH>
-<TH>gfcc(hs)</TH>
-<TH>gfcc++</TH>
-</TR>
-<TR>
-<TD>program size</TD>
-<TD ALIGN="center">7249k</TD>
-<TD ALIGN="center">803k</TD>
-<TD ALIGN="right">113k</TD>
-</TR>
-<TR>
-<TD>grammar size</TD>
-<TD ALIGN="center">336k</TD>
-<TD ALIGN="center">119k</TD>
-<TD ALIGN="right">119k</TD>
-</TR>
-<TR>
-<TD>read grammar</TD>
-<TD ALIGN="center">1150ms</TD>
-<TD ALIGN="center">510ms</TD>
-<TD ALIGN="right">100ms</TD>
-</TR>
-<TR>
-<TD>generate 222</TD>
-<TD ALIGN="center">9500ms</TD>
-<TD ALIGN="center">450ms</TD>
-<TD ALIGN="right">800ms</TD>
-</TR>
-<TR>
-<TD>memory</TD>
-<TD ALIGN="center">21M</TD>
-<TD ALIGN="center">10M</TD>
-<TD ALIGN="right">20M</TD>
-</TR>
-</TABLE>
-
-<P></P>
-<P>
-To summarize:
-</P>
+<H2>Embedded formats</H2>
 <UL>
-<LI>going from GF to gfcc is a major win in both code size and efficiency
-<LI>going from Haskell to C++ interpreter is not a win yet, because of a space
-  leak in the C++ version
+<LI>JavaScript: compiler of linearization and abstract syntax
+<P></P>
+<LI>Haskell: compiler of abstract syntax and interpreter with parsing,
+  linearization, and generation
+<P></P>
+<LI>C: compiler of linearization (old GFCC)
+<P></P>
+<LI>C++: embedded interpreter supporting linearization (old GFCC)
 </UL>

-<A NAME="toc17"></A>
 <H2>Some things to do</H2>
 <P>
+Support for dependent types, higher-order abstract syntax, and
+semantic definition in GFCC generation and interpreters.
+</P>
+<P>
+Replacing the entire GF shell by one based on GFCC.
+</P>
+<P>
 Interpreter in Java.
 </P>
-<P>
-Parsing via MCFG 
-</P>
-<UL>
-<LI>the FCFG format can possibly be simplified
-<LI>parser grammars should be saved in files to make interpreters easier
-</UL>
-
 <P>
 Hand-written parsers for GFCC grammars to reduce code size
 (and efficiency?) of interpreters.
@@ -838,5 +805,5 @@ word-suffix sharing better (depth-one tables, as in FM).
 </P>

 <!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
-<!-- cmdline: txt2tags -thtml -\-toc gfcc.txt -->
+<!-- cmdline: txt2tags -thtml gfcc.txt -->
 </BODY></HTML>
--- a/src/GF/GFCC/doc/gfcc.txt
+++ b/src/GF/GFCC/doc/gfcc.txt
@@ -1,6 +1,6 @@
 The GFCC Grammar Format
 Aarne Ranta
-October 19, 2006
+October 5, 2007

 Author's address:
 [``http://www.cs.chalmers.se/~aarne`` http://www.cs.chalmers.se/~aarne]
@@ -8,6 +8,7 @@ Author's address:
 % to compile: txt2tags -thtml --toc gfcc.txt

 History:
+- 5 Oct 2007: new, better structured GFCC with full expressive power
 - 19 Oct: translation of lincats, new figures on C++
 - 3 Oct 2006: first version

@@ -22,13 +23,15 @@ advantages:
 - simple definition of interpreters


-The idea is that all embedded GF applications are compiled to GFCC.
+Thus we also want to call GFCC the **portable grammar format**.
+
+The idea is that all embedded GF applications use GFCC.
 The GF system would be primarily used as a compiler and as a grammar
 development tool.

 Since GFCC is implemented in BNFC, a parser of the format is readily
-available for C, C++, Haskell, Java, and OCaml. Also an XML 
-representation is generated in BNFC. A 
+available for C, C++, C#, Haskell, Java, and OCaml. Also an XML 
+representation can be generated in BNFC. A 
 [reference implementation ../]
 of linearization and some other functions has been written in Haskell.

@@ -44,61 +47,68 @@ such as type annotations, which is only needed in compilation and not at
 run-time. In particular, the pattern matching syntax and semantics of GFC is
 complex and therefore difficult to implement in new platforms.

-The main differences of GFCC compared with GFC can be summarized as follows:
+Actually, GFC is planned to be omitted also as the target format of
+separate compilation, where plain GF (type annotated and partially evaluated)
+will be used instead. GFC provides only marginal advantages as a target format
+compared with GF, and it is therefore just extra weight to carry around this
+format.
+
+The main differences of GFCC compared with GFC (and GF) can be summarized as follows:
 - there are no modules, and therefore no qualified names
 - a GFCC grammar is multilingual, and consists of a common abstract syntax 
  together with one concrete syntax per language
 - records and tables are replaced by arrays
 - record labels and parameter values are replaced by integers
 - record projection and table selection are replaced by array indexing
- there is (so far) no support for dependent types or higher-order abstract
-  syntax (which would be easy to add, but make interpreters much more difficult
-  to write)
+- even though the format does support dependent types and higher-order abstract
+  syntax, there is no interpreted yet that does this
+


 Here is an example of a GF grammar, consisting of three modules, 
-as translated to GFCC. The representations are aligned, with the exceptions
-due to the alphabetical sorting of GFCC grammars.
+as translated to GFCC. The representations are aligned; thus they do not completely
+reflect the order of judgements in GFCC files, which have different orders of
+blocks of judgements, and alphabetical sorting.
 ```  
                                    grammar Ex(Eng,Swe);

 abstract Ex = {                     abstract {
-  cat 
-    S ; NP ; VP ;
-  fun 
-    Pred : NP -> VP -> S ;            Pred : NP,VP -> S = (Pred);
-    She, They : NP ;                  She : -> NP = (She);
-    Sleep : VP ;                      Sleep : -> VP = (Sleep); 
-                                      They : -> NP = (They);
+  cat                                 cat
+    S ; NP ; VP ;                      NP[]; S[]; VP[];
+  fun                                 fun
+    Pred : NP -> VP -> S ;             Pred=[(($ 0! 1),(($ 1! 0)!($ 0! 0)))];
+    She, They : NP ;                   She=[0,"she"];
+    Sleep : VP ;                       They=[1,"they"];
+                                       Sleep=[["sleeps","sleep"]];
 }                                     } ;
                                    
 concrete Eng of Ex = {              concrete Eng {
-  lincat
-    S  = {s : Str} ;
-    NP = {s : Str ; n : Num} ;
-    VP = {s : Num => Str} ;
+  lincat                             lincat
+    S  = {s : Str} ;                  S=[()];
+    NP = {s : Str ; n : Num} ;        NP=[1,()];
+    VP = {s : Num => Str} ;           VP=[[(),()]];
  param
    Num = Sg | Pl ;
-  lin
-    Pred np vp = {                    Pred = [(($0!1),(($1!0)!($0!0)))];
+  lin                                lin
+    Pred np vp = {                    Pred=[(($ 0! 1),(($ 1! 0)!($ 0! 0)))];
      s = np.s ++ vp.s ! np.n} ;      
-    She = {s = "she" ; n = Sg} ;      She = [0, "she"];
-    They = {s = "they" ; n = Pl} ;    
-    Sleep = {s = table {              Sleep = [("sleep" + ["s",""])];
+    She = {s = "she" ; n = Sg} ;      She=[0,"she"];
+    They = {s = "they" ; n = Pl} ;    They = [1, "they"];
+    Sleep = {s = table {              Sleep=[["sleeps","sleep"]];
      Sg => "sleeps" ; 
-      Pl => "sleep"                   They = [1, "they"];
-      }                               } ;
+      Pl => "sleep"                   
+      }                               
    } ;
-}
+}                                   } ;

 concrete Swe of Ex = {              concrete Swe {
-  lincat
-    S  = {s : Str} ;
-    NP = {s : Str} ;
-    VP = {s : Str} ;
+  lincat                             lincat
+    S  = {s : Str} ;                  S=[()];
+    NP = {s : Str} ;                  NP=[()];
+    VP = {s : Str} ;                  VP=[()];
  param
    Num = Sg | Pl ;
-  lin
+  lin                                lin
    Pred np vp = {                    Pred = [(($0!0),($1!0))];
      s = np.s ++ vp.s} ;
    She = {s = "hon"} ;               She = ["hon"];
@@ -109,6 +119,11 @@ concrete Swe of Ex = {              concrete Swe {

 ==The syntax of GFCC files==

+The complete BNFC grammar, from which  
+the rules in this section are taken, is in the file 
+[``GF/GFCC/GFCC.cf`` ../DataGFCC.cf].
+
+
 ===Top level===

 A grammar has a header telling the name of the abstract syntax
@@ -116,21 +131,40 @@ A grammar has a header telling the name of the abstract syntax
 the concrete languages. The abstract syntax and the concrete
 syntaxes themselves follow.
 ```
-  Grammar  ::= Header ";" Abstract ";" [Concrete] ;
-  Header   ::= "grammar" CId "(" [CId] ")" ;
-  Abstract ::= "abstract" "{" [AbsDef] "}" ;
-  Concrete ::= "concrete" CId "{" [CncDef] "}" ;
+  Grm. Grammar  ::= 
+    "grammar" CId "(" [CId] ")" ";" 
+    Abstract ";" 
+    [Concrete] ;
+
+  Abs. Abstract ::= 
+    "abstract" "{" 
+      "flags" [Flag] 
+      "fun"   [FunDef] 
+      "cat"   [CatDef] 
+    "}" ;
+
+  Cnc. Concrete ::= 
+    "concrete" CId "{" 
+      "flags"  [Flag] 
+      "lin"    [LinDef] 
+      "oper"   [LinDef] 
+      "lincat" [LinDef] 
+      "lindef" [LinDef] 
+      "printname" [LinDef]
+    "}" ;
 ```
-Abstract syntax judgements give typings and semantic definitions.
-Concrete syntax judgements give linearizations.
+This syntax organizes each module to a sequence of **fields**, such
+as flags, linearizations, operations, linearization types, etc.
+It is envisaged that particular applications can ignore some
+of the fields, typically so that earlier fields are more
+important than later ones.
+
+The judgement forms have the following syntax.
 ```
-  AbsDef   ::= CId ":" Type "=" Exp ;
-  CncDef   ::= CId "=" Term ;
-```
-Also flags are possible, local to each "module" (i.e. abstract and concretes).
-```
-  AbsDef   ::= "%" CId "=" String ;
-  CncDef   ::= "%" CId "=" String ;
+  Flg. Flag     ::= CId "=" String ;
+  Cat. CatDef   ::= CId "[" [Hypo] "]" ;
+  Fun. FunDef   ::= CId ":" Type "=" Exp ;
+  Lin. LinDef   ::= CId "=" Term ;
 ```
 For the run-time system, the reference implementation in Haskell
 uses a structure that gives efficient look-up:
@@ -143,30 +177,74 @@ uses a structure that gives efficient look-up:
    }

  data Abstr = Abstr {
-    funs :: Map CId Type,   -- find the type of a fun
-    cats :: Map CId [CId]   -- find the funs giving a cat
+    aflags  :: Map CId String,     -- value of a flag
+    funs    :: Map CId (Type,Exp), -- type and def of a fun
+    cats    :: Map CId [Hypo],     -- context of a cat
+    catfuns :: Map CId [CId]       -- funs yielding a cat (redundant, for fast lookup)
    }

-  type Concr = Map CId Term
+  data Concr = Concr {
+    flags   :: Map CId String, -- value of a flag
+    lins    :: Map CId Term,   -- lin of a fun
+    opers   :: Map CId Term,   -- oper generated by subex elim
+    lincats :: Map CId Term,   -- lin type of a cat
+    lindefs :: Map CId Term,   -- lin default of a cat
+    printnames :: Map CId Term -- printname of a cat or a fun
+    }
+```
+These definitions are from [``GF/GFCC/DataGFCC.hs`` ../DataGFCC.hs].
+
+Identifiers (``CId``) are like ``Ident`` in GF, except that
+the compiler produces constants prefixed with ``_`` in
+the common subterm elimination optimization.
+```
+  token CId (('_' | letter) (letter | digit | '\'' | '_')*) ;
 ```


 ===Abstract syntax===

-Types are first-order function types built from
+Types are first-order function types built from argument type
+contexts and value types.
 category symbols. Syntax trees (``Exp``) are
-rose trees with the head (``Atom``) either a function
-constant, a metavariable, or a string, integer, or float
+rose trees with nodes consisting of a head (``Atom``) and
+bound variables (``CId``). 
+```
+  DTyp. Type  ::= "[" [Hypo] "]" CId [Exp] ;        
+  DTr.  Exp   ::= "[" "(" [CId] ")" Atom [Exp] "]" ;
+  Hyp.  Hypo  ::= CId ":" Type ;
+```
+The head Atom is either a function
+constant, a bound variable, or a metavariable, or a string, integer, or float
 literal.
 ```
-  Type     ::= [CId] "->" CId ;
-  Exp      ::= "(" Atom [Exp] ")" ;
-  Atom     ::= CId ;        -- function constant
-  Atom     ::= "?" ;        -- metavariable
-  Atom     ::= String ;     -- string literal
-  Atom     ::= Integer ;    -- integer literal
-  Atom     ::= Double ;     -- float literal
+  AC.   Atom  ::= CId ;
+  AS.   Atom  ::= String ;
+  AI.   Atom  ::= Integer ;
+  AF.   Atom  ::= Double ;
+  AM.   Atom  ::= "?" Integer ;
 ```
+The context-free types and trees of the "old GFCC" are special
+cases, which can be defined as follows:
+```
+  Typ.  Type  ::= [CId] "->" CId
+  Typ args val = DTyp [Hyp (CId "_") arg | arg <- args] val
+
+  Tr.   Exp   ::= "(" CId [Exp] ")"
+  Tr fun exps  = DTr [] fun exps
+```
+To store semantic (``def``) definitions by cases, the following expression
+form is provided, but it is only meaningful in the last field of a function
+declaration in an abstract syntax:
+```
+  EEq. Exp      ::= "{" [Equation] "}" ;
+  Equ. Equation ::= [Exp] "->" Exp ;
+```
+Notice that expressions are used to encode patterns. Primitive notions
+(the default semantics in GF) are encoded as empty sets of equations
+(``[]``). For a constructor (canonical form) of a category ``C``, we
+aim to use the encoding as the application ``(_constr C)``.
+


 ===Concrete syntax===
@@ -175,12 +253,12 @@ Linearization terms (``Term``) are built as follows.
 Constructor names are shown to make the later code
 examples readable.
 ```
-  R.  Term ::= "[" [Term] "]" ;        -- array
-  P.  Term ::= "(" Term "!" Term ")" ; -- access to indexed field
-  S.  Term ::= "(" [Term] ")" ;        -- sequence with ++
+  R.  Term ::= "[" [Term] "]" ;        -- array (record/table)
+  P.  Term ::= "(" Term "!" Term ")" ; -- access to field (projection/selection)
+  S.  Term ::= "(" [Term] ")" ;        -- concatenated sequence
  K.  Term ::= Tokn ;                  -- token
-  V.  Term ::= "$" Integer ;           -- argument
-  C.  Term ::= Integer ;               -- array index
+  V.  Term ::= "$" Integer ;           -- argument (subtree)
+  C.  Term ::= Integer ;               -- array index (label/parameter value)
  FV. Term ::= "[|" [Term] "|]" ;      -- free variation
  TM. Term ::= "?" ;                   -- linearization of metavariable
 ```
@@ -191,25 +269,27 @@ variant lists.
  KP.  Tokn     ::= "[" "pre" [String] "[" [Variant] "]" "]" ;
  Var. Variant  ::= [String] "/" [String] ;
 ```
-Three special forms of terms are introduced by the compiler
+Two special forms of terms are introduced by the compiler
 as optimizations. They can in principle be eliminated, but
 their presence makes grammars much more compact. Their semantics
 will be explained in a later section.
 ```
  F.  Term ::= CId ;                     -- global constant
  W.  Term ::= "(" String "+" Term ")" ; -- prefix + suffix table
-  RP. Term ::= "(" Term "@" Term ")";    -- record parameter alias
 ```
-Identifiers are like ``Ident`` in GF and GFC, except that
-the compiler produces constants prefixed with ``_`` in
-the common subterm elimination optimization.
+There is also a deprecated form of "record parameter alias",
 ```
-  token CId (('_' | letter) (letter | digit | '\'' | '_')*) ;
+  RP. Term ::= "(" Term "@" Term ")";    -- DEPRECATED
 ```
+which will be removed when the migration to new GFCC is complete.
+


 ==The semantics of concrete syntax terms==

+The code in this section is from [``GF/GFCC/Linearize.hs`` ../Linearize.hs].
+
+
 ===Linearization and realization===

 The linearization algorithm is essentially the same as in
@@ -220,17 +300,19 @@ The function also needs to know the language (i.e. concrete syntax)
 in which linearization is performed.
 ```
  linExp :: GFCC -> CId -> Exp -> Term
-  linExp mcfg lang tree@(Tr at trees) = case at of
+  linExp gfcc lang tree@(DTr _ at trees) = case at of
    AC fun -> comp (Prelude.map lin trees) $ look fun
    AS s   -> R [kks (show s)] -- quoted
    AI i   -> R [kks (show i)]
    AF d   -> R [kks (show d)]
    AM     -> TM
   where
-     lin  = linExp mcfg lang
-     comp = compute mcfg lang
-     look = lookLin mcfg lang
+     lin  = linExp gfcc lang
+     comp = compute gfcc lang
+     look = lookLin gfcc lang
 ```
+TODO: bindings must be supported.
+
 The result of linearization is usually a record, which is realized as
 a string using the following algorithm.
 ```
@@ -244,10 +326,11 @@ a string using the following algorithm.
    FV (t:_) -> realize t
    TM       -> "?"
 ```
-Since the order of record fields is not necessarily
-the same as in GF source,
-this realization does not work securely for
-categories whose lincats more than one field.
+Notice that realization always picks the first field of a record.
+If a linearization type has more than one field, the first field
+does not necessarily contain the desired string.
+Also notice that the order of record fields in GFCC is not necessarily
+the same as in GF source.


 ===Term evaluation===
@@ -263,10 +346,9 @@ enable reimplementations in languages that do not permit
 deep patterns (such as Java and C++).
 ```
 compute :: GFCC -> CId -> [Term] -> Term -> Term
-compute mcfg lang args = comp where
+compute gfcc lang args = comp where
  comp trm = case trm of
    P r p  -> proj (comp r) (comp p)
-    RP i t -> RP (comp i) (comp t)
    W s t  -> W s (comp t)
    R ts   -> R $ Prelude.map comp ts
    V i    -> idx args (fromInteger i)  -- already computed
@@ -275,7 +357,7 @@ compute mcfg lang args = comp where
    S ts   -> S $ Prelude.filter (/= S []) $ Prelude.map comp ts
    _ -> trm

-  look = lookLin mcfg lang
+  look = lookOper gfcc lang

  idx xs i = xs !! i

@@ -311,12 +393,12 @@ Global constants
  Term ::= CId ;
 ```
 are shorthands for complex terms. They are produced by the
-compiler by (iterated) common subexpression elimination.
+compiler by (iterated) **common subexpression elimination**.
 They are often more powerful than hand-devised code sharing in the source
 code. They could be computed off-line by replacing each identifier by 
 its definition.

-Prefix-suffix tables 
+**Prefix-suffix tables**
 ```
  Term ::= "(" String "+" Term ")" ; 
 ```
@@ -339,40 +421,6 @@ explains the used syntax rather than the more accurate
 since we want the suffix part to be a ``Term`` for the optimization to
 take effect.

-The most curious construct of GFCC is the parameter array alias, 
-```
-  Term ::= "(" Term "@" Term ")";
-```
-This form is used as the value of parameter records, such as the type
-```
-  {n : Number ; p : Person}
-```
-The problem with parameter records is their double role.
-They can be used like parameter values, as indices in selection,
-```
-  VP.s ! {n = Sg ; p = P3}
-```
-but also as records, from which parameters can be projected:
-```
-  {n = Sg ; p = P3}.n
-```
-Whichever use is selected as primary, a prohibitively complex
-case expression must be generated at compilation to GFCC to get the
-other use. The adopted
-solution is to generate a pair containing both a parameter value index 
-and an array of indices of record fields. For instance, if we have
-```
-  param Number = Sg | Pl ; Person = P1 | P2 | P3 ;
-```
-we get the encoding
-```
-  {n = Sg ; p = P3}  ---> (2 @ [0,2])
-```
-The GFCC computation rules are essentially
-```
-  (t ! (i @ _)) = (t ! i)
-  ((_ @ r) ! j)  =(r ! j)
-```


 ==Compiling to GFCC==
@@ -383,31 +431,26 @@ however, it might be interesting to know what happens to the grammars
 in the process.

 The compilation phases are the following
-+ translate GF source to GFC, as always in GF
-+ undo GFC back-end optimizations
-+ perform the ``values`` optimization to normalize tables
-+ create a symbol table mapping the GFC parameter and record types to
+ type check and partially evaluate GF source
+ create a symbol table mapping the GF parameter and record types to
  fixed-size arrays, and parameter values and record labels to integers
 + traverse the linearization rules replacing parameters and labels by integers
-+ reorganize the created GFC grammar so that it has just one abstract syntax
+ reorganize the created GF grammar so that it has just one abstract syntax
  and one concrete syntax per language
-+ apply UTF8 encoding to the grammar, if not yet applied (this is told by the
+ TODO: apply UTF8 encoding to the grammar, if not yet applied (this is told by the
  ``coding`` flag)
-+ translate the GFC syntax tree to a GFCC syntax tree, using a simple
+ translate the GF grammar object to a GFCC grammar object, using a simple
  compositional mapping
 + perform the word-suffix optimization on GFCC linearization terms
 + perform subexpression elimination on each concrete syntax module
 + print out the GFCC code


-Notice that a major part of the compilation is done within GFC, so that
-GFC-related tasks (such as parser generation) could be performed by
-using the old algorithms.


 ===Problems in GFCC compilation===

-Two major problems had to be solved in compiling GFC to GFCC:
+Two major problems had to be solved in compiling GF to GFCC:
 - consistent order of tables and records, to permit the array translation
 - run-time variables in complex parameter values.

@@ -416,16 +459,11 @@ The current implementation is still experimental and may fail
 to generate correct code. Any errors remaining are likely to be 
 related to the two problems just mentioned.

-The order problem is solved in different ways for tables and records.
-For tables, the ``values`` optimization of GFC already manages to
-maintain a canonical order. But this order can be destroyed by the
-``share`` optimization. To make sure that GFCC compilation works properly,
-it is safest to recompile the GF grammar by using the ``values``
-optimization flag.
-
-Records can be canonically ordered by sorting them by labels.
-In fact, this was done in connection of the GFCC work as a part
-of the GFC generation, to guarantee consistency. This means that
+The order problem is solved in slightly different ways for tables and records.
+In both cases, **eta expansion** is used to establish a
+canonical order. Tables are ordered by applying the preorder induced
+by ``param`` definitions. Records are ordered by sorting them by labels.
+This means that
 e.g. the ``s`` field will in general no longer appear as the first
 field, even if it does so in the GF source code. But relying on the
 order of fields in a labelled record would be misplaced anyway.
@@ -434,7 +472,7 @@ The canonical form of records is further complicated by lock fields,
 i.e. dummy fields of form ``lock_C = <>``, which are added to grammar
 libraries to force intensionality of linearization types. The problem
 is that the absence of a lock field only generates a warning, not
-an error. Therefore a GFC grammar can contain objects of the same
+an error. Therefore a GF grammar can contain objects of the same
 type with and without a lock field. This problem was solved in GFCC
 generation by just removing all lock fields (defined as fields whose
 type is the empty record type). This has the further advantage of
@@ -503,8 +541,18 @@ a case expression,
    5 => 1
    }
 ```
-To avoid the code bloat resulting from this, we chose the alias representation
-which is easy enough to deal with in interpreters.
+To avoid the code bloat resulting from this, we have chosen to
+deal with records by a **currying** transformation:
+```
+  table {n : Number ; p : Person} {... ...}
+   ===>
+  table Number {Sg => table Person {...} ; table Person {...}}
+```
+This is performed when GFCC is generated. Selections with
+records have to be treated likewise,
+```
+  t ! r   ===> t ! r.n ! r.p
+```


 ===The representation of linearization types===
@@ -515,13 +563,11 @@ GFCC. The linearization type definitions are shown as a part of the
 concrete syntax, by using terms to represent types. Here is the table
 showing how different linearization types are encoded.
 ```
-  P*                         = size(P)        -- parameter type              
-  {_ : I ; __ : R}*          = (I* @ R*)      -- record of parameters
-  {r1 : T1 ; ... ; rn : Tn}* = [T1*,...,Tn*]  -- other record
-  (P => T)*                  = [T* ,...,T*]   -- size(P) times
+  P*                         = max(P)         -- parameter type
+  {r1 : T1 ; ... ; rn : Tn}* = [T1*,...,Tn*]  -- record
+  (P => T)*                  = [T* ,...,T*]   -- table, size(P) cases
  Str*                       = ()
 ```
-The category symbols are prefixed with two underscores (``__``).
 For example, the linearization type ``present/CatEng.NP`` is
 translated as follows:
 ```
@@ -533,7 +579,7 @@ translated as follows:
    s : {ResEng.Case} => Str  -- 3 values
  }

-  __NP = [(6@[2,3]),[(),(),()]]
+  __NP = [[1,2],[(),(),()]]
 ```


@@ -547,8 +593,7 @@ of GF since September 2006. To invoke the compiler, the flag
 ``-printer=gfcc`` to the command
 ``pm = print_multi`` is used. It is wise to recompile the grammar from
 source, since previously compiled libraries may not obey the canonical
-order of records. To ``strip`` the grammar before
-GFCC translation removes unnecessary interface references.
+order of records. 
 Here is an example, performed in
 [example/bronzeage ../../../../../examples/bronzeage].
 ```
@@ -557,6 +602,16 @@ Here is an example, performed in
  strip
  pm -printer=gfcc | wf bronze.gfcc
 ```
+There is also an experimental batch compiler, which does not use the GFC
+format or the record aliases. It can be produced by
+```
+  make gfc
+```
+in ``GF/src``, and invoked by
+```
+  gfc --make FILES
+```
+



@@ -568,20 +623,33 @@ The reference interpreter written in Haskell consists of the following files:
  GFCC.cf       -- labelled BNF grammar of gfcc

  -- files generated by BNFC
-  AbsGFCC.hs    -- abstrac syntax of gfcc
+  AbsGFCC.hs    -- abstrac syntax datatypes
  ErrM.hs       -- error monad used internally
  LexGFCC.hs    -- lexer of gfcc files
  ParGFCC.hs    -- parser of gfcc files and syntax trees
  PrintGFCC.hs  -- printer of gfcc files and syntax trees

  -- hand-written files
-  DataGFCC.hs   -- post-parser grammar creation, linearization and evaluation
-  GenGFCC.hs    -- random and exhaustive generation, generate-and-test parsing
-  RunGFCC.hs    -- main function - a simple command interpreter
+  DataGFCC.hs   -- grammar datatype, post-parser grammar creation
+  Linearize.hs  -- linearization and evaluation
+  Macros.hs     -- utilities abstracting away from GFCC datatypes
+  Generate.hs   -- random and exhaustive generation, generate-and-test parsing
+  API.hs        -- functionalities accessible in embedded GF applications
+  Generate.hs   -- random and exhaustive generation
+  Shell.hs      -- main function - a simple command interpreter
 ```
 It is included in the
 [developers' version http://www.cs.chalmers.se/Cs/Research/Language-technology/darcs/GF/doc/darcs.html]
-of GF, in the subdirectory [``GF/src/GF/Canon/GFCC`` ../].
+of GF, in the subdirectories [``GF/src/GF/GFCC`` ../] and 
+[``GF/src/GF/Devel`` ../../Devel].
+
+As of September 2007, default parsing in main GF uses GFCC (implemented by Krasimir
+Angelov). The interpreter uses the relevant modules
+```
+  GF/Conversions/SimpleToFCFG.hs  -- generate parser from GFCC
+  GF/Parsing/FCFG.hs              -- run the parser
+```
+

 To compile the interpreter, type
 ```
@@ -600,48 +668,34 @@ The available commands are
  and show their linearizations in all languages
 - ``gtt <Cat> <Int>``:  generate a number of trees in category from smallest,
  and show the trees and their linearizations in all languages
- ``p <Int> <Cat> <String>``: "parse", i.e. generate trees until match or 
-  until the given number have been generated
- ``<Tree>``: linearize tree in all languages, also showing full records
- ``quit``: terminate the system cleanly
-
-
-==Interpreter in C++==
-
-A base-line interpreter in C++ has been started.
-Its main functionality is random generation of trees and linearization of them.
-
-Here are some results from running the different interpreters, compared
-to running the same grammar in GF, saved in ``.gfcm`` format.
-The grammar contains the English, German, and Norwegian
-versions of Bronzeage. The experiment was carried out on
-Ubuntu Linux laptop with 1.5 GHz Intel centrino processor.
-
-||                | GF        | gfcc(hs) | gfcc++ |
-| program size    |   7249k   |   803k   |  113k
-| grammar size    |    336k   |  119k    |  119k
-| read grammar    |   1150ms  |  510ms   |  100ms
-| generate 222    |   9500ms  |  450ms   |  800ms
-| memory          |     21M   |   10M    |   20M
+- ``p <Lang> <Cat> <String>``: parse a string into a set of trees
+- ``lin <Tree>``: linearize tree in all languages, also showing full records
+- ``q``: terminate the system cleanly



-To summarize:
- going from GF to gfcc is a major win in both code size and efficiency
- going from Haskell to C++ interpreter is not a win yet, because of a space
-  leak in the C++ version
+==Embedded formats==
+
+- JavaScript: compiler of linearization and abstract syntax
+
+- Haskell: compiler of abstract syntax and interpreter with parsing,
+  linearization, and generation
+
+- C: compiler of linearization (old GFCC)
+
+- C++: embedded interpreter supporting linearization (old GFCC)



 ==Some things to do==

+Support for dependent types, higher-order abstract syntax, and
+semantic definition in GFCC generation and interpreters.
+
+Replacing the entire GF shell by one based on GFCC.
+
 Interpreter in Java.

-Parsing via MCFG 
- the FCFG format can possibly be simplified
- parser grammars should be saved in files to make interpreters easier
-
-
 Hand-written parsers for GFCC grammars to reduce code size
 (and efficiency?) of interpreters.

@@ -652,5 +706,3 @@ Syntax editor based on GFCC.
 Rewriting of resource libraries in order to exploit the
 word-suffix sharing better (depth-one tables, as in FM).

-
-
--- a/src/GF/GFCC/doc/old-gfcc.txt
+++ b/src/GF/GFCC/doc/old-gfcc.txt
@@ -0,0 +1,656 @@
+The GFCC Grammar Format
+Aarne Ranta
+October 19, 2006
+
+Author's address:
+[``http://www.cs.chalmers.se/~aarne`` http://www.cs.chalmers.se/~aarne]
+
+% to compile: txt2tags -thtml --toc gfcc.txt
+
+History:
+- 19 Oct: translation of lincats, new figures on C++
+- 3 Oct 2006: first version
+
+
+==What is GFCC==
+
+GFCC is a low-level format for GF grammars. Its aim is to contain the minimum
+that is needed to process GF grammars at runtime. This minimality has three
+advantages:
+- compact grammar files and run-time objects
+- time and space efficient processing
+- simple definition of interpreters
+
+
+The idea is that all embedded GF applications are compiled to GFCC.
+The GF system would be primarily used as a compiler and as a grammar
+development tool.
+
+Since GFCC is implemented in BNFC, a parser of the format is readily
+available for C, C++, Haskell, Java, and OCaml. Also an XML 
+representation is generated in BNFC. A 
+[reference implementation ../]
+of linearization and some other functions has been written in Haskell.
+
+
+==GFCC vs. GFC==
+
+GFCC is aimed to replace GFC as the run-time grammar format. GFC was designed
+to be a run-time format, but also to
+support separate compilation of grammars, i.e.
+to store the results of compiling 
+individual GF modules. But this means that GFC has to contain extra information,
+such as type annotations, which is only needed in compilation and not at
+run-time. In particular, the pattern matching syntax and semantics of GFC is
+complex and therefore difficult to implement in new platforms.
+
+The main differences of GFCC compared with GFC can be summarized as follows:
+- there are no modules, and therefore no qualified names
+- a GFCC grammar is multilingual, and consists of a common abstract syntax 
+  together with one concrete syntax per language
+- records and tables are replaced by arrays
+- record labels and parameter values are replaced by integers
+- record projection and table selection are replaced by array indexing
+- there is (so far) no support for dependent types or higher-order abstract
+  syntax (which would be easy to add, but make interpreters much more difficult
+  to write)
+
+
+Here is an example of a GF grammar, consisting of three modules, 
+as translated to GFCC. The representations are aligned, with the exceptions
+due to the alphabetical sorting of GFCC grammars.
+```  
+                                    grammar Ex(Eng,Swe);
+
+abstract Ex = {                     abstract {
+  cat 
+    S ; NP ; VP ;
+  fun 
+    Pred : NP -> VP -> S ;            Pred : NP,VP -> S = (Pred);
+    She, They : NP ;                  She : -> NP = (She);
+    Sleep : VP ;                      Sleep : -> VP = (Sleep); 
+                                      They : -> NP = (They);
+}                                     } ;
+                                    
+concrete Eng of Ex = {              concrete Eng {
+  lincat
+    S  = {s : Str} ;
+    NP = {s : Str ; n : Num} ;
+    VP = {s : Num => Str} ;
+  param
+    Num = Sg | Pl ;
+  lin
+    Pred np vp = {                    Pred = [(($0!1),(($1!0)!($0!0)))];
+      s = np.s ++ vp.s ! np.n} ;      
+    She = {s = "she" ; n = Sg} ;      She = [0, "she"];
+    They = {s = "they" ; n = Pl} ;    
+    Sleep = {s = table {              Sleep = [("sleep" + ["s",""])];
+      Sg => "sleeps" ; 
+      Pl => "sleep"                   They = [1, "they"];
+      }                               } ;
+    } ;
+}
+
+concrete Swe of Ex = {              concrete Swe {
+  lincat
+    S  = {s : Str} ;
+    NP = {s : Str} ;
+    VP = {s : Str} ;
+  param
+    Num = Sg | Pl ;
+  lin
+    Pred np vp = {                    Pred = [(($0!0),($1!0))];
+      s = np.s ++ vp.s} ;
+    She = {s = "hon"} ;               She = ["hon"];
+    They = {s = "de"} ;               They = ["de"];
+    Sleep = {s = "sover"} ;           Sleep = ["sover"];
+}                                     } ;                                   
+```
+
+==The syntax of GFCC files==
+
+===Top level===
+
+A grammar has a header telling the name of the abstract syntax
+(often specifying an application domain), and the names of
+the concrete languages. The abstract syntax and the concrete
+syntaxes themselves follow.
+```
+  Grammar  ::= Header ";" Abstract ";" [Concrete] ;
+  Header   ::= "grammar" CId "(" [CId] ")" ;
+  Abstract ::= "abstract" "{" [AbsDef] "}" ;
+  Concrete ::= "concrete" CId "{" [CncDef] "}" ;
+```
+Abstract syntax judgements give typings and semantic definitions.
+Concrete syntax judgements give linearizations.
+```
+  AbsDef   ::= CId ":" Type "=" Exp ;
+  CncDef   ::= CId "=" Term ;
+```
+Also flags are possible, local to each "module" (i.e. abstract and concretes).
+```
+  AbsDef   ::= "%" CId "=" String ;
+  CncDef   ::= "%" CId "=" String ;
+```
+For the run-time system, the reference implementation in Haskell
+uses a structure that gives efficient look-up:
+```
+  data GFCC = GFCC {
+    absname   :: CId ,
+    cncnames  :: [CId] ,
+    abstract  :: Abstr ,
+    concretes :: Map CId Concr
+    }
+
+  data Abstr = Abstr {
+    funs :: Map CId Type,   -- find the type of a fun
+    cats :: Map CId [CId]   -- find the funs giving a cat
+    }
+
+  type Concr = Map CId Term
+```
+
+
+===Abstract syntax===
+
+Types are first-order function types built from
+category symbols. Syntax trees (``Exp``) are
+rose trees with the head (``Atom``) either a function
+constant, a metavariable, or a string, integer, or float
+literal.
+```
+  Type     ::= [CId] "->" CId ;
+  Exp      ::= "(" Atom [Exp] ")" ;
+  Atom     ::= CId ;        -- function constant
+  Atom     ::= "?" ;        -- metavariable
+  Atom     ::= String ;     -- string literal
+  Atom     ::= Integer ;    -- integer literal
+  Atom     ::= Double ;     -- float literal
+```
+
+
+===Concrete syntax===
+
+Linearization terms (``Term``) are built as follows.
+Constructor names are shown to make the later code
+examples readable.
+```
+  R.  Term ::= "[" [Term] "]" ;        -- array
+  P.  Term ::= "(" Term "!" Term ")" ; -- access to indexed field
+  S.  Term ::= "(" [Term] ")" ;        -- sequence with ++
+  K.  Term ::= Tokn ;                  -- token
+  V.  Term ::= "$" Integer ;           -- argument
+  C.  Term ::= Integer ;               -- array index
+  FV. Term ::= "[|" [Term] "|]" ;      -- free variation
+  TM. Term ::= "?" ;                   -- linearization of metavariable
+```
+Tokens are strings or (maybe obsolescent) prefix-dependent
+variant lists.
+```
+  KS.  Tokn     ::= String ;
+  KP.  Tokn     ::= "[" "pre" [String] "[" [Variant] "]" "]" ;
+  Var. Variant  ::= [String] "/" [String] ;
+```
+Three special forms of terms are introduced by the compiler
+as optimizations. They can in principle be eliminated, but
+their presence makes grammars much more compact. Their semantics
+will be explained in a later section.
+```
+  F.  Term ::= CId ;                     -- global constant
+  W.  Term ::= "(" String "+" Term ")" ; -- prefix + suffix table
+  RP. Term ::= "(" Term "@" Term ")";    -- record parameter alias
+```
+Identifiers are like ``Ident`` in GF and GFC, except that
+the compiler produces constants prefixed with ``_`` in
+the common subterm elimination optimization.
+```
+  token CId (('_' | letter) (letter | digit | '\'' | '_')*) ;
+```
+
+
+==The semantics of concrete syntax terms==
+
+===Linearization and realization===
+
+The linearization algorithm is essentially the same as in
+GFC: a tree is linearized by evaluating its linearization term
+in the environment of the linearizations of the subtrees.
+Literal atoms are linearized in the obvious way.
+The function also needs to know the language (i.e. concrete syntax)
+in which linearization is performed.
+```
+  linExp :: GFCC -> CId -> Exp -> Term
+  linExp mcfg lang tree@(Tr at trees) = case at of
+    AC fun -> comp (Prelude.map lin trees) $ look fun
+    AS s   -> R [kks (show s)] -- quoted
+    AI i   -> R [kks (show i)]
+    AF d   -> R [kks (show d)]
+    AM     -> TM
+   where
+     lin  = linExp mcfg lang
+     comp = compute mcfg lang
+     look = lookLin mcfg lang
+```
+The result of linearization is usually a record, which is realized as
+a string using the following algorithm.
+```
+  realize :: Term -> String
+  realize trm = case trm of
+    R (t:_)  -> realize t
+    S ss     -> unwords $ Prelude.map realize ss
+    K (KS s) -> s
+    K (KP s _) -> unwords s ---- prefix choice TODO
+    W s t    -> s ++ realize t
+    FV (t:_) -> realize t
+    TM       -> "?"
+```
+Since the order of record fields is not necessarily
+the same as in GF source,
+this realization does not work securely for
+categories whose lincats more than one field.
+
+
+===Term evaluation===
+
+Evaluation follows call-by-value order, with two environments
+needed:
+- the grammar (a concrete syntax) to give the global constants
+- an array of terms to give the subtree linearizations
+
+
+The code is presented in one-level pattern matching, to
+enable reimplementations in languages that do not permit
+deep patterns (such as Java and C++).
+```
+compute :: GFCC -> CId -> [Term] -> Term -> Term
+compute mcfg lang args = comp where
+  comp trm = case trm of
+    P r p  -> proj (comp r) (comp p)
+    RP i t -> RP (comp i) (comp t)
+    W s t  -> W s (comp t)
+    R ts   -> R $ Prelude.map comp ts
+    V i    -> idx args (fromInteger i)  -- already computed
+    F c    -> comp $ look c             -- not computed (if contains V)
+    FV ts  -> FV $ Prelude.map comp ts
+    S ts   -> S $ Prelude.filter (/= S []) $ Prelude.map comp ts
+    _ -> trm
+
+  look = lookLin mcfg lang
+
+  idx xs i = xs !! i
+
+  proj r p = case (r,p) of
+    (_,     FV ts) -> FV $ Prelude.map (proj r) ts
+    (W s t, _)     -> kks (s ++ getString (proj t p))
+    _              -> comp $ getField r (getIndex p)
+
+  getString t = case t of
+    K (KS s) -> s
+    _ -> trace ("ERROR in grammar compiler: string from "++ show t) "ERR"
+
+  getIndex t =  case t of
+    C i    -> fromInteger i
+    RP p _ -> getIndex p
+    TM     -> 0  -- default value for parameter
+    _ -> trace ("ERROR in grammar compiler: index from " ++ show t) 0
+
+  getField t i = case t of
+    R rs   -> idx rs i
+    RP _ r -> getField r i
+    TM     -> TM
+    _ -> trace ("ERROR in grammar compiler: field from " ++ show t) t
+```
+
+===The special term constructors===
+
+The three forms introduced by the compiler may a need special
+explanation.
+
+Global constants
+```
+  Term ::= CId ;
+```
+are shorthands for complex terms. They are produced by the
+compiler by (iterated) common subexpression elimination.
+They are often more powerful than hand-devised code sharing in the source
+code. They could be computed off-line by replacing each identifier by 
+its definition.
+
+Prefix-suffix tables 
+```
+  Term ::= "(" String "+" Term ")" ; 
+```
+represent tables of word forms divided to the longest common prefix
+and its array of suffixes. In the example grammar above, we have
+```
+  Sleep = [("sleep" + ["s",""])]
+```
+which in fact is equal to the array of full forms
+```
+  ["sleeps", "sleep"]
+```
+The power of this construction comes from the fact that suffix sets
+tend to be repeated in a language, and can therefore be collected
+by common subexpression elimination. It is this technique that
+explains the used syntax rather than the more accurate
+```
+  "(" String "+" [String] ")"
+```
+since we want the suffix part to be a ``Term`` for the optimization to
+take effect.
+
+The most curious construct of GFCC is the parameter array alias, 
+```
+  Term ::= "(" Term "@" Term ")";
+```
+This form is used as the value of parameter records, such as the type
+```
+  {n : Number ; p : Person}
+```
+The problem with parameter records is their double role.
+They can be used like parameter values, as indices in selection,
+```
+  VP.s ! {n = Sg ; p = P3}
+```
+but also as records, from which parameters can be projected:
+```
+  {n = Sg ; p = P3}.n
+```
+Whichever use is selected as primary, a prohibitively complex
+case expression must be generated at compilation to GFCC to get the
+other use. The adopted
+solution is to generate a pair containing both a parameter value index 
+and an array of indices of record fields. For instance, if we have
+```
+  param Number = Sg | Pl ; Person = P1 | P2 | P3 ;
+```
+we get the encoding
+```
+  {n = Sg ; p = P3}  ---> (2 @ [0,2])
+```
+The GFCC computation rules are essentially
+```
+  (t ! (i @ _)) = (t ! i)
+  ((_ @ r) ! j)  =(r ! j)
+```
+
+
+==Compiling to GFCC==
+
+Compilation to GFCC is performed by the GF grammar compiler, and
+GFCC interpreters need not know what it does. For grammar writers,
+however, it might be interesting to know what happens to the grammars
+in the process.
+
+The compilation phases are the following
+ translate GF source to GFC, as always in GF
+ undo GFC back-end optimizations
+ perform the ``values`` optimization to normalize tables
+ create a symbol table mapping the GFC parameter and record types to
+  fixed-size arrays, and parameter values and record labels to integers
+ traverse the linearization rules replacing parameters and labels by integers
+ reorganize the created GFC grammar so that it has just one abstract syntax
+  and one concrete syntax per language
+ apply UTF8 encoding to the grammar, if not yet applied (this is told by the
+  ``coding`` flag)
+ translate the GFC syntax tree to a GFCC syntax tree, using a simple
+  compositional mapping
+ perform the word-suffix optimization on GFCC linearization terms
+ perform subexpression elimination on each concrete syntax module
+ print out the GFCC code
+
+
+Notice that a major part of the compilation is done within GFC, so that
+GFC-related tasks (such as parser generation) could be performed by
+using the old algorithms.
+
+
+===Problems in GFCC compilation===
+
+Two major problems had to be solved in compiling GFC to GFCC:
+- consistent order of tables and records, to permit the array translation
+- run-time variables in complex parameter values.
+
+
+The current implementation is still experimental and may fail
+to generate correct code. Any errors remaining are likely to be 
+related to the two problems just mentioned.
+
+The order problem is solved in different ways for tables and records.
+For tables, the ``values`` optimization of GFC already manages to
+maintain a canonical order. But this order can be destroyed by the
+``share`` optimization. To make sure that GFCC compilation works properly,
+it is safest to recompile the GF grammar by using the ``values``
+optimization flag.
+
+Records can be canonically ordered by sorting them by labels.
+In fact, this was done in connection of the GFCC work as a part
+of the GFC generation, to guarantee consistency. This means that
+e.g. the ``s`` field will in general no longer appear as the first
+field, even if it does so in the GF source code. But relying on the
+order of fields in a labelled record would be misplaced anyway.
+
+The canonical form of records is further complicated by lock fields,
+i.e. dummy fields of form ``lock_C = <>``, which are added to grammar
+libraries to force intensionality of linearization types. The problem
+is that the absence of a lock field only generates a warning, not
+an error. Therefore a GFC grammar can contain objects of the same
+type with and without a lock field. This problem was solved in GFCC
+generation by just removing all lock fields (defined as fields whose
+type is the empty record type). This has the further advantage of
+(slightly) reducing the grammar size. More importantly, it is safe
+to remove lock fields, because they are never used in computation,
+and because intensional types are only needed in grammars reused
+as libraries, not in grammars used at runtime.
+
+While the order problem is rather bureaucratic in nature, run-time 
+variables are an interesting problem. They arise in the presence
+of complex parameter values, created by argument-taking constructors
+and parameter records. To give an example, consider the GF parameter
+type system
+```
+  Number = Sg | Pl ;
+  Person = P1 | P2 | P3 ;
+  Agr = Ag Number Person ;
+```
+The values can be translated to integers in the expected way,
+```
+  Sg = 0, Pl = 1
+  P1 = 0, P2 = 1, P3 = 2
+  Ag Sg P1 = 0, Ag Sg P2 = 1, Ag Sg P3 = 2,
+  Ag Pl P1 = 3, Ag Pl P2 = 4, Ag Pl P3 = 5
+```
+However, an argument of ``Agr`` can be a run-time variable, as in
+```
+  Ag np.n P3
+```
+This expression must first be translated to a case expression,
+```
+  case np.n of {
+    0 => 2 ;
+    1 => 5
+    }
+```
+which can then be translated to the GFCC term
+```
+  ([2,5] ! ($0 ! $1))  
+```
+assuming that the variable ``np`` is the first argument and that its
+``Number`` field is the second in the record.
+
+This transformation of course has to be performed recursively, since
+there can be several run-time variables in a parameter value:
+```
+  Ag np.n np.p
+```
+A similar transformation would be possible to deal with the double
+role of parameter records discussed above. Thus the type
+```
+  RNP = {n : Number ; p : Person}
+```
+could be uniformly translated into the set ``{0,1,2,3,4,5}``
+as ``Agr`` above. Selections would be simple instances of indexing.
+But any projection from the record should be translated into
+a case expression,
+```
+  rnp.n  ===> 
+  case rnp of {
+    0 => 0 ;
+    1 => 0 ;
+    2 => 0 ;
+    3 => 1 ;
+    4 => 1 ;
+    5 => 1
+    }
+```
+To avoid the code bloat resulting from this, we chose the alias representation
+which is easy enough to deal with in interpreters.
+
+
+===The representation of linearization types===
+
+Linearization types (``lincat``) are not needed when generating with
+GFCC, but they have been added to enable parser generation directly from
+GFCC. The linearization type definitions are shown as a part of the
+concrete syntax, by using terms to represent types. Here is the table
+showing how different linearization types are encoded.
+```
+  P*                         = size(P)        -- parameter type              
+  {_ : I ; __ : R}*          = (I* @ R*)      -- record of parameters
+  {r1 : T1 ; ... ; rn : Tn}* = [T1*,...,Tn*]  -- other record
+  (P => T)*                  = [T* ,...,T*]   -- size(P) times
+  Str*                       = ()
+```
+The category symbols are prefixed with two underscores (``__``).
+For example, the linearization type ``present/CatEng.NP`` is
+translated as follows:
+```
+  NP = {
+    a : {                     -- 6 = 2*3 values
+      n : {ParamX.Number} ;   -- 2 values
+      p : {ParamX.Person}     -- 3 values
+    } ;
+    s : {ResEng.Case} => Str  -- 3 values
+  }
+
+  __NP = [(6@[2,3]),[(),(),()]]
+```
+
+
+
+
+===Running the compiler and the GFCC interpreter===
+
+GFCC generation is a part of the 
+[developers' version http://www.cs.chalmers.se/Cs/Research/Language-technology/darcs/GF/doc/darcs.html] 
+of GF since September 2006. To invoke the compiler, the flag 
+``-printer=gfcc`` to the command
+``pm = print_multi`` is used. It is wise to recompile the grammar from
+source, since previously compiled libraries may not obey the canonical
+order of records. To ``strip`` the grammar before
+GFCC translation removes unnecessary interface references.
+Here is an example, performed in
+[example/bronzeage ../../../../../examples/bronzeage].
+```
+  i -src -path=.:prelude:resource-1.0/* -optimize=all_subs BronzeageEng.gf
+  i -src -path=.:prelude:resource-1.0/* -optimize=all_subs BronzeageGer.gf
+  strip
+  pm -printer=gfcc | wf bronze.gfcc
+```
+
+
+
+==The reference interpreter==
+
+The reference interpreter written in Haskell consists of the following files:
+```
+  -- source file for BNFC
+  GFCC.cf       -- labelled BNF grammar of gfcc
+
+  -- files generated by BNFC
+  AbsGFCC.hs    -- abstrac syntax of gfcc
+  ErrM.hs       -- error monad used internally
+  LexGFCC.hs    -- lexer of gfcc files
+  ParGFCC.hs    -- parser of gfcc files and syntax trees
+  PrintGFCC.hs  -- printer of gfcc files and syntax trees
+
+  -- hand-written files
+  DataGFCC.hs   -- post-parser grammar creation, linearization and evaluation
+  GenGFCC.hs    -- random and exhaustive generation, generate-and-test parsing
+  RunGFCC.hs    -- main function - a simple command interpreter
+```
+It is included in the
+[developers' version http://www.cs.chalmers.se/Cs/Research/Language-technology/darcs/GF/doc/darcs.html]
+of GF, in the subdirectory [``GF/src/GF/Canon/GFCC`` ../].
+
+To compile the interpreter, type
+```
+  make gfcc
+```
+in ``GF/src``. To run it, type
+```
+  ./gfcc <GFCC-file>
+```
+The available commands are
+- ``gr <Cat> <Int>``:  generate a number of random trees in category.
+  and show their linearizations in all languages
+- ``grt <Cat> <Int>``:  generate a number of random trees in category.
+  and show the trees and their linearizations in all languages
+- ``gt <Cat> <Int>``:  generate a number of trees in category from smallest,
+  and show their linearizations in all languages
+- ``gtt <Cat> <Int>``:  generate a number of trees in category from smallest,
+  and show the trees and their linearizations in all languages
+- ``p <Int> <Cat> <String>``: "parse", i.e. generate trees until match or 
+  until the given number have been generated
+- ``<Tree>``: linearize tree in all languages, also showing full records
+- ``quit``: terminate the system cleanly
+
+
+==Interpreter in C++==
+
+A base-line interpreter in C++ has been started.
+Its main functionality is random generation of trees and linearization of them.
+
+Here are some results from running the different interpreters, compared
+to running the same grammar in GF, saved in ``.gfcm`` format.
+The grammar contains the English, German, and Norwegian
+versions of Bronzeage. The experiment was carried out on
+Ubuntu Linux laptop with 1.5 GHz Intel centrino processor.
+
+||                | GF        | gfcc(hs) | gfcc++ |
+| program size    |   7249k   |   803k   |  113k
+| grammar size    |    336k   |  119k    |  119k
+| read grammar    |   1150ms  |  510ms   |  100ms
+| generate 222    |   9500ms  |  450ms   |  800ms
+| memory          |     21M   |   10M    |   20M
+
+
+
+To summarize:
+- going from GF to gfcc is a major win in both code size and efficiency
+- going from Haskell to C++ interpreter is not a win yet, because of a space
+  leak in the C++ version
+
+
+
+==Some things to do==
+
+Interpreter in Java.
+
+Parsing via MCFG 
+- the FCFG format can possibly be simplified
+- parser grammars should be saved in files to make interpreters easier
+
+
+Hand-written parsers for GFCC grammars to reduce code size
+(and efficiency?) of interpreters.
+
+Binary format and/or file compression of GFCC output.
+
+Syntax editor based on GFCC.
+
+Rewriting of resource libraries in order to exploit the
+word-suffix sharing better (depth-one tables, as in FM).
+
+
+