doc on gfcc-lincat

2006-10-19 16:25:55 +00:00
parent 4b528b6ee2
commit 98e916831a
2 changed files with 95 additions and 17 deletions
--- a/src/GF/Canon/GFCC/doc/gfcc.html
+++ b/src/GF/Canon/GFCC/doc/gfcc.html
@@ -7,7 +7,7 @@
 <P ALIGN="center"><CENTER><H1>The GFCC Grammar Format</H1>
 <FONT SIZE="4">
 <I>Aarne Ranta</I><BR>
-October 3, 2006
+October 19, 2006
 </FONT></CENTER>
 <P></P>
@@ -31,11 +31,12 @@ October 3, 2006
    <LI><A HREF="#toc11">Compiling to GFCC</A>
      <UL>
      <LI><A HREF="#toc12">Problems in GFCC compilation</A>
-      <LI><A HREF="#toc13">Running the compiler and the GFCC interpreter</A>
+      <LI><A HREF="#toc13">The representation of linearization types</A>
      <LI><A HREF="#toc14">Running the compiler and the GFCC interpreter</A>
      </UL>
-    <LI><A HREF="#toc14">The reference interpreter</A>
+    <LI><A HREF="#toc15">The reference interpreter</A>
-    <LI><A HREF="#toc15">Interpreter in C++</A>
+    <LI><A HREF="#toc16">Interpreter in C++</A>
-    <LI><A HREF="#toc16">Some things to do</A>
+    <LI><A HREF="#toc17">Some things to do</A>
    </UL>
 <P></P>
@@ -45,6 +46,14 @@ October 3, 2006
 Author's address:
 <A HREF="http://www.cs.chalmers.se/~aarne"><CODE>http://www.cs.chalmers.se/~aarne</CODE></A>
 </P>
 <P>
 History:
 </P>
 <UL>
 <LI>19 Oct: translation of lincats, new figures on C++
 <LI>3 Oct 2006: first version
 </UL>
 <A NAME="toc1"></A>
 <H2>What is GFCC</H2>
 <P>
@@ -629,6 +638,39 @@ To avoid the code bloat resulting from this, we chose the alias representation
 which is easy enough to deal with in interpreters.
 </P>
 <A NAME="toc13"></A>
 <H3>The representation of linearization types</H3>
 <P>
 Linearization types (<CODE>lincat</CODE>) are not needed when generating with
 GFCC, but they have been added to enable parser generation directly from
 GFCC. The linearization type definitions are shown as a part of the
 concrete syntax, by using terms to represent types. Here is the table
 showing how different linearization types are encoded.
 </P>
 <PRE>
    P*                         = size(P)        -- parameter type              
    {_ : I ; __ : R}*          = (I* @ R*)      -- record of parameters
    {r1 : T1 ; ... ; rn : Tn}* = [T1*,...,Tn*]  -- other record
    (P =&gt; T)*                  = [T* ,...,T*]   -- size(P) times
    Str*                       = ()
 </PRE>
 <P>
 The category symbols are prefixed with two underscores (<CODE>__</CODE>).
 For example, the linearization type <CODE>present/CatEng.NP</CODE> is
 translated as follows:
 </P>
 <PRE>
    NP = {
      a : {                     -- 6 = 2*3 values
        n : {ParamX.Number} ;   -- 2 values
        p : {ParamX.Person}     -- 3 values
      } ;
      s : {ResEng.Case} =&gt; Str  -- 3 values
    }
    __NP = [(6@[2,3]),[(),(),()]]
 </PRE>
 <P></P>
 <A NAME="toc14"></A>
 <H3>Running the compiler and the GFCC interpreter</H3>
 <P>
 GFCC generation is a part of the 
@@ -649,7 +691,7 @@ Here is an example, performed in
    pm -printer=gfcc | wf bronze.gfcc
 </PRE>
 <P></P>
-<A NAME="toc14"></A>
+<A NAME="toc15"></A>
 <H2>The reference interpreter</H2>
 <P>
 The reference interpreter written in Haskell consists of the following files:
@@ -705,7 +747,7 @@ The available commands are
 <LI><CODE>quit</CODE>: terminate the system cleanly
 </UL>
-<A NAME="toc15"></A>
+<A NAME="toc16"></A>
 <H2>Interpreter in C++</H2>
 <P>
 A base-line interpreter in C++ has been started.
@@ -741,7 +783,7 @@ Ubuntu Linux laptop with 1.5 GHz Intel centrino processor.
 <TD>read grammar</TD>
 <TD ALIGN="center">1150ms</TD>
 <TD ALIGN="center">510ms</TD>
-<TD ALIGN="right">150ms</TD>
+<TD ALIGN="right">100ms</TD>
 </TR>
 <TR>
 <TD>generate 222</TD>
@@ -753,7 +795,7 @@ Ubuntu Linux laptop with 1.5 GHz Intel centrino processor.
 <TD>memory</TD>
 <TD ALIGN="center">21M</TD>
 <TD ALIGN="center">10M</TD>
-<TD ALIGN="right">2M</TD>
+<TD ALIGN="right">20M</TD>
 </TR>
 </TABLE>
@@ -763,11 +805,11 @@ To summarize:
 </P>
 <UL>
 <LI>going from GF to gfcc is a major win in both code size and efficiency
-<LI>going from Haskell to C++ interpreter is a win in code size and memory,
+<LI>going from Haskell to C++ interpreter is not a win yet, because of a space
-  but not so much in speed
+  leak in the C++ version
 </UL>
-<A NAME="toc16"></A>
+<A NAME="toc17"></A>
 <H2>Some things to do</H2>
 <P>
 Interpreter in Java.
--- a/src/GF/Canon/GFCC/doc/gfcc.txt
+++ b/src/GF/Canon/GFCC/doc/gfcc.txt
@@ -1,12 +1,17 @@
 The GFCC Grammar Format
 Aarne Ranta
-October 3, 2006
+October 19, 2006
 Author's address:
 [``http://www.cs.chalmers.se/~aarne`` http://www.cs.chalmers.se/~aarne]
 % to compile: txt2tags -thtml --toc gfcc.txt
 History:
 - 19 Oct: translation of lincats, new figures on C++
 - 3 Oct 2006: first version
 ==What is GFCC==
 GFCC is a low-level format for GF grammars. Its aim is to contain the minimum
@@ -502,6 +507,37 @@ To avoid the code bloat resulting from this, we chose the alias representation
 which is easy enough to deal with in interpreters.
 ===The representation of linearization types===
 Linearization types (``lincat``) are not needed when generating with
 GFCC, but they have been added to enable parser generation directly from
 GFCC. The linearization type definitions are shown as a part of the
 concrete syntax, by using terms to represent types. Here is the table
 showing how different linearization types are encoded.
 ```
  P*                         = size(P)        -- parameter type              
  {_ : I ; __ : R}*          = (I* @ R*)      -- record of parameters
  {r1 : T1 ; ... ; rn : Tn}* = [T1*,...,Tn*]  -- other record
  (P => T)*                  = [T* ,...,T*]   -- size(P) times
  Str*                       = ()
 ```
 The category symbols are prefixed with two underscores (``__``).
 For example, the linearization type ``present/CatEng.NP`` is
 translated as follows:
 ```
  NP = {
    a : {                     -- 6 = 2*3 values
      n : {ParamX.Number} ;   -- 2 values
      p : {ParamX.Person}     -- 3 values
    } ;
    s : {ResEng.Case} => Str  -- 3 values
  }
  __NP = [(6@[2,3]),[(),(),()]]
 ```
 ===Running the compiler and the GFCC interpreter===
@@ -584,16 +620,16 @@ Ubuntu Linux laptop with 1.5 GHz Intel centrino processor.
 ||                | GF        | gfcc(hs) | gfcc++ |
 | program size    |   7249k   |   803k   |  113k
 | grammar size    |    336k   |  119k    |  119k
-| read grammar    |   1150ms  |  510ms   |  150ms
+| read grammar    |   1150ms  |  510ms   |  100ms
 | generate 222    |   9500ms  |  450ms   |  800ms
-| memory          |     21M   |   10M    |    2M
+| memory          |     21M   |   10M    |   20M
 To summarize:
 - going from GF to gfcc is a major win in both code size and efficiency
- going from Haskell to C++ interpreter is a win in code size and memory,
+- going from Haskell to C++ interpreter is not a win yet, because of a space
-  but not so much in speed
+  leak in the C++ version