From 98e916831a97df96ced70befb23650ed53d8ef6b Mon Sep 17 00:00:00 2001
From: aarne The GFCC Grammar Format
Aarne Ranta
-October 3, 2006
+October 19, 2006
http://www.cs.chalmers.se/~aarne
+History: +
+@@ -629,6 +638,39 @@ To avoid the code bloat resulting from this, we chose the alias representation which is easy enough to deal with in interpreters.
+
+Linearization types (lincat) are not needed when generating with
+GFCC, but they have been added to enable parser generation directly from
+GFCC. The linearization type definitions are shown as a part of the
+concrete syntax, by using terms to represent types. Here is the table
+showing how different linearization types are encoded.
+
+ P* = size(P) -- parameter type
+ {_ : I ; __ : R}* = (I* @ R*) -- record of parameters
+ {r1 : T1 ; ... ; rn : Tn}* = [T1*,...,Tn*] -- other record
+ (P => T)* = [T* ,...,T*] -- size(P) times
+ Str* = ()
+
+
+The category symbols are prefixed with two underscores (__).
+For example, the linearization type present/CatEng.NP is
+translated as follows:
+
+ NP = {
+ a : { -- 6 = 2*3 values
+ n : {ParamX.Number} ; -- 2 values
+ p : {ParamX.Person} -- 3 values
+ } ;
+ s : {ResEng.Case} => Str -- 3 values
+ }
+
+ __NP = [(6@[2,3]),[(),(),()]]
+
+
+
GFCC generation is a part of the @@ -649,7 +691,7 @@ Here is an example, performed in pm -printer=gfcc | wf bronze.gfcc
- +The reference interpreter written in Haskell consists of the following files: @@ -705,7 +747,7 @@ The available commands are
quit: terminate the system cleanly
-
+
A base-line interpreter in C++ has been started. @@ -741,7 +783,7 @@ Ubuntu Linux laptop with 1.5 GHz Intel centrino processor.
Interpreter in Java. diff --git a/src/GF/Canon/GFCC/doc/gfcc.txt b/src/GF/Canon/GFCC/doc/gfcc.txt index daa55137b..6ffd9bd64 100644 --- a/src/GF/Canon/GFCC/doc/gfcc.txt +++ b/src/GF/Canon/GFCC/doc/gfcc.txt @@ -1,12 +1,17 @@ The GFCC Grammar Format Aarne Ranta -October 3, 2006 +October 19, 2006 Author's address: [``http://www.cs.chalmers.se/~aarne`` http://www.cs.chalmers.se/~aarne] % to compile: txt2tags -thtml --toc gfcc.txt +History: +- 19 Oct: translation of lincats, new figures on C++ +- 3 Oct 2006: first version + + ==What is GFCC== GFCC is a low-level format for GF grammars. Its aim is to contain the minimum @@ -502,6 +507,37 @@ To avoid the code bloat resulting from this, we chose the alias representation which is easy enough to deal with in interpreters. +===The representation of linearization types=== + +Linearization types (``lincat``) are not needed when generating with +GFCC, but they have been added to enable parser generation directly from +GFCC. The linearization type definitions are shown as a part of the +concrete syntax, by using terms to represent types. Here is the table +showing how different linearization types are encoded. +``` + P* = size(P) -- parameter type + {_ : I ; __ : R}* = (I* @ R*) -- record of parameters + {r1 : T1 ; ... ; rn : Tn}* = [T1*,...,Tn*] -- other record + (P => T)* = [T* ,...,T*] -- size(P) times + Str* = () +``` +The category symbols are prefixed with two underscores (``__``). +For example, the linearization type ``present/CatEng.NP`` is +translated as follows: +``` + NP = { + a : { -- 6 = 2*3 values + n : {ParamX.Number} ; -- 2 values + p : {ParamX.Person} -- 3 values + } ; + s : {ResEng.Case} => Str -- 3 values + } + + __NP = [(6@[2,3]),[(),(),()]] +``` + + + ===Running the compiler and the GFCC interpreter=== @@ -584,16 +620,16 @@ Ubuntu Linux laptop with 1.5 GHz Intel centrino processor. || | GF | gfcc(hs) | gfcc++ | | program size | 7249k | 803k | 113k | grammar size | 336k | 119k | 119k -| read grammar | 1150ms | 510ms | 150ms +| read grammar | 1150ms | 510ms | 100ms | generate 222 | 9500ms | 450ms | 800ms -| memory | 21M | 10M | 2M +| memory | 21M | 10M | 20M To summarize: - going from GF to gfcc is a major win in both code size and efficiency -- going from Haskell to C++ interpreter is a win in code size and memory, - but not so much in speed +- going from Haskell to C++ interpreter is not a win yet, because of a space + leak in the C++ version