forked from GitHub/gf-core
doc on gfcc-lincat
This commit is contained in:
@@ -7,7 +7,7 @@
|
|||||||
<P ALIGN="center"><CENTER><H1>The GFCC Grammar Format</H1>
|
<P ALIGN="center"><CENTER><H1>The GFCC Grammar Format</H1>
|
||||||
<FONT SIZE="4">
|
<FONT SIZE="4">
|
||||||
<I>Aarne Ranta</I><BR>
|
<I>Aarne Ranta</I><BR>
|
||||||
October 3, 2006
|
October 19, 2006
|
||||||
</FONT></CENTER>
|
</FONT></CENTER>
|
||||||
|
|
||||||
<P></P>
|
<P></P>
|
||||||
@@ -31,11 +31,12 @@ October 3, 2006
|
|||||||
<LI><A HREF="#toc11">Compiling to GFCC</A>
|
<LI><A HREF="#toc11">Compiling to GFCC</A>
|
||||||
<UL>
|
<UL>
|
||||||
<LI><A HREF="#toc12">Problems in GFCC compilation</A>
|
<LI><A HREF="#toc12">Problems in GFCC compilation</A>
|
||||||
<LI><A HREF="#toc13">Running the compiler and the GFCC interpreter</A>
|
<LI><A HREF="#toc13">The representation of linearization types</A>
|
||||||
|
<LI><A HREF="#toc14">Running the compiler and the GFCC interpreter</A>
|
||||||
</UL>
|
</UL>
|
||||||
<LI><A HREF="#toc14">The reference interpreter</A>
|
<LI><A HREF="#toc15">The reference interpreter</A>
|
||||||
<LI><A HREF="#toc15">Interpreter in C++</A>
|
<LI><A HREF="#toc16">Interpreter in C++</A>
|
||||||
<LI><A HREF="#toc16">Some things to do</A>
|
<LI><A HREF="#toc17">Some things to do</A>
|
||||||
</UL>
|
</UL>
|
||||||
|
|
||||||
<P></P>
|
<P></P>
|
||||||
@@ -45,6 +46,14 @@ October 3, 2006
|
|||||||
Author's address:
|
Author's address:
|
||||||
<A HREF="http://www.cs.chalmers.se/~aarne"><CODE>http://www.cs.chalmers.se/~aarne</CODE></A>
|
<A HREF="http://www.cs.chalmers.se/~aarne"><CODE>http://www.cs.chalmers.se/~aarne</CODE></A>
|
||||||
</P>
|
</P>
|
||||||
|
<P>
|
||||||
|
History:
|
||||||
|
</P>
|
||||||
|
<UL>
|
||||||
|
<LI>19 Oct: translation of lincats, new figures on C++
|
||||||
|
<LI>3 Oct 2006: first version
|
||||||
|
</UL>
|
||||||
|
|
||||||
<A NAME="toc1"></A>
|
<A NAME="toc1"></A>
|
||||||
<H2>What is GFCC</H2>
|
<H2>What is GFCC</H2>
|
||||||
<P>
|
<P>
|
||||||
@@ -629,6 +638,39 @@ To avoid the code bloat resulting from this, we chose the alias representation
|
|||||||
which is easy enough to deal with in interpreters.
|
which is easy enough to deal with in interpreters.
|
||||||
</P>
|
</P>
|
||||||
<A NAME="toc13"></A>
|
<A NAME="toc13"></A>
|
||||||
|
<H3>The representation of linearization types</H3>
|
||||||
|
<P>
|
||||||
|
Linearization types (<CODE>lincat</CODE>) are not needed when generating with
|
||||||
|
GFCC, but they have been added to enable parser generation directly from
|
||||||
|
GFCC. The linearization type definitions are shown as a part of the
|
||||||
|
concrete syntax, by using terms to represent types. Here is the table
|
||||||
|
showing how different linearization types are encoded.
|
||||||
|
</P>
|
||||||
|
<PRE>
|
||||||
|
P* = size(P) -- parameter type
|
||||||
|
{_ : I ; __ : R}* = (I* @ R*) -- record of parameters
|
||||||
|
{r1 : T1 ; ... ; rn : Tn}* = [T1*,...,Tn*] -- other record
|
||||||
|
(P => T)* = [T* ,...,T*] -- size(P) times
|
||||||
|
Str* = ()
|
||||||
|
</PRE>
|
||||||
|
<P>
|
||||||
|
The category symbols are prefixed with two underscores (<CODE>__</CODE>).
|
||||||
|
For example, the linearization type <CODE>present/CatEng.NP</CODE> is
|
||||||
|
translated as follows:
|
||||||
|
</P>
|
||||||
|
<PRE>
|
||||||
|
NP = {
|
||||||
|
a : { -- 6 = 2*3 values
|
||||||
|
n : {ParamX.Number} ; -- 2 values
|
||||||
|
p : {ParamX.Person} -- 3 values
|
||||||
|
} ;
|
||||||
|
s : {ResEng.Case} => Str -- 3 values
|
||||||
|
}
|
||||||
|
|
||||||
|
__NP = [(6@[2,3]),[(),(),()]]
|
||||||
|
</PRE>
|
||||||
|
<P></P>
|
||||||
|
<A NAME="toc14"></A>
|
||||||
<H3>Running the compiler and the GFCC interpreter</H3>
|
<H3>Running the compiler and the GFCC interpreter</H3>
|
||||||
<P>
|
<P>
|
||||||
GFCC generation is a part of the
|
GFCC generation is a part of the
|
||||||
@@ -649,7 +691,7 @@ Here is an example, performed in
|
|||||||
pm -printer=gfcc | wf bronze.gfcc
|
pm -printer=gfcc | wf bronze.gfcc
|
||||||
</PRE>
|
</PRE>
|
||||||
<P></P>
|
<P></P>
|
||||||
<A NAME="toc14"></A>
|
<A NAME="toc15"></A>
|
||||||
<H2>The reference interpreter</H2>
|
<H2>The reference interpreter</H2>
|
||||||
<P>
|
<P>
|
||||||
The reference interpreter written in Haskell consists of the following files:
|
The reference interpreter written in Haskell consists of the following files:
|
||||||
@@ -705,7 +747,7 @@ The available commands are
|
|||||||
<LI><CODE>quit</CODE>: terminate the system cleanly
|
<LI><CODE>quit</CODE>: terminate the system cleanly
|
||||||
</UL>
|
</UL>
|
||||||
|
|
||||||
<A NAME="toc15"></A>
|
<A NAME="toc16"></A>
|
||||||
<H2>Interpreter in C++</H2>
|
<H2>Interpreter in C++</H2>
|
||||||
<P>
|
<P>
|
||||||
A base-line interpreter in C++ has been started.
|
A base-line interpreter in C++ has been started.
|
||||||
@@ -741,7 +783,7 @@ Ubuntu Linux laptop with 1.5 GHz Intel centrino processor.
|
|||||||
<TD>read grammar</TD>
|
<TD>read grammar</TD>
|
||||||
<TD ALIGN="center">1150ms</TD>
|
<TD ALIGN="center">1150ms</TD>
|
||||||
<TD ALIGN="center">510ms</TD>
|
<TD ALIGN="center">510ms</TD>
|
||||||
<TD ALIGN="right">150ms</TD>
|
<TD ALIGN="right">100ms</TD>
|
||||||
</TR>
|
</TR>
|
||||||
<TR>
|
<TR>
|
||||||
<TD>generate 222</TD>
|
<TD>generate 222</TD>
|
||||||
@@ -753,7 +795,7 @@ Ubuntu Linux laptop with 1.5 GHz Intel centrino processor.
|
|||||||
<TD>memory</TD>
|
<TD>memory</TD>
|
||||||
<TD ALIGN="center">21M</TD>
|
<TD ALIGN="center">21M</TD>
|
||||||
<TD ALIGN="center">10M</TD>
|
<TD ALIGN="center">10M</TD>
|
||||||
<TD ALIGN="right">2M</TD>
|
<TD ALIGN="right">20M</TD>
|
||||||
</TR>
|
</TR>
|
||||||
</TABLE>
|
</TABLE>
|
||||||
|
|
||||||
@@ -763,11 +805,11 @@ To summarize:
|
|||||||
</P>
|
</P>
|
||||||
<UL>
|
<UL>
|
||||||
<LI>going from GF to gfcc is a major win in both code size and efficiency
|
<LI>going from GF to gfcc is a major win in both code size and efficiency
|
||||||
<LI>going from Haskell to C++ interpreter is a win in code size and memory,
|
<LI>going from Haskell to C++ interpreter is not a win yet, because of a space
|
||||||
but not so much in speed
|
leak in the C++ version
|
||||||
</UL>
|
</UL>
|
||||||
|
|
||||||
<A NAME="toc16"></A>
|
<A NAME="toc17"></A>
|
||||||
<H2>Some things to do</H2>
|
<H2>Some things to do</H2>
|
||||||
<P>
|
<P>
|
||||||
Interpreter in Java.
|
Interpreter in Java.
|
||||||
|
|||||||
@@ -1,12 +1,17 @@
|
|||||||
The GFCC Grammar Format
|
The GFCC Grammar Format
|
||||||
Aarne Ranta
|
Aarne Ranta
|
||||||
October 3, 2006
|
October 19, 2006
|
||||||
|
|
||||||
Author's address:
|
Author's address:
|
||||||
[``http://www.cs.chalmers.se/~aarne`` http://www.cs.chalmers.se/~aarne]
|
[``http://www.cs.chalmers.se/~aarne`` http://www.cs.chalmers.se/~aarne]
|
||||||
|
|
||||||
% to compile: txt2tags -thtml --toc gfcc.txt
|
% to compile: txt2tags -thtml --toc gfcc.txt
|
||||||
|
|
||||||
|
History:
|
||||||
|
- 19 Oct: translation of lincats, new figures on C++
|
||||||
|
- 3 Oct 2006: first version
|
||||||
|
|
||||||
|
|
||||||
==What is GFCC==
|
==What is GFCC==
|
||||||
|
|
||||||
GFCC is a low-level format for GF grammars. Its aim is to contain the minimum
|
GFCC is a low-level format for GF grammars. Its aim is to contain the minimum
|
||||||
@@ -502,6 +507,37 @@ To avoid the code bloat resulting from this, we chose the alias representation
|
|||||||
which is easy enough to deal with in interpreters.
|
which is easy enough to deal with in interpreters.
|
||||||
|
|
||||||
|
|
||||||
|
===The representation of linearization types===
|
||||||
|
|
||||||
|
Linearization types (``lincat``) are not needed when generating with
|
||||||
|
GFCC, but they have been added to enable parser generation directly from
|
||||||
|
GFCC. The linearization type definitions are shown as a part of the
|
||||||
|
concrete syntax, by using terms to represent types. Here is the table
|
||||||
|
showing how different linearization types are encoded.
|
||||||
|
```
|
||||||
|
P* = size(P) -- parameter type
|
||||||
|
{_ : I ; __ : R}* = (I* @ R*) -- record of parameters
|
||||||
|
{r1 : T1 ; ... ; rn : Tn}* = [T1*,...,Tn*] -- other record
|
||||||
|
(P => T)* = [T* ,...,T*] -- size(P) times
|
||||||
|
Str* = ()
|
||||||
|
```
|
||||||
|
The category symbols are prefixed with two underscores (``__``).
|
||||||
|
For example, the linearization type ``present/CatEng.NP`` is
|
||||||
|
translated as follows:
|
||||||
|
```
|
||||||
|
NP = {
|
||||||
|
a : { -- 6 = 2*3 values
|
||||||
|
n : {ParamX.Number} ; -- 2 values
|
||||||
|
p : {ParamX.Person} -- 3 values
|
||||||
|
} ;
|
||||||
|
s : {ResEng.Case} => Str -- 3 values
|
||||||
|
}
|
||||||
|
|
||||||
|
__NP = [(6@[2,3]),[(),(),()]]
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
===Running the compiler and the GFCC interpreter===
|
===Running the compiler and the GFCC interpreter===
|
||||||
|
|
||||||
@@ -584,16 +620,16 @@ Ubuntu Linux laptop with 1.5 GHz Intel centrino processor.
|
|||||||
|| | GF | gfcc(hs) | gfcc++ |
|
|| | GF | gfcc(hs) | gfcc++ |
|
||||||
| program size | 7249k | 803k | 113k
|
| program size | 7249k | 803k | 113k
|
||||||
| grammar size | 336k | 119k | 119k
|
| grammar size | 336k | 119k | 119k
|
||||||
| read grammar | 1150ms | 510ms | 150ms
|
| read grammar | 1150ms | 510ms | 100ms
|
||||||
| generate 222 | 9500ms | 450ms | 800ms
|
| generate 222 | 9500ms | 450ms | 800ms
|
||||||
| memory | 21M | 10M | 2M
|
| memory | 21M | 10M | 20M
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
To summarize:
|
To summarize:
|
||||||
- going from GF to gfcc is a major win in both code size and efficiency
|
- going from GF to gfcc is a major win in both code size and efficiency
|
||||||
- going from Haskell to C++ interpreter is a win in code size and memory,
|
- going from Haskell to C++ interpreter is not a win yet, because of a space
|
||||||
but not so much in speed
|
leak in the C++ version
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user