Files
gf-core/doc/gf2.2-highlights.html
2005-05-20 08:42:18 +00:00

174 lines
6.9 KiB
HTML

<html>
<body bgcolor="#FFFFFF" text="#000000">
<center>
<h1>Grammatical Framework Version 2.2</h1>
Highlights of GF version 2.2.
<p>
9/5/2005
<p>
<a href="http://www.cs.chalmers.se/~aarne">Aarne Ranta</a>
</center>
<h2>Summary of novelties in Version 2.2 in comparison to 2.1</h2>
<li> New optimizations to reduce the size of GFC files
<li> Improved parsing algorithms
<li> Lots of bug fixes
<li> Separate <tt>reuse</tt> modules no longer needed
<li> Several new command options
<li> New documentation:
<ul>
<li> <a href="gf-modules.html">module system document</tt>
<li> <a href="tutorial/gf-tutorial2.html">new tutorial</a>, based on the module system (unfinished)
</ul>
<li> New resource libraries
<li> New example grammars
<li> Visualization of module dependency graph
<li> In the editor GUI, text corresponding to subtrees with constraints marked with red colour
<li> Hierarchic modules used in the source code
<li> <a href="http://www.haskell.org/haddock">haddock</a> documentation available for source code
<li> Optimizations to reduce GF's memory footprint when using large grammars.
<li> The <tt>pm</tt> command can now convert identifiers in the grammar to UTF-8.
<h3>Compiler optimizations</h3>
The sometimes exploding size of generated <tt>gfc</tt> and
<tt>gfr</tt> files has made it urgent to find optimizations
that reduce the size of the code. There are five
combinations optimizations that can be chosen, as the value of the
<tt>optimize</tt> flag:
<ul>
<li> <tt>share</tt>: group tables so that common branch values are shared
by the use of disjunctive patterns.
<li> <tt>parametrize</tt>: if table branches differ at most at the
occurrence of the pattern, replace the expanded table by a one-branch
table with a variable. If this fails, perform <tt>share</tt>.
<li> <tt>values</tt>: only show the values of table branches, not the
patterns.
<li> <tt>all</tt>: try <tt>parametrize</tt>; if this fails, do <tt>values</tt>.
<li> <tt>none</tt>: don't do any optimizations
</ul>
The <tt>share</tt> and <tt>parametrize</tt> optimizations are always
just good, whereas the <tt>values</tt> optimization may slow down the
use of the table. However, it is very good for grammars mostly consisting
of the inflection tables of lexical items: it can reduce the file size
by the factor of 4.
<p>
An optimization can be selected individually for each
<tt>resource</tt> and <tt>concrete</tt> module by including
the judgement
<pre>
flags optimize=(share|parametrize|values|all|none) ;
</pre>
in the module body. These flags can be overridden by a flag given
in the <tt>i</tt> command, e.g.
<pre>
i -src -optimize=none Foo.gf
</pre>
Notice that the option <tt>-src</tt> is needed if there already are
generated files created with other optimization flags.
<p>
<b>Important notice</b>: If you use the
<a href="http://www.cs.chalmers.se/~bringert/gf/gf-java.html">
Embedded GF Interpreter</a>,
or the improved parsing algorithms described below,
only the values <tt>none</tt>,
<tt>share</tt> and <tt>values</tt> can be used; the stronger optimizations are not
supported yet.
Also note that currently, GF aborts and reports an error if the stronger optimizations are used
when creating the grammar for the Embedded GF Interpreter, or when trying to parse.
<h3>Improved parsing algorithms</h3>
We have implemented some of the suggested parsing algorithms described in
Peter Ljunglöf's <a href="http://www.cs.chalmers.se/~peb/pubs.html">PhD thesis</a>.
So now there are the following options for parsing:
<ul>
<li>The default parser. It uses a (possibly) very overgenerating context-free grammar, and filters the resulting parse trees by type-checking.
<li>The <tt>-cfg</tt> flag. It uses a much less overgenerating context-free grammar, and filters as above.
<li>The <tt>-mcfg</tt> flag. It uses an even less overgenerating <em>multiple context-free grammar</em>.
If the abstract syntax is context-free, meaning that there are no dependent types and only first-order functions,
the trees do not have to be filtered at all.
</ul>
The option <tt>-parser=X</tt> selects the parsing strategy. The default parser has the strategies
<tt>chart</tt>, <tt>bottomup</tt>, <tt>topdown</tt>, <tt>old</tt>, with the first one being the default.
The <tt>-cfg</tt> and <tt>-mcfg</tt> parsers only recognize the <tt>bottomup</tt> and <tt>topdown</tt> strategies.
<p>
<b>Note</b> that the <tt>-cfg</tt> and <tt>-mcfg</tt> parsers can take a very long time on their first call, since
they have to convert the GF grammar. This will only happen once in a GF run, provided the GF files are not changed.
<p>
<b>Tips</b> for choosing the best parser for your grammar. Try with the default parser; if it is too slow, try the other two.
Remember that the first time you parse they will be very slow, since they have to build parsing information.
the <tt>-cfg</tt> parser is best on grammars with many parameters and inflection tables, and
The <tt>-mcfg</tt> parser is even better when the grammar also has discontinuous constituents.
<p>
Here is a small example from the resource library:
<pre>
> i -src -optimize=share lib/resource/english/LangEng.gf
> p -cat=S ""
> p -cat=S -cfg ""
> p -cat=S -mcfg ""
{Comment: Just some dummy parsing calls to calculate the parsing information}
> p -cat=S -rawtrees=200000 "you will be running"
{Comment: Nr of unfiltered trees: 169296 -- 99,996% av the trees are ill-typed}
UseCl (PosTP TFuture ASimul) (SPredProgVP thou_NP (IPredV AAnter run_V))
UseCl (PosTP TFuture ASimul) (SPredProgVP thou_NP (IPredV ASimul run_V))
UseCl (PosTP TFuture ASimul) (SPredProgVP ye_NP (IPredV AAnter run_V))
UseCl (PosTP TFuture ASimul) (SPredProgVP ye_NP (IPredV ASimul run_V))
UseCl (PosTP TFuture ASimul) (SPredProgVP you_NP (IPredV AAnter run_V))
UseCl (PosTP TFuture ASimul) (SPredProgVP you_NP (IPredV ASimul run_V))
17730 msec
> p -cat=S -cfg "you will be running"
{Comment: Nr of unfiltered trees: 246 -- 97,5% of the trees are ill-typed}
UseCl (PosTP TFuture ASimul) (SPredProgVP thou_NP (IPredV AAnter run_V))
UseCl (PosTP TFuture ASimul) (SPredProgVP thou_NP (IPredV ASimul run_V))
UseCl (PosTP TFuture ASimul) (SPredProgVP ye_NP (IPredV AAnter run_V))
UseCl (PosTP TFuture ASimul) (SPredProgVP ye_NP (IPredV ASimul run_V))
UseCl (PosTP TFuture ASimul) (SPredProgVP you_NP (IPredV AAnter run_V))
UseCl (PosTP TFuture ASimul) (SPredProgVP you_NP (IPredV ASimul run_V))
1580 msec
> p -cat=S -mcfg "you will be running"
{Comment: Nr of unfiltered trees: 6 -- all trees are type-corrent}
UseCl (PosTP TFuture ASimul) (SPredProgVP thou_NP (IPredV AAnter run_V))
UseCl (PosTP TFuture ASimul) (SPredProgVP thou_NP (IPredV ASimul run_V))
UseCl (PosTP TFuture ASimul) (SPredProgVP ye_NP (IPredV AAnter run_V))
UseCl (PosTP TFuture ASimul) (SPredProgVP ye_NP (IPredV ASimul run_V))
UseCl (PosTP TFuture ASimul) (SPredProgVP you_NP (IPredV AAnter run_V))
UseCl (PosTP TFuture ASimul) (SPredProgVP you_NP (IPredV ASimul run_V))
470 msec
</pre>
</body>
</html>