gf-core/doc/gf2-highlights.html

<html>

<body bgcolor="#FFFFFF" text="#000000">

<center>

<h1>Grammatical Framework Version 2</h1>

Highlights, versions 2.0, 2.1, and 2.2 (2.2 coming soon)

<p>

13/10/2003 - 25/11 - 2/4/2004 - 18/6 - 13/10 - 16/2/2005

<p>

<a href="http://www.cs.chalmers.se/~aarne">Aarne Ranta</a>

</center>


<h2>Syntax of GF</h2>

An accurate <a href="DocGF.pdf">language specification</a> is now available.


<h2>Summary of novelties in Versions 2.0 to 2.2</h2>

<h4>Module system</h4>

<li> Separate modules for <tt>abstract</tt>,
     <tt>concrete</tt>, and <tt>resource</tt>.
<li> Replaces the file-based <tt>include</tt> system
<li> Name space handling with qualified names
<li> Hierarchic structure (single inheritance <tt>**</tt>) +
     cross-cutting reuse (<tt>open</tt>)
<li> Separate compilation, one module per file
<li> Reuse of <tt>abstract</tt>+<tt>concrete</tt> as <tt>resource</tt><br>
     <b>Version 2.2</b>: separate <tt>reuse</tt> modules no longer needed
<li> Parametrized modules:
     <tt>interface</tt>, <tt>instance</tt>, <tt>incomplete</tt>.
<li> New experimental module types: <tt>transfer</tt>,
     <tt>union</tt>.
<li> Version 2.1: multiple inheritance in module extension.

<h4>Canonical format GFC</h4>

<li> The target of GF compiler; to reuse, just read in.
<li> Readable by Haskell/Java/C++/C applications.
<li> Version 2.1: Java interpreter available for GFC (by Björn Bringert).
<li> <b>Version 2.2</b>: new optimizations to reduce the size of GFC files


<h4>New features in expression language</h4>

<li> Disjunctive patterns <tt>P | ... | Q</tt>.
<li> String patterns <tt>"foo"</tt>.
<li> Binding token <tt>&+</tt> to glue separate tokens at unlexing phase,
     and unlexer to resolve this.
<li> New syntax alternatives for local definitions: <tt>let</tt> without
     braces and <tt>where</tt>.
<li> Pattern variables can be used on lhs's of <tt>oper</tt> definitions.
<li> New Unicode transliterations (by Harad Hammarström).
<li> Version 2.1: Initial segments of integers
     (<tt>Ints</tt><i>n</i>) available as parameter types.


<h4>New shell commands and command functionalities</h4>

<li> <tt>pi</tt> = <tt>print_info</tt>: information on an identifier in scope.
<li> <tt>h</tt> = <tt>help</tt> now in long or short form,
     and on individual commands.
<li> <tt>gt</tt> = <tt>generate_trees</tt>: all trees of a given
     category or instantiations of a given incomplete term, up to a
     given depth.
<li> <tt>gr</tt> = <tt>generate_random</tt> can now be given
     an incomplete term as an argument, to constrain generation.
<li> <tt>so</tt> = <tt>show_opers</tt> shows all <tt>ope</tt>
     operations with a given value type.
<li> <tt>pm</tt> = <tt>print_multi</tt> prints the multilingual
     grammar resident in the current state to a ready-compiles
     <tt>.gfcm</tt> file.
<li> <b>Version 2.2</b>: several new command options
<li> <b>Version 2.2</b>: <tt>vg</tt> visializes the module dependency graph
<li> All commands have both long and short names (see help). Short
     names are easier to type, whereas long names
     make scripts more readable.
<li> Meaningless command options generate warnings.


<h4>New editor features</h4>

<li> Active text field: click the middle button in the focus to send
     in refinement through the parser.
<li> Clipboard: copy complex terms into the refine menu.
<li> <b>Version 2.2</b>: text corresponding to subtrees with constraints marked with red colour


<h4>Improved implementation</h4>

<li> Haskell source code is organized into subdirectories.
<li> BNF Converter is used for defining the languages GF and GFC, which also
     give reliable LaTeX documentation.
<li> Lexical rules sorted out by option <tt>-cflexer</tt> for efficient
     parsing with large lexica.
<li> GHC optimizations and strictness flags are used for improving performance.
<li> <b>Version 2.2</b>: started <a
     href="http://www.haskell.org/haddock">haddock</a> documentation
     by using uniform module headers


<h4>New parser (work in progress)</h4>

<li> By Peter Ljunglöf, based on MCFG.
<li> Much more efficient for morphology and discontinuous constituents.
<li> Treatment of cyclic rules.
<li> Version 2.1: improved generation of speech recognition
     grammars (by Björn Bringert).
<li> Version 2.1: output of Labelled BNF files readable by the
     BNF Converter.


<!-- NEW -->

<h2>Abstract, concrete, and resource modules</h2>

Judgement forms are sorted as follows:
<ul>
<li> abstract:
  <tt>cat</tt>, <tt>fun</tt>, <tt>def</tt>, <tt>data</tt>, <tt>flags</tt>
<li> concrete:
  <tt>lincat</tt>, <tt>cat</tt>, <tt>printname</tt>, <tt>flags</tt>
<li> resource:
  <tt>param</tt>, <tt>oper</tt>, <tt>flags</tt>
<li>
</ul>
Example:
<pre>
  abstract Sums = {
    cat
      Exp ;
    fun
      One : Exp ;
      plus : Exp -> Exp -> Exp ;
  }

  concrete EnglishSums of Sums = open ResEng in {
    lincat
      Exp = {s : Str ; n : Number} ;
    lin
      One = expSg "one" ;
      sum x y = expSg ("the" ++ "sum" ++ "of" ++ x.s ++ "and" ++ y.s) ;
  }

  resource ResEng = {
    param
      Number = Sg | Pl ;
    oper
      expSG : Str -> {s : Str ; n : Number} = \s -> {s = s ; n = Sg} ;
  }
</pre>


<!-- NEW -->

<h2>Opening and extending modules</h2>

A <tt>concrete</tt> or <tt>resource</tt> can <b>open</b> a
<tt>resource</tt>. This means that
<ul>
<li> the names defined in <tt>resource</tt> can be used ("become visible")
<li> but: these names are not included in ("exported from") the opening module
</ul>
A module of any type can moreover <b>extend</b> a module of the same type.
This means that
<ul>
<li> the names defined in the extended module can be used ("become visible")
<li> and also: these names are included in ("exported from") the extending module
</ul>
Examples of extension:
<pre>
  abstract Products = Sums ** {
    fun times : Exp -> Exp -> Exp ;
  }
  -- names exported: Exp, plus, times

  concrete English of Products = EnglishSums ** open ResEng in {
    lin times x y = expSg ("the" ++ "product" ++ "of" ++ x.s ++ "and" ++ y.s) ;
  }
</pre>

<p>

Opening, but not extension, can be <b>qualified</b>:
<pre>
  concrete NumberSystems of Systems = open (Bin = Binary), (Dec = Decimal) in {
    lin
      BZero = Bin.Zero ;
      DZero = Dec.Zero
  }
</pre>

<p>

<b>Version 2.1</b> introduces <tt>multiple inheritance</tt>: a module
can extend several modules at the same time, for instance,
<pre>
  abstract Dialogue = User, System ** { ...}
</pre>
may be used to put together "User's moves" and "System's moves" into
one Dialogue System grammar.


<!-- NEW -->

<h2>Compiling modules</h2>

Separate compilation assumes there is <b>one module per file</b>.

<p>

The <b>module header</b> is the beginning of the module code up to the
first left bracket (<tt>{</tt>). The header gives
<ul>
<li> the module type: <tt>abstract</tt>, <tt>concrete</tt> (<tt>of</tt> <i>A</i>),
  or <tt>resource</tt>
<li> the name of the module (next to the module type keyword)
<li> the names of extended modules (between <tt>=</tt> and <tt>**</tt>)
<li> the names of opened modules
</ul>

<p>

<b>filename</b> = <b>modulename</b> <tt>.</tt> <b>extension</b>

<p>

File name extensions:
<ul>
<li> <tt>gf</tt>: GF source file (uses GF syntax, is type checked and compiled)
<li> <tt>gfc</tt>: canonical GF file (uses GFC syntax, is simply read
in instead of compiled; produced from all kinds of modules)
<li> <tt>gfr</tt>: GF resource file (uses GF syntax, is only read in; produced from
<tt>resource</tt> modules)
<li> <tt>gfcm</tt>: canonical multilingual GF file
(uses GFC syntax, is only read in; produced
from a set of <tt>abstract</tt> and <tt>conctrete</tt> modules)
</ul>
Only <tt>gf</tt> files should ever be written/edited manually!

<p>

What the make facility does when compiling <tt>Foo.gf</tt>
<ol>
<li> read the module header of <tt>Foo.gf</tt>, and recursively all headers from
the modules it <b>depends</b> on (i.e. extends or opens)
<li> build a dependency graph of these modules, and do topological sorting
<li> starting from the first module in topological order,
compare the modification times of each <tt>gf</tt> and <tt>gfc</tt> file:
<ul>
<li> if <tt>gf</tt> is later, compile the module and all modules depending on it
<li> if <tt>gfc</tt> is later, just read in the module
</ul>
</ol>
Inside the GF shell, also time stamps of modules read into memory are
taken into account. Thus a module need not be read from a file if the
module is in the memory and the file has not been modified.

<p>

If the compilation of a grammar fails at some module, the state of the
GF shell contains all modules read up to that point. This makes it
faster to compile the faulty module again after fixing it.

<p>

Use the command <tt>po</tt> = <tt>print_options</tt> to see what
modules are in the state.

<p>

To force compilation:
<ul>
<li> The flag <i>-src</i> in the import command forces compilation from
     source even if more recent object files exist. This is useful
     when testing new versions of GF.
<li> The flag <i>-retain</i> in the import command forces reading in
     <tt>gfr</tt> files in addition to <tt>gfc</tt> files. This is useful
     when testing operations with the <tt>cc</tt> command.
</ul>

<!-- NEW -->

<h3>Compiler optimizations</h3>

<b>Version 2.2</b>

<p>

The sometimes exploding size of generated <tt>gfc</tt> and
<tt>gfr</tt> files has made it urgent to find optimizations
that reduce the size of the code. There are five
combinations optimizations that can be chosen, as the value of the
<tt>optimize</tt> flag:
<ul>
<li> <tt>share</tt>: group tables so that common branch values are shared
by the use of disjunctive patterns.
<li> <tt>parametrize</tt>: if table branches differ at most at the
occurrence of the pattern, replace the expanded table by a one-branch
table with a variable. If this fails, perform <tt>share</tt>.
<li> <tt>values</tt>: only show the values of table branches, not the
patterns.
<li> <tt>all</tt>: try <tt>parametrize</tt>; if this fails, do <tt>values</tt>.
<li> <tt>none</tt>: don't do any optimizations
</ul>
The <tt>share</tt> and <tt>parametrize</tt> optimizations are always
just good, whereas the <tt>values</tt> optimization may slow down the
use of the table. However, it is very good for grammars mostly consisting
of the inflection tables of lexical items: it can reduce the file size
by the factor of 4.

<p>

An optimization can be selected individually for each
<tt>resource</tt> and <tt>concrete</tt> module by including
the judgement
<pre>
  flags optimize=(share|parametrize|values|all|none) ;
</pre>
in the module body. These flags can be overridden by a flag given
in the <tt>i</tt> command, e.g.
<pre>
  i -src -optimize=none Foo.gf
</pre>
Notice that the option <tt>-src</tt> is needed if there already are
generated files created with other optimization flags.


<!-- NEW -->

<h2>Module search paths</h2>

Modules can reside in different directories. Use the <tt>path</tt>
flag to extend the directory search path. For instance,
<pre>
  -path=.:../resource/russian:../prelude
</pre>
enables files to be found in three different directories.
By default, only the current directory is included.
If a <tt>path</tt> flag is given, the current directory
<tt>.</tt> must be explicitly included if it is wanted.

<p>

The <tt>path</tt> flag can be set in any of the following
places:
<ul>
<li> when invoking GF: <tt>gf -path=xxx</tt>
<li> when importing a module: <tt>i -path=xxx Foo.gf</tt>
<li> as a pragma in a topmost file: <tt>--# -path=xxx</tt>
</ul>
A flag set on a command line overrides ones set in files.


<!-- NEW -->

<h2>How to use GF 1.* files</h2>

Backward compatibility with respect to old GF grammars has been
a central goal. All GF grammars, from version 0.9, should work in
the old way in GF2. The main exceptions are some features that
are rarely used.
<ul>
<li> The <tt>package</tt> system introduced in GF 1.2, cannot be
     interpreted in the module system of GF 2.0, since packages are in
     mutual scope with the top level.
<li> <tt>tokenizer</tt> pragmas are cannot be parsed any more. In GF
     1.2, they are already replaced by <tt>lexer</tt> flags.
<li> <tt>var</tt> pragmas cannot be parsed any more.
</ul>

<p>

Very old GF grammars (from versions before 0.9), with the completely
different notation, do not work. They should be first converted to
GF1 by using GF version 1.2.

<p>

The import command <tt>i</tt> can be given the option <tt>-old</tt>. E.g.
<pre>
  i -old tut1.Eng.g2
</pre>
But this is no more necessary: GF2 detects automatically if a grammar
is in the GF1 format.

<p>

Importing a set of GF2 files generates, internally, three modules:
<pre>
  abstract tut1 = ...
  resource ResEng = ...
  concrete Eng of tut1 = open ResEng in ...
</pre>
(The names are different if the file name has fewer parts.)


<p>

The option <tt>-o</tt> causes GF2 to write these modules into files.

<p>

The flags <tt>-abs</tt>, <tt>-cnc</tt>, and <tt>-res</tt> can be used
to give custom names to the modules. In particular, it is good to use
the <tt>-abs</tt> flag to guarantee that the abstract syntax module
has the same name for all grammars in a multilingual environmens:
<pre>
  i -old -abs=Numerals hungarian.gf
  i -old -abs=Numerals tamil.gf
  i -old -abs=Numerals sanskrit.gf
</pre>

<p>

The same flags as in the import command can be used when invoking
GF2 from the system shell. Many grammars can be imported on the same command
line, e.g.
<pre>
  % gf2 -old -abs=Tutorial tut1.Eng.gf tut1.Fin.gf tut1.Fra.gf
</pre>

<p>

To write a GF2 grammar back to GF1 (as one big file), use the command
<pre>
  > pg -old
</pre>


<p>


GF2 has more reserved words than GF 1.2. When old files are read, a preprocessor
replaces every identifier that has the shape of a new reserved word
with a variant where the last letter is replaced by <tt>Z</tt>, e.g.
<tt>instance</tt> is replaced by <tt>instancZ</tt>. This method is of course
unsafe and should be replaced by something better.


<!-- NEW -->

<h2>Missing features of GF 1.2 (13/10/2004)</h2>

Generally, GF1 grammars can be automatically translated to GF2, although the
result is not as good
as manual, since indentation and comments are destroyed.
The results can be
saved in GF2 files, but this is not necessary.
Some rarely used GF1 features are no longer supported (see next section).
It is also possible to write a GF2 grammar back to GF1, with the
command <tt>pg -printer=old</tt>.


<p>

Resource libraries
and some example grammars have been
converted. Most old example grammars work without any changes.
However, there is a new resource API with
many new constructions, and which is recommended.

<p>

Soundness checking of module depencencies and completeness is not
complete. This means that some errors may show up too late.

<p>

Latex and XML printing of grammars do not work yet.


</body>
</html>