Grammatical Framework Version 2

Release of Version 2.0

Planned: 24 June 2004

Aarne Ranta

Highlights

Module system.

Separate compilation to canonical GF.

Improved GUI.

Improved parser generation.

Improved shell (new commands and options, help, error messages).

Accurate language specification (also of GFC).

Extended resource library.

Extended Numerals library.

Module system

  • Separate modules for abstract, concrete, and resource.
  • Replaces the file-based include system
  • Name space handling with qualified names
  • Hierarchic structure (single inheritance **) + cross-cutting reuse (open)
  • Separate compilation, one module per file
  • Reuse of abstract+concrete as resource
  • Parametrized modules: interface, instance, incomplete.
  • New experimental module types: transfer, union.

    Canonical format GFC

  • The target of GF compiler; to reuse, just read in.
  • Readable by Haskell/Java/C++/C applications (by BNFC generated parsers).

    New features in expression language

    In addition to the module system:

  • Disjunctive patterns P | ... | Q.
  • String patterns "foo".
  • (?) Integer patterns 74.
  • Binding token &+ to glue separate tokens at unlexing phase, and unlexer to resolve this.
  • New syntax alternatives for local definitions: let without braces and where.
  • Pattern variables can be used on lhs's of oper definitions.
  • New Unicode transliterations (by Harad Hammarström).

    New shell commands and command functionalities

  • pi = print_info: information on an identifier in scope.
  • h = help now in long or short form, and on individual commands.
  • gt = generate_trees: all trees of a given category or instantiations of a given incomplete term, up to a given depth.
  • gr = generate_random can now be given an incomplete term as an argument, to constrain generation.
  • so = show_opers shows all ope operations with a given value type.
  • pm = print_multi prints the multilingual grammar resident in the current state to a ready-compiles .gfcm file.
  • All commands have both long and short names (see help). Short names are easier to type, whereas long names make scripts more readable.
  • Meaningless command options generate warnings.

    New editor features

  • Active text field: click the middle button in the focus to send in refinement through the parser.
  • Clipboard: copy complex terms into the refine menu.
  • Two-step refinements generated by the "Generate" operation.

    Improved implementation

  • Haskell source code is organized into subdirectories.
  • BNF Converter is used for defining the languages GF and GFC, which also give reliable LaTeX documentation.
  • Lexical rules sorted out by option -cflexer for efficient parsing with large lexica.
  • GHC optimizations and strictness flags are used for improving performance.

    New parser (work in progress)

  • By Peter Ljunglöf, based on MCFG.
  • Much more efficient for morphology and discontinuous constituents.
  • Treatment of cyclic rules.
  • Currently lots of alternative parsers via flags -parser=newX.

    Status (21/6/2004)

    Grammar compiler, editor GUIs, and shell work for all platforms (with restrictions for Solaris).

    The updated HelpFile (accessible through h command) marks unsupported features present in GF 1.2 with *. They will be supported again if interested users appear.

    GF1 grammars can be automatically translated to GF2 (although the result is not as good as manual, since indentation and comments are destroyed). The results can be saved in GF2 files, but this is not necessary. Some rarely used GF1 features are no longer supported (see next section).

    It is also possible to write a GF2 grammar back to GF1, with the command pg -printer=old. Resource libraries and some example grammars and have been converted. Most old example grammars work without any changes. There is a new resource API with many new constructions.

    A make facility works, finding out which modules have to be recompiled.

    Soundness checking of module depencencies and completeness is not complete. This means that some errors may show up too late.

    The environment variable GF_LIB_PATH needs some more work.

    Latex and XML printing of grammars do not work yet.

    How to use GF 1.* files

    Backward compatibility with respect to old GF grammars has been a central goal. All GF grammars, from version 0.9, should work in the old way in GF2. The main exceptions are some features that are rarely used.

    Very old GF grammars (from versions before 0.9), with the completely different notation, do not work. They should be first converted to GF1 by using GF version 1.2. The import command i can be given the option -old. E.g.

      i -old tut1.Eng.g2
    
    But this is no more necessary: GF2 detects automatically if a grammar is in the GF1 format.

    Importing a set of GF2 files generates, internally, three modules:

      abstract tut1 = ...
      resource ResEng = ...
      concrete Eng of tut1 = open ResEng in ...
    
    (The names are different if the file name has fewer parts.)

    The option -o causes GF2 to write these modules into files. The flags -abs, -cnc, and -res can be used to give custom names to the modules. In particular, it is good to use the -abs flag to guarantee that the abstract syntax module has the same name for all grammars in a multilingual environmens:

      i -old -abs=Numerals hungarian.gf
      i -old -abs=Numerals tamil.gf
      i -old -abs=Numerals sanskrit.gf
    

    The same flags as in the import command can be used when invoking GF2 from the system shell. Many grammars can be imported on the same command line, e.g.

      % gf2 -old -abs=Tutorial tut1.Eng.gf tut1.Fin.gf tut1.Fra.gf
    

    To write a GF2 grammar back to GF1 (as one big file), use the command

      > pg -old
    
    GF2 has more reserved words than GF 1.2. When old files are read, a preprocessor replaces every identifier that has the shape of a new reserved word with a variant where the last letter is replaced by Z, e.g. instance is replaced by instancZ. This method is of course unsafe and should be replaced by something better.

    Abstract, concrete, and resource modules

    Judgement forms are sorted as follows: Example:
      abstract Sums = {
        cat 
          Exp ;
        fun 
          One : Exp ;
          plus : Exp -> Exp -> Exp ;
      }
    
      concrete EnglishSums of Sums = open ResEng in {
        lincat 
          Exp = {s : Str ; n : Number} ;
        lin
          One = expSg "one" ;
          sum x y = expSg ("the" ++ "sum" ++ "of" ++ x.s ++ "and" ++ y.s) ;
      }
    
      resource ResEng = {
        param 
          Number = Sg | Pl ;
        oper 
          expSG : Str -> {s : Str ; n : Number} = \s -> {s = s ; n = Sg} ;
      }
    

    Opening and extending modules

    A concrete or resource can open a resource. This means that A module of any type can moreover extend a module of the same type. This means that Examples of extension:
      abstract Products = Sums ** {
        fun times : Exp -> Exp -> Exp ;
      }
      -- names exported: Exp, plus, times
    
      concrete English of Products = EnglishSums ** open ResEng in {
        lin times x y = expSg ("the" ++ "product" ++ "of" ++ x.s ++ "and" ++ y.s) ;
      }
    
    Another important difference:
  • extension is single
  • opening can be multiple: open Foo, Bar, Baz in {...} Moreover:
  • opening can be qualified

    Example of qualified opening:

      concrete NumberSystems of Systems = open (Bin = Binary), (Dec = Decimal) in {
        lin 
          BZero = Bin.Zero ;
          DZero = Dec.Zero
      }
    

    Compiling modules

    Separate compilation assumes there is one module per file.

    The module header is the beginning of the module code up to the first left bracket ({). The header gives

    filename = modulename . extension

    File name extensions:

    Only gf files should ever be written/edited manually! What the make facility does when compiling Foo.gf
    1. read the module header of Foo.gf, and recursively all headers from the modules it depends on (i.e. extends or opens)
    2. build a dependency graph of these modules, and do topological sorting
    3. starting from the first module in topological order, compare the modification times of each gf and gfc file:
      • if gf is later, compile the module and all modules depending on it
      • if gfc is later, just read in the module
    Inside the GF shell, also time stamps of modules read into memory are taken into account. Thus a module need not be read from a file if the module is in the memory and the file has not been modified. If the compilation of a grammar fails at some module, the state of the GF shell contains all modules read up to that point. This makes it faster to compile the faulty module again after fixing it.

    Use the command po = print_options to see what modules are in the state.

    To force compilation:

    Module search paths

    Modules can reside in different directories. Use the path flag to extend the directory search path. For instance,
      -path=.:../resource/russian:../prelude
    
    enables files to be found in three different directories. By default, only the current directory is included. If a path flag is given, the current directory . must be explicitly included if it is wanted.

    The path flag can be set in any of the following places:

    A flag set on a command line overrides ones set in files.

    The value of the environment variable GF_LIB_PATH is appended to the user-given path.

    To do

    Testing

    Documentation

    Packaging

    Nasty details

  • Readline in Solaris
  • Proper treatment file search paths
  • Unicode fonts in GUIs
  • directionality of Semitic alphabets