Grammatical Framework Version 2

Highlights, preliminary version

13/10/2003 - 25/11 - 24/3/2004

Aarne Ranta

Syntax of GF

An accurate language specification is now available.

Summary of novelties

Module system

  • Separate modules for abstract, concrete, and resource.
  • Replaces the file-based include system
  • Name space handling with qualified names
  • Hierarchic structure (extend) + cross-cutting reuse (open)
  • Separate compilation, one module per file
  • Reuse of abstract+concrete as resource
  • New module types: interface, instance, incomplete.
  • New experimental module types: transfer, union.

    Canonical format GFC

  • The target of GF compiler; to reuse, just read in
  • Readable by Haskell/Java/C++/C applications

    New features in expression language

  • Disjunctive patterns P | ... | Q.
  • String patterns "foo".
  • Binding token &+ to glue separate tokens at unlexing phase, and unlexer to resolve this.
  • New syntax alternatives for local definitions: let without braces and where.
  • Pattern variables can be used on lhs's of oper definitions.
  • New Unicode transliterations (by Harad Hammarström).

    New parser (forthcoming)

  • By Peter Ljunglöf, based on MCFG
  • Much more efficient for morphology and discontinuous constituents
  • Treatment of cyclic rules

    New editor features

  • Active text field (forthcoming, by Janna Khegai)
  • Clipboard

    Improved implementation

  • Haskell source code organized into subdirectories.
  • BNF Converter used for defining the languages GF and GFC, which also give reliable LaTeX documentation.
  • Lexican rules sorted out by option -cflexer for efficient parsing with large lexica.

    Status (24/3/2004)

    Grammar compiler, editor GUIs, and shell work.

    The updated HelpFile (accessible through h command) marks unsupported but expected features with *.

    GF1 grammars can be automatically translated to GF2 (although result not as good as manual, since indentation and comments are destroyed). The results can be saved in GF2 files, but this is not necessary.

    It is also possible to write a GF2 grammar back to GF1.

    Example grammars and resource libraries are have been converted. There is a new resource API with many new constructions. The new versions lie in grammars/newresource.

    A make facility works, finding out which modules have to be recompiled. There is some room for improvement.

    transfer modules have to be called by flags.

    Soundness checking of module depencencies and completeness is not complete. This means that some errors may show up too late.

    How to use GF 1.* files

    The import command i is given the option -old. E.g.
      i -old tut1.Eng.g2
    
    This generates, internally, three modules:
      abstract tut1 = ...
      resource ResEng = ...
      concrete Eng of tut1 = open ResEng in ...
    
    (The names are different if the file name has fewer parts.)

    The option -o causes GF2 to write these modules into files.

    The flags -abs, -cnc, and -res can be used to give custom names to the modules. In particular, it is good to use the -abs flag to guarantee that the abstract syntax module has the same name for all grammars in a multilingual environmens:

      i -old -abs=Numerals hungarian.gf
      i -old -abs=Numerals tamil.gf
      i -old -abs=Numerals sanskrit.gf
    

    The same flags as in the import command can be used when invoking GF2 from the system shell. Many grammars can be imported on the same command line, e.g.

      % gf2 -old -abs=Tutorial tut1.Eng.gf tut1.Fin.gf tut1.Fra.gf
    

    To write a GF2 grammar back to GF1 (as one big file), use the command

      > pg -old
    

    GF2 has more reserved words than GF 1.2. When old files are read, a preprocessor replaces every identifier that has the shape of a new reserved word with a variant where the last letter is replaced by Z, e.g. instance is replaced by instancZ. This method is of course unsafe and should be replaced by something better.

    Abstract, concrete, and resource modules

    Judgement forms are sorted as follows: Example:
      abstract Sums = {
        cat 
          Exp ;
        fun 
          One : Exp ;
          plus : Exp -> Exp -> Exp ;
      }
    
      concrete EnglishSums of Sums = open ResEng in {
        lincat 
          Exp = {s : Str ; n : Number} ;
        lin
          One = expSg "one" ;
          sum x y = expSg ("the" ++ "sum" ++ "of" ++ x.s ++ "and" ++ y.s) ;
      }
    
      resource ResEng = {
        param 
          Number = Sg | Pl ;
        oper 
          expSG : Str -> {s : Str ; n : Number} = \s -> {s = s ; n = Sg} ;
      }
    

    Opening and extending modules

    A concrete or resource can open a resource. This means that A module of any type can moreover extend a module of the same type. This means that Examples of extension:
      abstract Products = Sums ** {
        fun times : Exp -> Exp -> Exp ;
      }
      -- names exported: Exp, plus, times
    
      concrete English of Products = EnglishSums ** open ResEng in {
        lin times x y = expSg ("the" ++ "product" ++ "of" ++ x.s ++ "and" ++ y.s) ;
      }
    
    Another important difference:
  • extension is single
  • opening can be multiple: open Foo, Bar, Baz in {...}

    Moreover:

  • opening can be qualified

    Example of qualified opening:

      concrete NumberSystems of Systems = open (Bin = Binary), (Dec = Decimal) in {
        lin 
          BZero = Bin.Zero ;
          DZero = Dec.Zero
      }
    

    Compiling modules

    Separate compilation assumes there is one module per file.

    The module header is the beginning of the module code up to the first left bracket ({). The header gives

    filename = modulename . extension

    File name extensions:

    Only gf files should ever be written/edited manually!

    What the make facility does when compiling Foo.gf

    1. read the module header of Foo.gf, and recursively all headers from the modules it depends on (i.e. extends or opens)
    2. build a dependency graph of these modules, and do topological sorting
    3. starting from the first module in topological order, compare the modification times of each gf and gfc file:
      • if gf is later, compile the module and all modules depending on it
      • if gfc is later, just read in the module
    Inside the GF shell, also time stamps of modules read into memory are taken into account. Thus a module need not be read from a file if the module is in the memory and the file has not been modified.