diff --git a/doc/release2.html b/doc/release2.html new file mode 100644 index 000000000..d34b49cc1 --- /dev/null +++ b/doc/release2.html @@ -0,0 +1,546 @@ + + + + +
+ +

Grammatical Framework Version 2

+ +Release of Version 2.0 + +

+ +Planned: 24 June 2004 + +

+ +Aarne Ranta + +

+ + + + +

Highlights

+ +Module system. + +

+ +Separate compilation to canonical GF. + +

+ +Improved GUI. + +

+ +Improved parser generation. + +

+ +Improved shell (new commands and options, help, error messages). + +

+ +Accurate language specification +(also of GFC). + +

+ +Extended resource library. + +

+ +Extended Numerals library. + + + + + + +

Module system

+ +
  • Separate modules for abstract, + concrete, and resource. +
  • Replaces the file-based include system +
  • Name space handling with qualified names +
  • Hierarchic structure (single inheritance **) + + cross-cutting reuse (open) +
  • Separate compilation, one module per file +
  • Reuse of abstract+concrete as resource +
  • Parametrized modules: + interface, instance, incomplete. +
  • New experimental module types: transfer, + union. + + + + +

    Canonical format GFC

    + +
  • The target of GF compiler; to reuse, just read in. + +
  • Readable by Haskell/Java/C++/C applications (by BNFC generated parsers). + + + + + +

    New features in expression language

    + +In addition to the module system: + +

    + +

  • Disjunctive patterns P | ... | Q. +
  • String patterns "foo". +
  • (?) Integer patterns 74. +
  • Binding token &+ to glue separate tokens at unlexing phase, + and unlexer to resolve this. +
  • New syntax alternatives for local definitions: let without + braces and where. +
  • Pattern variables can be used on lhs's of oper definitions. +
  • New Unicode transliterations (by Harad Hammarström). + + + + +

    New shell commands and command functionalities

    + +
  • pi = print_info: information on an identifier in scope. +
  • h = help now in long or short form, + and on individual commands. +
  • gt = generate_trees: all trees of a given + category or instantiations of a given incomplete term, up to a + given depth. +
  • gr = generate_random can now be given + an incomplete term as an argument, to constrain generation. +
  • so = show_opers shows all ope + operations with a given value type. +
  • pm = print_multi prints the multilingual + grammar resident in the current state to a ready-compiles + .gfcm file. +
  • All commands have both long and short names (see help). Short + names are easier to type, whereas long names + make scripts more readable. +
  • Meaningless command options generate warnings. + + + + +

    New editor features

    + +
  • Active text field: click the middle button in the focus to send + in refinement through the parser. +
  • Clipboard: copy complex terms into the refine menu. +
  • Two-step refinements generated by the "Generate" operation. + + + +

    Improved implementation

    + +
  • Haskell source code is organized into subdirectories. +
  • BNF Converter is used for defining the languages GF and GFC, which also + give reliable LaTeX documentation. +
  • Lexical rules sorted out by option -cflexer for efficient + parsing with large lexica. +
  • GHC optimizations and strictness flags are used for improving performance. + + + + +

    New parser (work in progress)

    + +
  • By Peter Ljunglöf, based on MCFG. +
  • Much more efficient for morphology and discontinuous constituents. +
  • Treatment of cyclic rules. +
  • Currently lots of alternative parsers via flags -parser=newX. + + + + +

    Status (21/6/2004)

    + +Grammar compiler, editor GUIs, and shell work for all platforms +(with restrictions for Solaris). + +

    + +The updated HelpFile (accessible through h command) +marks unsupported features present in GF 1.2 with *. +They will be supported again if interested users appear. + +

    + +GF1 grammars can be automatically translated to GF2 (although the +result is not as good +as manual, since indentation and comments are destroyed). The results can be +saved in GF2 files, but this is not necessary. +Some rarely used GF1 features are no longer supported (see next section). + +

    + +It is also possible to write a GF2 grammar back to GF1, with the +command pg -printer=old. + + + + +Resource libraries +and some example grammars and have been +converted. Most old example grammars work without any changes. +There is a new resource API with +many new constructions. + +

    + +A make facility works, finding out which modules have to be recompiled. + +

    + +Soundness checking of module depencencies and completeness is not +complete. This means that some errors may show up too late. + +

    + +The environment variable GF_LIB_PATH needs some more work. + +

    + +Latex and XML printing of grammars do not work yet. + + + + + +

    How to use GF 1.* files

    + +Backward compatibility with respect to old GF grammars has been +a central goal. All GF grammars, from version 0.9, should work in +the old way in GF2. The main exceptions are some features that +are rarely used. + + +

    + +Very old GF grammars (from versions before 0.9), with the completely +different notation, do not work. They should be first converted to +GF1 by using GF version 1.2. + + + + + +The import command i can be given the option -old. E.g. +

    +  i -old tut1.Eng.g2
    +
    +But this is no more necessary: GF2 detects automatically if a grammar +is in the GF1 format. + +

    + +Importing a set of GF2 files generates, internally, three modules: +

    +  abstract tut1 = ...
    +  resource ResEng = ...
    +  concrete Eng of tut1 = open ResEng in ...
    +
    +(The names are different if the file name has fewer parts.) + + +

    + +The option -o causes GF2 to write these modules into files. + + + + +The flags -abs, -cnc, and -res can be used +to give custom names to the modules. In particular, it is good to use +the -abs flag to guarantee that the abstract syntax module +has the same name for all grammars in a multilingual environmens: +

    +  i -old -abs=Numerals hungarian.gf
    +  i -old -abs=Numerals tamil.gf
    +  i -old -abs=Numerals sanskrit.gf
    +
    + +

    + +The same flags as in the import command can be used when invoking +GF2 from the system shell. Many grammars can be imported on the same command +line, e.g. +

    +  % gf2 -old -abs=Tutorial tut1.Eng.gf tut1.Fin.gf tut1.Fra.gf
    +
    + +

    + +To write a GF2 grammar back to GF1 (as one big file), use the command +

    +  > pg -old
    +
    + + + + + + + +GF2 has more reserved words than GF 1.2. When old files are read, a preprocessor +replaces every identifier that has the shape of a new reserved word +with a variant where the last letter is replaced by Z, e.g. +instance is replaced by instancZ. This method is of course +unsafe and should be replaced by something better. + + + + + + +

    Abstract, concrete, and resource modules

    + +Judgement forms are sorted as follows: + + + + + +Example: +
    +  abstract Sums = {
    +    cat 
    +      Exp ;
    +    fun 
    +      One : Exp ;
    +      plus : Exp -> Exp -> Exp ;
    +  }
    +
    +  concrete EnglishSums of Sums = open ResEng in {
    +    lincat 
    +      Exp = {s : Str ; n : Number} ;
    +    lin
    +      One = expSg "one" ;
    +      sum x y = expSg ("the" ++ "sum" ++ "of" ++ x.s ++ "and" ++ y.s) ;
    +  }
    +
    +  resource ResEng = {
    +    param 
    +      Number = Sg | Pl ;
    +    oper 
    +      expSG : Str -> {s : Str ; n : Number} = \s -> {s = s ; n = Sg} ;
    +  }
    +
    + + + + + +

    Opening and extending modules

    + +A concrete or resource can open a +resource. This means that + +A module of any type can moreover extend a module of the same type. +This means that + +Examples of extension: +
    +  abstract Products = Sums ** {
    +    fun times : Exp -> Exp -> Exp ;
    +  }
    +  -- names exported: Exp, plus, times
    +
    +  concrete English of Products = EnglishSums ** open ResEng in {
    +    lin times x y = expSg ("the" ++ "product" ++ "of" ++ x.s ++ "and" ++ y.s) ;
    +  }
    +
    +Another important difference: +
  • extension is single +
  • opening can be multiple: open Foo, Bar, Baz in {...} + + + +Moreover: +
  • opening can be qualified +

    +Example of qualified opening: +

    +  concrete NumberSystems of Systems = open (Bin = Binary), (Dec = Decimal) in {
    +    lin 
    +      BZero = Bin.Zero ;
    +      DZero = Dec.Zero
    +  }
    +
    + + + + +

    Compiling modules

    + +Separate compilation assumes there is one module per file. + +

    + +The module header is the beginning of the module code up to the +first left bracket ({). The header gives +

    + + + + +filename = modulename . extension + +

    + +File name extensions: +

    +Only gf files should ever be written/edited manually! + + + + + + +What the make facility does when compiling Foo.gf +
      +
    1. read the module header of Foo.gf, and recursively all headers from +the modules it depends on (i.e. extends or opens) +
    2. build a dependency graph of these modules, and do topological sorting +
    3. starting from the first module in topological order, +compare the modification times of each gf and gfc file: +
        +
      • if gf is later, compile the module and all modules depending on it +
      • if gfc is later, just read in the module +
      +
    +Inside the GF shell, also time stamps of modules read into memory are +taken into account. Thus a module need not be read from a file if the +module is in the memory and the file has not been modified. + + + + +If the compilation of a grammar fails at some module, the state of the +GF shell contains all modules read up to that point. This makes it +faster to compile the faulty module again after fixing it. + +

    + +Use the command po = print_options to see what +modules are in the state. + +

    + +To force compilation: +

    + + + +

    Module search paths

    + +Modules can reside in different directories. Use the path +flag to extend the directory search path. For instance, +
    +  -path=.:../resource/russian:../prelude
    +
    +enables files to be found in three different directories. +By default, only the current directory is included. +If a path flag is given, the current directory +. must be explicitly included if it is wanted. + +

    + +The path flag can be set in any of the following +places: +

    +A flag set on a command line overrides ones set in files. + +

    + +The value of the environment variable GF_LIB_PATH is +appended to the user-given path. + + + + +

    To do

    + +Testing + +

    + +Documentation + +

    + +Packaging + + + + + +

    Nasty details

    + + +
  • Readline in Solaris + +
  • Proper treatment file search paths + +
  • Unicode fonts in GUIs + +
  • directionality of Semitic alphabets + + + + + diff --git a/src/GF/API.hs b/src/GF/API.hs index c3d160bcd..ca97af146 100644 --- a/src/GF/API.hs +++ b/src/GF/API.hs @@ -148,8 +148,9 @@ string2srcTerm gr m s = do randomTreesIO :: Options -> GFGrammar -> Int -> IO [Tree] randomTreesIO opts gr n = do gen <- myStdGen mx - t <- err (\s -> putStrLnFlush s >> return []) (return . singleton) $ - mkRandomTree gen mx g catfun + t <- err (\s -> putS s >> return []) + (return . singleton) $ + mkRandomTree gen mx g catfun ts <- if n==1 then return [] else randomTreesIO opts gr (n-1) return $ t ++ ts where @@ -158,6 +159,8 @@ randomTreesIO opts gr n = do _ -> Left $ firstAbsCat opts gr g = grammar gr mx = optIntOrN opts flagDepth 41 + putS s = if oElem beSilent opts then return () else putStrLnFlush s + generateTrees :: Options -> GFGrammar -> Maybe Tree -> [Tree] generateTrees opts gr mt = diff --git a/src/GF/Compile/Compile.hs b/src/GF/Compile/Compile.hs index fa2e65a3c..78f3a1bb1 100644 --- a/src/GF/Compile/Compile.hs +++ b/src/GF/Compile/Compile.hs @@ -35,6 +35,10 @@ import Arch import Monad +-- environment variable for grammar search path + +gfGrammarPathVar = "GF_LIB_PATH" + -- in batch mode: write code in a file batchCompile f = liftM fst $ compileModule defOpts emptyShellState f @@ -86,9 +90,10 @@ compileModule opts1 st0 file = do let opts = addOptions opts1 opts0 let ps0 = pathListOpts opts let fpath = justInitPath file - let ps = if useFileOpt - then (map (prefixPathName fpath) ps0) - else ps0 + let ps1 = if useFileOpt + then (map (prefixPathName fpath) ps0) + else ps0 + ps <- ioeIO $ extendPathEnv gfGrammarPathVar ps1 let ioeIOIf = if oElem beSilent opts then (const (return ())) else ioeIO ioeIOIf $ putStrLn $ "module search path:" +++ show ps ---- let putp = putPointE opts diff --git a/src/GF/Infra/UseIO.hs b/src/GF/Infra/UseIO.hs index 243ead306..3dc41fadc 100644 --- a/src/GF/Infra/UseIO.hs +++ b/src/GF/Infra/UseIO.hs @@ -81,6 +81,13 @@ doesFileExistPath paths file = do mpfile <- ioeIO $ getFilePath paths file return $ maybe False (const True) mpfile +-- path in environment variable has lower priority +extendPathEnv :: String -> [FilePath] -> IO [FilePath] +extendPathEnv var ps = do + s <- catch (getEnv var) (const (return "")) + let fs = pFilePaths s + return $ ps ++ fs + pFilePaths :: String -> [FilePath] pFilePaths s = case span (/=':') s of (f,_:cs) -> f : pFilePaths cs diff --git a/src/GF/Shell/TeachYourself.hs b/src/GF/Shell/TeachYourself.hs index 623bd7b72..e3576e7ed 100644 --- a/src/GF/Shell/TeachYourself.hs +++ b/src/GF/Shell/TeachYourself.hs @@ -24,7 +24,7 @@ teachTranslation opts ig og = do transTrainList :: Options -> GFGrammar -> GFGrammar -> Integer -> IO [(String,[String])] transTrainList opts ig og number = do - ts <- randomTreesIO opts ig (fromInteger number) + ts <- randomTreesIO (addOption beSilent opts) ig (fromInteger number) return $ map mkOne $ ts where cat = firstCatOpts opts ig @@ -39,7 +39,7 @@ teachMorpho opts ig = useIOE () $ do morphoTrainList :: Options -> GFGrammar -> Integer -> IOE [(String,[String])] morphoTrainList opts ig number = do - ts <- ioeIO $ randomTreesIO opts ig (fromInteger number) + ts <- ioeIO $ randomTreesIO (addOption beSilent opts) ig (fromInteger number) gen <- ioeIO $ myStdGen (fromInteger number) mkOnes gen ts where @@ -49,9 +49,9 @@ morphoTrainList opts ig number = do let (i,gen') = randomR (0, length pss - 1) gen (ps,ss) <- ioeErr $ pss !? i (_,ss0) <- ioeErr $ pss !? 0 - let bas = concat $ take 1 ss0 + let bas = unwords ss0 --- concat $ take 1 ss0 more <- mkOnes gen' ts - return $ (bas +++ ":" +++ unwords (map prt_ ps), return (concat ss)) : more + return $ (bas +++ ":" +++ unwords (map prt_ ps), return (unwords ss)) : more mkOnes gen [] = return [] gr = grammar ig