diff --git a/doc/gf-course.html b/doc/gf-course.html deleted file mode 100644 index 039bbe72c..000000000 --- a/doc/gf-course.html +++ /dev/null @@ -1,221 +0,0 @@ - - - - -Graduate Course: GF (Grammatical Framework) - -

Graduate Course: GF (Grammatical Framework)

- -Aarne Ranta
-Wed Oct 24 09:49:27 2007 -
- -

-GSLT, -NGSLT, -and -Department of Computer Science and Engineering, -Chalmers University of Technology and Gothenburg University. -

-

-Autumn Term 2007. -

-

News

-

-24/10 Tomorrow's session starts at 8.15. A detailed plan has been added to -the table below. Material (new chapters) will appear later today. -It will explain some of the files in -

- - -

-12/9 The course starts tomorrow at 8.00. A detailed plan for the day is -right below. Don't forget to -

- - -

-31/8 Revised the description of the one- and five-point variants. -

-

-21/8 Course mailing list started. -To subscribe, send a mail to gf-subscribe at gslt hum gu se -(replacing spaces by dots except around the word at, where the spaces -are just removed, and the word itself is replaced by the at symbol). -

-

-20/8/2007 Schedule. -The course will start on Thursday 13 September in Room C430 at the Humanities -Building of Gothenburg University ("Humanisten"). -

-

Plan

-

-First week (13-14/9) -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
TimeSubjectAssignment
Thu 8.00-9.30Chapters 1-3Hello and Food in a new language
Thu 10.00-11.30Chapters 3-4Foods in a new language
Thu 13.15-14.45Chapter 5ExtFoods in a new language
Thu 15.15-16.45Chapters 6-7straight code compiler
Fri 8.00-9.30Chapters 8application in Haskell or Java
- -

-

-Second week (25/10) -

- - - - - - - - - - - - - - - - - - - - - -
TimeSubjectAssignment
Thu 8.15-9.45Chapters 13-15mini resource in a new language
Thu 10.15-11.45Chapters 12,16query system for a new domain
Thu 13.15-14.45presentationsexplain your own project
- -

-

-The structure of each lecture will be the following: -

- - -

-In order for this to work out, it is important that enough many -have a working GF installation, including the directory -examples/tutorial. This directory is -included in the Darcs version, as well as in the updated binary -packages from 12 September. -

-

Purpose

-

-GF -(Grammatical Framework) is a grammar formalism, i.e. a special-purpose -programming language for writing grammars. It is suitable for many -natural language processing tasks, in particular, -

- - -

-The goal of the course is to develop an understanding of GF and -practical skills in using it. -

-

Contents

-

-The course consists of two modules. The first module is a one-week -intensive course (during the first intensive week of GSLT), which -is as such usable as a one-week intensive course for doctoral studies, -if completed with a small course project. -

-

-The second module is a larger programming project, written -by each student (possibly working in groups) during the Autumn term. -The projects are discussed during the second intensive week of GSLT -(see schedule), -and presented at a date that will be set later. -

-

-The first module goes through the basics of GF, including -

- - -

-The lectures follow a draft of GF book. It contains a heavily updated -version os the -GF Tutorial; -thus the on-line tutorial is not adequate for this course. To get the course -book, join the course mailing list. -

-

-Those who just want to do the first module will write a simple application -as their course work during and after the first intensive week. -

-

-Those who continue with the second module will choose a more substantial -project. Possible topics are -

- - -

Prerequisites

-

-Experience in programming. No earlier natural language processing -or functional programming experience is necessary. -

-

-The course is thus suitable both for GSLT and NGSLT students, -and for graduate students in computer science. -

-

-We will in particular welcome students from the Baltic countries -who wish to build resources for their own language in GF. -

- - - - diff --git a/doc/gf-course.txt b/doc/gf-course.txt deleted file mode 100644 index 846186049..000000000 --- a/doc/gf-course.txt +++ /dev/null @@ -1,149 +0,0 @@ -Graduate Course: GF (Grammatical Framework) -Aarne Ranta -%%date(%c) - -% NOTE: this is a txt2tags file. -% Create an html file from this file using: -% txt2tags -thtml --toc gf-reference.html - -%!target:html - -[GSLT http://www.gslt.hum.gu.se], -[NGSLT http://ngslt.org/], -and -[Department of Computer Science and Engineering http://www.chalmers.se/cse/EN/], -Chalmers University of Technology and Gothenburg University. - -Autumn Term 2007. - - -=News= - -24/10 Tomorrow's session starts at 8.15. A detailed plan has been added to -the table below. Material (new chapters) will appear later today. -It will explain some of the files in -- [``syntax/`` http://digitalgrammars.com/gf/examples/tutorial/syntax/]: - linguistic grammar programming -- [``semantics/`` http://digitalgrammars.com/gf/examples/tutorial/semantics/]: - a question-answer system based on logical semantics - - - -12/9 The course starts tomorrow at 8.00. A detailed plan for the day is -right below. Don't forget to -- join the mailing list (send a mail to ``gf-subscribe at gslt hum gu se``) -- install GF on your laptops from [here ../download.html] -- take with you a copy of the book (as sent to the mailing list yesterday) - - -31/8 Revised the description of the one- and five-point variants. - -21/8 Course mailing list started. -To subscribe, send a mail to ``gf-subscribe at gslt hum gu se`` -(replacing spaces by dots except around the word at, where the spaces -are just removed, and the word itself is replaced by the at symbol). - -20/8/2007 [Schedule http://www.gslt.hum.gu.se/courses/schedule.html]. -The course will start on Thursday 13 September in Room C430 at the Humanities -Building of Gothenburg University ("Humanisten"). - - -=Plan= - -First week (13-14/9) - -|| Time | Subject | Assignment || -| Thu 8.00-9.30 | Chapters 1-3 | Hello and Food in a new language | -| Thu 10.00-11.30 | Chapters 3-4 | Foods in a new language | -| Thu 13.15-14.45 | Chapter 5 | ExtFoods in a new language | -| Thu 15.15-16.45 | Chapters 6-7 | straight code compiler | -| Fri 8.00-9.30 | Chapters 8 | application in Haskell or Java | - -Second week (25/10) - -|| Time | Subject | Assignment || -| Thu 8.15-9.45 | Chapters 13-15 | mini resource in a new language | -| Thu 10.15-11.45 | Chapters 12,16 | query system for a new domain | -| Thu 13.15-14.45 | presentations | explain your own project | - - - -The structure of each lecture will be the following: -- ca. 75min lecture, going through the book -- ca. 15min work on computer, individually or in pairs - - -In order for this to work out, it is important that enough many -have a working GF installation, including the directory -[``examples/tutorial`` ../examples/tutorial]. This directory is -included in the Darcs version, as well as in the updated binary -packages from 12 September. - - - -=Purpose= - -[GF http://www.cs.chalmers.se/~aarne/GF/] -(Grammatical Framework) is a grammar formalism, i.e. a special-purpose -programming language for writing grammars. It is suitable for many -natural language processing tasks, in particular, -- multilingual applications -- systems where grammar-based components are needed for e.g. - parsing, translation, or speech recognition - - -The goal of the course is to develop an understanding of GF and -practical skills in using it. - - -=Contents= - -The course consists of two modules. The first module is a one-week -intensive course (during the first intensive week of GSLT), which -is as such usable as a one-week intensive course for doctoral studies, -if completed with a small course project. - -The second module is a larger programming project, written -by each student (possibly working in groups) during the Autumn term. -The projects are discussed during the second intensive week of GSLT -(see [schedule http://www.gslt.hum.gu.se/courses/schedule.html]), -and presented at a date that will be set later. - -The first module goes through the basics of GF, including -- using the GF programming language -- writing multilingual grammars -- using the - [GF resource grammar library http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/doc/] -- generating speech recognition systems from GF grammars -- using embedded grammars as components of software systems - - -The lectures follow a draft of GF book. It contains a heavily updated -version os the -[GF Tutorial http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html]; -thus the on-line tutorial is not adequate for this course. To get the course -book, join the course mailing list. - -Those who just want to do the first module will write a simple application -as their course work during and after the first intensive week. - -Those who continue with the second module will choose a more substantial -project. Possible topics are -- building a dialogue system by using GF -- implementing a multilingual document generator -- experimenting with synthetized multilingual tree banks -- extending the GF resource grammar library - - - -=Prerequisites= - -Experience in programming. No earlier natural language processing -or functional programming experience is necessary. - -The course is thus suitable both for GSLT and NGSLT students, -and for graduate students in computer science. - -We will in particular welcome students from the Baltic countries -who wish to build resources for their own language in GF. - diff --git a/doc/gf-help.txt b/doc/gf-help.txt deleted file mode 100644 index d77e9aff7..000000000 --- a/doc/gf-help.txt +++ /dev/null @@ -1,699 +0,0 @@ -=GF Command Help= - -Each command has a long and a short name, options, and zero or more -arguments. Commands are sorted by functionality. The short name is -given first. - -Commands and options marked with * are currently not implemented. - -==Commands that change the state== - -``` -i, import: i File - Reads a grammar from File and compiles it into a GF runtime grammar. - Files "include"d in File are read recursively, nubbing repetitions. - If a grammar with the same language name is already in the state, - it is overwritten - but only if compilation succeeds. - The grammar parser depends on the file name suffix: - .gf normal GF source - .gfc canonical GF - .gfr precompiled GF resource - .gfcm multilingual canonical GF - .gfe example-based grammar files (only with the -ex option) - .gfwl multilingual word list (preprocessed to abs + cncs) - .ebnf Extended BNF format - .cf Context-free (BNF) format - .trc TransferCore format - options: - -old old: parse in GF<2.0 format (not necessary) - -v verbose: give lots of messages - -s silent: don't give error messages - -src from source: ignore precompiled gfc and gfr files - -gfc from gfc: use compiled modules whenever they exist - -retain retain operations: read resource modules (needed in comm cc) - -nocf don't build old-style context-free grammar (default without HOAS) - -docf do build old-style context-free grammar (default with HOAS) - -nocheckcirc don't eliminate circular rules from CF - -cflexer build an optimized parser with separate lexer trie - -noemit do not emit code (default with old grammar format) - -o do emit code (default with new grammar format) - -ex preprocess .gfe files if needed - -prob read probabilities from top grammar file (format --# prob Fun Double) - -treebank read a treebank file to memory (xml format) - flags: - -abs set the name used for abstract syntax (with -old option) - -cnc set the name used for concrete syntax (with -old option) - -res set the name used for resource (with -old option) - -path use the (colon-separated) search path to find modules - -optimize select an optimization to override file-defined flags - -conversion select parsing method (values strict|nondet) - -probs read probabilities from file (format (--# prob) Fun Double) - -preproc use a preprocessor on each source file - -noparse read nonparsable functions from file (format --# noparse Funs) - examples: - i English.gf -- ordinary import of Concrete - i -retain german/ParadigmsGer.gf -- import of Resource to test - -r, reload: r - Executes the previous import (i) command. - -rl, remove_language: rl Language - Takes away the language from the state. - -e, empty: e - Takes away all languages and resets all global flags. - -sf, set_flags: sf Flag* - The values of the Flags are set for Language. If no language - is specified, the flags are set globally. - examples: - sf -nocpu -- stop showing CPU time - sf -lang=Swe -- make Swe the default concrete - -s, strip: s - Prune the state by removing source and resource modules. - -dc, define_command Name Anything - Add a new defined command. The Name must star with '%'. Later, - if 'Name X' is used, it is replaced by Anything where #1 is replaced - by X. - Restrictions: Currently at most one argument is possible, and a defined - command cannot appear in a pipe. - To see what definitions are in scope, use help -defs. - examples: - dc %tnp p -cat=NP -lang=Eng #1 | l -lang=Swe -- translate NPs - %tnp "this man" -- translate and parse - -dt, define_term Name Tree - Add a constant for a tree. The constant can later be called by - prefixing it with '$'. - Restriction: These terms are not yet usable as a subterm. - To see what definitions are in scope, use help -defs. - examples: - p -cat=NP "this man" | dt tm -- define tm as parse result - l -all $tm -- linearize tm in all forms -``` - -==Commands that give information about the state== - -``` -pg, print_grammar: pg - Prints the actual grammar (overridden by the -lang=X flag). - The -printer=X flag sets the format in which the grammar is - written. - N.B. since grammars are compiled when imported, this command - generally does not show the grammar in the same format as the - source. In particular, the -printer=latex is not supported. - Use the command tg -printer=latex File to print the source - grammar in LaTeX. - options: - -utf8 apply UTF8-encoding to the grammar - flags: - -printer - -lang - -startcat -- The start category of the generated grammar. - Only supported by some grammar printers. - examples: - pg -printer=cf -- show the context-free skeleton - -pm, print_multigrammar: pm - Prints the current multilingual grammar in .gfcm form. - (Automatically executes the strip command (s) before doing this.) - options: - -utf8 apply UTF8 encoding to the tokens in the grammar - -utf8id apply UTF8 encoding to the identifiers in the grammar - examples: - pm | wf Letter.gfcm -- print the grammar into the file Letter.gfcm - pm -printer=graph | wf D.dot -- then do 'dot -Tps D.dot > D.ps' - -vg, visualize_graph: vg - Show the dependency graph of multilingual grammar via dot and gv. - -po, print_options: po - Print what modules there are in the state. Also - prints those flag values in the current state that differ from defaults. - -pl, print_languages: pl - Prints the names of currently available languages. - -pi, print_info: pi Ident - Prints information on the identifier. -``` - -==Commands that execute and show the session history== - -``` -eh, execute_history: eh File - Executes commands in the file. - -ph, print_history; ph - Prints the commands issued during the GF session. - The result is readable by the eh command. - examples: - ph | wf foo.hist" -- save the history into a file -``` - - -==Linearization, parsing, translation, and computation== - -``` -l, linearize: l PattList? Tree - Shows all linearization forms of Tree by the actual grammar - (which is overridden by the -lang flag). - The pattern list has the form [P, ... ,Q] where P,...,Q follow GF - syntax for patterns. All those forms are generated that match with the - pattern list. Too short lists are filled with variables in the end. - Only the -table flag is available if a pattern list is specified. - HINT: see GF language specification for the syntax of Pattern and Term. - You can also copy and past parsing results. - options: - -struct bracketed form - -table show parameters (not compatible with -record, -all) - -record record, i.e. explicit GF concrete syntax term (not compatible with -table, -all) - -all show all forms and variants (not compatible with -record, -table) - -multi linearize to all languages (can be combined with the other options) - flags: - -lang linearize in this grammar - -number give this number of forms at most - -unlexer filter output through unlexer - examples: - l -lang=Swe -table -- show full inflection table in Swe - -p, parse: p String - Shows all Trees returned for String by the actual - grammar (overridden by the -lang flag), in the category S (overridden - by the -cat flag). - options for batch input: - -lines parse each line of input separately, ignoring empty lines - -all as -lines, but also parse empty lines - -prob rank results by probability - -cut stop after first lexing result leading to parser success - -fail show strings whose parse fails prefixed by #FAIL - -ambiguous show strings that have more than one parse prefixed by #AMBIGUOUS - options for selecting parsing method: - -fcfg parse using a fast variant of MCFG (default is no HOAS in grammar) - -old parse using an overgenerating CFG (default if HOAS in grammar) - -cfg parse using a much less overgenerating CFG - -mcfg parse using an even less overgenerating MCFG - Note: the first time parsing with -cfg, -mcfg, and -fcfg may take a long time - options that only work for the -old default parsing method: - -n non-strict: tolerates morphological errors - -ign ignore unknown words when parsing - -raw return context-free terms in raw form - -v verbose: give more information if parsing fails - flags: - -cat parse in this category - -lang parse in this grammar - -lexer filter input through this lexer - -parser use this parsing strategy - -number return this many results at most - examples: - p -cat=S -mcfg "jag är gammal" -- parse an S with the MCFG - rf examples.txt | p -lines -- parse each non-empty line of the file - -at, apply_transfer: at (Module.Fun | Fun) - Transfer a term using Fun from Module, or the topmost transfer - module. Transfer modules are given in the .trc format. They are - shown by the 'po' command. - flags: - -lang typecheck the result in this lang instead of default lang - examples: - p -lang=Cncdecimal "123" | at num2bin | l -- convert dec to bin - -tb, tree_bank: tb - Generate a multilingual treebank from a list of trees (default) or compare - to an existing treebank. - options: - -c compare to existing xml-formatted treebank - -trees return the trees of the treebank - -all show all linearization alternatives (branches and variants) - -table show tables of linearizations with parameters - -record show linearization records - -xml wrap the treebank (or comparison results) with XML tags - -mem write the treebank in memory instead of a file TODO - examples: - gr -cat=S -number=100 | tb -xml | wf tb.xml -- random treebank into file - rf tb.xml | tb -c -- compare-test treebank from file - rf old.xml | tb -trees | tb -xml -- create new treebank from old - -ut, use_treebank: ut String - Lookup a string in a treebank and return the resulting trees. - Use 'tb' to create a treebank and 'i -treebank' to read one from - a file. - options: - -assocs show all string-trees associations in the treebank - -strings show all strings in the treebank - -trees show all trees in the treebank - -raw return the lookup result as string, without typechecking it - flags: - -treebank use this treebank (instead of the latest introduced one) - examples: - ut "He adds this to that" | l -multi -- use treebank lookup as parser in translation - ut -assocs | grep "ComplV2" -- show all associations with ComplV2 - -tt, test_tokenizer: tt String - Show the token list sent to the parser when String is parsed. - HINT: can be useful when debugging the parser. - flags: - -lexer use this lexer - examples: - tt -lexer=codelit "2*(x + 3)" -- a favourite lexer for program code - -g, grep: g String1 String2 - Grep the String1 in the String2. String2 is read line by line, - and only those lines that contain String1 are returned. - flags: - -v return those lines that do not contain String1. - examples: - pg -printer=cf | grep "mother" -- show cf rules with word mother - -cc, compute_concrete: cc Term - Compute a term by concrete syntax definitions. Uses the topmost - resource module (the last in listing by command po) to resolve - constant names. - N.B. You need the flag -retain when importing the grammar, if you want - the oper definitions to be retained after compilation; otherwise this - command does not expand oper constants. - N.B.' The resulting Term is not a term in the sense of abstract syntax, - and hence not a valid input to a Tree-demanding command. - flags: - -table show output in a similar readable format as 'l -table' - -res use another module than the topmost one - examples: - cc -res=ParadigmsFin (nLukko "hyppy") -- inflect "hyppy" with nLukko - -so, show_operations: so Type - Show oper operations with the given value type. Uses the topmost - resource module to resolve constant names. - N.B. You need the flag -retain when importing the grammar, if you want - the oper definitions to be retained after compilation; otherwise this - command does not find any oper constants. - N.B.' The value type may not be defined in a supermodule of the - topmost resource. In that case, use appropriate qualified name. - flags: - -res use another module than the topmost one - examples: - so -res=ParadigmsFin ResourceFin.N -- show N-paradigms in ParadigmsFin - -t, translate: t Lang Lang String - Parses String in Lang1 and linearizes the resulting Trees in Lang2. - flags: - -cat - -lexer - -parser - examples: - t Eng Swe -cat=S "every number is even or odd" - -gr, generate_random: gr Tree? - Generates a random Tree of a given category. If a Tree - argument is given, the command completes the Tree with values to - the metavariables in the tree. - options: - -prob use probabilities (works for nondep types only) - -cf use a very fast method (works for nondep types only) - flags: - -cat generate in this category - -lang use the abstract syntax of this grammar - -number generate this number of trees (not impl. with Tree argument) - -depth use this number of search steps at most - examples: - gr -cat=Query -- generate in category Query - gr (PredVP ? (NegVG ?)) -- generate a random tree of this form - gr -cat=S -tr | l -- gererate and linearize - -gt, generate_trees: gt Tree? - Generates all trees up to a given depth. If the depth is large, - a small -alts is recommended. If a Tree argument is given, the - command completes the Tree with values to the metavariables in - the tree. - options: - -metas also return trees that include metavariables - -all generate all (can be infinitely many, lazily) - -lin linearize result of -all (otherwise, use pipe to linearize) - flags: - -depth generate to this depth (default 3) - -atoms take this number of atomic rules of each category (default unlimited) - -alts take this number of alternatives at each branch (default unlimited) - -cat generate in this category - -nonub don't remove duplicates (faster, not effective with -mem) - -mem use a memorizing algorithm (often faster, usually more memory-consuming) - -lang use the abstract syntax of this grammar - -number generate (at most) this number of trees (also works with -all) - -noexpand don't expand these categories (comma-separated, e.g. -noexpand=V,CN) - -doexpand only expand these categories (comma-separated, e.g. -doexpand=V,CN) - examples: - gt -depth=10 -cat=NP -- generate all NP's to depth 10 - gt (PredVP ? (NegVG ?)) -- generate all trees of this form - gt -cat=S -tr | l -- generate and linearize - gt -noexpand=NP | l -mark=metacat -- the only NP is meta, linearized "?0 +NP" - gt | l | p -lines -ambiguous | grep "#AMBIGUOUS" -- show ambiguous strings - -ma, morphologically_analyse: ma String - Runs morphological analysis on each word in String and displays - the results line by line. - options: - -short show analyses in bracketed words, instead of separate lines - -status show just the work at success, prefixed with "*" at failure - flags: - -lang - examples: - wf Bible.txt | ma -short | wf Bible.tagged -- analyse the Bible -``` - - -==Elementary generation of Strings and Trees== - -``` -ps, put_string: ps String - Returns its argument String, like Unix echo. - HINT. The strength of ps comes from the possibility to receive the - argument from a pipeline, and altering it by the -filter flag. - flags: - -filter filter the result through this string processor - -length cut the string after this number of characters - examples: - gr -cat=Letter | l | ps -filter=text -- random letter as text - -pt, put_tree: pt Tree - Returns its argument Tree, like a specialized Unix echo. - HINT. The strength of pt comes from the possibility to receive - the argument from a pipeline, and altering it by the -transform flag. - flags: - -transform transform the result by this term processor - -number generate this number of terms at most - examples: - p "zero is even" | pt -transform=solve -- solve ?'s in parse result - -* st, show_tree: st Tree - Prints the tree as a string. Unlike pt, this command cannot be - used in a pipe to produce a tree, since its output is a string. - flags: - -printer show the tree in a special format (-printer=xml supported) - -wt, wrap_tree: wt Fun - Wraps the tree as the sole argument of Fun. - flags: - -c compute the resulting new tree to normal form - -vt, visualize_tree: vt Tree - Shows the abstract syntax tree via dot and gv (via temporary files - grphtmp.dot, grphtmp.ps). - flags: - -c show categories only (no functions) - -f show functions only (no categories) - -g show as graph (sharing uses of the same function) - -o just generate the .dot file - examples: - p "hello world" | vt -o | wf my.dot ;; ! open -a GraphViz my.dot - -- This writes the parse tree into my.dot and opens the .dot file - -- with another application without generating .ps. -``` - -==Subshells== - -``` -es, editing_session: es - Opens an interactive editing session. - N.B. Exit from a Fudget session is to the Unix shell, not to GF. - options: - -f Fudget GUI (necessary for Unicode; only available in X Window System) - -ts, translation_session: ts - Translates input lines from any of the actual languages to all other ones. - To exit, type a full stop (.) alone on a line. - N.B. Exit from a Fudget session is to the Unix shell, not to GF. - HINT: Set -parser and -lexer locally in each grammar. - options: - -f Fudget GUI (necessary for Unicode; only available in X Windows) - -lang prepend translation results with language names - flags: - -cat the parser category - examples: - ts -cat=Numeral -lang -- translate numerals, show language names - -tq, translation_quiz: tq Lang Lang - Random-generates translation exercises from Lang1 to Lang2, - keeping score of success. - To interrupt, type a full stop (.) alone on a line. - HINT: Set -parser and -lexer locally in each grammar. - flags: - -cat - examples: - tq -cat=NP TestResourceEng TestResourceSwe -- quiz for NPs - -tl, translation_list: tl Lang Lang - Random-generates a list of ten translation exercises from Lang1 - to Lang2. The number can be changed by a flag. - HINT: use wf to save the exercises in a file. - flags: - -cat - -number - examples: - tl -cat=NP TestResourceEng TestResourceSwe -- quiz list for NPs - -mq, morphology_quiz: mq - Random-generates morphological exercises, - keeping score of success. - To interrupt, type a full stop (.) alone on a line. - HINT: use printname judgements in your grammar to - produce nice expressions for desired forms. - flags: - -cat - -lang - examples: - mq -cat=N -lang=TestResourceSwe -- quiz for Swedish nouns - -ml, morphology_list: ml - Random-generates a list of ten morphological exercises, - keeping score of success. The number can be changed with a flag. - HINT: use wf to save the exercises in a file. - flags: - -cat - -lang - -number - examples: - ml -cat=N -lang=TestResourceSwe -- quiz list for Swedish nouns -``` - - -==IO-related commands== - -``` -rf, read_file: rf File - Returns the contents of File as a String; error if File does not exist. - -wf, write_file: wf File String - Writes String into File; File is created if it does not exist. - N.B. the command overwrites File without a warning. - -af, append_file: af File - Writes String into the end of File; File is created if it does not exist. - -* tg, transform_grammar: tg File - Reads File, parses as a grammar, - but instead of compiling further, prints it. - The environment is not changed. When parsing the grammar, the same file - name suffixes are supported as in the i command. - HINT: use this command to print the grammar in - another format (the -printer flag); pipe it to wf to save this format. - flags: - -printer (only -printer=latex supported currently) - -* cl, convert_latex: cl File - Reads File, which is expected to be in LaTeX form. - -sa, speak_aloud: sa String - Uses the Flite speech generator to produce speech for String. - Works for American English spelling. - examples: - h | sa -- listen to the list of commands - gr -cat=S | l | sa -- generate a random sentence and speak it aloud - -si, speech_input: si - Uses an ATK speech recognizer to get speech input. - flags: - -lang: The grammar to use with the speech recognizer. - -cat: The grammar category to get input in. - -language: Use acoustic model and dictionary for this language. - -number: The number of utterances to recognize. - -h, help: h Command? - Displays the paragraph concerning the command from this help file. - Without the argument, shows the first lines of all paragraphs. - options - -all show the whole help file - -defs show user-defined commands and terms - -FLAG show the values of FLAG (works for grammar-independent flags) - examples: - h print_grammar -- show all information on the pg command - -q, quit: q - Exits GF. - HINT: you can use 'ph | wf history' to save your session. - -!, system_command: ! String - Issues a system command. No value is returned to GF. - example: - ! ls - -?, system_command: ? String - Issues a system command that receives its arguments from GF pipe - and returns a value to GF. - example: - h | ? 'wc -l' | p -cat=Num -``` - - -==Flags== - -The availability of flags is defined separately for each command. -``` --cat, category in which parsing is performed. - The default is S. - --depth, the search depth in e.g. random generation. - The default depends on application. - --filter, operation performed on a string. The default is identity. - -filter=identity no change - -filter=erase erase the text - -filter=take100 show the first 100 characters - -filter=length show the length of the string - -filter=text format as text (punctuation, capitalization) - -filter=code format as code (spacing, indentation) - --lang, grammar used when executing a grammar-dependent command. - The default is the last-imported grammar. - --language, voice used by Festival as its --language flag in the sa command. - The default is system-dependent. - --length, the maximum number of characters shown of a string. - The default is unlimited. - --lexer, tokenization transforming a string into lexical units for a parser. - The default is words. - -lexer=words tokens are separated by spaces or newlines - -lexer=literals like words, but GF integer and string literals recognized - -lexer=vars like words, but "x","x_...","$...$" as vars, "?..." as meta - -lexer=chars each character is a token - -lexer=code use Haskell's lex - -lexer=codevars like code, but treat unknown words as variables, ?? as meta - -lexer=textvars like text, but treat unknown words as variables, ?? as meta - -lexer=text with conventions on punctuation and capital letters - -lexer=codelit like code, but treat unknown words as string literals - -lexer=textlit like text, but treat unknown words as string literals - -lexer=codeC use a C-like lexer - -lexer=ignore like literals, but ignore unknown words - -lexer=subseqs like ignore, but then try all subsequences from longest - --number, the maximum number of generated items in a list. - The default is unlimited. - --optimize, optimization on generated code. - The default is share for concrete, none for resource modules. - Each of the flags can have the suffix _subs, which performs - common subexpression elimination after the main optimization. - Thus, -optimize=all_subs is the most aggressive one. The _subs - strategy only works in GFC, and applies therefore in concrete but - not in resource modules. - -optimize=share share common branches in tables - -optimize=parametrize first try parametrize then do share with the rest - -optimize=values represent tables as courses-of-values - -optimize=all first try parametrize then do values with the rest - -optimize=none no optimization - --parser, parsing strategy. The default is chart. If -cfg or -mcfg are - selected, only bottomup and topdown are recognized. - -parser=chart bottom-up chart parsing - -parser=bottomup a more up to date bottom-up strategy - -parser=topdown top-down strategy - -parser=old an old bottom-up chart parser - --printer, format in which the grammar is printed. The default is - gfc. Those marked with M are (only) available for pm, the rest - for pg. - -printer=gfc GFC grammar - -printer=gf GF grammar - -printer=old old GF grammar - -printer=cf context-free grammar, with profiles - -printer=bnf context-free grammar, without profiles - -printer=lbnf labelled context-free grammar for BNF Converter - -printer=plbnf grammar for BNF Converter, with precedence levels - *-printer=happy source file for Happy parser generator (use lbnf!) - -printer=haskell abstract syntax in Haskell, with transl to/from GF - -printer=haskell_gadt abstract syntax GADT in Haskell, with transl to/from GF - -printer=morpho full-form lexicon, long format - *-printer=latex LaTeX file (for the tg command) - -printer=fullform full-form lexicon, short format - *-printer=xml XML: DTD for the pg command, object for st - -printer=old old GF: file readable by GF 1.2 - -printer=stat show some statistics of generated GFC - -printer=probs show probabilities of all functions - -printer=gsl Nuance GSL speech recognition grammar - -printer=jsgf Java Speech Grammar Format - -printer=jsgf_sisr_old Java Speech Grammar Format with semantic tags in - SISR WD 20030401 format - -printer=srgs_abnf SRGS ABNF format - -printer=srgs_abnf_non_rec SRGS ABNF format, without any recursion. - -printer=srgs_abnf_sisr_old SRGS ABNF format, with semantic tags in - SISR WD 20030401 format - -printer=srgs_xml SRGS XML format - -printer=srgs_xml_non_rec SRGS XML format, without any recursion. - -printer=srgs_xml_prob SRGS XML format, with weights - -printer=srgs_xml_sisr_old SRGS XML format, with semantic tags in - SISR WD 20030401 format - -printer=vxml Generate a dialogue system in VoiceXML. - -printer=slf a finite automaton in the HTK SLF format - -printer=slf_graphviz the same automaton as slf, but in Graphviz format - -printer=slf_sub a finite automaton with sub-automata in the - HTK SLF format - -printer=slf_sub_graphviz the same automaton as slf_sub, but in - Graphviz format - -printer=fa_graphviz a finite automaton with labelled edges - -printer=regular a regular grammar in a simple BNF - -printer=unpar a gfc grammar with parameters eliminated - -printer=functiongraph abstract syntax functions in 'dot' format - -printer=typegraph abstract syntax categories in 'dot' format - -printer=transfer Transfer language datatype (.tr file format) - -printer=cfg-prolog M cfg in prolog format (also pg) - -printer=gfc-prolog M gfc in prolog format (also pg) - -printer=gfcm M gfcm file (default for pm) - -printer=graph M module dependency graph in 'dot' (graphviz) format - -printer=header M gfcm file with header (for GF embedded in Java) - -printer=js M JavaScript type annotator and linearizer - -printer=mcfg-prolog M mcfg in prolog format (also pg) - -printer=missing M the missing linearizations of each concrete - --startcat, like -cat, but used in grammars (to avoid clash with keyword cat) - --transform, transformation performed on a syntax tree. The default is identity. - -transform=identity no change - -transform=compute compute by using definitions in the grammar - -transform=nodup return the term only if it has no constants duplicated - -transform=nodupatom return the term only if it has no atomic constants duplicated - -transform=typecheck return the term only if it is type-correct - -transform=solve solve metavariables as derived refinements - -transform=context solve metavariables by unique refinements as variables - -transform=delete replace the term by metavariable - --unlexer, untokenization transforming linearization output into a string. - The default is unwords. - -unlexer=unwords space-separated token list (like unwords) - -unlexer=text format as text: punctuation, capitals, paragraph

- -unlexer=code format as code (spacing, indentation) - -unlexer=textlit like text, but remove string literal quotes - -unlexer=codelit like code, but remove string literal quotes - -unlexer=concat remove all spaces - -unlexer=bind like identity, but bind at "&+" - --mark, marking of parts of tree in linearization. The default is none. - -mark=metacat append "+CAT" to every metavariable, showing its category - -mark=struct show tree structure with brackets - -mark=java show tree structure with XML tags (used in gfeditor) - --coding, Some grammars are in UTF-8, some in isolatin-1. - If the letters ä (a-umlaut) and ö (o-umlaut) look strange, either - change your terminal to isolatin-1, or rewrite the grammar with - 'pg -utf8'. -``` diff --git a/doc/gf-history.html b/doc/gf-history.html deleted file mode 100644 index 3fe8153e2..000000000 --- a/doc/gf-history.html +++ /dev/null @@ -1,865 +0,0 @@ - - -

- - - -

Grammatical Framework History of Changes

- - - -Changes in functionality since May 17, 2005, release of GF Version 2.2 - -
- -

- -25/6 (BB) -Added new speech recognition grammar printers for non-recursive SRGS grammars, -as used by Nuance Recognizer 9.0. Try pg -printer=srgs_xml_non_rec -or pg -printer=srgs_abnf_non_rec. - -

- -19/6 (AR) -Extended the functor syntax (with modules) so that the functor can have -restricted import and a module body (whose function is normally to complete restricted -import). Thus the following format is now possible: -

-  concrete C of A = E ** CI - [f,g] with (...) ** open R in {...}
-
-At the same time, the possibility of an empty module body was added to other modules -for symmetry. This can be useful for "proxy modules" that just collect other modules -without adding anything, e.g. -
-  abstract Math = Arithmetic, Geometry ;
-
- - -

- - -18/6 (AR) -Added a warning for clashing constants. A constant coming from multiple opened modules -was interpreted as "the first" found by the compiler, which was a source of difficult -errors. Clashing is officially forbidden, but we chose to give a warning instead of -raising an error to begin with (in version 2.8). - -

- -30/1/2007 (AR) -Semantics of variants fixed for complex types. Officially, it was only -defined for basic types (Str and parameters). When used for records, results were -multiplicative, which was nor usable. But now variants should work for any type. - -

- -


- -

- -22/12 (AR) Release of GF version 2.7. - -

- -21/12 (AR) -Overloading rules for GF version 2.7: -

    -
  1. If a unique instance is found by exact match with argument types, - that instance is used. -
  2. Otherwise, if exact match with the expected value type gives a - uniques instance, that instance is used. -
  3. Otherwise, if among possible instances only one returns a non-function - type, that instance is used, but a warning is issued. -
  4. Otherwise, an error results, and the list of possible instances is shown. -
-These rules are still experimental, but all future developments will guarantee -that their type-correct use will work. Rule (3) is only needed because the -current type checker does not always know an expected type. It can give -an incorrect result which is captured later in the compilation. To be noticed, -in particular, is that exact match is required. Match by subtyping will be -investigated later. - -

- -21/12 (BB) Java Speech Grammar Format with SISR tags can now be generated. -Use pg -printer=jsgf_sisr_old. The SISR tags are in Working Draft -20030401 format, which is supported by the OptimTALK VoiceXML interpreter -and the IBM XHTML+Voice implementation use by the Opera web browser. - -

- -21/12 (BB) -VoiceXML 2.0 dialog systems can now be generated from GF grammars. -Use pg -printer=vxml. - -

- -21/12 (BB) -JavaScript code for linearization and type annotation can now be -generated from a multilingual GF grammar. Use pm -printer=js. - - -

- -5/12 (BB) -A new tool for generating C linearization libraries -from a GFCC file. make gfcc2c in src -compiles the tool. The generated -code includes header files in lib/c and should be linked -against libgfcc.a in lib/c. For an example of -using the generated code, see src/tools/c/examples/bronzeage. -make in that directory generates a GFCC file, then generates -C code from that, and then compiles a program bronzeage-test. -The main function for that program is defined in -bronzeage-test.c. - - -

- -20/11 (AR) Type error messages in concrete syntax are printed with a -heuristic where a type of the form {... ; lock_C : {} ; ...} -is printed as C. This gives more readable error messages, but -can produce wrong results if lock fields are hand-written or if subtypes -of lock-fielded categories are used. - -

- -17/11 (AR) -Operation overloading: an oper can have many types, -from which one is picked at compile time. The types must have different -argument lists. Exact match with the arguments given to the oper -is required. An example is given in -Constructors.gf. -The purpose of overloading is to make libraries easier to use, since -only one name for each grammatical operation is needed: predication, modification, -coordination, etc. The concrete syntax is, at this experimental level, not -extended but relies on using a record with the function name repeated -as label name (see the example). The treatment of overloading is inspired -by C++, and was first suggested by Björn Nringert. - -

- - -3/10 (AR) A new low-level format gfcc ("Canonical Canonical GF"). -It is going to replace the gfc format later, but is already now -an efficient format for multilingual generation. -See GFCC document -for more information. - -

- -1/9 (AR) New way for managing errors in grammar compilation: -

-  Predef.Error : Type ;
-  Predef.error : Str -> Predef.Error ;
-
-Denotationally, Error is the empty type and thus a -subtype of any other types: it can be used anywhere. But the -error function is not canonical. Hence the compilation -is interrupted when (error s) is translated to GFC, and -the message s is emitted. An example use is given in -english/ParadigmsEng.gf: -
-  regDuplV : Str -> V ;
-  regDuplV fit = 
-    case last fit of {
-      ("a" | "e" | "i" | "o" | "u" | "y") => 
-        Predef.error (["final duplication makes no sense for"] ++ fit) ;
-      t =>
-       let fitt = fit + t in
-       mkV fit (fit + "s") (fitt + "ed") (fitt + "ed") (fitt + "ing")
-      } ;
-
-This function thus cannot be applied to a stem ending with a vowel, -which is exactly what we want. In future, it may be good to add similar -checks to all morphological paradigms in the resource. - - -

- -16/8 (AR) New generation algorithm: slower but works with less -memory. Default of gt; use gt -mem for the old -algorithm. The new option gt -all lazily generates all -trees until interrupted. It cannot be piped to other GF commands, -hence use gt -all -lin to print out linearized strings -rather than trees. - -


- - -22/6 (AR) Release of GF version 2.6. - -

- -20/6 (AR) The FCFG parser is know the default, as it even handles literals. -The old default can be selected by p -old. Since -FCFG does not support variable bindings, -old is automatically -selected if the grammar has bindings - and unless the -fcfg flag -is used. - -

- -17/6 (AR) The FCFG parser is now the recommended method for parsing -heavy grammars such as the resource grammars. It does not yet support -literals and variable bindings. - -

- -1/6 (AR) Added the FCFG parser written by Krasimir Angelov. Invoked by -p -fcfg. This parser is as general as MCFG but faster. -It needs more testing and debugging. - -

- -1/6 (AR) The command r = reload repeats the latest -i = import command. - -

- -30/5 (AR) It is now possible to use the flags -all, -table, -record -in combination with l -multi, and also with tb. - -

- -18/5 (AR) Introduced a wordlist format gfwl for -quick creation of language exercises and (in future) multilingual lexica. -The format is now very simple: -

-  # Svenska - Franska - Finska
-  berg      - montagne            - vuori
-  klättra   - grimper / escalader - kiivetä / kiipeillä
-
-but can be extended to cover paradigm functions in addition to just -words. - -

- -3/4 (AR) The predefined abstract syntax type Int now has two -inherent parameters indicating its last digit and its size. The (hard-coded) -linearization type is -

-  {s : Str ; size : Predef.Ints 1 ; last : Predef.Ints 9}
-
-The size field has value 1 for integers greater than 9, and -value 0 for other integers (which are never negative). This parameter can -be used e.g. in calculating number agreement, -
-    Risala i = {s = i.s ++ table (Predef.Ints 1 * Predef.Ints 9) {
-      <0,1>  => "risalah" ;
-      <0,2>  => "risalatan" ;
-      <0,_> | <1,0> => "rasail" ; 
-      _ => "risalah"
-      } ! <i.size,i.last>
-    } ;
-
-Notice that the table has to be typed explicitly for Ints k, -because type inference would otherwise return Int and therefore -fail to expand the table. - - -

- -31/3 (AR) Added flags and options to some commands, to help generation: -

- -

- -


- -21/3/2006 Release of GF 2.5. - -

- -16/3 (AR) Added two flag values to pt -transform=X: -nodup which excludes terms where a constant is duplicated, -and -nodupatom which excludes terms where an atomic constant is duplicated. -The latter, in particular, is useful as a filter in generation: -

-  gt -cat=Cl | pt -transform=nodupatom
-
-This gives a corpus where words don't (usually) occur twice in the same clause. - -

- -6/3 (AR) Generalized the gfe file format in two ways: -

    -
  1. Use the real grammar parser, hence (in M.C "foo") expressions - may occur anywhere. But the ad hoc word substitution syntax is - abandoned: ordinary let (and where) expressions - can now be used instead. -
  2. The resource may now be a treebank, not just a grammar. Parsing - is thus replaced by treebank lookup, which in most cases is faster. -
-A minor novelty is that the --# -resource=FILE flag can now be -relative to GF_LIB_PATH, both for grammars and treebanks. -The flag --# -treebank=IDENT gives the language whose treebank -entries are used, in case of a multilingual treebank. - -

- -4/3 (AR) Added command use_treebank = ut for lookup in a treebank. -This command can be used as a fast substitute for parsing, but also as a -way to browse treebanks. -

-  ut "He adds this to that" | l -multi   -- use treebank lookup as parser in translation
-  ut -assocs | grep "ComplV2"            -- show all associations with ComplV2
-
- -

- -3/3 (AR) Added option -treebank to the i command. This adds treebanks to -the shell state. The possible file formats are -

    -
  1. XML file with a multilingual treebank, produced by tb -xml -
  2. tab-organized text file with a unilingual treebank, produced by ut -assocs -
-Notice that the treebanks in shell state are unilingual, and have strings as keys. -Multilingual treebanks have trees as keys. In case 1, one unilingual treebank per -language is built in the shell state. - - -

- -1/3 (AR) Added option -trees to the command tree_bank = tb. -By this option, the command just returns the trees in the treebank. It can be -used for producing new treebanks with the same trees: -

-  rf old.xml | tb -trees | tb -xml | wf new.xml
-
-Recall that only treebanks in the XML format can be read with the -trees -and -c flags. - -

- -1/3 (AR) A .gfe file can have a --# -path=PATH on its -second line. The file given on the first line (--# -resource=FILE) -is then read w.r.t. this path. This is useful if the resource file has -no path itself, which happens when it is gfc-only. - -

- -25/2 (AR) The flag preproc of the i command (and thereby -to gf itself) causes GF to apply a preprocessor to each sourcefile -it reads. - -

- -8/2 (AR) The command tb = tree_bank for creating and testing against -multilingual treebanks. Example uses: -

-  gr -cat=S -number=100 | tb -xml | wf tb.xml -- random treebank into file
-  rf tb.txt | tb -c                           -- read comparison treebank from file
-
- -

- -10/1 (AR) Forbade variable binding inside negation and Kleene star -patterns. - -

- -7/1 (AR) Full set of regular expression patterns, with -as-patterns to enable variable bindings to matched expressions: -

-The last three apply to all types of patterns, the first two only to token strings. -Example: plural formation in Swedish 2nd declension -(pojke-pojkar, nyckel-nycklar, seger-segrar, bil-bilar): -
-  plural2 : Str -> Str = \w -> case w of {
-    pojk + "e"                       => pojk + "ar" ;
-    nyck + "e" + l@("l" | "r" | "n") => nyck + l + "ar" ;
-    bil                              => bil + "ar"
-    } ;
-
-Semantics: variables are always bound to the first match, in the sequence defined -as the list Match p v as follows: -
-  Match (p1|p2) v = Match p1 v ++ Match p2 v
-  Match (p1+p2) s = [Match p1 s1 ++ Match p2 s2 | i <- [0..length s], (s1,s2) = splitAt i s]
-  Match p*      s = Match "" s ++ Match p s ++ Match (p + p) s ++ ...
-  Match c       v = [[]] if c == v  -- for constant patterns c
-  Match x       v = [[(x,v)]]       -- for variable patterns x
-  Match x@p     v = [[(x,v)]] + M   if M = Match p v /= []
-  Match p       v = [] otherwise    -- failure
-
-Examples: - -

- -6/1 (AR) Concatenative string patterns to help morphology definitions... -This can be seen as a step towards regular expression string patterns. -The natural notation p1 + p2 will be considered later. -Note. This was done on 7/1. - -

- -5/1/2006 (BB) New grammar printers slf_sub and slf_sub_graphviz -for creating SLF networks with sub-automata. - -


- -22/12 Release of GF 2.4. - -

- -21/12 (AR) It now works to parse escaped string literals from command -line, and also string literals with spaces: -

-  gf examples/tram0/TramEng.gf
-  > p -lexer=literals "I want to go to \"Gustaf Adolfs torg\" ;"
-  QInput (GoTo (DestNamed "Gustaf Adolfs torg"))
-
- -

- -20/12 (AR) Support for full disjunctive patterns (P|Q) i.e. -not just on top level. - -

- -14/12 (BB) The command si (speech_input) which creates -a speech recognizer from a grammar for English and admits speech input -of strings has been added. The command uses an -ATK recognizer and -creates a recognition -network which accepts strings in the currently active grammar. -In order to use the si command, -you need to install the -atkrec library -and configure GF with ./configure --with-atk before compiling. -You need to set two environment variables for the si command to -work. ATK_HOME should contain the path to your copy of ATK -and GF_ATK_CFG should contain the path to your GF ATK configuration -file. A default version of this file can be found in - GF/src/gf_atk.cfg. - - -

- -11/12 (AR) Parsing of float literals now possible in object language. -Use the flag lexer=literals. - -

- -6/12 (AR) Accept param and oper definitions in -concrete modules. The definitions are just inlined in the -current module and not inherited. The purpose is to support rapid -prototyping of grammars. - -

- -2/12 (AR) The built-in type Float added to abstract syntax (and -resource). Values are stored as Haskell's Double precision -floats. For the syntax of float literals, see BNFC document. -NB: some bug still prevents parsing float literals in object -languages. Bug fixed 11/12. - -

- -1/12 (BB,AR) The command at = apply_transfer, which applies -a transfer function to a term. This is used for noncompositional -translation. Transfer functions are defined in a special transfer -language (file suffix .tr), which is compiled into a -run-time transfer core language (file suffix .trc). -The compiler is included in GF/transfer. The following is -a complete example of how to try out transfer: -

-  % cd GF/transfer
-  % make                            -- compile the trc compiler
-  % cd examples                     -- GF/transfer/examples
-  % ../compile_to_core -i../lib numerals.tr
-  % mv numerals.trc ../../examples/numerals
-  % cd ../../examples/numerals      -- GF/examples/numerals
-  % gf
-     > i decimal.gf
-     > i BinaryDigits.gf
-     > i numerals.trc
-     > p -lang=Cncdecimal "123" | at num2bin | l
-     1 0 0 1 1 0 0 1 1 1 0
-
-Other relevant commands are: - -For more information on the commands, see help. Documentation on -the transfer language: to appear. - -

- -17/11 (AR) Made it possible for lexers to be nondeterministic. -Now with a simple-minded implementation that the parser is sent -each lexing result in turn. The option -cut is used for -breaking after first lexing leading to successful parse. The only -nondeterministic lexer right now is -lexer=subseqs, which -first filters with -lexer=ignore (dropping words neither in -the grammar nor literals) and then starts ignoring other words from -longest to shortest subsequence. This is usable for parser tasks -of keyword spotting type, but expensive (2n) in long input. -A smarter implementation is therefore desirable. - -

- -14/11 (AR) Functions can be made unparsable (or "internal" as -in BNFC). This is done by i -noparse=file, where -the nonparsable functions are given in file using the -line format --# noparse Funs. This can be used e.g. to -rule out expensive parsing rules. It is used in -lib/resource/abstract/LangVP.gf to get parse values -structured with VP, which is obtained via transfer. -So far only the default (= old) parser generator supports this. - -

- -14/11 (AR) Removed the restrictions how a lincat may look like. -Now any record type that has a value in GFC (i.e. without any -functions in it) can be used, e.g. {np : NP ; cn : Bool => CN}. -To display linearization values, only l -record shows -nice results. - -

- -9/11 (AR) GF shell state can now have several abstract syntaxes with -their associated concrete syntaxes. This allows e.g. parsing with -resource while testing an application. One can also have a -parse-transfer-lin chain from one abstract syntax to another. - -

-7/11 (BB) Running commands can now be interrupted with Ctrl-C, without -killing the GF process. This feature is not supported on Windows. - -

- -1/11 (AR) Yet another method for adding probabilities: append - --# prob Double to the end of a line defining a function. -This can be (1) a .cf rule (2) a fun rule, or -(3) a lin rule. The probability is attached to the -first identifier on the line. - -

-1/11 (BB) Added generation of weighted SRGS grammars. The weights -are calculated from the function probabilities. The algorithm -for calculating the weights is not yet very good. -Use pg -printer=srgs_xml_prob. - -

-31/10 (BB) Added option for converting grammars to SRGS grammars in XML format. -Use pg -printer=srgs_xml. - -

- -31/10 (AR) Probabilistic grammars. Probabilities can be used to -weight random generation (gr -prob) and to rank parse -results (p -prob). They are read from a separate file -(flag i -probs=File, format --# prob Fun Double) -or from the top-level grammar file itself (option i -prob). -To see the probabilities, use pg -printer=probs. -
-As a by-product, the probabilistic random generation algorithm is -available for any context-free abstract syntax. Use the flag -gr -cf. This algorithm is much faster than the -old (more general) one, but it may sometimes loop. - -

- -12/10 (AR) Flag -atoms=Int to the command gt = generate_trees -takes away all zero-argument functions except Int per category. In -this way, it is possible to generate a corpus illustrating each -syntactic structure even when the lexicon (which consists of -zero-argument functions) is large. - -

- -6/10 (AR) New commands dc = define_command and -dt = define_tree to define macros in a GF session. -See help for details and examples. - -

- -5/10 (AR) Printing missing linearization rules: -pm -printer=missing. Command g = grep, -which works in a way similar to Unix grep. - -

- -5/10 (PL) Printing graphs with function and category dependencies: -pg -printer=functiongraph, pg -printer=typegraph. - -

- -20/9 (AR) Added optimization by common subexpression elimination. -It works on GFC modules and creates oper definitions for -subterms that occur more than once in lin definitions. These -oper definitions are automatically reinlined in functionalities -that don't support opers in GFC. This conversion is done by -module and the opers are not inherited. Moreover, the subterms -can contain free variables which means that the opers are not -always well typed. However, since all variables in GFC are type-specific -(and local variables are lin-specific), this does not destroy -subject reduction or cause illegal captures. -
-The optimization is triggered by the flag optimize=OPT_subs, -where OPT is any of the other optimizations (see h -optimize). -The most aggressive value of the flag is all_subs. In experiments, -the size of a GFC module can shrink by 85% compared to plain all. - -

- -18/9 (AR) Removed superfluous spaces from GFC printing. This shrinks -the GFC size by 5-10%. - -

- -15/9 (AR) Fixed some bugs in dependent-type type checking of abstract -modules at compile time. The type checker is more severe now, which means -that some old grammars may fail to compile - but this is usually the -right result. However, the type checker of def judgements still -needs work. - -

- -14/9 (AR) Added printing of grammars to a format without parameters, in -the spirit of Peanos "Latino sine flexione". The command pg -unpar -does the trick, and the result can be saved in a gfcm file. The generated -concrete syntax modules get the prefix UP_. The translation is briefly: -

-  (P => T)*               =  T*
-  (t ! p)*                =  t*
-  (table {p => t ; ...})* =  t*
-
-In order for this to be maximally useful, the grammar should be written in such -a way that the first value of every parameter type is the desired one. For -instance, in Peano's case it would be the ablative for noun cases, the singular for -numbers, and the 2nd person singular imperative for verb forms. - -

- -14/9 (BB) Added finite state approximation of grammars. -Internally the conversion is done cfg -> regular -> fa -> slf, so the -different printers can be used to check the output of each stage. -The new options are: -

-
pg -printer=slf
-
A finite automaton in the HTK SLF format.
-
pg -printer=slf_graphviz
-
The same FA as in SLF, but in Graphviz format.
-
pg -printer=fa_graphviz
-
A finite automaton with labelled edges, instead of labelled nodes which SLF has.
-
pg -printer=regular
-
A regular grammar in a simple BNF.
-
- -

- -4/9 (AR) Added the option pg -printer=stat to show -statistics of gfc compilation result. To be extended with new information. -The most important stats now are the top-40 sized definitions. - -

-


- -1/7 Release of GF 2.3. - -

- - -1/7 (AR) Added the flag -o to the vt command -to just write the .dot file without going to .ps -(cf. 20/6). - -

- -29/6 (AR) The printer used by Embedded Java GF Interpreter -(pm -header) now produces -working code from all optimized grammars - hence you need not select a -weaker optimization just to use the interpreter. However, the -optimization -optimize=share usually produces smaller object -grammars because the "unoptimizer" just undoes all optimizations. -(This is to be considered a temporary solution until the interpreter -knows how to handle stronger optimizations.) - -

- -27/6 (AR) The flag flags optimize=noexpand placed in a -resource module prevents the optimization phase of the compiler when -the .gfr file is created. This can prevent serious code -explosion, but it will also make the processing of modules using the -resource slowwer. A favourable example is lib/resource/finnish/ParadigmsFin. - -

- -23/6 (HD,AR) The new editor GUI gfeditor by Hans-Joachim -Daniels can now be used. It is based on Janna Khegai's jgf. -New functionality include HTML display (gfeditor -h) and -programmable refinement tooltips. - -

- -23/6 (AR) The flag unlexer=finnish can be used to bind -Finnish suffixes (e.g. possessives) to preceding words. The GF source -notation is e.g. "isä" ++ "&*" ++ "nsa" ++ "&*" ++ "ko", -which unlexes to "isänsäkö". There is no corresponding lexer -support yet. - - -

- -22/6 (PL,AR) The MCFG parser (p -mcfg) now works on all -optimized grammars - hence you need not select a weaker optimization -to use this parser. The same concerns the CFGM printer (pm -printer=cfgm). - -

- -20/6 (AR) Added the command visualize_tree = vt, to -display syntax trees graphically. Like vg, this command uses -GraphViz and Ghostview. The foremost use is to pipe the parser to this -command. - -

- -17/6 (BB) There is now support for lists in GF abstract syntax. -A list category is declared as: -

-cat [C]
-
-or -
-cat [C]{n}
-
-where C is a category and n is a non-negative integer. -cat [C] is equivalent to cat [C]{0}. List category -syntax can be used whereever categories are used. - -

- -cat [C]{n} is equivalent to the declarations: -

-cat ListC
-fun BaseC : C^n -> ListC
-fun ConsC : C -> ListC -> ListC
-
- -where C^0 -> X means X, and C^m (where -m > 0) means C -> C^(m-1). - -

- -A lincat declaration on the form: -

-lincat [C] = T
-
-is equivalent to -
-lincat ListC = T
-
- -The linearizations of the list constructors are written -just like they would be if the function declarations above -had been made manually, e.g.: -
-lin BaseC x_1 ... x_n = t
-lin ConsC x xs = t'
-
- -

- -10/6 (AR) Preprocessor of .gfe files can now be performed as part of -any grammar compilation. The flag -ex causes GF to look for -the .gfe files and preprocess those that are younger -than the corresponding .gf files. The files are first sorted -and grouped by the resource, so that each resource only need be compiled once. - -

- -10/6 (AR) Editor GUI can now be alternatively invoked by the shell -command gf -edit (equivalent to jgf). - -

- -10/6 (AR) Editor GUI command pc Int to pop Int -items from the clip board. - -

- -4/6 (AR) Sequence of commands in the Java editor GUI now possible. -The commands are separated by ;; (notice the space on -both sides of the two semicolons). Such a sequence can be sent -from the "GF Command" pop-up field, but is mostly intended -for external processes that communicate with GF. - -

- -3/6 (AR) The format .gfe defined to support -grammar writing by examples. Files of this format are first -converted to .gf files by the command -

-  gf -examples File.gfe
-
-See -../lib/resource/doc/examples/QuestionsI.gfe -for an example. - -

- -31/5 (AR) Default of p -rawtrees=k changed to 999999. - -

- -31/5 (AR) Support for restricted inheritance. Syntax: -

-  M          -- inherit everything from M, as before
-  M [a,b,c]  -- only inherit constants a,b,c
-  M-[a,b,c]  -- inherit everything except a,b,c
-
-Caution: there is no check yet for completeness and -consistency, but restricted inheritance can create -run-time failures. - -

- -29/5 (AR) Parser support for reading GFC files line per line. -The category Line in GFC.cf can be used -as entrypoint instead of Grammar to achieve this. - -

- -28/5 (AR) Environment variables and path wild cards. -

-

- - -26/5/2005 (BB) Notation for list categories. - - - - - diff --git a/doc/gf-modules.html b/doc/gf-modules.html deleted file mode 100644 index 6292bd855..000000000 --- a/doc/gf-modules.html +++ /dev/null @@ -1,1183 +0,0 @@ - - - - -The Module System of GF - -

The Module System of GF

- -Aarne Ranta
-8/4/2005 - 5/7/2007 -
- -

-
-

- - -

-
-

-

-A GF grammar consists of a set of modules, which can be -combined in different ways to build different grammars. -There are several different types of modules: -

- - -

-We will go through the module types in this order, which is also -their order of "importance" from the most basic to -the more advanced ones. -

-

-This document presupposes knowledge of GF judgements and expressions, which can -be gained from the GF tutorial. It aims -to give a systamatic description of the module system; -some tutorial information is repeated to make the document -self-contained. -

- -

The principal module types

- -

Abstract syntax

-

-Any GF grammar that is used in an application -will probably contain at least one module -of the abstract module type. Here is an example of -such a module, defining a fragment of propositional logic. -

-
-    abstract Logic = {
-      cat Prop ;
-      fun Conj : Prop -> Prop -> Prop ;
-      fun Disj : Prop -> Prop -> Prop ;
-      fun Impl : Prop -> Prop -> Prop ;
-      fun Falsum : Prop ;
-      }
-
-

-The name of this module is Logic. -

-

-An abstract module defines an abstract syntax, which -is a language-independent representation of a fragment of language. -It consists of two kinds of judgements: -

- - -

-There can also be def and data judgements in an -abstract syntax. -

- -

Compilation of abstract syntax

-

-The GF grammar compiler expects to find the module Logic in a file named -Logic.gf. When the compiler is run, it produces -another file, named Logic.gfc. This file is in the -format called canonical GF, which is the "machine language" -of GF. Next time that the module Logic is needed in -compiling a grammar, it can be read from the compiled (gfc) -file instead of the source (gf) file, unless the source -has been changed after the compilation. -

- -

Concrete syntax

-

-In order for a GF grammar to describe a concrete language, the abstract -syntax must be completed with a concrete syntax of it. -For this purpose, we use modules of type concrete: for instance, -

-
-    concrete LogicEng of Logic = {
-      lincat Prop = {s : Str} ;
-      lin Conj a b = {s = a.s ++ "and" ++ b.s} ;
-      lin Disj a b = {s = a.s ++ "or"  ++ b.s} ;
-      lin Impl a b = {s = "if" ++ a.s ++ "then"  ++ b.s} ;
-      lin Falsum = {s = ["we have a contradiction"]} ;
-      }
-
-

-The module LogicEng is a concrete syntax of the -abstract syntax Logic. The GF grammar compiler checks that -the concrete is valid with respect to the abstract syntax of -which it is claimed to be. The validity requires that there has to be -

- - -

-Validity also requires that the linearization functions defined by -lin judgements are type-correct with respect to the -linearization types of the arguments and value of the function. -

-

-There can also be lindef and printname judgements in a -concrete syntax. -

- -

Top-level grammar

-

-When a concrete module is successfully compiled, a gfc -file is produced in the same way as for abstract modules. The -pair of an abstract and a corresponding concrete module -is a top-level grammar, which can be used in the GF system to -perform various tasks. The most fundamental tasks are -

- - -

-In the current grammar, infinitely many trees and strings are recognized, although -no very interesting ones. For example, the tree -

-
-    Impl (Disj Falsum Falsum) Falsum
-
-

-has the linearization -

-
-    if we have a contradiction or we have a contradiction then we have a contradiction
-
-

-which in turn can be parsed uniquely as that tree. -

- -

Compiling top-level grammars

-

-When GF compiles the module LogicEng it also has to compile -all modules that it depends on (in this case, just Logic). -The compilation process starts with dependency analysis to find -all these modules, recursively, starting from the explicitly imported one. -The compiler then reads either gf or gfc files, in -a dependency order. The decision on which files to read depends on -time stamps and dependencies in a natural way, so that all and only -those modules that have to be compiled are compiled. (This behaviour can -be changed with flags, see below.) -

- -

Using top-level grammars

-

-To use a top-level grammar in the GF system, one uses the import -command (short name i). For instance, -

-
-    i LogicEng.gf
-
-

-It is also possible to specify the imported grammar(s) on the command -line when invoking GF: -

-
-    gf LogicEng.gf
-
-

-Various compilation flags can be added to both ways of compiling a module: -

- - -

-A complete list of flags can be obtained in GF by help i. -

-

-Importing a grammar makes it visible in GF's internal state. To see -what modules are available, use the command print_options (po). -You can empty the state with the command empty (e); this is -needed if you want to read in grammars with a different abstract syntax -than the current one without exiting GF. -

-

-Grammar modules can reside in different directories. They can then be found -by means of a search path, which is a flag such as -

-
-    -path=.:api/toplevel:prelude
-
-

-given to the import command or the shell command invoking GF. -(It can also be defined in the grammar file; see below.) The compiler -writes every gfc file in the same directory as the corresponding -gf file. -

-

-The path is relative to the working directory pwd, so that -all directories listed are primarily interpreted as subdirectories of -pwd. Secondarily, they are searched relative to the value of the -environment variable GF_LIB_PATH, which is by default set to -/usr/local/share/GF. -

-

-Parsing and linearization can be performed with the parse -(p) and linearize (l) commands, respectively. -For instance, -

-
-    > l Impl (Disj Falsum Falsum) Falsum
-     if we have a contradiction or we have a contradiction then we have a contradiction
-  
-    > p -cat=Prop "we have a contradiction"
-    Falsum
-
-

-Notice that the parse command needs the parsing category -as a flag. This necessary since a grammar can have several -possible parsing categories ("entry points"). -

- -

Multilingual grammar

-

-One abstract syntax can have several concrete syntaxes. -Here are two new ones for Logic: -

-
-    concrete LogicFre of Logic = {
-      lincat Prop = {s : Str} ;
-      lin Conj a b = {s = a.s ++ "et" ++ b.s} ;
-      lin Disj a b = {s = a.s ++ "ou"  ++ b.s} ;
-      lin Impl a b = {s = "si" ++ a.s ++ "alors"  ++ b.s} ;
-      lin Falsum = {s = ["nous avons une contradiction"]} ;
-      }
-  
-    concrete LogicSymb of Logic = {
-      lincat Prop = {s : Str} ;
-      lin Conj a b = {s = "(" ++ a.s ++ "&" ++ b.s ++ ")"} ;
-      lin Disj a b = {s = "(" ++ a.s ++ "v" ++ b.s ++ ")"} ;
-      lin Impl a b = {s = "(" ++ a.s ++ "->" ++ b.s ++ ")"} ;
-      lin Falsum = {s = "_|_"} ;
-      }
-
-

-The four modules Logic, LogicEng, LogicFre, and -LogicSymb together form a multilingual grammar, in which -it is possible to perform parsing and linearization with respect to any -of the concrete syntaxes. As a combination of parsing and linearization, -one can also perform translation from one language to another. -(By language we mean the set of expressions generated by one -concrete syntax.) -

- -

Using multilingual grammars

-

-Any combination of abstract syntax and corresponding concrete syntaxes -is thus a multilingual grammar. With many languages and other enrichments -(as described below), a multilingual grammar easily grows to the size of -tens of modules. The grammar developer, having finished her job, can -package the result in a multilingual canonical grammar, a file -with the suffix .gfcm. For instance, to compile the set of grammars -described by now, the following sequence of GF commands can be used: -

-
-    i LogicEng.gf
-    i LogicFre.gf
-    i LogicSymb.gf
-    pm | wf logic.gfcm
-
-

-The "end user" of the grammar only needs the file logic.gfcm to -access all the functionality of the multilingual grammar. It can be -imported in the GF system in the same way as .gf files. But -it can also be used in the -Embedded Java Interpreter for GF -to build Java programs of which the multilingual grammar functionalities -(linearization, parsing, translation) form a part. -

-

-In a multilingual grammar, the concrete syntax module names work as -names of languages that can be selected for linearization and parsing: -

-
-    > l -lang=LogicFre Impl Falsum Falsum
-    si nous avons une contradiction alors nous avons une contradiction
-  
-    > l -lang=LogicSymb Impl Falsum Falsum
-    ( _|_ -> _|_ )
-  
-    > p -cat=Prop -lang=LogicSymb "( _|_ & _|_ )"
-    Conj Falsum Falsum
-
-

-The option -multi gives linearization to all languages: -

-
-    > l -multi Impl Falsum Falsum
-    if we have a contradiction then we have a contradiction
-    si nous avons une contradiction alors nous avons une contradiction
-    ( _|_ -> _|_ )
-
-

-Translation can be obtained by using a pipe from a parser -to a linearizer: -

-
-    > p -cat=Prop -lang=LogicSymb "( _|_ & _|_ )" | l -lang=LogicEng
-    if we have a contradiction then we have a contradiction
-
-

- -

Resource modules

-

-The concrete modules shown above would look much nicer if -we used the main idea of functional programming: avoid repetitive -code by using functions that capture repeated patterns of -expressions. A collection of such functions can be a valuable -resource for a programmer, reusable in many different -top-level grammars. Thus we introduce the resource -module type, with the first example -

-
-    resource Util = {
-      oper SS : Type = {s : Str} ;
-      oper ss : Str -> SS = \s -> {s = s} ;
-      oper paren : Str -> Str = \s -> "(" ++ s ++ ")" ;
-      oper infix : Str -> SS -> SS -> SS = \h,x,y ->
-        ss (x.s ++ h ++ y.s) ;
-      oper infixp : Str -> SS -> SS -> SS = \h,x,y ->
-        ss (paren (infix h x y)) ;
-      }
-
-

-Modules of resource type have two forms of judgement: -

- - -

-A resource can be used in a concrete (or another -resource) by opening it. This means that -all operations (and parameter types) defined in the resource -module become usable in module that opens it. For instance, -we can rewrite the module LogicSymb much more concisely: -

-
-    concrete LogicSymb of Logic = open Util in {
-      lincat Prop = SS ;
-      lin Conj = infixp "&" ;
-      lin Disj = infixp "v" ;
-      lin Impl = infixp "->" ;
-      lin Falsum = ss "_|_" ;
-      }
-
-

-What happens when this variant of LogicSymb is -compiled is that the oper-defined constants -of Util are inlined in the -right-hand-sides of the judgements of LogicSymb, -and these expressions are partially evaluated, i.e. -computed as far as possible. The generated gfc file -will look just like the file generated for the first version -of LogicSymb - at least, it will do the same job. -

-

-Several resource modules can be opened -at the same time. If the modules contain same names, the -conflict can be resolved by qualified opening and -reference. For instance, -

-
-    concrete LogicSymb of Logic = open Util, Prelude in { ...
-      } ;
-
-

-(where Prelude is a standard library of GF) brings -into scope two definitions of the constant SS. -To specify which one is used, you can write -Util.SS or Prelude.SS instead of just SS. -You can also introduce abbreviations to avoid long qualifiers, e.g. -

-
-    concrete LogicSymb of Logic = open (U=Util), (P=Prelude) in { ...
-      } ;
-
-

-which means that you can write U.SS and P.SS. -

-

-Judgements of param and oper forms may also be used -in concrete modules, and they are then considered local -to those modules, i.e. they are not exported. -

- -

Compiling resource modules

-

-The compilation of a resource module differs -from the compilation of abstract and -concrete modules because oper operations -do not in general have values in gfc. A gfc -file is generated, but it contains only -param judgements (also recall that opers -are inlined in their top-level use sites, so it is not -necessary to save them in the compiled grammar). -However, since computing the operations over and over -again can be time comsuming, and since type checking -resource modules also takes time, a third kind -of file is generated for resource modules: a .gfr -file. This file is written in the GF source code notation, -but it is type checked and type annotated, and opers -are computed as far as possible. -

-

-If you look at any gfc or gfr file generated -by the GF compiler, you see that all names have been replaced by -their qualified variants. This is an important first step (after parsing) -the compiler does. As for the commands in the GF shell, some output -qualified names and some not. The difference does not always result -from firm principles. -

- -

Using resource modules

-

-The typical use is through open in a -concrete module, which means that -resource modules are not imported on their own. -However, in the developing and testing phase of grammars, it -can be useful to evaluate opers with different -arguments. To prevent them from being thrown away after inlining, the --retain option can be used: -

-
-    > i -retain Util.gf
-
-

-The command compute_concrete (cc) -can now be used for evaluating expressions that may contain -operations defined in Util: -

-
-    > cc ss (paren "foo")
-    {s = "(" ++ "foo" ++ ")"}
-
-

-To find out what opers are available for a given type, -the command show_operations (so) can be used: -

-
-    > so SS
-    Util.ss : Str -> SS ;
-    Util.infix : Str -> SS -> SS -> SS ;
-    Util.infixp : Str -> SS -> SS -> SS ;
-
-

- -

Inheritance

-

-The most characteristic modularity of GF lies in the division of -grammars into abstract, concrete, and -resource modules. This permits writing multilingual -grammar and sharing the maximum of code between different -languages. -

-

-In addition to this special kind of modularity, GF provides inheritance, -which is familiar from other programming languages (in particular, -object-oriented ones). Inheritance means that a module inherits all -judgements from another module; we also say that it extends -the other module. Inheritance is useful to divide big grammars into -smaller units, and also to reuse the same units in different bigger -grammars. -

-

-The first example of inheritance is for abstract syntax. Let us -extend the module Logic to Arithmetic: -

-
-    abstract Arithmetic = Logic ** {
-      cat Nat ;
-      fun Even : Nat -> Prop ;
-      fun Odd  : Nat -> Prop ;
-      fun Zero : Nat ;
-      fun Succ : Nat -> Nat ;
-      }
-
-

-In parallel with the extension of the abstract syntax -Logic to Arithmetic, we can extend -the concrete syntax LogicEng to ArithmeticEng: -

-
-    concrete ArithmeticEng of Arithmetic = LogicEng ** open Util in {
-      lincat Nat = SS ;
-      lin Even x = ss (x.s ++ "is" ++ "even") ;
-      lin Odd x = ss (x.s ++ "is" ++ "odd") ;
-      lin Zero = ss "zero" ;
-      lin Succ x = ss ("the" ++ "successor" ++ "of" ++ x.s) ;
-      }
-
-

-Another extension of Logic is Geometry, -

-
-    abstract Geometry = Logic ** {
-      cat Point ;
-      cat Line ;
-      fun Incident : Point -> Line -> Prop ;
-      }
-
-

-The corresponding concrete syntax is left as exercise. -

- -

Multiple inheritance

-

-Inheritance can be multiple, which means that a module -may extend many modules at the same time. Suppose, for instance, -that we want to build a module for mathematics covering both -arithmetic and geometry, and the underlying logic. We then write -

-
-    abstract Mathematics = Arithmetic, Geometry ** {
-      } ;
-
-

-We could of course add some new judgements in this module, but -it is not necessary to do so. If no new judgements are added, the -module body can be omitted: -

-
-    abstract Mathematics = Arithmetic, Geometry ;
-
-

-

-The module Mathematics shows that it is possibe -to extend a module already built by extension. The correctness -criterion for extensions is that the same name -(cat, fun, oper, or param) -may not be defined twice in the resulting union of names. -That the names defined in Logic are "inherited twice" -by Mathematics (via both Arithmetic and -Geometry) is no violation of this rule; the usual -problems of multiple inheritance do not arise, since -the definitions of inherited constants cannot be changed. -

- -

Restricted inheritance

-

-Inheritance can be restricted, which means that only some of -the constants are inherited. There are two dual notations for this: -

-
-    A [f,g]
-
-

-meaning that only f and g are inherited from A, and -

-
-    A-[f,g]
-
-

-meaning that everything except f is g are inherited from A. -

-

-Constants that are not inherited may be redefined in the inheriting module. -

- -

Compiling inheritance

-

-Inherited judgements are not copied into the inheriting modules. -Instead, an indirection is created for each inherited name, -as can be seen by looking into the generated gfc (and -gfr) files. Thus for instance the names -

-
-    Mathematics.Prop  Arithmetic.Prop  Geometry.Prop Logic.Prop
-
-

-all refer to the same category, declared in the module -Logic. -

- -

Inspecting grammar hierarchies

-

-The command visualize_graph (vg) shows the -dependency graph in the current GF shell state. The graph can -also be saved in a file and used e.g. in documentation, by the -command print_multi -graph (pm -graph). -

-

-The vg command uses the free software packages Graphviz (commad dot) -and Ghostscript (command gv). -

- -

Reuse of top-level grammars as resources

-

-Top-level grammars have a straightforward translation to -resource modules. The translation concerns -pairs of abstract-concrete judgements: -

-
-    cat C ;               ===>  oper C : Type = T ;
-    lincat C = T ;
-  
-    fun f : A ;           ===>  oper f : A = t ;
-    lin f = t ;
-
-

-Due to this translation, a concrete module -can be opened in the same way as a -resource module; the translation is done -on the fly (it is computationally very cheap). -

-

-Modular grammar engineering often means that some grammarians -focus on the semantics of the domain whereas others take care -of linguistic details. Thus a typical reuse opens a -linguistically oriented resource grammar, -

-
-    abstract Resource = {
-      cat S ; NP ; A ;
-      fun PredA : NP -> A -> S ;
-      }
-    concrete ResourceEng of Resource = {
-      lincat S = ... ; 
-      lin PredA = ... ;
-      }
-
-

-The application grammar, instead of giving linearizations -explicitly, just reduces them to categories and functions in the -resource grammar: -

-
-    concrete ArithmeticEng of Arithmetic = LogicEng ** open ResourceEng in {
-      lincat Nat = NP ;
-      lin Even x = PredA x (regA "even") ;
-      }
-
-

-If the resource grammar is only capable of generating grammatically -correct expressions, then the grammaticality of the application -grammar is also guaranteed: the type checker of GF is used as -grammar checker. -To guarantee distinctions between categories that have -the same linearization type, the actual translation used -in GF adds to every linearization type and linearization -a lock field, -

-
-    cat C ;                    ===>  oper C : Type = T ** {lock_C : {}} ;
-    lincat C = T ;
-  
-    fun f : C_1 ... C_n -> C ; ===>  oper f : C_1 ... C_n -> C = \x_1,...,x_n -> 
-    lin f = t ;                        t x_1 ... x_n ** {lock_C = &lt;>};
-
-

-(Notice that the latter translation is type-correct because of -record subtyping, which means that t can ignore the -lock fields of its arguments.) An application grammarian who -only uses resource grammar categories and functions never -needs to write these lock fields herself. Having to do so -serves as a warning that the grammaticality guarantee given -by the resource grammar no longer holds. -

-

-Note. The lock field mechanism is experimental, and may be changed -to a stronger abstraction mechnism in the future. This may result in -hand-written lock fields ceasing to work. -

- -

Additional module types

- -

Interfaces, instances, and incomplete grammars

-

-One difference between top-level grammars and resource -modules is that the former systematically separete the -declarations of categories and functions from their definitions. -In the reuse translation creating and oper judgement, -the declaration coming from the abstract module is put -together with the definition coming from the concrete -module. -

-

-However, the separation of declarations and definitions is so -useful a notion that GF also has specific modules types that -resource modules into two parts. In this splitting, -an interface module corresponds to an abstract syntax, -in giving the declarations of operations (and parameter types). -For instance, a generic markup interface would look as follows: -

-
-    interface Markup = open Util in {
-      oper Boldface : Str -> Str ;
-      oper Heading  : Str -> Str ;
-      oper markupSS : (Str -> Str) -> SS -> SS = \f,r ->
-        ss (f r.s) ;
-      } 
-
-

-The definitions of the constants declared in an interface -are given in an instance module (which is always of -an interface, in the same way as a concrete is always -of an abstract). The following instances -define markup in HTML and latex. -

-
-    instance MarkupHTML of Markup = open Util in {
-      oper Boldface s = "&lt;b>" ++ s ++ "&lt;/b>" ; 
-      oper Heading  s = "&lt;h2>" ++ s ++ "&lt;/h2>" ; 
-      } 
-  
-    instance MarkupLatex of Markup = open Util in {
-      oper Boldface s = "\\textbf{" ++ s ++ "}" ; 
-      oper Heading  s = "\\section{" ++ s ++ "}" ; 
-      } 
-
-

-Notice that both interfaces and instances may -open resources (and also reused top-level grammars). -An interface may moreover define some of the operations it -declares; these definitions are inherited by all instances and cannot -be changed in them. Inheritance by module extension -is possible, as always, between modules of the same type. -

- -

Using an interface

-

-An interface or an instance -can be opened in -a concrete using the same syntax as when opening -a resource. For an instance, the semantics -is the same as when opening the definitions together with -the type signatures - one can think of an interface -and an instance of it together forming an ordinary -resource. Opening an interface, however, -is different: functions that are only declared without -having a definition cannot be compiled (inlined); neither -can functions whose definitions depend on undefined functions. -

-

-A module that opens an interface is therefore -incomplete, and has to be completed with an -instance of the interface to become complete. To make -this situation clear, GF requires any module that opens an -interface to be marked as incomplete. Thus -the module -

-
-    incomplete concrete DocMarkup of Doc = open Markup in {
-      ...
-      }
-
-

-uses the interface Markup to place markup in -chosen places in its linearization rules, but the -implementation of markup - whether in HTML or in LaTeX - is -left unspecified. This is a powerful way of sharing -the code of a whole module with just differences in -the definitions of some constants. -

-

-Another terminology for incomplete modules is -parametrized modules or functors. -The interface gives the list of parameters -that the functor depends on. -

- -

Instantiating an interface

-

-To complete an incomplete module, each inteface -that it opens has to be provided an instance. The following -syntax is used for this: -

-
-    concrete DocHTML of Doc = DocMarkup with (Markup = MarkupHTML) ;
-
-

-Instantiation of Markup with MarkupLatex is -another one-liner. -

-

-If more interfaces than one are instantiated, a comma-separated -list of equations in parentheses is used, e.g. -

-
-    concrete MusicIta = MusicI with
-      (Syntax = SyntaxIta), (LexMusic = LexMusicIta) ;
-
-

-This example shows a common design pattern for building applications: -the concrete syntax is a functor on the generic resource grammar library -interface Syntax and a domain-specific lexicon interface, here -LexMusic. -

-

-All interfaces that are opened in the completed model -must be completed. -

-

-Notice that the completion of an incomplete module -may at the same time extend modules of the same type (which need -not be completions). It can also add new judgements in a module body, -and restrict inheritance from the functor. -

-
-    concrete MusicIta = MusicI - [f] with
-      (Syntax = SyntaxIta), (LexMusic = LexMusicIta) ** {
-  
-    lin f = ...
-  
-    } ;
-
-

- -

Compiling interfaces, instances, and parametrized modules

-

-Interfaces, instances, and parametric modules are purely a -front-end feature of GF: these module types do not exist in -the gfc and gfr formats. The compiler has -nevertheless to keep track of their dependencies and modification -times. Here is a summary of how they are compiled: -

- - -

-This means that some generated code is duplicated, because those operations that -do have complete definitions in an interface are copied to each of -the instances. -

- -

Summary of module syntax and semantics

- -

Abstract syntax modules

-

-Syntax: -

-

-abstract A = (A1,...,An **)? -{J1 ; ... ; Jm ; } -

-

-where -

- - -

-Semantic conditions: -

- - - -

Concrete syntax modules

-

-Syntax: -

-

-incomplete? concrete C of A = -(C1,...,Cn **)? -(open O1,...,Ok in)? -{J1 ; ... ; Jm ; } -

-

-where -

- - -

- where R is a resource, instance, or concrete, and Q is any identifier -

- - -

-If the modifier incomplete appears, then any R in -an open specification may also be an interface or an abstract. -

-

-Semantic conditions: -

- - - -

Resource modules

-

-Syntax: -

-

-resource R = -(R1,...,Rn **)? -(open O1,...,Ok in)? -{J1 ; ... ; Jm ; } -

-

-where -

- - -

- where P is a resource, instance, or concrete, and Q is any identifier -

- - -

-Semantic conditions: -

- - - -

Interface modules

-

-Syntax: -

-

-interface R = -(R1,...,Rn **)? -(open O1,...,Ok in)? -{J1 ; ... ; Jm ; } -

-

-where -

- - -

- where P is a resource, instance, or concrete, and Q is any identifier -

- - -

-Semantic conditions: -

- - - -

Instance modules

-

-Syntax: -

-

-instance R of I = -(R1,...,Rn **)? -(open O1,...,Ok in)? -{J1 ; ... ; Jm ; } -

-

-where -

- - -

- where P is a resource, instance, or concrete, and Q is any identifier -

- - -

-Semantic conditions: -

- - - -

Instantiated concrete syntax modules

-

-Syntax: -

-

-concrete C of A = -(C1,...,Cn **)? -B -with -(I1 =J1), ... -, (Ip =Jp) -(-? [c1,...,cq ])? -(**? -(open O1,...,Ok in)? -{J1 ; ... ; Jm ; })? ; -

-

-where -

- - -

- where R is a resource, instance, or concrete, and Q is any identifier -

- - - - - - diff --git a/doc/gf-modules.txt b/doc/gf-modules.txt deleted file mode 100644 index 1a4067b40..000000000 --- a/doc/gf-modules.txt +++ /dev/null @@ -1,994 +0,0 @@ -The Module System of GF -Aarne Ranta -8/4/2005 - 5/7/2007 - -%!postproc(html): #SUB1 1 -%!postproc(html): #SUBk k -%!postproc(html): #SUBi i -%!postproc(html): #SUBm m -%!postproc(html): #SUBn n -%!postproc(html): #SUBp p -%!postproc(html): #SUBq q - - -% to compile: txt2tags --toc -thtml modulesystem.txt - - -A GF grammar consists of a set of **modules**, which can be -combined in different ways to build different grammars. -There are several different **types of modules**: -- ``abstract`` -- ``concrete`` -- ``resource`` -- ``interface`` -- ``instance`` -- ``incomplete concrete`` - - -We will go through the module types in this order, which is also -their order of "importance" from the most basic to -the more advanced ones. - -This document presupposes knowledge of GF judgements and expressions, which can -be gained from the [GF tutorial tutorial/gf-tutorial2.html]. It aims -to give a systamatic description of the module system; -some tutorial information is repeated to make the document -self-contained. - - - - -=The principal module types= - -==Abstract syntax== - -Any GF grammar that is used in an application -will probably contain at least one module -of the ``abstract`` module type. Here is an example of -such a module, defining a fragment of propositional logic. -``` - abstract Logic = { - cat Prop ; - fun Conj : Prop -> Prop -> Prop ; - fun Disj : Prop -> Prop -> Prop ; - fun Impl : Prop -> Prop -> Prop ; - fun Falsum : Prop ; - } -``` -The **name** of this module is ``Logic``. - - - -An ``abstract`` module defines an **abstract syntax**, which -is a language-independent representation of a fragment of language. -It consists of two kinds of **judgements**: -- ``cat`` judgements telling what **categories** there are - (types of abstract syntax trees) -- ``fun`` judgements telling what **functions** there are - (to build abstract syntax trees) - - -There can also be ``def`` and ``data`` judgements in an -abstract syntax. - - -===Compilation of abstract syntax=== - -The GF grammar compiler expects to find the module ``Logic`` in a file named -``Logic.gf``. When the compiler is run, it produces -another file, named ``Logic.gfc``. This file is in the -format called **canonical GF**, which is the "machine language" -of GF. Next time that the module ``Logic`` is needed in -compiling a grammar, it can be read from the compiled (``gfc``) -file instead of the source (``gf``) file, unless the source -has been changed after the compilation. - - -==Concrete syntax== - -In order for a GF grammar to describe a concrete language, the abstract -syntax must be completed with a **concrete syntax** of it. -For this purpose, we use modules of type ``concrete``: for instance, -``` - concrete LogicEng of Logic = { - lincat Prop = {s : Str} ; - lin Conj a b = {s = a.s ++ "and" ++ b.s} ; - lin Disj a b = {s = a.s ++ "or" ++ b.s} ; - lin Impl a b = {s = "if" ++ a.s ++ "then" ++ b.s} ; - lin Falsum = {s = ["we have a contradiction"]} ; - } -``` -The module ``LogicEng`` is a concrete syntax ``of`` the -abstract syntax ``Logic``. The GF grammar compiler checks that -the concrete is valid with respect to the abstract syntax ``of`` -which it is claimed to be. The validity requires that there has to be -- a ``lincat`` judgement for each ``cat`` judgement, telling what the - **linearization types** of categories are -- a ``lin`` judgement for each ``fun`` judgement, telling what the - **linearization functions** corresponding to functions are - - -Validity also requires that the linearization functions defined by -``lin`` judgements are type-correct with respect to the -linearization types of the arguments and value of the function. - - - -There can also be ``lindef`` and ``printname`` judgements in a -concrete syntax. - - -==Top-level grammar== - -When a ``concrete`` module is successfully compiled, a ``gfc`` -file is produced in the same way as for ``abstract`` modules. The -pair of an ``abstract`` and a corresponding ``concrete`` module -is a **top-level grammar**, which can be used in the GF system to -perform various tasks. The most fundamental tasks are -- **linearization**: take an abstract syntax tree and find the corresponding string -- **parsing**: take a string and find the corresponding abstract syntax - trees (which can be zero, one, or many) - - -In the current grammar, infinitely many trees and strings are recognized, although -no very interesting ones. For example, the tree -``` - Impl (Disj Falsum Falsum) Falsum -``` -has the linearization -``` - if we have a contradiction or we have a contradiction then we have a contradiction -``` -which in turn can be parsed uniquely as that tree. - - -===Compiling top-level grammars=== - -When GF compiles the module ``LogicEng`` it also has to compile -all modules that it **depends** on (in this case, just ``Logic``). -The compilation process starts with dependency analysis to find -all these modules, recursively, starting from the explicitly imported one. -The compiler then reads either ``gf`` or ``gfc`` files, in -a dependency order. The decision on which files to read depends on -time stamps and dependencies in a natural way, so that all and only -those modules that have to be compiled are compiled. (This behaviour can -be changed with flags, see below.) - - -===Using top-level grammars=== - -To use a top-level grammar in the GF system, one uses the ``import`` -command (short name ``i``). For instance, -``` - i LogicEng.gf -``` -It is also possible to specify the imported grammar(s) on the command -line when invoking GF: -``` - gf LogicEng.gf -``` -Various **compilation flags** can be added to both ways of compiling a module: -- ``-src`` forces compilation form source files -- ``-v`` gives more verbose information on compilation -- ``-s`` makes compilation silent (except if it fails with an error message) - - -A complete list of flags can be obtained in GF by ``help i``. - -Importing a grammar makes it visible in GF's **internal state**. To see -what modules are available, use the command ``print_options`` (``po``). -You can empty the state with the command ``empty`` (``e``); this is -needed if you want to read in grammars with a different abstract syntax -than the current one without exiting GF. - - - -Grammar modules can reside in different directories. They can then be found -by means of a **search path**, which is a flag such as -``` - -path=.:api/toplevel:prelude -``` -given to the ``import`` command or the shell command invoking GF. -(It can also be defined in the grammar file; see below.) The compiler -writes every ``gfc`` file in the same directory as the corresponding -``gf`` file. - -The ``path`` is relative to the working directory ``pwd``, so that -all directories listed are primarily interpreted as subdirectories of -``pwd``. Secondarily, they are searched relative to the value of the -environment variable ``GF_LIB_PATH``, which is by default set to -``/usr/local/share/GF``. - -Parsing and linearization can be performed with the ``parse`` -(``p``) and ``linearize`` (``l``) commands, respectively. -For instance, -``` - > l Impl (Disj Falsum Falsum) Falsum - if we have a contradiction or we have a contradiction then we have a contradiction - - > p -cat=Prop "we have a contradiction" - Falsum -``` -Notice that the ``parse`` command needs the parsing category -as a flag. This necessary since a grammar can have several -possible parsing categories ("entry points"). - - - -==Multilingual grammar== - -One ``abstract`` syntax can have several ``concrete`` syntaxes. -Here are two new ones for ``Logic``: -``` - concrete LogicFre of Logic = { - lincat Prop = {s : Str} ; - lin Conj a b = {s = a.s ++ "et" ++ b.s} ; - lin Disj a b = {s = a.s ++ "ou" ++ b.s} ; - lin Impl a b = {s = "si" ++ a.s ++ "alors" ++ b.s} ; - lin Falsum = {s = ["nous avons une contradiction"]} ; - } - - concrete LogicSymb of Logic = { - lincat Prop = {s : Str} ; - lin Conj a b = {s = "(" ++ a.s ++ "&" ++ b.s ++ ")"} ; - lin Disj a b = {s = "(" ++ a.s ++ "v" ++ b.s ++ ")"} ; - lin Impl a b = {s = "(" ++ a.s ++ "->" ++ b.s ++ ")"} ; - lin Falsum = {s = "_|_"} ; - } -``` -The four modules ``Logic``, ``LogicEng``, ``LogicFre``, and -``LogicSymb`` together form a **multilingual grammar**, in which -it is possible to perform parsing and linearization with respect to any -of the concrete syntaxes. As a combination of parsing and linearization, -one can also perform **translation** from one language to another. -(By **language** we mean the set of expressions generated by one -concrete syntax.) - - -===Using multilingual grammars=== - -Any combination of abstract syntax and corresponding concrete syntaxes -is thus a multilingual grammar. With many languages and other enrichments -(as described below), a multilingual grammar easily grows to the size of -tens of modules. The grammar developer, having finished her job, can -package the result in a **multilingual canonical grammar**, a file -with the suffix ``.gfcm``. For instance, to compile the set of grammars -described by now, the following sequence of GF commands can be used: -``` - i LogicEng.gf - i LogicFre.gf - i LogicSymb.gf - pm | wf logic.gfcm -``` -The "end user" of the grammar only needs the file ``logic.gfcm`` to -access all the functionality of the multilingual grammar. It can be -imported in the GF system in the same way as ``.gf`` files. But -it can also be used in the -[Embedded Java Interpreter for GF http://www.cs.chalmers.se/~bringert/gf/gf-java.html] -to build Java programs of which the multilingual grammar functionalities -(linearization, parsing, translation) form a part. - -In a multilingual grammar, the concrete syntax module names work as -names of languages that can be selected for linearization and parsing: -``` - > l -lang=LogicFre Impl Falsum Falsum - si nous avons une contradiction alors nous avons une contradiction - - > l -lang=LogicSymb Impl Falsum Falsum - ( _|_ -> _|_ ) - - > p -cat=Prop -lang=LogicSymb "( _|_ & _|_ )" - Conj Falsum Falsum -``` -The option ``-multi`` gives linearization to all languages: -``` - > l -multi Impl Falsum Falsum - if we have a contradiction then we have a contradiction - si nous avons une contradiction alors nous avons une contradiction - ( _|_ -> _|_ ) -``` -Translation can be obtained by using a **pipe** from a parser -to a linearizer: -``` - > p -cat=Prop -lang=LogicSymb "( _|_ & _|_ )" | l -lang=LogicEng - if we have a contradiction then we have a contradiction -``` - - - -==Resource modules== - -The ``concrete`` modules shown above would look much nicer if -we used the main idea of functional programming: avoid repetitive -code by using **functions** that capture repeated patterns of -expressions. A collection of such functions can be a valuable -**resource** for a programmer, reusable in many different -top-level grammars. Thus we introduce the ``resource`` -module type, with the first example -``` - resource Util = { - oper SS : Type = {s : Str} ; - oper ss : Str -> SS = \s -> {s = s} ; - oper paren : Str -> Str = \s -> "(" ++ s ++ ")" ; - oper infix : Str -> SS -> SS -> SS = \h,x,y -> - ss (x.s ++ h ++ y.s) ; - oper infixp : Str -> SS -> SS -> SS = \h,x,y -> - ss (paren (infix h x y)) ; - } -``` -Modules of ``resource`` type have two forms of judgement: - -- ``oper`` defining auxiliary operations -- ``param`` defining parameter types - - -A ``resource`` can be used in a ``concrete`` (or another -``resource``) by ``open``ing it. This means that -all operations (and parameter types) defined in the resource -module become usable in module that opens it. For instance, -we can rewrite the module ``LogicSymb`` much more concisely: -``` - concrete LogicSymb of Logic = open Util in { - lincat Prop = SS ; - lin Conj = infixp "&" ; - lin Disj = infixp "v" ; - lin Impl = infixp "->" ; - lin Falsum = ss "_|_" ; - } -``` -What happens when this variant of ``LogicSymb`` is -compiled is that the ``oper``-defined constants -of ``Util`` are **inlined** in the -right-hand-sides of the judgements of ``LogicSymb``, -and these expressions are **partially evaluated**, i.e. -computed as far as possible. The generated ``gfc`` file -will look just like the file generated for the first version -of ``LogicSymb`` - at least, it will do the same job. - - -Several ``resource`` modules can be ``open``ed -at the same time. If the modules contain same names, the -conflict can be resolved by **qualified** opening and -reference. For instance, -``` - concrete LogicSymb of Logic = open Util, Prelude in { ... - } ; -``` -(where ``Prelude`` is a standard library of GF) brings -into scope two definitions of the constant ``SS``. -To specify which one is used, you can write -``Util.SS`` or ``Prelude.SS`` instead of just ``SS``. -You can also introduce abbreviations to avoid long qualifiers, e.g. -``` - concrete LogicSymb of Logic = open (U=Util), (P=Prelude) in { ... - } ; -``` -which means that you can write ``U.SS`` and ``P.SS``. - -Judgements of ``param`` and ``oper`` forms may also be used -in ``concrete`` modules, and they are then considered local -to those modules, i.e. they are not exported. - - - -===Compiling resource modules=== - -The compilation of a ``resource`` module differs -from the compilation of ``abstract`` and -``concrete`` modules because ``oper`` operations -do not in general have values in ``gfc``. A ``gfc`` -file //is// generated, but it contains only -``param`` judgements (also recall that ``oper``s -are inlined in their top-level use sites, so it is not -necessary to save them in the compiled grammar). -However, since computing the operations over and over -again can be time comsuming, and since type checking -``resource`` modules also takes time, a third kind -of file is generated for resource modules: a ``.gfr`` -file. This file is written in the GF source code notation, -but it is type checked and type annotated, and ``oper``s -are computed as far as possible. - - - -If you look at any ``gfc`` or ``gfr`` file generated -by the GF compiler, you see that all names have been replaced by -their qualified variants. This is an important first step (after parsing) -the compiler does. As for the commands in the GF shell, some output -qualified names and some not. The difference does not always result -from firm principles. - - -===Using resource modules=== - -The typical use is through ``open`` in a -``concrete`` module, which means that -``resource`` modules are not imported on their own. -However, in the developing and testing phase of grammars, it -can be useful to evaluate ``oper``s with different -arguments. To prevent them from being thrown away after inlining, the -``-retain`` option can be used: -``` - > i -retain Util.gf -``` -The command ``compute_concrete`` (``cc``) -can now be used for evaluating expressions that may contain -operations defined in ``Util``: -``` - > cc ss (paren "foo") - {s = "(" ++ "foo" ++ ")"} -``` -To find out what ``oper``s are available for a given type, -the command ``show_operations`` (``so``) can be used: -``` - > so SS - Util.ss : Str -> SS ; - Util.infix : Str -> SS -> SS -> SS ; - Util.infixp : Str -> SS -> SS -> SS ; -``` - - - - -==Inheritance== - -The most characteristic modularity of GF lies in the division of -grammars into ``abstract``, ``concrete``, and -``resource`` modules. This permits writing multilingual -grammar and sharing the maximum of code between different -languages. - - -In addition to this special kind of modularity, GF provides **inheritance**, -which is familiar from other programming languages (in particular, -object-oriented ones). Inheritance means that a module inherits all -judgements from another module; we also say that it **extends** -the other module. Inheritance is useful to divide big grammars into -smaller units, and also to reuse the same units in different bigger -grammars. - - - -The first example of inheritance is for abstract syntax. Let us -extend the module ``Logic`` to ``Arithmetic``: -``` - abstract Arithmetic = Logic ** { - cat Nat ; - fun Even : Nat -> Prop ; - fun Odd : Nat -> Prop ; - fun Zero : Nat ; - fun Succ : Nat -> Nat ; - } -``` -In parallel with the extension of the abstract syntax -``Logic`` to ``Arithmetic``, we can extend -the concrete syntax ``LogicEng`` to ``ArithmeticEng``: -``` - concrete ArithmeticEng of Arithmetic = LogicEng ** open Util in { - lincat Nat = SS ; - lin Even x = ss (x.s ++ "is" ++ "even") ; - lin Odd x = ss (x.s ++ "is" ++ "odd") ; - lin Zero = ss "zero" ; - lin Succ x = ss ("the" ++ "successor" ++ "of" ++ x.s) ; - } -``` -Another extension of ``Logic`` is ``Geometry``, -``` - abstract Geometry = Logic ** { - cat Point ; - cat Line ; - fun Incident : Point -> Line -> Prop ; - } -``` -The corresponding concrete syntax is left as exercise. - - -===Multiple inheritance=== - - -Inheritance can be **multiple**, which means that a module -may extend many modules at the same time. Suppose, for instance, -that we want to build a module for mathematics covering both -arithmetic and geometry, and the underlying logic. We then write -``` - abstract Mathematics = Arithmetic, Geometry ** { - } ; -``` -We could of course add some new judgements in this module, but -it is not necessary to do so. If no new judgements are added, the -module body can be omitted: -``` - abstract Mathematics = Arithmetic, Geometry ; -``` - -The module ``Mathematics`` shows that it is possibe -to extend a module already built by extension. The correctness -criterion for extensions is that the same name -(``cat``, ``fun``, ``oper``, or ``param``) -may not be defined twice in the resulting union of names. -That the names defined in ``Logic`` are "inherited twice" -by ``Mathematics`` (via both ``Arithmetic`` and -``Geometry``) is no violation of this rule; the usual -problems of multiple inheritance do not arise, since -the definitions of inherited constants cannot be changed. - - - -===Restricted inheritance=== - -Inheritance can be **restricted**, which means that only some of -the constants are inherited. There are two dual notations for this: -``` - A [f,g] -``` -meaning that //only// ``f`` and ``g`` are inherited from ``A``, and -``` - A-[f,g] -``` -meaning that //everything except// ``f`` is ``g`` are inherited from ``A``. - -Constants that are not inherited may be redefined in the inheriting module. - - - - -===Compiling inheritance=== - -Inherited judgements are not copied into the inheriting modules. -Instead, an **indirection** is created for each inherited name, -as can be seen by looking into the generated ``gfc`` (and -``gfr``) files. Thus for instance the names -``` - Mathematics.Prop Arithmetic.Prop Geometry.Prop Logic.Prop -``` -all refer to the same category, declared in the module -``Logic``. - - - -===Inspecting grammar hierarchies=== - -The command ``visualize_graph`` (``vg``) shows the -dependency graph in the current GF shell state. The graph can -also be saved in a file and used e.g. in documentation, by the -command ``print_multi -graph`` (``pm -graph``). - -The ``vg`` command uses the free software packages Graphviz (commad ``dot``) -and Ghostscript (command ``gv``). - - - -==Reuse of top-level grammars as resources== - -Top-level grammars have a straightforward translation to -``resource`` modules. The translation concerns -pairs of abstract-concrete judgements: -``` - cat C ; ===> oper C : Type = T ; - lincat C = T ; - - fun f : A ; ===> oper f : A = t ; - lin f = t ; -``` -Due to this translation, a ``concrete`` module -can be ``open``ed in the same way as a -``resource`` module; the translation is done -on the fly (it is computationally very cheap). - -Modular grammar engineering often means that some grammarians -focus on the semantics of the domain whereas others take care -of linguistic details. Thus a typical reuse opens a -linguistically oriented **resource grammar**, -``` - abstract Resource = { - cat S ; NP ; A ; - fun PredA : NP -> A -> S ; - } - concrete ResourceEng of Resource = { - lincat S = ... ; - lin PredA = ... ; - } -``` -The **application grammar**, instead of giving linearizations -explicitly, just reduces them to categories and functions in the -resource grammar: -``` - concrete ArithmeticEng of Arithmetic = LogicEng ** open ResourceEng in { - lincat Nat = NP ; - lin Even x = PredA x (regA "even") ; - } -``` -If the resource grammar is only capable of generating grammatically -correct expressions, then the grammaticality of the application -grammar is also guaranteed: the type checker of GF is used as -grammar checker. -To guarantee distinctions between categories that have -the same linearization type, the actual translation used -in GF adds to every linearization type and linearization -a **lock field**, -``` - cat C ; ===> oper C : Type = T ** {lock_C : {}} ; - lincat C = T ; - - fun f : C_1 ... C_n -> C ; ===> oper f : C_1 ... C_n -> C = \x_1,...,x_n -> - lin f = t ; t x_1 ... x_n ** {lock_C = <>}; -``` -(Notice that the latter translation is type-correct because of -record subtyping, which means that ``t`` can ignore the -lock fields of its arguments.) An application grammarian who -only uses resource grammar categories and functions never -needs to write these lock fields herself. Having to do so -serves as a warning that the grammaticality guarantee given -by the resource grammar no longer holds. - -**Note**. The lock field mechanism is experimental, and may be changed -to a stronger abstraction mechnism in the future. This may result in -hand-written lock fields ceasing to work. - - -=Additional module types= - -==Interfaces, instances, and incomplete grammars== - -One difference between top-level grammars and ``resource`` -modules is that the former systematically separete the -declarations of categories and functions from their definitions. -In the reuse translation creating and ``oper`` judgement, -the declaration coming from the ``abstract`` module is put -together with the definition coming from the ``concrete`` -module. - - - -However, the separation of declarations and definitions is so -useful a notion that GF also has specific modules types that -``resource`` modules into two parts. In this splitting, -an ``interface`` module corresponds to an abstract syntax, -in giving the declarations of operations (and parameter types). -For instance, a generic markup interface would look as follows: -``` - interface Markup = open Util in { - oper Boldface : Str -> Str ; - oper Heading : Str -> Str ; - oper markupSS : (Str -> Str) -> SS -> SS = \f,r -> - ss (f r.s) ; - } -``` -The definitions of the constants declared in an ``interface`` -are given in an ``instance`` module (which is always ``of`` -an interface, in the same way as a ``concrete`` is always -``of`` an abstract). The following ``instance``s -define markup in HTML and latex. -``` - instance MarkupHTML of Markup = open Util in { - oper Boldface s = "<b>" ++ s ++ "</b>" ; - oper Heading s = "<h2>" ++ s ++ "</h2>" ; - } - - instance MarkupLatex of Markup = open Util in { - oper Boldface s = "\\textbf{" ++ s ++ "}" ; - oper Heading s = "\\section{" ++ s ++ "}" ; - } -``` -Notice that both ``interface``s and ``instance``s may -``open`` ``resource``s (and also reused top-level grammars). -An ``interface`` may moreover define some of the operations it -declares; these definitions are inherited by all instances and cannot -be changed in them. Inheritance by module extension -is possible, as always, between modules of the same type. - - -===Using an interface=== - -An ``interface`` or an ``instance`` -can be ``open``ed in -a ``concrete`` using the same syntax as when opening -a ``resource``. For an ``instance``, the semantics -is the same as when opening the definitions together with -the type signatures - one can think of an ``interface`` -and an ``instance`` of it together forming an ordinary -``resource``. Opening an ``interface``, however, -is different: functions that are only declared without -having a definition cannot be compiled (inlined); neither -can functions whose definitions depend on undefined functions. - - - -A module that ``open``s an ``interface`` is therefore -**incomplete**, and has to be **completed** with an -``instance`` of the interface to become complete. To make -this situation clear, GF requires any module that opens an -``interface`` to be marked as ``incomplete``. Thus -the module -``` - incomplete concrete DocMarkup of Doc = open Markup in { - ... - } -``` -uses the interface ``Markup`` to place markup in -chosen places in its linearization rules, but the -implementation of markup - whether in HTML or in LaTeX - is -left unspecified. This is a powerful way of sharing -the code of a whole module with just differences in -the definitions of some constants. - - - -Another terminology for ``incomplete`` modules is -**parametrized modules** or **functors**. -The ``interface`` gives the list of parameters -that the functor depends on. - - -===Instantiating an interface=== - -To complete an ``incomplete`` module, each ``inteface`` -that it opens has to be provided an ``instance``. The following -syntax is used for this: -``` - concrete DocHTML of Doc = DocMarkup with (Markup = MarkupHTML) ; -``` -Instantiation of ``Markup`` with ``MarkupLatex`` is -another one-liner. - -If more interfaces than one are instantiated, a comma-separated -list of equations in parentheses is used, e.g. -``` - concrete MusicIta = MusicI with - (Syntax = SyntaxIta), (LexMusic = LexMusicIta) ; -``` -This example shows a common design pattern for building applications: -the concrete syntax is a functor on the generic resource grammar library -interface ``Syntax`` and a domain-specific lexicon interface, here -``LexMusic``. - -All interfaces that are ``open``ed in the completed model -must be completed. - -Notice that the completion of an ``incomplete`` module -may at the same time extend modules of the same type (which need -not be completions). It can also add new judgements in a module body, -and restrict inheritance from the functor. -``` - concrete MusicIta = MusicI - [f] with - (Syntax = SyntaxIta), (LexMusic = LexMusicIta) ** { - - lin f = ... - - } ; -``` - - -===Compiling interfaces, instances, and parametrized modules=== - -Interfaces, instances, and parametric modules are purely a -front-end feature of GF: these module types do not exist in -the ``gfc`` and ``gfr`` formats. The compiler has -nevertheless to keep track of their dependencies and modification -times. Here is a summary of how they are compiled: -- an ``interface`` is compiled into a ``resource`` with an empty body -- an ``instance`` is compiled into a ``resource`` in union with its - ``interface`` -- an ``incomplete`` module (``concrete`` or ``resource``) is compiled - into a module of the same type with an empty body -- a completion module (``concrete`` or ``resource``) is compiled - into a module of the same type by compiling its functor so that, instead of - each ``interface``, its given ``instance`` is used - - -This means that some generated code is duplicated, because those operations that -do have complete definitions in an ``interface`` are copied to each of -the ``instances``. - - -=Summary of module syntax and semantics= - - -==Abstract syntax modules== - -Syntax: - -``abstract`` A ``=`` (A#SUB1,...,A#SUBn ``**``)? -``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }`` - - - -where -- i >= 0 -- each //A#SUBi// is itself an abstract module, - possibly with restrictions on inheritance, i.e. //A#SUBi//``-[``//f,..,g//``]`` - or //A#SUBi//``[``//f,..,g//``]`` -- each //J#SUBi// is a judgement of one of the forms - ``cat, fun, def, data`` - - -Semantic conditions: -- all inherited names declared in each //A#SUBi// and //A// must be distinct -- names in restriction lists must be defined in the restricted module -- inherited constants may not depend on names excluded by restriction - - - -==Concrete syntax modules== - -Syntax: - -``incomplete``? ``concrete`` C ``of`` A ``=`` -(C#SUB1,...,C#SUBn ``**``)? -(``open`` O#SUB1,...,O#SUBk ``in``)? -``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }`` - - - -where -- i >= 0 -- //A// is an abstract module -- each //C#SUBi// is a concrete module, - possibly with restrictions on inheritance, i.e. //C#SUBi//``-[``//f,..,g//``]`` -- each //O#SUBi// is an open specification, of one of the forms - - //R// - - ``(``//Q//``=``//R//``)`` - - - where //R// is a resource, instance, or concrete, and //Q// is any identifier -- each //J#SUBi// is a judgement of one of the forms - ``lincat, lin, lindef, printname``; also the forms ``oper, param`` are - allowed, but they cannot be inherited. - - - -If the modifier ``incomplete`` appears, then any //R// in -an open specification may also be an interface or an abstract. - - -Semantic conditions: -- each ``cat`` judgement in //A// - must have a corresponding, unique - ``lincat`` judgement in //C// -- each ``fun`` judgement in //A// - must have a corresponding, unique - ``lin`` judgement in //C// -- names in restriction lists must be defined in the restricted module -- inherited constants may not depend on names excluded by restriction - - - -==Resource modules== - -Syntax: - -``resource`` R ``=`` -(R#SUB1,...,R#SUBn ``**``)? -(``open`` O#SUB1,...,O#SUBk ``in``)? -``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }`` - - -where -- i >= 0 -- each //R#SUBi// is a resource, instance, or concrete module, - possibly with restrictions on inheritance, i.e. //R#SUBi//``-[``//f,..,g//``]`` -- each //O#SUBi// is an open specification, of one of the forms - - //P// - - ``(``//Q//``=``//R//``)`` - - - where //P// is a resource, instance, or concrete, and //Q// is any identifier -- each //J#SUBi// is a judgement of one of the forms ``oper, param`` - - - - -Semantic conditions: -- all names defined in each //R#SUBi// and //R// must be distinct -- all constants declared must have a definition -- names in restriction lists must be defined in the restricted module -- inherited constants may not depend on names excluded by restriction - - - -==Interface modules== - -Syntax: - -``interface`` R ``=`` -(R#SUB1,...,R#SUBn ``**``)? -(``open`` O#SUB1,...,O#SUBk ``in``)? -``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }`` - - -where -- i >= 0 -- each //R#SUBi// is an interface or abstract module, - possibly with restrictions on inheritance, i.e. //R#SUBi//``-[``//f,..,g//``]`` -- each //O#SUBi// is an open specification, of one of the forms - - //P// - - ``(``//Q//``=``//R//``)`` - - - where //P// is a resource, instance, or concrete, and //Q// is any identifier -- each //J#SUBi// is a judgement of one of the forms ``oper, param`` - - - - -Semantic conditions: -- all names declared in each //R#SUBi// and //R// must be distinct -- names in restriction lists must be defined in the restricted module -- inherited constants may not depend on names excluded by restriction - - - - -==Instance modules== - -Syntax: - -``instance`` R ``of`` I ``=`` -(R#SUB1,...,R#SUBn ``**``)? -(``open`` O#SUB1,...,O#SUBk ``in``)? -``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }`` - - -where -- i >= 0 -- //I// is an interface module -- each //R#SUBi// is an instance, resource, or concrete module, - possibly with restrictions on inheritance, i.e. //R#SUBi//``-[``//f,..,g//``]`` - -- each //O#SUBi// is an open specification, of one of the forms - - //P// - - ``(``//Q//``=``//R//``)`` - - - where //P// is a resource, instance, or concrete, and //Q// is any identifier -- each //J#SUBi// is a judgement of one of the forms - ``oper, param`` - - - - -Semantic conditions: -- all names declared in each //R#SUBi//, //I//, and //R// must be distinct -- all constants declared in //I// must have a definition either in - //I// or //R// -- names in restriction lists must be defined in the restricted module -- inherited constants may not depend on names excluded by restriction - - - -==Instantiated concrete syntax modules== - -Syntax: - -``concrete`` C ``of`` A ``=`` -(C#SUB1,...,C#SUBn ``**``)? -B -``with`` -``(``I#SUB1 ``=``J#SUB1``),`` ... -``, (``I#SUBp ``=``J#SUBp``)`` -(``-``? ``[``c#SUB1,...,c#SUBq ``]``)? -(``**``? -(``open`` O#SUB1,...,O#SUBk ``in``)? -``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }``)? ``;`` - -where -- i >= 0 -- //A// is an abstract module -- each //C#SUBi// is a concrete module, - possibly with restrictions on inheritance, i.e. //R#SUBi//``-[``//f,..,g//``]`` -- //B// is an incomplete concrete syntax of //A// -- each //I#SUBi// is an interface or an abstract -- each //J#SUBi// is an instance or a concrete of //I#SUBi// -- each //O#SUBi// is an open specification, of one of the forms - - //R// - - ``(``//Q//``=``//R//``)`` - - - where //R// is a resource, instance, or concrete, and //Q// is any identifier -- each //J#SUBi// is a judgement of one of the forms - ``lincat, lin, lindef, printname``; also the forms ``oper, param`` are - allowed, but they cannot be inherited. - - - - diff --git a/doc/overview-resource.txt b/doc/overview-resource.txt deleted file mode 100644 index 2f9b2cd04..000000000 --- a/doc/overview-resource.txt +++ /dev/null @@ -1,300 +0,0 @@ -==Texts. phrases, and utterances== - -The outermost linguistic structure is ``Text``. ``Text``s are composed -from Phrases (``Phr``) followed by punctuation marks - either of ".", "?" or -"!" (with their proper variants in Spanish and Arabic). Here is an -example of a ``Text`` string. -``` - John walks. Why? He doesn't want to sleep! -``` -Phrases are mostly built from Utterances (``Utt``), which in turn are -declarative sentences, questions, or imperatives - but there -are also "one-word utterances" consisting of noun phrases -or other subsentential phrases. Some Phrases are atomic, -for instance "yes" and "no". Here are some examples of Phrases. -``` - yes - come on, John - but John walks - give me the stick please - don't you know that he is sleeping - a glass of wine - a glass of wine please -``` -There is no connection between the punctuation marks and the -types of utterances. This reflects the fact that the punctuation -mark in a real text is selected as a function of the speech act -rather than the grammatical form of an utterance. The following -text is thus well-formed. -``` - John walks. John walks? John walks! -``` -What is the difference between Phrase and Utterance? Just technical: -a Phrase is an Utterance with an optional leading conjunction ("but") -and an optional tailing vocative ("John", "please"). - - -==Sentences and clauses== - -TODO: use overloaded operations in the examples. - -The richest of the categories below Utterance is ``S``, Sentence. A Sentence -is formed from a Clause (``Cl``), by fixing its Tense, Anteriority, and Polarity. -For example, each of the following strings has a distinct syntax tree -in the category Sentence: -``` - John walks - John doesn't walk - John walked - John didn't walk - John has walked - John hasn't walked - John will walk - John won't walk - ... -``` -whereas in the category Clause all of them are just different forms of -the same tree. -The difference between Sentence and Clause is thus also rather technical. -It may not correspond exactly to any standard usage of the terms -"clause" and "sentence". - -Figure 1 shows a type-annotated syntax tree of the Text "John walks." -and gives an overview of the structural levels. - -#BFIG - -``` -Node Constructor Value type Other constructors ------------------------------------------------------------ - 1. TFullStop Text TQuestMark - 2. (PhrUtt Phr - 3. NoPConj PConj but_PConj - 4. (UttS Utt UttQS - 5. (UseCl S UseQCl - 6. TPres Tense TPast - 7. ASimul Anter AAnter - 8. PPos Pol PNeg - 9. (PredVP Cl -10. (UsePN NP UsePron, DetCN -11. john_PN) PN mary_PN -12. (UseV VP ComplV2, ComplV3 -13. walk_V)))) V sleep_V -14. NoVoc) Voc please_Voc -15. TEmpty Text -``` - -#BCENTER -Figure 1. Type-annotated syntax tree of the Text "John walks." -#ECENTER - -#EFIG - -Here are some examples of the results of changing constructors. -``` - 1. TFullStop -> TQuestMark John walks? - 3. NoPConj -> but_PConj But John walks. - 6. TPres -> TPast John walked. - 7. ASimul -> AAnter John has walked. - 8. PPos -> PNeg John doesn't walk. -11. john_PN -> mary_PN Mary walks. -13. walk_V -> sleep_V John sleeps. -14. NoVoc -> please_Voc John sleeps please. -``` -All constructors cannot of course be changed so freely, because the -resulting tree would not remain well-typed. Here are some changes involving -many constructors: -``` - 4- 5. UttS (UseCl ...) -> - UttQS (UseQCl (... QuestCl ...)) Does John walk? -10-11. UsePN john_PN -> - UsePron we_Pron We walk. -12-13. UseV walk_V -> - ComplV2 love_V2 this_NP John loves this. -``` - - -==Parts of sentences== - -The linguistic phenomena mostly discussed in both traditional grammars and modern -syntax belong to the level of Clauses, that is, lines 9-13, and occasionally -to Sentences, lines 5-13. At this level, the major categories are -``NP`` (Noun Phrase) and ``VP`` (Verb Phrase). A Clause typically -consists of just an ``NP`` and a ``VP``. -The internal structure of both ``NP`` and ``VP`` can be very complex, -and these categories are mutually recursive: not only can a ``VP`` -contain an ``NP``, -``` - [VP loves [NP Mary]] -``` -but also an ``NP`` can contain a ``VP`` -``` - [NP every man [RS who [VP walks]]] -``` -(a labelled bracketing like this is of course just a rough approximation of -a GF syntax tree, but still a useful device of exposition). - -Most of the resource modules thus define functions that are used inside -NPs and VPs. Here is a brief overview: - -**Noun**. How to construct NPs. The main three mechanisms -for constructing NPs are -- from proper names: "John" -- from pronouns: "we" -- from common nouns by determiners: "this man" - - -The ``Noun`` module also defines the construction of common nouns. -The most frequent ways are -- lexical noun items: "man" -- adjectival modification: "old man" -- relative clause modification: "man who sleeps" -- application of relational nouns: "successor of the number" - - -**Verb**. -How to construct VPs. The main mechanism is verbs with their arguments, -for instance, -- one-place verbs: "walks" -- two-place verbs: "loves Mary" -- three-place verbs: "gives her a kiss" -- sentence-complement verbs: "says that it is cold" -- VP-complement verbs: "wants to give her a kiss" - - -A special verb is the copula, "be" in English but not even realized -by a verb in all languages. -A copula can take different kinds of complement: -- an adjectival phrase: "(John is) old" -- an adverb: "(John is) here" -- a noun phrase: "(John is) a man" - - -**Adjective**. -How to constuct ``AP``s. The main ways are -- positive forms of adjectives: "old" -- comparative forms with object of comparison: "older than John" - - -**Adverb**. -How to construct ``Adv``s. The main ways are -- from adjectives: "slowly" -- as prepositional phrases: "in the car" - - -==Modules and their names== - -This section is not necessary for users of the library. - -TODO: explain the overloaded API. - -The resource modules are named after the kind of -phrases that are constructed in them, -and they can be roughly classified by the "level" or "size" of expressions that are -formed in them: -- Larger than sentence: ``Text``, ``Phrase`` -- Same level as sentence: ``Sentence``, ``Question``, ``Relative`` -- Parts of sentence: ``Adjective``, ``Adverb``, ``Noun``, ``Verb`` -- Cross-cut (coordination): ``Conjunction`` - - -Because of mutual recursion such as in embedded sentences, this classification is -not a complete order. However, no mutual dependence is needed between the -modules themselves - they can all be compiled separately. This is due -to the module ``Cat``, which defines the type system common to the other modules. -For instance, the types ``NP`` and ``VP`` are defined in ``Cat``, -and the module ``Verb`` only -needs to know what is given in ``Cat``, not what is given in ``Noun``. To implement -a rule such as -``` - Verb.ComplV2 : V2 -> NP -> VP -``` -it is enough to know the linearization type of ``NP`` -(as well as those of ``V2`` and ``VP``, all -given in ``Cat``). It is not necessary to know what -ways there are to build ``NP``s (given in ``Noun``), since all these ways must -conform to the linearization type defined in ``Cat``. Thus the format of -category-specific modules is as follows: -``` - abstract Adjective = Cat ** {...} - abstract Noun = Cat ** {...} - abstract Verb = Cat ** {...} -``` - - -==Top-level grammar and lexicon== - -The module ``Grammar`` collects all the category-specific modules into -a complete grammar: -``` - abstract Grammar = - Adjective, Noun, Verb, ..., Structural, Idiom -``` -The module ``Structural`` is a lexicon of structural words (function words), -such as determiners. - -The module ``Idiom`` is a collection of idiomatic structures whose -implementation is very language-dependent. An example is existential -structures ("there is", "es gibt", "il y a", etc). - -The module ``Lang`` combines ``Grammar`` with a ``Lexicon`` of -ca. 350 content words: -``` - abstract Lang = Grammar, Lexicon -``` -Using ``Lang`` instead of ``Grammar`` as a library may give -for free some words needed in an application. But its main purpose is to -help testing the resource library, rather than as a resource itself. -It does not even seem realistic to develop -a general-purpose multilingual resource lexicon. - -The diagram in Figure 2 shows the structure of the API. - -#BFIG - -#GRAMMAR - -#BCENTER -Figure 2. The resource syntax API. -#ECENTER - -#EFIG - -==Language-specific syntactic structures== - -The API collected in ``Grammar`` has been designed to be implementable for -all languages in the resource package. It does contain some rules that -are strange or superfluous in some languages; for instance, the distinction -between definite and indefinite articles does not apply to Finnish and Russian. -But such rules are still easy to implement: they only create some superfluous -ambiguity in the languages in question. - -But the library makes no claim that all languages should have exactly the same -abstract syntax. The common API is therefore extended by language-dependent -rules. The top level of each languages looks as follows (with English as example): -``` - abstract English = Grammar, ExtraEngAbs, DictEngAbs -``` -where ``ExtraEngAbs`` is a collection of syntactic structures specific to English, -and ``DictEngAbs`` is an English dictionary -(at the moment, it consists of ``IrregEngAbs``, -the irregular verbs of English). Each of these language-specific grammars has -the potential to grow into a full-scale grammar of the language. These grammars -can also be used as libraries, but the possibility of using functors is lost. - -To give a better overview of language-specific structures, -modules like ``ExtraEngAbs`` -are built from a language-independent module ``ExtraAbs`` -by restricted inheritance: -``` - abstract ExtraEngAbs = Extra [f,g,...] -``` -Thus any category and function in ``Extra`` may be shared by a subset of all -languages. One can see this set-up as a matrix, which tells -what ``Extra`` structures -are implemented in what languages. For the common API in ``Grammar``, the matrix -is filled with 1's (everything is implemented in every language). - -Language-specific extensions and the use of restricted -inheritance is a recent addition to the resource grammar library, and -has only been exploited in a very small scale so far.