diff --git a/doc/gf2-highlights.html b/doc/gf2-highlights.html deleted file mode 100644 index 3d8a150a9..000000000 --- a/doc/gf2-highlights.html +++ /dev/null @@ -1,490 +0,0 @@ - - -
- -- -13/10/2003 - 25/11 - 2/4/2004 - 18/6 - 13/10 - 16/2/2005 - -
- -Aarne Ranta - -
- abstract Sums = {
- cat
- Exp ;
- fun
- One : Exp ;
- plus : Exp -> Exp -> Exp ;
- }
-
- concrete EnglishSums of Sums = open ResEng in {
- lincat
- Exp = {s : Str ; n : Number} ;
- lin
- One = expSg "one" ;
- sum x y = expSg ("the" ++ "sum" ++ "of" ++ x.s ++ "and" ++ y.s) ;
- }
-
- resource ResEng = {
- param
- Number = Sg | Pl ;
- oper
- expSG : Str -> {s : Str ; n : Number} = \s -> {s = s ; n = Sg} ;
- }
-
-
-
-
-
-
-
- abstract Products = Sums ** {
- fun times : Exp -> Exp -> Exp ;
- }
- -- names exported: Exp, plus, times
-
- concrete English of Products = EnglishSums ** open ResEng in {
- lin times x y = expSg ("the" ++ "product" ++ "of" ++ x.s ++ "and" ++ y.s) ;
- }
-
-
-- -Opening, but not extension, can be qualified: -
- concrete NumberSystems of Systems = open (Bin = Binary), (Dec = Decimal) in {
- lin
- BZero = Bin.Zero ;
- DZero = Dec.Zero
- }
-
-
-- -Version 2.1 introduces multiple inheritance: a module -can extend several modules at the same time, for instance, -
- abstract Dialogue = User, System ** { ...}
-
-may be used to put together "User's moves" and "System's moves" into
-one Dialogue System grammar.
-
-
-
-
-
-- -The module header is the beginning of the module code up to the -first left bracket ({). The header gives -
- -filename = modulename . extension - -
- -File name extensions: -
- -What the make facility does when compiling Foo.gf -
- -If the compilation of a grammar fails at some module, the state of the -GF shell contains all modules read up to that point. This makes it -faster to compile the faulty module again after fixing it. - -
- -Use the command po = print_options to see what -modules are in the state. - -
- -To force compilation: -
- -The sometimes exploding size of generated gfc and -gfr files has made it urgent to find optimizations -that reduce the size of the code. There are five -combinations optimizations that can be chosen, as the value of the -optimize flag: -
- -An optimization can be selected individually for each -resource and concrete module by including -the judgement -
- flags optimize=(share|parametrize|values|all|none) ; --in the module body. These flags can be overridden by a flag given -in the i command, e.g. -
- i -src -optimize=none Foo.gf --Notice that the option -src is needed if there already are -generated files created with other optimization flags. - - - - - -
- -path=.:../resource/russian:../prelude --enables files to be found in three different directories. -By default, only the current directory is included. -If a path flag is given, the current directory -. must be explicitly included if it is wanted. - -
- -The path flag can be set in any of the following -places: -
- -Very old GF grammars (from versions before 0.9), with the completely -different notation, do not work. They should be first converted to -GF1 by using GF version 1.2. - -
- -The import command i can be given the option -old. E.g. -
- i -old tut1.Eng.g2 --But this is no more necessary: GF2 detects automatically if a grammar -is in the GF1 format. - -
- -Importing a set of GF2 files generates, internally, three modules: -
- abstract tut1 = ... - resource ResEng = ... - concrete Eng of tut1 = open ResEng in ... --(The names are different if the file name has fewer parts.) - - -
- -The option -o causes GF2 to write these modules into files. - -
- -The flags -abs, -cnc, and -res can be used -to give custom names to the modules. In particular, it is good to use -the -abs flag to guarantee that the abstract syntax module -has the same name for all grammars in a multilingual environmens: -
- i -old -abs=Numerals hungarian.gf - i -old -abs=Numerals tamil.gf - i -old -abs=Numerals sanskrit.gf -- -
- -The same flags as in the import command can be used when invoking -GF2 from the system shell. Many grammars can be imported on the same command -line, e.g. -
- % gf2 -old -abs=Tutorial tut1.Eng.gf tut1.Fin.gf tut1.Fra.gf -- -
- -To write a GF2 grammar back to GF1 (as one big file), use the command -
- > pg -old -- - -
- - -GF2 has more reserved words than GF 1.2. When old files are read, a preprocessor -replaces every identifier that has the shape of a new reserved word -with a variant where the last letter is replaced by Z, e.g. -instance is replaced by instancZ. This method is of course -unsafe and should be replaced by something better. - - - - -
- -Resource libraries -and some example grammars have been -converted. Most old example grammars work without any changes. -However, there is a new resource API with -many new constructions, and which is recommended. - -
- -Soundness checking of module depencencies and completeness is not -complete. This means that some errors may show up too late. - -
- -Latex and XML printing of grammars do not work yet. - - - - diff --git a/doc/gf2.2-highlights.html b/doc/gf2.2-highlights.html deleted file mode 100644 index 58ccd5256..000000000 --- a/doc/gf2.2-highlights.html +++ /dev/null @@ -1,173 +0,0 @@ - - -
- -- -9/5/2005 - -
- -Aarne Ranta - -
- -An optimization can be selected individually for each -resource and concrete module by including -the judgement -
- flags optimize=(share|parametrize|values|all|none) ; --in the module body. These flags can be overridden by a flag given -in the i command, e.g. -
- i -src -optimize=none Foo.gf --Notice that the option -src is needed if there already are -generated files created with other optimization flags. - -
- -Important notice: If you use the - -Embedded GF Interpreter, -or the improved parsing algorithms described below, -only the values none, -share and values can be used; the stronger optimizations are not -supported yet. -Also note that currently, GF aborts and reports an error if the stronger optimizations are used -when creating the grammar for the Embedded GF Interpreter, or when trying to parse. - - -
- -Note that the -cfg and -mcfg parsers can take a very long time on their first call, since -they have to convert the GF grammar. This will only happen once in a GF run, provided the GF files are not changed. - -
- -Tips for choosing the best parser for your grammar. Try with the default parser; if it is too slow, try the other two. -Remember that the first time you parse they will be very slow, since they have to build parsing information. -the -cfg parser is best on grammars with many parameters and inflection tables, and -The -mcfg parser is even better when the grammar also has discontinuous constituents. - -
- -Here is a small example from the resource library: -
-> i -src -optimize=share lib/resource/english/LangEng.gf
-> p -cat=S ""
-> p -cat=S -cfg ""
-> p -cat=S -mcfg ""
-{Comment: Just some dummy parsing calls to calculate the parsing information}
-
-> p -cat=S -rawtrees=200000 "you will be running"
-{Comment: Nr of unfiltered trees: 169296 -- 99,996% av the trees are ill-typed}
-
-UseCl (PosTP TFuture ASimul) (SPredProgVP thou_NP (IPredV AAnter run_V))
-UseCl (PosTP TFuture ASimul) (SPredProgVP thou_NP (IPredV ASimul run_V))
-UseCl (PosTP TFuture ASimul) (SPredProgVP ye_NP (IPredV AAnter run_V))
-UseCl (PosTP TFuture ASimul) (SPredProgVP ye_NP (IPredV ASimul run_V))
-UseCl (PosTP TFuture ASimul) (SPredProgVP you_NP (IPredV AAnter run_V))
-UseCl (PosTP TFuture ASimul) (SPredProgVP you_NP (IPredV ASimul run_V))
-
-17730 msec
-
-> p -cat=S -cfg "you will be running"
-{Comment: Nr of unfiltered trees: 246 -- 97,5% of the trees are ill-typed}
-
-UseCl (PosTP TFuture ASimul) (SPredProgVP thou_NP (IPredV AAnter run_V))
-UseCl (PosTP TFuture ASimul) (SPredProgVP thou_NP (IPredV ASimul run_V))
-UseCl (PosTP TFuture ASimul) (SPredProgVP ye_NP (IPredV AAnter run_V))
-UseCl (PosTP TFuture ASimul) (SPredProgVP ye_NP (IPredV ASimul run_V))
-UseCl (PosTP TFuture ASimul) (SPredProgVP you_NP (IPredV AAnter run_V))
-UseCl (PosTP TFuture ASimul) (SPredProgVP you_NP (IPredV ASimul run_V))
-
-1580 msec
-
-> p -cat=S -mcfg "you will be running"
-{Comment: Nr of unfiltered trees: 6 -- all trees are type-corrent}
-
-UseCl (PosTP TFuture ASimul) (SPredProgVP thou_NP (IPredV AAnter run_V))
-UseCl (PosTP TFuture ASimul) (SPredProgVP thou_NP (IPredV ASimul run_V))
-UseCl (PosTP TFuture ASimul) (SPredProgVP ye_NP (IPredV AAnter run_V))
-UseCl (PosTP TFuture ASimul) (SPredProgVP ye_NP (IPredV ASimul run_V))
-UseCl (PosTP TFuture ASimul) (SPredProgVP you_NP (IPredV AAnter run_V))
-UseCl (PosTP TFuture ASimul) (SPredProgVP you_NP (IPredV ASimul run_V))
-
-470 msec
-
-
-
-
diff --git a/doc/gfcc.pdf b/doc/gfcc.pdf
deleted file mode 100644
index 9d7b2193f..000000000
Binary files a/doc/gfcc.pdf and /dev/null differ
diff --git a/doc/grammars-and-types.txt b/doc/grammars-and-types.txt
deleted file mode 100644
index 27667589d..000000000
--- a/doc/grammars-and-types.txt
+++ /dev/null
@@ -1,56 +0,0 @@
-Grammars and Types
-
-==Historical introduction==
-
-Stoics ?
-
-Port-Royal ?
-
-Lyons
-
-Frege
-
-Ajdukiewicz
-
-Bar-Hillel
-
-Lambek
-
-Curry
-
-Montague
-
-PATR, HPSG
-
-LFG
-
-GF
-
-ACG, HOG
-
-
-==Syntactic and semantic grammars==
-
-in GF
-
-==Cross-linguistic types==
-
-generalizations over type systems, parametrized modules
-
-
-==Grammatical concepts formalized==
-
-POS, category
-
-inherent and parametric features
-
-agreement
-
-rection
-
-endocentric and exocentric concepts
-
-(see Lyons and Jespersen for more)
-
-a core syntax (latin.gf)
-
diff --git a/doc/intro-resource.txt b/doc/intro-resource.txt
deleted file mode 100644
index c4c292fca..000000000
--- a/doc/intro-resource.txt
+++ /dev/null
@@ -1,511 +0,0 @@
-
-
-==Coverage==
-
-The GF Resource Grammar Library contains grammar rules for
-10 languages (in addition, 2 languages are available as incomplete
-implementations, and a few more are under construction). Its purpose
-is to make these rules available for application programmers,
-who can thereby concentrate on the semantic and stylistic
-aspects of their grammars, without having to think about
-grammaticality. The targeted level of application grammarians
-is that of a skilled programmer with
-a practical knowledge of the target languages, but without
-theoretical knowledge about their grammars.
-Such a combination of
-skills is typical of programmers who, for instance, want to localize
-software to new languages.
-
-The current resource languages are
-- ``Ara``bic (incomplete)
-- ``Cat``alan (incomplete)
-- ``Dan``ish
-- ``Eng``lish
-- ``Fin``nish
-- ``Fre``nch
-- ``Ger``man
-- ``Ita``lian
-- ``Nor``wegian
-- ``Rus``sian
-- ``Spa``nish
-- ``Swe``dish
-
-
-The first three letters (``Eng`` etc) are used in grammar module names.
-The incomplete Arabic and Catalan implementations are
-enough to be used in many applications; they both contain, amoung other
-things, complete inflectional morphology.
-
-
-
-==A first example==
-
-To give an example application, consider a system for steering
-music playing devices by voice commands. In the application,
-we may have a semantical category ``Kind``, examples
-of ``Kind``s being ``Song`` and ``Artist``. In German, for instance, ``Song``
-is linearized into the noun "Lied", but knowing this is not
-enough to make the application work, because the noun must be
-produced in both singular and plural, and in four different
-cases. By using the resource grammar library, it is enough to
-write
-```
- lin Song = mkN "Lied" "Lieder" neuter
-```
-and the eight forms are correctly generated. The resource grammar
-library contains a complete set of inflectional paradigms (such as
-``mkN`` here), enabling the definition of any lexical items.
-
-The resource grammar library is not only about inflectional paradigms - it
-also has syntax rules. The music player application
-might also want to modify songs with properties, such as "American",
-"old", "good". The German grammar for adjectival modifications is
-particularly complex, because adjectives have to agree in gender,
-number, and case, and also depend on what determiner is used
-("ein amerikanisches Lied" vs. "das amerikanische Lied"). All this
-variation is taken care of by the resource grammar function
-```
- mkCN : AP -> CN -> CN
-```
-(see the table in the end of this document for the list of all resource grammar
-functions). The resource grammar implementation of the rule adding properties
-to kinds is
-```
- lin PropKind kind prop = mkCN prop kind
-```
-given that
-```
- lincat Prop = AP
- lincat Kind = CN
-```
-The resource library API is devided into language-specific
-and language-independent parts. To put it roughly,
-- the lexicon API is language-specific
-- the syntax API is language-independent
-
-
-Thus, to render the above example in French instead of German, we need to
-pick a different linearization of ``Song``,
-```
- lin Song = mkN "chanson" feminine
-```
-But to linearize ``PropKind``, we can use the very same rule as in German.
-The resource function ``mkCN`` has different implementations in the two
-languages (e.g. a different word order in French),
-but the application programmer need not care about the difference.
-
-
-
-==Note on APIs==
-
-From version 1.1 onwards, the resource library is available via two
-APIs:
-- original ``fun`` and ``oper`` definitions
-- overloaded ``oper`` definitions
-
-
-Introducing overloading in GF version 2.7 has been a success in improving
-the accessibility of libraries. It has also created a layer of abstraction
-between the writers and users of libraries, and thereby makes the library
-easier to modify. We shall therefore use the overloaded API
-in this document. The original function names are mainly interesting
-for those who want to write or modify libraries.
-
-
-
-==A complete example==
-
-To summarize the example, and also give a template for a programmer to work on,
-here is the complete implementation of a small system with songs and properties.
-The abstract syntax defines a "domain ontology":
-```
- abstract Music = {
-
- cat
- Kind,
- Property ;
- fun
- PropKind : Kind -> Property -> Kind ;
- Song : Kind ;
- American : Property ;
- }
-```
-The concrete syntax is defined by a functor (parametrized module),
-independently of language, by opening
-two interfaces: the resource ``Syntax`` and an application lexicon.
-```
- incomplete concrete MusicI of Music =
- open Syntax, MusicLex in {
- lincat
- Kind = CN ;
- Property = AP ;
- lin
- PropKind k p = mkCN p k ;
- Song = mkCN song_N ;
- American = mkAP american_A ;
- }
-```
-The application lexicon ``MusicLex`` is an interface
-opening the resource category system ``Cat``.
-```
- interface MusicLex = Cat ** {
- oper
- song_N : N ;
- american_A : A ;
- }
-```
-It could also be an abstract syntax that extends ``Cat``, but
-this would limit the kind of constructions that are possible in
-the interface
-
-Each language has its own concrete syntax, which opens the
-inflectional paradigms module for that language:
-```
- interface MusicLexGer of MusicLex =
- CatGer ** open ParadigmsGer in {
- oper
- song_N = mkN "Lied" "Lieder" neuter ;
- american_A = mkA "amerikanisch" ;
- }
-
- interface MusicLexFre of MusicLex =
- CatFre ** open ParadigmsFre in {
- oper
- song_N = mkN "chanson" feminine ;
- american_A = mkA "américain" ;
- }
-```
-The top-level ``Music`` grammars are obtained by
-instantiating the two interfaces of ``MusicI``:
-```
- concrete MusicGer of Music = MusicI with
- (Syntax = SyntaxGer),
- (MusicLex = MusicLexGer) ;
-
- concrete MusicFre of Music = MusicI with
- (Syntax = SyntaxFre),
- (MusicLex = MusicLexFre) ;
-```
-Both of these files can use the same ``path``, defined as
-```
- --# -path=.:present:prelude
-```
-The ``present`` category contains the compiled resources, restricted to
-present tense; ``alltenses`` has the full resources.
-
-To localize the music player system to a new language,
-all that is needed is two modules,
-one implementing ``MusicLex`` and the other
-instantiating ``Music``. The latter is
-completely trivial, whereas the former one involves the choice of correct
-vocabulary and inflectional paradigms. For instance, Finnish is added as follows:
-```
- instance MusicLexFin of MusicLex =
- CatFin ** open ParadigmsFin in {
- oper
- song_N = mkN "kappale" ;
- american_A = mkA "amerikkalainen" ;
- }
-
- concrete MusicFin of Music = MusicI with
- (Syntax = SyntaxFin),
- (MusicLex = MusicLexFin) ;
-```
-More work is of course needed if the language-independent linearizations in
-MusicI are not satisfactory for some language. The resource grammar guarantees
-that the linearizations are possible in all languages, in the sense of grammatical,
-but they might of course be inadequate for stylistic reasons. Assume,
-for the sake of argument, that adjectival modification does not sound good in
-English, but that a relative clause would be preferrable. One can then use
-restricted inheritance of the functor:
-```
- concrete MusicEng of Music =
- MusicI - [PropKind]
- with
- (Syntax = SyntaxEng),
- (MusicLex = MusicLexEng) **
- open SyntaxEng in {
- lin
- PropKind k p = mkCN k (mkRS (mkRCl which_RP (mkVP p))) ;
- }
-```
-The lexicon is as expected:
-```
- instance MusicLexEng of MusicLex =
- CatEng ** open ParadigmsEng in {
- oper
- song_N = mkN "song" ;
- american_A = mkA "American" ;
- }
-```
-
-
-==Lock fields==
-
-//This section is only relevant as a guide to error messages that have to do with lock fields, and can be skipped otherwise.//
-
-FIXME: this section may become obsolete.
-
-When the categories of the resource grammar are used
-in applications, a **lock field** is added to their linearization types.
-The lock field for a category ``C`` is a record field
-```
- lock_C : {}
-```
-with the only possible value
-```
- lock_C = <>
-```
-The lock field carries no information, but its presence
-makes the linearization type of ``C``
-unique, so that categories
-with the same implementation are not confused with each other.
-(This is inspired by the ``newtype`` discipline in Haskell.)
-
-For example, the lincats of adverbs and conjunctions are the same
-in ``CatEng`` (and therefore in ``GrammarEng``, which inherits it):
-```
- lincat Adv = {s : Str} ;
- lincat Conj = {s : Str} ;
-```
-But when these category symbols are used to denote their linearization
-types in an application, these definitions are translated to
-```
- oper Adv : Type = {s : Str ; lock_Adv : {}} ;
- oper Conj : Type = {s : Str} ; lock_Conj : {}} ;
-```
-In this way, the user of a resource grammar cannot confuse adverbs with
-conjunctions. In other words, the lock fields force the type checker
-to function as grammaticality checker.
-
-When the resource grammar is ``open``ed in an application grammar,
-and only functions from the resource are used in type-correct way, the
-lock fields are never seen (except possibly in type error messages).
-If an application grammarian has to write lock fields herself,
-it is a sign that the guarantees given by the resource grammar
-no longer hold. But since the resource may be incomplete, the
-application grammarian may occasionally have to provide the dummy
-values of lock fields (always ``<>``, the empty record).
-Here is an example:
-```
- mkUtt : Str -> Utt ;
- mkUtt s = {s = s ; lock_Utt = <>} ;
-```
-Currently, missing lock field produce warnings rather than errors,
-but this behaviour of GF may change in future.
-
-
-==Parsing with resource grammars?==
-
-The intended use of the resource grammar is as a library for writing
-application grammars. It is not designed for parsing e.g. newspaper text. There
-are several reasons why this is not practical:
-- Efficiency: the resource grammar uses complex data structures, in
-particular, discontinuous constituents, which make parsing slow and the
-parser size huge.
-- Completeness: the resource grammar does not necessarily cover all rules
-of the language - only enough many to be able to express everything
-in one way or another.
-- Lexicon: the resource grammar has a very small lexicon, only meant for test
-purposes.
-- Semantics: the resource grammar has very little semantic control, and may
-accept strange input or deliver strange interpretations.
-- Ambiguity: parsing in the resource grammar may return lots of results many
-of which are implausible.
-
-
-All of these problems should be solved in application grammars.
-The task of resource grammars is just to take care of low-level linguistic
-details such as inflection, agreement, and word order.
-
-It is for the same reasons that resource grammars are not adequate for translation.
-That the syntax API is implemented for different languages of course makes
-it possible to translate via it - but there is no guarantee of translation
-equivalence. Of course, the use of functor implementations such as ``MusicI``
-above only extends to those cases where the syntax API does give translation
-equivalence - but this must be seen as a limiting case, and bigger applications
-will often use only restricted inheritance of ``MusicI``.
-
-
-
-=To find rules in the resource grammar library=
-
-==Inflection paradigms==
-
-Inflection paradigms are defined separately for each language //L//
-in the module ``Paradigms``//L//. To test them, the command
-``cc`` (= ``compute_concrete``)
-can be used:
-```
- > i -retain german/ParadigmsGer.gf
-
- > cc mkN "Schlange"
- {
- s : Number => Case => Str = table Number {
- Sg => table Case {
- Nom => "Schlange" ;
- Acc => "Schlange" ;
- Dat => "Schlange" ;
- Gen => "Schlange"
- } ;
- Pl => table Case {
- Nom => "Schlangen" ;
- Acc => "Schlangen" ;
- Dat => "Schlangen" ;
- Gen => "Schlangen"
- }
- } ;
- g : Gender = Fem
- }
-```
-For the sake of convenience, every language implements these five paradigms:
-```
- oper
- mkN : Str -> N ; -- regular nouns
- mkA : Str -> A : -- regular adjectives
- mkV : Str -> V ; -- regular verbs
- mkPN : Str -> PN ; -- regular proper names
- mkV2 : V -> V2 ; -- direct transitive verbs
-```
-It is often possible to initialize a lexicon by just using these functions,
-and later revise it by using the more involved paradigms. For instance, in
-German we cannot use ``mkN "Lied"`` for ``Song``, because the result would be a
-Masculine noun with the plural form ``"Liede"``.
-The individual ``Paradigms`` modules
-tell what cases are covered by the regular heuristics.
-
-As a limiting case, one could even initialize the lexicon for a new language
-by copying the English (or some other already existing) lexicon. This would
-produce language with correct grammar but with content words directly borrowed from
-English - maybe not so strange in certain technical domains.
-
-
-
-==Syntax rules==
-
-Syntax rules should be looked for in the module ``Constructors``.
-Below this top-level module exposing overloaded constructors,
-there are around 10 abstract modules, each defining constructors for
-a group of one or more related categories. For instance, the module
-``Noun`` defines how to construct common nouns, noun phrases, and determiners.
-But these special modules are seldom or never needed by the users of the library.
-
-TODO: when are they needed?
-
-Browsing the libraries is helped by the gfdoc-generated HTML pages,
-whose LaTeX versions are included in the present document.
-
-
-==Special-purpose APIs==
-
-To give an analogy with the well-known type setting software, GF can be compared
-with TeX and the resource grammar library with LaTeX.
-Just like TeX frees the author
-from thinking about low-level problems of page layout, so GF frees the grammarian
-from writing parsing and generation algorithms. But quite a lot of knowledge of
-//how// to write grammars is still needed, and the resource grammar library helps
-GF grammarians in a way similar to how the LaTeX macro package helps TeX authors.
-
-But even LaTeX is often too detailed and low-level, and users are encouraged to
-develop their own macro packages. The same applies to GF resource grammars:
-the application grammarian might not need all the choices that the resource
-provides, but would prefer less writing and higher-level programming.
-To this end, application grammarians may want to write their own views on the
-resource grammar.
-
-
-==Browsing by the parser==
-
-A method alternative to browsing library documentation is
-to use the parser.
-Even though parsing is not an intended end-user application
-of resource grammars, it is a useful technique for application grammarians
-to browse the library. To find out which resource function implements
-a particular structure, one can just parse a string that exemplifies this
-structure. For instance, to find out how sentences are built using
-transitive verbs, write
-```
- > i english/LangEng.gf
-
- > p -cat=Cl "she loves him"
- PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron))
-```
-The parser returns original constructors, not overloaded ones. Overloaded
-constructors can be returned, so far with experimental heuristics, by using
-the grammar ``api/toplevel/OverLangEng.gf`` and a special flag:
-```
- > i api/toplevel/OverLangEng.gf
-
- > p -cat=Cl -overload "she loves him"
- mkCl (mkNP she_Pron) love_V2 (mkNP he_Pron)
-```
-Parsing with the English resource grammar has an acceptable speed, but
-with most languages it takes just too much resources even to build the
-parser. However, examples parsed in one language can always be linearized into
-other languages:
-```
- > i italian/LangIta.gf
-
- > l PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron))
- lo ama
-```
-Therefore, one can use the English parser to write an Italian grammar, and also
-to write a language-independent (incomplete) grammar. One can also parse strings
-that are bizarre in English but the intended way of expression in another language.
-For instance, the phrase for "I am hungry" in Italian is literally "I have hunger".
-This can be built by parsing "I have beer" in ``OverLangEng`` and then writing
-```
- lin IamHungry =
- let beer_N = mkN "fame" feminine
- in
- mkCl (mkNP i_Pron) have_V2 (mkNP massQuant beer_N)
-```
-which uses ``ParadigmsIta.mkN``.
-
-
-
-==Example-based grammar writing==
-
-The technique of parsing with the resource grammar can be used in GF source files,
-endowed with the suffix ``.gfe`` ("GF examples"). The suffix tells GF to preprocess
-the file by replacing all expressions of the form
-```
- in Module.Cat "example string"
-```
-by the syntax trees obtained by parsing "example string" in ``Cat`` in ``Module``.
-For instance,
-```
- lin IamHungry =
- let beer_N = mkN "fame" feminine
- in
- (in LangEng.Cl "I have beer") ;
-```
-will result in the rule displayed in the previous section. The normal binding rules
-of functional programming (and GF) guarantee that local bindings of identifiers
-take precedence over constants of the same forms. Thus it is also possible to
-linearize functions taking arguments in this way:
-```
- lin
- PropKind car_N old_A = in LangEng.CN "old car" ;
-```
-However, the technique of example-based grammar writing has some limitations:
-- Ambiguity. If a string has several parses, the first one is returned, and
-it may not be the intended one. The other parses are shown in a comment, from
-where they must/can be picked manually.
-- Lexicality. The arguments of a function must be atomic identifiers, and are thus
-not available for categories that have no lexical items.
-For instance, the ``PropKind`` rule above gives the result
-```
- lin
- PropKind car_N old_A = AdjCN (UseN car_N) (PositA old_A) ;
-```
-However, it is possible to write a special lexicon that gives atomic rules for
-all those categories that can be used as arguments, for instance,
-```
- fun
- cat_CN : CN ;
- old_AP : AP ;
-```
-and then use this lexicon instead of the standard one included in ``Lang``.
-
-
diff --git a/doc/multimodal.html b/doc/multimodal.html
deleted file mode 100644
index 9f2b43902..000000000
--- a/doc/multimodal.html
+++ /dev/null
@@ -1,863 +0,0 @@
-
-
-
-
--This document shows a method to write grammars -in which spoken utterances are accompanied by -pointing gestures. A computer application of such -grammars are multimodal dialogue systems, in -which the pointing gestures are performed by -mouse clicks and movements. -
--After an introduction to the notions of -demonstratives and integrated multimodality, -we will show by a concrete example -how multimodal grammars can be written in GF -and how they can be used in dialogue systems. -The explanation is given in three stages: -
--Demonstrative expressions are an old idea. Such -expressions get their meaning from the context. -
-- This train is faster than that airplane. -- -
- I want to go from this place to this place. -- -
-In particular, as in these examples, the meaning -can be obtained from accompanying pointing gestures. -
--Thus the meaning-bearing unit is neither the words nor the -gestures alone, but their combination. Demonstratives -thus provide an example of integrated multimodality, -as opposed to parallel multimodality. In parallel -multimodality, speech and other modes of communication -are just alternative ways to convey the same information. -
- --When formalizing the semantics of demonstratives, we can combine syntax with coordinates: -
-- I want to go from this place to this place -- -
-is interpreted as something like -
-- want(I, go, this(place,(123,45)), this(place,(98,10))) --
-Now, the same semantic value can be given in many ways, by performing -the clicks at different points of time in relation to the speech: -
-- I want to go from this place CLICK(123,45) to this place CLICK(98,10) -- -
- I want to go from this place to this place CLICK(123,45) CLICK(98,10) -- -
- CLICK(123,45) CLICK(98,10) I want to go from this place to this place -- -
-How do we build the value compositionally in parsing? -Traditional parsing is sequential: its input is a string of tokens. -It works for demonstratives only if the pointing is adjacent to -the spoken expression. In the actual input, the demonstrative word -can be separated from the accompanying click by other words. The two -can also be simultaneous. -
- --What we need is a notion of asynchronous parsing, as opposed to -sequential parsing (where demonstrative words and clicks must be -adjacent). -
--We can implement asynchronous parsin in GF by exploiting the generality -of linearization types. A linearization type is the type of -the concrete syntax objects assigned to semantic values. -What a GF grammar defines is a relation -
-- abstract syntax trees <---> concrete syntax objects --
-When modelling context-free grammar in GF, -the concrete syntax objects are just strings. -But they can be more structured objects as well - in general, they are -records of different kinds of objects. For example, -a demonstrative expression can be linearized into a record of two strings. -
-
- {s = "this place" ;
- this place (coord 123 45) <---> p = "(123,45)"
- }
-
--The record -
-
- {s = "I want to go from this place to this place" ;
- p = "(123,45) (98,10"
- }
-
--represents any combination of the sentence and the clicks, as long -as the clicks appear in this order. -
- --A simple example of a multimodal GF grammar is the one called -the Tram Demo grammar. It was written by Björn Bringert within -the TALK project as a part of a dialogue system that -deals with queries about tram timetables. The system interprets -a speech input in combination with mouse clicks on a digital map. -
--The abstract syntax of (a minimal fragment of) the Tram Demo -grammar is -
-- cat - Input, Dep, Dest, Click ; - fun - GoFromTo : Dep -> Dest -> Input ; -- "I want to go from x to y" - DepHere : Click -> Dep ; -- "from here" with click - DestHere : Click -> Dest ; -- "to here" with click - - CCoord : Int -> Int -> Click ; -- click coordinates --
-An English concrete syntax of the grammar is -
-
- lincat
- Input, Dep, Dest = {s : Str ; p : Str} ;
- Click = {p : Str} ;
-
- lin
- GoFromTo x y = {s = ["I want to go"] ++ x.s ++ y.s ; p = x.p ++ y.p} ;
- DepHere c = {s = ["from here"] ; p = c.p} ;
- DestHere c = {s = ["to here"] ; p = c.p} ;
-
- CCoord x y = {p = "(" ++ x.s ++ "," ++ y.s ++ ")"} ;
-
--When the grammar is used in the actual system, standard parsing methods -are used for interpreting the integrated speech and click input. -Parsing appears on two levels: the speech input parsing -performed by the Nuance speech recognition program (without the clicks), -and the semantics-yielding parser sending input to the dialogue manager. -The latter parser just attaches the clicks to the speech input. The order -of the clicks is preserved, and the parser can hence associate each of -the clicks with proper demonstratives. Here is the grammar used in the -two parsing phases. -
-
- cat
- Query, -- whole content
- Speech ; -- speech only
- fun
- QueryInput : Input -> Query ; -- the whole content shown
- SpeechInput : Input -> Speech ; -- only the speech shown
-
- lincat
- Query, Speech = {s : Str} ;
- lin
- QueryInput i = {s = i.s ++ ";" ++ i.p} ;
- SpeechInput i = {s = i.s} ;
-
-
-
--The GF representation of integrated multimodality is -similar to the representation of discontinous constituents. -For instance, assume has arrived is a verb phrase in English, -which can be used both in declarative sentences and questions, -
-- she has arrived -- -
- has she arrived -- -
-In the question, the two words are separated from each other. If
-has arrived is a constituent of the question, it is thus discontinuous.
-To represent such constituents in GF, records can be used:
-we split verb phrases (VP) into a finite and infinitive part.
-
- lincat VP = {fin, inf : Str} ;
-
- lin Indic np vp = {s = np.s ++ vp.fin ++ vp.inf} ;
- lin Quest np vp = {s = vp.fin ++ np.s ++ vp.inf} ;
-
-
-
--The general recipe for using GF when building dialogue systems -is to write a grammar with the following components: -
--The engineering advantages of this approach have to do partly with -the declarativity of the description, partly with the tools provided -by GF to derive different components of the system: -
--An example of this process is Björn Bringert's TramDemo. -More recently, grammars have been integrated to the GoDiS dialogue -manager by Prolog representations of abstract syntax. -
- --This section gives a recipe for making any unimodal grammar -multimodal, by adding pointing gestures to chosen expressions. The recipe -guarantees that the resulting grammar remains semantically well-formed, -i.e. type correct. -
- --The multimodal conversion of a grammar consists of seven -steps, of which the first is always the same, the second -involves a decision, and the rest are derivative: -
-`Point` with a standard linearization type.
-
- cat Point ;
- lincat Point = {point : Str} ;
-
-Point` as their last argument.
- The new type signatures for such constructors d have the form
-- fun d : ... -> Point -> D --
point field to the linearization type L of any
- demonstrative category D, i.e. a category that has at least one demonstrative
- constructor:
-
- lincat D = L ** {point : Str} ;
-
-point field in the linearization t of any
- constructor d that has been made demonstrative:
-
- lin d x1 ... xn p = t x1 ... xn ** {point = p.point} ;
-
-
- lin f x_1 ... x_m =
- t x_1 ... x_m ** {point = x_d1.point ++ ... ++ x_dn.point} ;
-
- Make sure that the pointings x_d1.point ... x_dn.point are concatenated
- in the same order as the arguments appear in the linearization t,
- which is not necessarily the same as the abstract argument order.
-point field to the linearization t of any
- constructor c of a demonstrative category:
-
- lin c x1 ... xn = t x1 ... xn ** {point = []} ;
-
--Start with a Tram Demo grammar with no demonstratives, but just -tram stop names and the indexical here (interpreted as e.g. the user's -standing place). -
-- cat - Input, Dep, Dest, Name ; - fun - GoFromTo : Dep -> Dest -> Input ; - DepHere : Dep ; - DestHere : Dest ; - DepName : Name -> Dep ; - DestName : Name -> Dest ; - - Almedal : Name ; --
-A unimodal English concrete syntax of the grammar is -
-
- lincat
- Input, Dep, Dest, Name = {s : Str} ;
-
- lin
- GoFromTo x y = {s = ["I want to go"] ++ x.s ++ y.s} ;
- DepHere = {s = ["from here"]} ;
- DestHere = {s = ["to here"]} ;
- DepName n = {s = ["from"] ++ n.s} ;
- DestName n = {s = ["to"] ++ n.s} ;
-
- Almedal = {s = "Almedal"} ;
-
--Let us follow the steps of the recipe. -
-Point and its linearization type.
-DepHere and DestHere involve a pointing gesture.
-point to the linearization types of Dep and Dest.
-point to Input. (But Name remains unimodal.)
-p.point to the linearizations of DepHere and DestHere.
-GoFromTo.
-point to DepName and DestName.
--In the resulting grammar, one category is added and -two functions are changed in the abstract syntax (annotated by the step numbers): -
-- cat - Point ; -- 1 - fun - DepHere : Point -> Dep ; -- 2 - DestHere : Point -> Dest ; -- 2 - --
-The concrete syntax in its entirety looks as follows -
-
- lincat
- Dep, Dest = {s : Str ; point : Str} ; -- 3
- Input = {s : Str ; point : Str} ; -- 4
- Name = {s : Str} ;
- Point = {point : Str} ; -- 1
- lin
- GoFromTo x y = {s = ["I want to go"] ++ x.s ++ y.s ; -- 6
- point = x.point ++ y.point
- } ;
- DepHere p = {s = ["from here"] ; -- 5
- point = p.point
- } ;
- DestHere p = {s = ["to here"] : -- 5
- point = p.point
- } ;
- DepName n = {s = ["from"] ++ n.s ; -- 7
- point = []
- } ;
- DestName n = {s = ["to"] ++ n.s ; -- 7
- point = []
- } ;
- Almedal = {s = "Almedal"} ;
-
--What we need in addition, to use the grammar in applications, are -
-Point, e.g. coordinate pairs.
-Query and Speech in the original.
--But their proper place is probably in another grammar module, so that -the core Tram Demo grammar can be used in different systems e.g. -encoding clicks in different ways. -
- --GF is a functional programming language, and we exploit this -by providing a set of combinators that makes the multimodal conversion easier -and clearer. We start with the type of sequences of pointing gestures. -
-
- Point : Type = {point : Str} ;
-
-
-To make a record type multimodal is to extend it with Point.
-The record extension operator ** is needed here.
-
- Dem : Type -> Type = \t -> t ** Point ; --
-To construct, use, and concatenate pointings: -
-
- mkPoint : Str -> Point = \s -> {point = s} ;
-
- noPoint : Point = mkPoint [] ;
-
- point : Point -> Str = \p -> p.point ;
-
- concatPoint : (x,y : Point) -> Point = \x,y ->
- mkPoint (point x ++ point y) ;
-
--Finally, to add pointing to a record, with the limiting case of no demonstrative needed. -
-- mkDem : (t : Type) -> t -> Point -> Dem t = \_,x,s -> x ** s ; - - nonDem : (t : Type) -> t -> Dem t = \t,x -> mkDem t x noPoint ; --
-Let us rewrite the Tram Demo grammar by using these combinators: -
-
- oper
- SS : Type = {s : Str} ;
- lincat
- Input, Dep, Dest = Dem SS ;
- Name = SS ;
-
- lin
- GoFromTo x y = {s = ["I want to go"] ++ x.s ++ y.s} **
- concatPoint x y ;
- DepHere = mkDem SS {s = ["from here"]} ;
- DestHere = mkDem SS {s = ["to here"]} ;
- DepName n = nonDem SS {s = ["from"] ++ n.s} ;
- DestName n = nonDem SS {s = ["to"] ++ n.s} ;
-
- Almedal = {s = "Almedal"} ;
-
-
-The type synonym SS is introduced to make the combinator applications
-concise. Notice the use of partial application in DepHere and
-DestHere; an equivalent way to write is
-
- DepHere p = mkDem SS {s = ["from here"]} p ;
-
-
-
--The main advantage of using GF when building dialogue systems is -that various components of the system -can be automatically generated from GF grammars. -Writing these grammars, however, can still be a considerable -task. A case in point are multilingual systems: -how to localize e.g. a system built in a car to -the languages of all those customers to whom the -car is sold? This problem has been the main focus of -GF for some years, and the solution on which most work has been -done is the development of resource grammar libraries. -These libraries work in the same way as program libraries -in software engineering, enabling a division of labour -between linguists and domain experts. -
-
-One of the goals in the resource grammars of different
-languages has been to provide a language-independent API,
-which makes the same resource grammar functions available for
-different languages. For instance, the categories
-S, NP, and VP are available in all of the
-10 languages currently supported, and so is the function
-
- PredVP : NP -> VP -> S --
-which corresponds to the rule S -> NP VP in phrase
-structure grammar. However, there are several levels of abstraction
-between the function PredVP and the phrase structure rule,
-because the rule is implemented in so different ways in different
-languages. In particular, discontinuous constituents are needed in
-various degrees to make the rule work in different languages.
-
-Now, dealing with discontinuous constituents is one of the demanding -aspects of multilingual grammar writing that the resource grammar -API is designed to hide. But the proposed treatment of integrated -multimodality is heavily dependent on similar things. What can we -do to make multimodal grammars easier to write (for different languages)? -There are two orthogonal answers: -
--The multimodal resource grammar library has been obtained from -the unimodal one by applying the multimodal conversion manually. -In addition, the API has been simplified -by leaving out structures needed in written technical documents -(the original application area of GF) but not in spoken dialogue. -
--In the following subsections, we will show a part of the -multimodal resource grammar API, limited to a fragment that -is needed to get the main ideas and to reimplement the -Tram Demo grammar. The reimplementation shows one more advantage -of the resource grammar approach: dialogue systems can be -automatically instantiated to different languages. -
- --The resource grammar API has three main kinds of entries: -
-- PredVP : NP -> VP -> S ; -- "Mary helps him" --
- TopicObj : NP -> VP -> S ; -- "honom hjälper Mary" --
- irregV : (sing,sang,sung : Str) -> V ; --
-The first two kinds of entries are cat and fun definitions
-in an abstract syntax. The multimodal, restricted API has
-e.g. the following categories. Their names are obtained from
-the corresponding unimodal categories by prefixing M.
-
- MS ; -- multimodal sentence or question - MQS ; -- multimodal wh question - MImp ; -- multimodal imperative - MVP ; -- multimodal verb phrase - MNP ; -- multimodal (demonstrative) noun phrase - MAdv ; -- multimodal (demonstrative) adverbial - - Point ; -- pointing gesture -- - -
-Demonstrative pronouns can be used both as noun phrases and -as determiners. -
-- this_MNP : Point -> MNP ; -- this - thisDet_MNP : CN -> Point -> MNP ; -- this car --
-There are also demonstrative adverbs, and prepositions give -a productive way to build more adverbs. -
-- here_MAdv : Point -> MAdv ; -- here - here7from_MAdv : Point -> MAdv ; -- from here - - MPrepNP : Prep -> MNP -> MAdv ; -- in this car -- - -
-A handful of predication rules construct sentences, questions, and imperatives. -
-- MPredVP : MNP -> MVP -> MS ; -- this plane flies here - MQPredVP : MNP -> MVP -> MQS ; -- does this plane fly here - MQuestVP : IP -> MVP -> MQS ; -- who flies here - MImpVP : MVP -> MImp ; -- fly here! --
-Verb phrases are constructed from verbs (inherited as such from -the unimodal API) by providing their complements. -
-- MUseV : V -> MVP ; -- flies - MComplV2 : V2 -> MNP -> MVP ; -- takes this - MComplVV : VV -> MVP -> MVP ; -- wants to take this --
-A multimodal adverb can be attached to a verb phrase. -
-- MAdvVP : MVP -> MAdv -> MVP ; -- flies here -- - -
-The implementation makes heavy use of the multimodal conversion
-combinators. It adds a point field to whatever the implementation of the unimodal
-category is in any language. Thus, for example
-
- lincat
- MVP = Dem VP ;
- MNP = Dem NP ;
- MAdv = Dem Adv ;
-
- lin
- this_MNP = mkDem NP this_NP ;
- -- i.e. this_MNP p = this_NP ** {point = p.point} ;
-
- MComplV2 verb obj = mkDem VP (ComplV2 verb obj) obj ;
-
- MAdvVP vp adv = mkDem VP (AdvVP vp adv) (concatPoint vp adv) ;
-
-
-
--Using nondemonstrative expressions as demonstratives: -
-- DemNP : NP -> MNP ; - DemAdv : Adv -> MAdv ; --
-Building top-level phrases: -
-- PhrMS : Pol -> MS -> Phr ; - PhrMS : Pol -> MS -> Phr ; - PhrMQS : Pol -> MQS -> Phr ; - PhrMImp : Pol -> MImp -> Phr ; -- - -
-The implementation above has only used the resource grammar API,
-not the concrete implementations. The library Demonstrative
-is a parametrized module, also called a functor, which
-has the following structure
-
- incomplete concrete DemonstrativeI of Demonstrative =
- Cat, TenseX ** open Test, Structural in {
-
- -- lincat and lin rules
-
- }
-
--It can be instantiated to different languages as follows. -
-- concrete DemonstrativeEng of Demonstrative = - CatEng, TenseX ** DemonstrativeI with - (Test = TestEng), - (Structural = StructuralEng) ; - - concrete DemonstrativeSwe of Demonstrative = - CatSwe, TenseX ** DemonstrativeI with - (Test = TestSwe), - (Structural = StructuralSwe) ; -- - -
-Again using the functor idea, we reimplement TramDemo
-as follows:
-
- incomplete concrete TramI of Tram = open Multimodal in {
-
- lincat
- Query = Phr ; Input = MS ;
- Dep, Dest = MAdv ; Click = Point ;
- lin
- QInput = PhrMS PPos ;
-
- GoFromTo x y =
- MPredVP (DemNP (UsePron i_Pron))
- (MAdvVP (MAdvVP (MComplVV want_VV (MUseV go_V)) x) y) ;
-
- DepHere = here7from_MAdv ;
- DestHere = here7to_MAdv ;
- DepName s = MPrepNP from_Prep (DemNP (UsePN (SymbPN (MkSymb s)))) ;
- DestName s = MPrepNP to_Prep (DemNP (UsePN (SymbPN (MkSymb s)))) ;
-
-
-
-Then we can instantiate this to all languages for which
-the Multimodal API has been implemented:
-
- concrete TramEng of Tram = TramI with - (Multimodal = MultimodalEng) ; - - concrete TramSwe of Tram = TramI with - (Multimodal = MultimodalSwe) ; - - concrete TramFre of Tram = TramI with - (Multimodal = MultimodalFre) ; -- - -
-It was pointed out in the section on the multimodal conversion that -the concrete word order may be different from the abstract one, -and vary between different languages. For instance, Swedish -topicalization -
-- Det här tåget vill den här kunden inte ta. -- -
-(``this train, this customer doesn't want to take'') may well have -an abstract syntax of a form in which the customer appears -before the train. -
--This is a problem for the implementor of the resource grammar. -It means that some parts of the resource must be written manually -and not as a functor. -However, the user of the resource can safely -ignore the word order problem, if it is correctly dealt with in -the resource. -
- --When starting to develop resource grammars, we believed they -would be all that -an application grammarian needs to write a concrete syntax. -However, experience has shown that it can be tough to start -grammar development in this way: selecting functions from -a resource API requires more abstract thinking than just -writing strings, and its take longer to reach testable -results. The most light-weight format is -maybe to start with context-free grammars (which notation is -also supported by GF). Context-free grammars that -give acceptable even though over-generating -results for languages like English are quick to produce. -
--The experience has led to the following -steps for grammar development. While giving the work -a quick start, this recipe -increases abstraction at a later level, when it is time to -to localize the grammar to different languages. -If context-free notation is used, steps 1 and 2 can -be merged. -
-Domain.
-DomainRough.
- This can be oversimplified and overgenerating.
-DomainI.
- This can helped by example-based grammar writing, where
- the examples are generated from DomainRough.
-DomainI to different languages,
- and test the results by generating linearizations.
-