From e7b7def3130881852ff4acd1845dd31266c166fe Mon Sep 17 00:00:00 2001 From: aarne Date: Thu, 31 May 2007 13:43:46 +0000 Subject: [PATCH] resource doc in tutorial --- doc/resource.txt | 4 +- doc/tutorial/gf-tutorial2.txt | 458 +++++++++++++++++++++++----------- 2 files changed, 320 insertions(+), 142 deletions(-) diff --git a/doc/resource.txt b/doc/resource.txt index cfad3000e..a1c855fb7 100644 --- a/doc/resource.txt +++ b/doc/resource.txt @@ -1,5 +1,5 @@ -The GF Resource Grammar Library -Author: Aarne Ranta, Ali El Dada, and Janna Khegai +The GF Resource Grammar Library, Version 1.2 +Authors: Aarne Ranta, Ali El Dada, Janna Khegai, and Björn Bringert Last update: %%date(%c) % NOTE: this is a txt2tags file. diff --git a/doc/tutorial/gf-tutorial2.txt b/doc/tutorial/gf-tutorial2.txt index 9c3ae71b2..3ca7414d9 100644 --- a/doc/tutorial/gf-tutorial2.txt +++ b/doc/tutorial/gf-tutorial2.txt @@ -1658,174 +1658,352 @@ All of the following uses of ``mkN`` are easy to resolve: %--! ==Using the resource grammar library TODO== -A resource grammar is a grammar built on linguistic grounds, -to describe a language rather than a domain. -The GF resource grammar library, which contains resource grammars for -10 languages, is described more closely in the following -documents: -- [Resource library API documentation ../../lib/resource-1.0/doc/]: - for application grammarians using the resource. -- [Resource writing HOWTO ../../lib/resource-1.0/doc/Resource-HOWTO.html]: - for resource grammarians developing the resource. +===Coverage=== + +The GF Resource Grammar Library contains grammar rules for +10 languages (in addition, 2 languages are available as incomplete +implementations, and a few more are under construction). Its purpose +is to make these rules available for application programmers, +who can thereby concentrate on the semantic and stylistic +aspects of their grammars, without having to think about +grammaticality. The targeted level of application grammarians +is that of a skilled programmer with +a practical knowledge of the target languages, but without +theoretical knowledge about their grammars. +Such a combination of +skills is typical of programmers who want to localize +software to new languages. + +The current resource languages are +- ``Ara``bic +- ``Cat``alan +- ``Dan``ish +- ``Eng``lish +- ``Fin``nish +- ``Fre``nch +- ``Ger``man +- ``Ita``lian +- ``Nor``wegian +- ``Rus``sian +- ``Spa``nish +- ``Swe``dish -===Interfaces, instances, and functors=== +The first three letters (``Eng`` etc) are used in grammar module names. +The Arabic and Catalan implementations are still incomplete, but +enough to be used in many applications. -===The simplest way=== - -The simplest way is to ``open`` a top-level ``Lang`` module -and a ``Paradigms`` module: +To give an example application, consider +music playing devices. In the application, +we may have a semantical category ``Kind``, examples +of ``Kind``s being ``Song`` and ``Artist``. In German, for instance, ``Song`` +is linearized into the noun "Lied", but knowing this is not +enough to make the application work, because the noun must be +produced in both singular and plural, and in four different +cases. By using the resource grammar library, it is enough to +write ``` - abstract Foo = ... - - concrete FooEng = open LangEng, ParadigmsEng in ... - concrete FooSwe = open LangSwe, ParadigmsSwe in ... + lin Song = mkN "Lied" "Lieder" neuter ``` -Here is an example. +and the eight forms are correctly generated. The resource grammar +library contains a complete set of inflectional paradigms (such as +``mkN`` here), enabling the definition of any lexical items. + +The resource grammar library is not only about inflectional paradigms - it +also has syntax rules. The music player application +might also want to modify songs with properties, such as "American", +"old", "good". The German grammar for adjectival modifications is +particularly complex, because adjectives have to agree in gender, +number, and case, and also depend on what determiner is used +("ein amerikanisches Lied" vs. "das amerikanische Lied"). All this +variation is taken care of by the resource grammar function ``` -abstract Arithm = { - cat - Prop ; - Nat ; - fun - Zero : Nat ; - Succ : Nat -> Nat ; - Even : Nat -> Prop ; - And : Prop -> Prop -> Prop ; -} + fun AdjCN : AP -> CN -> CN +``` +(see the tables in the end of this document for the list of all resource grammar +functions). The resource grammar implementation of the rule adding properties +to kinds is +``` + lin PropKind kind prop = AdjCN prop kind +``` +given that +``` + lincat Prop = AP + lincat Kind = CN +``` +The resource library API is devided into language-specific +and language-independent parts. To put it roughly, +- the lexicon API is language-specific +- the syntax API is language-independent ---# -path=.:alltenses:prelude -concrete ArithmEng of Arithm = open LangEng, ParadigmsEng in { - lincat - Prop = S ; - Nat = NP ; - lin - Zero = - UsePN (regPN "zero" nonhuman) ; - Succ n = - DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 (regN2 "successor") n) ; - Even n = - UseCl TPres ASimul PPos - (PredVP n (UseComp (CompAP (PositA (regA "even"))))) ; - And x y = - ConjS and_Conj (BaseS x y) ; +Thus, to render the above example in French instead of German, we need to +pick a different linearization of ``Song``, +``` + lin Song = mkN "chanson" feminine +``` +But to linearize ``PropKind``, we can use the very same rule as in German. +The resource function ``AdjCN`` has different implementations in the two +languages (e.g. a different word order in French), +but the application programmer need not care about the difference. -} ---# -path=.:alltenses:prelude +===Note on APIs=== -concrete ArithmSwe of Arithm = open LangSwe, ParadigmsSwe in { - lincat - Prop = S ; - Nat = NP ; - lin - Zero = - UsePN (regPN "noll" neutrum) ; - Succ n = - DetCN (DetSg (SgQuant DefArt) NoOrd) - (ComplN2 (mkN2 (mk2N "efterföljare" "efterföljare") - (mkPreposition "till")) n) ; - Even n = - UseCl TPres ASimul PPos - (PredVP n (UseComp (CompAP (PositA (regA "jämn"))))) ; - And x y = - ConjS and_Conj (BaseS x y) ; -} +From version 1.1 onwards, the resource library is available via two +APIs: +- original ``fun`` and ``oper`` definitions +- overloaded ``oper`` definitions + + +Introducing overloading in GF version 2.7 has been a success in improving +the accessibility of libraries. It has also created a layer of abstraction +between the writers and users of libraries, and thereby makes the library +easier to modify. We shall therefore use the overloaded API +in this document. The original function names are mainly interesting +for those who want to write or modify libraries. + + + +===A complete example=== + +To summarize the example, and also give a template for a programmer to work on, +here is the complete implementation of a small system with songs and properties. +The abstract syntax defines a "domain ontology": +``` + abstract Music = { + cat + Kind, + Property ; + fun + PropKind : Kind -> Property -> Kind ; + Song : Kind ; + American : Property ; + } +``` +The concrete syntax is defined by a functor (parametrized module), +independently of language, by opening +two interfaces: the resource ``Grammar`` and an application lexicon. +``` + incomplete concrete MusicI of Music = open Grammar, MusicLex in { + lincat + Kind = CN ; + Property = AP ; + lin + PropKind k p = AdjCN p k ; + Song = UseN song_N ; + American = PositA american_A ; + } +``` +The application lexicon ``MusicLex`` has an abstract syntax that extends +the resource category system ``Cat``. +``` + abstract MusicLex = Cat ** { + fun + song_N : N ; + american_A : A ; + } +``` +Each language has its own concrete syntax, which opens the +inflectional paradigms module for that language: +``` + concrete MusicLexGer of MusicLex = + CatGer ** open ParadigmsGer in { + lin + song_N = reg2N "Lied" "Lieder" neuter ; + american_A = regA "amerikanisch" ; + } + + concrete MusicLexFre of MusicLex = + CatFre ** open ParadigmsFre in { + lin + song_N = regGenN "chanson" feminine ; + american_A = regA "américain" ; + } +``` +The top-level ``Music`` grammars are obtained by +instantiating the two interfaces of ``MusicI``: +``` + concrete MusicGer of Music = MusicI with + (Grammar = GrammarGer), + (MusicLex = MusicLexGer) ; + + concrete MusicFre of Music = MusicI with + (Grammar = GrammarFre), + (MusicLex = MusicLexFre) ; +``` +Both of these files can use the same ``path``, defined as +``` + --# -path=.:present:prelude +``` +The ``present`` category contains the compiled resources, restricted to +present tense; ``alltenses`` has the full resources. + +To localize the music player system to a new language, +all that is needed is two modules, +one implementing ``MusicLex`` and the other +instantiating ``Music``. The latter is +completely trivial, whereas the former one involves the choice of correct +vocabulary and inflectional paradigms. For instance, Finnish is added as follows: +``` + concrete MusicLexFin of MusicLex = + CatFin ** open ParadigmsFin in { + lin + song_N = regN "kappale" ; + american_A = regA "amerikkalainen" ; + } + + concrete MusicFin of Music = MusicI with + (Grammar = GrammarFin), + (MusicLex = MusicLexFin) ; +``` +More work is of course needed if the language-independent linearizations in +MusicI are not satisfactory for some language. The resource grammar guarantees +that the linearizations are possible in all languages, in the sense of grammatical, +but they might of course be inadequate for stylistic reasons. Assume, +for the sake of argument, that adjectival modification does not sound good in +English, but that a relative clause would be preferrable. One can then start as +before, +``` + concrete MusicLexEng of MusicLex = + CatEng ** open ParadigmsEng in { + lin + song_N = regN "song" ; + american_A = regA "American" ; + } + + concrete MusicEng0 of Music = MusicI with + (Grammar = GrammarEng), + (MusicLex = MusicLexEng) ; +``` +The module ``MusicEng0`` would not be used on the top level, however, but +another module would be built on top of it, with a restricted import from +``MusicEng0``. ``MusicEng`` inherits everything from ``MusicEng0`` +except ``PropKind``, and +gives its own definition of this function: +``` + concrete MusicEng of Music = + MusicEng0 - [PropKind] ** open GrammarEng in { + lin + PropKind k p = + RelCN k (UseRCl TPres ASimul PPos + (RelVP IdRP (UseComp (CompAP p)))) ; + } ``` +===To find rules in the resource grammar library=== -===How to find resource functions=== +====Inflection paradigms==== -The definitions in this example were found by parsing: +Inflection paradigms are defined separately for each language //L// +in the module ``Paradigms``//L//. To test them, the command +``cc`` (= ``compute_concrete``) +can be used: ``` - > i LangEng.gf + > i -retain german/ParadigmsGer.gf - -- for Successor: - > p -cat=NP -mcfg -parser=topdown "the mother of Paris" - - -- for Even: - > p -cat=S -mcfg -parser=topdown "Paris is old" - - -- for And: - > p -cat=S -mcfg -parser=topdown "Paris is old and I am old" + > cc mkN "Schlange" + { + s : Number => Case => Str = table Number { + Sg => table Case { + Nom => "Schlange" ; + Acc => "Schlange" ; + Dat => "Schlange" ; + Gen => "Schlange" + } ; + Pl => table Case { + Nom => "Schlangen" ; + Acc => "Schlangen" ; + Dat => "Schlangen" ; + Gen => "Schlangen" + } + } ; + g : Gender = Fem + } ``` -The use of parsing can be systematized by **example-based grammar writing**, -to which we will return later. - - -===A functor implementation=== - -The interesting thing now is that the -code in ``ArithmSwe`` is similar to the code in ``ArithmEng``, except for -some lexical items ("noll" vs. "zero", "efterföljare" vs. "successor", -"jämn" vs. "even"). How can we exploit the similarities and -actually share code between the languages? - -The solution is to use a functor: an ``incomplete`` module that opens -an ``abstract`` as an ``interface``, and then instantiate it to different -languages that implement the interface. The structure is as follows: +For the sake of convenience, every language implements these five paradigms: ``` - abstract Foo ... - - incomplete concrete FooI = open Lang, Lex in ... - - concrete FooEng of Foo = FooI with (Lang=LangEng), (Lex=LexEng) ; - concrete FooSwe of Foo = FooI with (Lang=LangSwe), (Lex=LexSwe) ; + oper + mkN : Str -> N ; -- regular nouns + mkA : Str -> A : -- regular adjectives + mkV : Str -> V ; -- regular verbs + mkPN : Str -> PN ; -- regular proper names + mkV2 : V -> V2 ; -- direct transitive verbs ``` -where ``Lex`` is an abstract lexicon that includes the vocabulary -specific to this application: +It is often possible to initialize a lexicon by just using these functions, +and later revise it by using the more involved paradigms. For instance, in +German we cannot use ``mkN "Lied"`` for ``Song``, because the result would be a +Masculine noun with the plural form ``"Liede"``. +The individual ``Paradigms`` modules +tell what cases are covered by the regular heuristics. + +As a limiting case, one could even initialize the lexicon for a new language +by copying the English (or some other already existing) lexicon. This would +produce language with correct grammar but with content words directly borrowed from +English - maybe not so strange in certain technical domains. + + + +====Syntax rules==== + +Syntax rules should be looked for in the module ``Constructors``. +Below this top-level module exposing overloaded constructors, +there are around 10 abstract modules, each defining constructors for +a group of one or more related categories. For instance, the module +``Noun`` defines how to construct common nouns, noun phrases, and determiners. +But these special modules are seldom needed by the users of the library. + +TODO: when are they needed? + +Browsing the libraries is helped by the gfdoc-generated HTML pages, +whose LaTeX versions are included in the present document. + + + +====Browsing by the parser==== + +A method alternative to browsing library documentation is +to use the parser. +Even though parsing is not an intended end-user application +of resource grammars, it is a useful technique for application grammarians +to browse the library. To find out which resource function implements +a particular structure, one can just parse a string that exemplifies this +structure. For instance, to find out how sentences are built using +transitive verbs, write ``` - abstract Lex = Cat ** ... + > i english/LangEng.gf + + > p -cat=Cl -fcfg "she loves him" - concrete LexEng of Lex = CatEng ** open ParadigmsEng in ... - concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in ... + PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron)) ``` -Here, again, a complete example (``abstract Arithm`` is as above): +The parser returns original constructors, not overloaded ones. + +Parsing with the English resource grammar has an acceptable speed, but +with most languages it takes just too much resources even to build the +parser. However, examples parsed in one language can always be linearized into +other languages: ``` -incomplete concrete ArithmI of Arithm = open Lang, Lex in { - lincat - Prop = S ; - Nat = NP ; - lin - Zero = - UsePN zero_PN ; - Succ n = - DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 successor_N2 n) ; - Even n = - UseCl TPres ASimul PPos - (PredVP n (UseComp (CompAP (PositA even_A)))) ; - And x y = - ConjS and_Conj (BaseS x y) ; -} + > i italian/LangIta.gf ---# -path=.:alltenses:prelude -concrete ArithmEng of Arithm = ArithmI with - (Lang = LangEng), - (Lex = LexEng) ; + > l PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron)) ---# -path=.:alltenses:prelude -concrete ArithmSwe of Arithm = ArithmI with - (Lang = LangSwe), - (Lex = LexSwe) ; - -abstract Lex = Cat ** { - fun - zero_PN : PN ; - successor_N2 : N2 ; - even_A : A ; -} - -concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in { - lin - zero_PN = regPN "noll" neutrum ; - successor_N2 = - mkN2 (mk2N "efterföljare" "efterföljare") (mkPreposition "till") ; - even_A = regA "jämn" ; -} + lo ama ``` +Therefore, one can use the English parser to write an Italian grammar, and also +to write a language-independent (incomplete) grammar. One can also parse strings +that are bizarre in English but the intended way of expression in another language. +For instance, the phrase for "I am hungry" in Italian is literally "I have hunger". +This can be built by parsing "I have beer" in LanEng and then writing +``` + lin IamHungry = + let beer_N = regGenN "fame" feminine + in + PredVP (UsePron i_Pron) (ComplV2 have_V2 + (DetCN (DetSg MassDet NoOrd) (UseN beer_N))) ; +``` +which uses ParadigmsIta.regGenN. -===Restricted inheritance and qualified opening===