resource doc in tutorial

This commit is contained in:
aarne
2007-05-31 13:43:46 +00:00
parent 74c032b688
commit ad1af38d60
2 changed files with 320 additions and 142 deletions

View File

@@ -1,5 +1,5 @@
The GF Resource Grammar Library The GF Resource Grammar Library, Version 1.2
Author: Aarne Ranta, Ali El Dada, and Janna Khegai Authors: Aarne Ranta, Ali El Dada, Janna Khegai, and Björn Bringert
Last update: %%date(%c) Last update: %%date(%c)
% NOTE: this is a txt2tags file. % NOTE: this is a txt2tags file.

View File

@@ -1658,174 +1658,352 @@ All of the following uses of ``mkN`` are easy to resolve:
%--! %--!
==Using the resource grammar library TODO== ==Using the resource grammar library TODO==
A resource grammar is a grammar built on linguistic grounds, ===Coverage===
to describe a language rather than a domain.
The GF resource grammar library, which contains resource grammars for The GF Resource Grammar Library contains grammar rules for
10 languages, is described more closely in the following 10 languages (in addition, 2 languages are available as incomplete
documents: implementations, and a few more are under construction). Its purpose
- [Resource library API documentation ../../lib/resource-1.0/doc/]: is to make these rules available for application programmers,
for application grammarians using the resource. who can thereby concentrate on the semantic and stylistic
- [Resource writing HOWTO ../../lib/resource-1.0/doc/Resource-HOWTO.html]: aspects of their grammars, without having to think about
for resource grammarians developing the resource. grammaticality. The targeted level of application grammarians
is that of a skilled programmer with
a practical knowledge of the target languages, but without
theoretical knowledge about their grammars.
Such a combination of
skills is typical of programmers who want to localize
software to new languages.
The current resource languages are
- ``Ara``bic
- ``Cat``alan
- ``Dan``ish
- ``Eng``lish
- ``Fin``nish
- ``Fre``nch
- ``Ger``man
- ``Ita``lian
- ``Nor``wegian
- ``Rus``sian
- ``Spa``nish
- ``Swe``dish
===Interfaces, instances, and functors=== The first three letters (``Eng`` etc) are used in grammar module names.
The Arabic and Catalan implementations are still incomplete, but
enough to be used in many applications.
===The simplest way=== To give an example application, consider
music playing devices. In the application,
The simplest way is to ``open`` a top-level ``Lang`` module we may have a semantical category ``Kind``, examples
and a ``Paradigms`` module: of ``Kind``s being ``Song`` and ``Artist``. In German, for instance, ``Song``
is linearized into the noun "Lied", but knowing this is not
enough to make the application work, because the noun must be
produced in both singular and plural, and in four different
cases. By using the resource grammar library, it is enough to
write
``` ```
abstract Foo = ... lin Song = mkN "Lied" "Lieder" neuter
concrete FooEng = open LangEng, ParadigmsEng in ...
concrete FooSwe = open LangSwe, ParadigmsSwe in ...
``` ```
Here is an example. and the eight forms are correctly generated. The resource grammar
library contains a complete set of inflectional paradigms (such as
``mkN`` here), enabling the definition of any lexical items.
The resource grammar library is not only about inflectional paradigms - it
also has syntax rules. The music player application
might also want to modify songs with properties, such as "American",
"old", "good". The German grammar for adjectival modifications is
particularly complex, because adjectives have to agree in gender,
number, and case, and also depend on what determiner is used
("ein amerikanisches Lied" vs. "das amerikanische Lied"). All this
variation is taken care of by the resource grammar function
``` ```
abstract Arithm = { fun AdjCN : AP -> CN -> CN
cat ```
Prop ; (see the tables in the end of this document for the list of all resource grammar
Nat ; functions). The resource grammar implementation of the rule adding properties
fun to kinds is
Zero : Nat ; ```
Succ : Nat -> Nat ; lin PropKind kind prop = AdjCN prop kind
Even : Nat -> Prop ; ```
And : Prop -> Prop -> Prop ; given that
} ```
lincat Prop = AP
lincat Kind = CN
```
The resource library API is devided into language-specific
and language-independent parts. To put it roughly,
- the lexicon API is language-specific
- the syntax API is language-independent
--# -path=.:alltenses:prelude
concrete ArithmEng of Arithm = open LangEng, ParadigmsEng in { Thus, to render the above example in French instead of German, we need to
lincat pick a different linearization of ``Song``,
Prop = S ; ```
Nat = NP ; lin Song = mkN "chanson" feminine
lin ```
Zero = But to linearize ``PropKind``, we can use the very same rule as in German.
UsePN (regPN "zero" nonhuman) ; The resource function ``AdjCN`` has different implementations in the two
Succ n = languages (e.g. a different word order in French),
DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 (regN2 "successor") n) ; but the application programmer need not care about the difference.
Even n =
UseCl TPres ASimul PPos
(PredVP n (UseComp (CompAP (PositA (regA "even"))))) ;
And x y =
ConjS and_Conj (BaseS x y) ;
}
--# -path=.:alltenses:prelude ===Note on APIs===
concrete ArithmSwe of Arithm = open LangSwe, ParadigmsSwe in { From version 1.1 onwards, the resource library is available via two
lincat APIs:
Prop = S ; - original ``fun`` and ``oper`` definitions
Nat = NP ; - overloaded ``oper`` definitions
lin
Zero =
UsePN (regPN "noll" neutrum) ; Introducing overloading in GF version 2.7 has been a success in improving
Succ n = the accessibility of libraries. It has also created a layer of abstraction
DetCN (DetSg (SgQuant DefArt) NoOrd) between the writers and users of libraries, and thereby makes the library
(ComplN2 (mkN2 (mk2N "efterföljare" "efterföljare") easier to modify. We shall therefore use the overloaded API
(mkPreposition "till")) n) ; in this document. The original function names are mainly interesting
Even n = for those who want to write or modify libraries.
UseCl TPres ASimul PPos
(PredVP n (UseComp (CompAP (PositA (regA "jämn"))))) ;
And x y =
ConjS and_Conj (BaseS x y) ; ===A complete example===
}
To summarize the example, and also give a template for a programmer to work on,
here is the complete implementation of a small system with songs and properties.
The abstract syntax defines a "domain ontology":
```
abstract Music = {
cat
Kind,
Property ;
fun
PropKind : Kind -> Property -> Kind ;
Song : Kind ;
American : Property ;
}
```
The concrete syntax is defined by a functor (parametrized module),
independently of language, by opening
two interfaces: the resource ``Grammar`` and an application lexicon.
```
incomplete concrete MusicI of Music = open Grammar, MusicLex in {
lincat
Kind = CN ;
Property = AP ;
lin
PropKind k p = AdjCN p k ;
Song = UseN song_N ;
American = PositA american_A ;
}
```
The application lexicon ``MusicLex`` has an abstract syntax that extends
the resource category system ``Cat``.
```
abstract MusicLex = Cat ** {
fun
song_N : N ;
american_A : A ;
}
```
Each language has its own concrete syntax, which opens the
inflectional paradigms module for that language:
```
concrete MusicLexGer of MusicLex =
CatGer ** open ParadigmsGer in {
lin
song_N = reg2N "Lied" "Lieder" neuter ;
american_A = regA "amerikanisch" ;
}
concrete MusicLexFre of MusicLex =
CatFre ** open ParadigmsFre in {
lin
song_N = regGenN "chanson" feminine ;
american_A = regA "américain" ;
}
```
The top-level ``Music`` grammars are obtained by
instantiating the two interfaces of ``MusicI``:
```
concrete MusicGer of Music = MusicI with
(Grammar = GrammarGer),
(MusicLex = MusicLexGer) ;
concrete MusicFre of Music = MusicI with
(Grammar = GrammarFre),
(MusicLex = MusicLexFre) ;
```
Both of these files can use the same ``path``, defined as
```
--# -path=.:present:prelude
```
The ``present`` category contains the compiled resources, restricted to
present tense; ``alltenses`` has the full resources.
To localize the music player system to a new language,
all that is needed is two modules,
one implementing ``MusicLex`` and the other
instantiating ``Music``. The latter is
completely trivial, whereas the former one involves the choice of correct
vocabulary and inflectional paradigms. For instance, Finnish is added as follows:
```
concrete MusicLexFin of MusicLex =
CatFin ** open ParadigmsFin in {
lin
song_N = regN "kappale" ;
american_A = regA "amerikkalainen" ;
}
concrete MusicFin of Music = MusicI with
(Grammar = GrammarFin),
(MusicLex = MusicLexFin) ;
```
More work is of course needed if the language-independent linearizations in
MusicI are not satisfactory for some language. The resource grammar guarantees
that the linearizations are possible in all languages, in the sense of grammatical,
but they might of course be inadequate for stylistic reasons. Assume,
for the sake of argument, that adjectival modification does not sound good in
English, but that a relative clause would be preferrable. One can then start as
before,
```
concrete MusicLexEng of MusicLex =
CatEng ** open ParadigmsEng in {
lin
song_N = regN "song" ;
american_A = regA "American" ;
}
concrete MusicEng0 of Music = MusicI with
(Grammar = GrammarEng),
(MusicLex = MusicLexEng) ;
```
The module ``MusicEng0`` would not be used on the top level, however, but
another module would be built on top of it, with a restricted import from
``MusicEng0``. ``MusicEng`` inherits everything from ``MusicEng0``
except ``PropKind``, and
gives its own definition of this function:
```
concrete MusicEng of Music =
MusicEng0 - [PropKind] ** open GrammarEng in {
lin
PropKind k p =
RelCN k (UseRCl TPres ASimul PPos
(RelVP IdRP (UseComp (CompAP p)))) ;
}
``` ```
===To find rules in the resource grammar library===
===How to find resource functions=== ====Inflection paradigms====
The definitions in this example were found by parsing: Inflection paradigms are defined separately for each language //L//
in the module ``Paradigms``//L//. To test them, the command
``cc`` (= ``compute_concrete``)
can be used:
``` ```
> i LangEng.gf > i -retain german/ParadigmsGer.gf
-- for Successor: > cc mkN "Schlange"
> p -cat=NP -mcfg -parser=topdown "the mother of Paris" {
s : Number => Case => Str = table Number {
-- for Even: Sg => table Case {
> p -cat=S -mcfg -parser=topdown "Paris is old" Nom => "Schlange" ;
Acc => "Schlange" ;
-- for And: Dat => "Schlange" ;
> p -cat=S -mcfg -parser=topdown "Paris is old and I am old" Gen => "Schlange"
} ;
Pl => table Case {
Nom => "Schlangen" ;
Acc => "Schlangen" ;
Dat => "Schlangen" ;
Gen => "Schlangen"
}
} ;
g : Gender = Fem
}
``` ```
The use of parsing can be systematized by **example-based grammar writing**, For the sake of convenience, every language implements these five paradigms:
to which we will return later.
===A functor implementation===
The interesting thing now is that the
code in ``ArithmSwe`` is similar to the code in ``ArithmEng``, except for
some lexical items ("noll" vs. "zero", "efterföljare" vs. "successor",
"jämn" vs. "even"). How can we exploit the similarities and
actually share code between the languages?
The solution is to use a functor: an ``incomplete`` module that opens
an ``abstract`` as an ``interface``, and then instantiate it to different
languages that implement the interface. The structure is as follows:
``` ```
abstract Foo ... oper
mkN : Str -> N ; -- regular nouns
incomplete concrete FooI = open Lang, Lex in ... mkA : Str -> A : -- regular adjectives
mkV : Str -> V ; -- regular verbs
concrete FooEng of Foo = FooI with (Lang=LangEng), (Lex=LexEng) ; mkPN : Str -> PN ; -- regular proper names
concrete FooSwe of Foo = FooI with (Lang=LangSwe), (Lex=LexSwe) ; mkV2 : V -> V2 ; -- direct transitive verbs
``` ```
where ``Lex`` is an abstract lexicon that includes the vocabulary It is often possible to initialize a lexicon by just using these functions,
specific to this application: and later revise it by using the more involved paradigms. For instance, in
German we cannot use ``mkN "Lied"`` for ``Song``, because the result would be a
Masculine noun with the plural form ``"Liede"``.
The individual ``Paradigms`` modules
tell what cases are covered by the regular heuristics.
As a limiting case, one could even initialize the lexicon for a new language
by copying the English (or some other already existing) lexicon. This would
produce language with correct grammar but with content words directly borrowed from
English - maybe not so strange in certain technical domains.
====Syntax rules====
Syntax rules should be looked for in the module ``Constructors``.
Below this top-level module exposing overloaded constructors,
there are around 10 abstract modules, each defining constructors for
a group of one or more related categories. For instance, the module
``Noun`` defines how to construct common nouns, noun phrases, and determiners.
But these special modules are seldom needed by the users of the library.
TODO: when are they needed?
Browsing the libraries is helped by the gfdoc-generated HTML pages,
whose LaTeX versions are included in the present document.
====Browsing by the parser====
A method alternative to browsing library documentation is
to use the parser.
Even though parsing is not an intended end-user application
of resource grammars, it is a useful technique for application grammarians
to browse the library. To find out which resource function implements
a particular structure, one can just parse a string that exemplifies this
structure. For instance, to find out how sentences are built using
transitive verbs, write
``` ```
abstract Lex = Cat ** ... > i english/LangEng.gf
> p -cat=Cl -fcfg "she loves him"
concrete LexEng of Lex = CatEng ** open ParadigmsEng in ... PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron))
concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in ...
``` ```
Here, again, a complete example (``abstract Arithm`` is as above): The parser returns original constructors, not overloaded ones.
Parsing with the English resource grammar has an acceptable speed, but
with most languages it takes just too much resources even to build the
parser. However, examples parsed in one language can always be linearized into
other languages:
``` ```
incomplete concrete ArithmI of Arithm = open Lang, Lex in { > i italian/LangIta.gf
lincat
Prop = S ;
Nat = NP ;
lin
Zero =
UsePN zero_PN ;
Succ n =
DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 successor_N2 n) ;
Even n =
UseCl TPres ASimul PPos
(PredVP n (UseComp (CompAP (PositA even_A)))) ;
And x y =
ConjS and_Conj (BaseS x y) ;
}
--# -path=.:alltenses:prelude > l PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron))
concrete ArithmEng of Arithm = ArithmI with
(Lang = LangEng),
(Lex = LexEng) ;
--# -path=.:alltenses:prelude lo ama
concrete ArithmSwe of Arithm = ArithmI with
(Lang = LangSwe),
(Lex = LexSwe) ;
abstract Lex = Cat ** {
fun
zero_PN : PN ;
successor_N2 : N2 ;
even_A : A ;
}
concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in {
lin
zero_PN = regPN "noll" neutrum ;
successor_N2 =
mkN2 (mk2N "efterföljare" "efterföljare") (mkPreposition "till") ;
even_A = regA "jämn" ;
}
``` ```
Therefore, one can use the English parser to write an Italian grammar, and also
to write a language-independent (incomplete) grammar. One can also parse strings
that are bizarre in English but the intended way of expression in another language.
For instance, the phrase for "I am hungry" in Italian is literally "I have hunger".
This can be built by parsing "I have beer" in LanEng and then writing
```
lin IamHungry =
let beer_N = regGenN "fame" feminine
in
PredVP (UsePron i_Pron) (ComplV2 have_V2
(DetCN (DetSg MassDet NoOrd) (UseN beer_N))) ;
```
which uses ParadigmsIta.regGenN.
===Restricted inheritance and qualified opening===