resource doc in tutorial

This commit is contained in:
aarne
2007-05-31 13:43:46 +00:00
parent 76268417db
commit e7b7def313
2 changed files with 320 additions and 142 deletions

View File

@@ -1,5 +1,5 @@
The GF Resource Grammar Library
Author: Aarne Ranta, Ali El Dada, and Janna Khegai
The GF Resource Grammar Library, Version 1.2
Authors: Aarne Ranta, Ali El Dada, Janna Khegai, and Björn Bringert
Last update: %%date(%c)
% NOTE: this is a txt2tags file.

View File

@@ -1658,174 +1658,352 @@ All of the following uses of ``mkN`` are easy to resolve:
%--!
==Using the resource grammar library TODO==
A resource grammar is a grammar built on linguistic grounds,
to describe a language rather than a domain.
The GF resource grammar library, which contains resource grammars for
10 languages, is described more closely in the following
documents:
- [Resource library API documentation ../../lib/resource-1.0/doc/]:
for application grammarians using the resource.
- [Resource writing HOWTO ../../lib/resource-1.0/doc/Resource-HOWTO.html]:
for resource grammarians developing the resource.
===Coverage===
The GF Resource Grammar Library contains grammar rules for
10 languages (in addition, 2 languages are available as incomplete
implementations, and a few more are under construction). Its purpose
is to make these rules available for application programmers,
who can thereby concentrate on the semantic and stylistic
aspects of their grammars, without having to think about
grammaticality. The targeted level of application grammarians
is that of a skilled programmer with
a practical knowledge of the target languages, but without
theoretical knowledge about their grammars.
Such a combination of
skills is typical of programmers who want to localize
software to new languages.
The current resource languages are
- ``Ara``bic
- ``Cat``alan
- ``Dan``ish
- ``Eng``lish
- ``Fin``nish
- ``Fre``nch
- ``Ger``man
- ``Ita``lian
- ``Nor``wegian
- ``Rus``sian
- ``Spa``nish
- ``Swe``dish
===Interfaces, instances, and functors===
The first three letters (``Eng`` etc) are used in grammar module names.
The Arabic and Catalan implementations are still incomplete, but
enough to be used in many applications.
===The simplest way===
The simplest way is to ``open`` a top-level ``Lang`` module
and a ``Paradigms`` module:
To give an example application, consider
music playing devices. In the application,
we may have a semantical category ``Kind``, examples
of ``Kind``s being ``Song`` and ``Artist``. In German, for instance, ``Song``
is linearized into the noun "Lied", but knowing this is not
enough to make the application work, because the noun must be
produced in both singular and plural, and in four different
cases. By using the resource grammar library, it is enough to
write
```
abstract Foo = ...
concrete FooEng = open LangEng, ParadigmsEng in ...
concrete FooSwe = open LangSwe, ParadigmsSwe in ...
lin Song = mkN "Lied" "Lieder" neuter
```
Here is an example.
and the eight forms are correctly generated. The resource grammar
library contains a complete set of inflectional paradigms (such as
``mkN`` here), enabling the definition of any lexical items.
The resource grammar library is not only about inflectional paradigms - it
also has syntax rules. The music player application
might also want to modify songs with properties, such as "American",
"old", "good". The German grammar for adjectival modifications is
particularly complex, because adjectives have to agree in gender,
number, and case, and also depend on what determiner is used
("ein amerikanisches Lied" vs. "das amerikanische Lied"). All this
variation is taken care of by the resource grammar function
```
abstract Arithm = {
cat
Prop ;
Nat ;
fun
Zero : Nat ;
Succ : Nat -> Nat ;
Even : Nat -> Prop ;
And : Prop -> Prop -> Prop ;
}
fun AdjCN : AP -> CN -> CN
```
(see the tables in the end of this document for the list of all resource grammar
functions). The resource grammar implementation of the rule adding properties
to kinds is
```
lin PropKind kind prop = AdjCN prop kind
```
given that
```
lincat Prop = AP
lincat Kind = CN
```
The resource library API is devided into language-specific
and language-independent parts. To put it roughly,
- the lexicon API is language-specific
- the syntax API is language-independent
--# -path=.:alltenses:prelude
concrete ArithmEng of Arithm = open LangEng, ParadigmsEng in {
lincat
Prop = S ;
Nat = NP ;
lin
Zero =
UsePN (regPN "zero" nonhuman) ;
Succ n =
DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 (regN2 "successor") n) ;
Even n =
UseCl TPres ASimul PPos
(PredVP n (UseComp (CompAP (PositA (regA "even"))))) ;
And x y =
ConjS and_Conj (BaseS x y) ;
Thus, to render the above example in French instead of German, we need to
pick a different linearization of ``Song``,
```
lin Song = mkN "chanson" feminine
```
But to linearize ``PropKind``, we can use the very same rule as in German.
The resource function ``AdjCN`` has different implementations in the two
languages (e.g. a different word order in French),
but the application programmer need not care about the difference.
}
--# -path=.:alltenses:prelude
===Note on APIs===
concrete ArithmSwe of Arithm = open LangSwe, ParadigmsSwe in {
lincat
Prop = S ;
Nat = NP ;
lin
Zero =
UsePN (regPN "noll" neutrum) ;
Succ n =
DetCN (DetSg (SgQuant DefArt) NoOrd)
(ComplN2 (mkN2 (mk2N "efterföljare" "efterföljare")
(mkPreposition "till")) n) ;
Even n =
UseCl TPres ASimul PPos
(PredVP n (UseComp (CompAP (PositA (regA "jämn"))))) ;
And x y =
ConjS and_Conj (BaseS x y) ;
}
From version 1.1 onwards, the resource library is available via two
APIs:
- original ``fun`` and ``oper`` definitions
- overloaded ``oper`` definitions
Introducing overloading in GF version 2.7 has been a success in improving
the accessibility of libraries. It has also created a layer of abstraction
between the writers and users of libraries, and thereby makes the library
easier to modify. We shall therefore use the overloaded API
in this document. The original function names are mainly interesting
for those who want to write or modify libraries.
===A complete example===
To summarize the example, and also give a template for a programmer to work on,
here is the complete implementation of a small system with songs and properties.
The abstract syntax defines a "domain ontology":
```
abstract Music = {
cat
Kind,
Property ;
fun
PropKind : Kind -> Property -> Kind ;
Song : Kind ;
American : Property ;
}
```
The concrete syntax is defined by a functor (parametrized module),
independently of language, by opening
two interfaces: the resource ``Grammar`` and an application lexicon.
```
incomplete concrete MusicI of Music = open Grammar, MusicLex in {
lincat
Kind = CN ;
Property = AP ;
lin
PropKind k p = AdjCN p k ;
Song = UseN song_N ;
American = PositA american_A ;
}
```
The application lexicon ``MusicLex`` has an abstract syntax that extends
the resource category system ``Cat``.
```
abstract MusicLex = Cat ** {
fun
song_N : N ;
american_A : A ;
}
```
Each language has its own concrete syntax, which opens the
inflectional paradigms module for that language:
```
concrete MusicLexGer of MusicLex =
CatGer ** open ParadigmsGer in {
lin
song_N = reg2N "Lied" "Lieder" neuter ;
american_A = regA "amerikanisch" ;
}
concrete MusicLexFre of MusicLex =
CatFre ** open ParadigmsFre in {
lin
song_N = regGenN "chanson" feminine ;
american_A = regA "américain" ;
}
```
The top-level ``Music`` grammars are obtained by
instantiating the two interfaces of ``MusicI``:
```
concrete MusicGer of Music = MusicI with
(Grammar = GrammarGer),
(MusicLex = MusicLexGer) ;
concrete MusicFre of Music = MusicI with
(Grammar = GrammarFre),
(MusicLex = MusicLexFre) ;
```
Both of these files can use the same ``path``, defined as
```
--# -path=.:present:prelude
```
The ``present`` category contains the compiled resources, restricted to
present tense; ``alltenses`` has the full resources.
To localize the music player system to a new language,
all that is needed is two modules,
one implementing ``MusicLex`` and the other
instantiating ``Music``. The latter is
completely trivial, whereas the former one involves the choice of correct
vocabulary and inflectional paradigms. For instance, Finnish is added as follows:
```
concrete MusicLexFin of MusicLex =
CatFin ** open ParadigmsFin in {
lin
song_N = regN "kappale" ;
american_A = regA "amerikkalainen" ;
}
concrete MusicFin of Music = MusicI with
(Grammar = GrammarFin),
(MusicLex = MusicLexFin) ;
```
More work is of course needed if the language-independent linearizations in
MusicI are not satisfactory for some language. The resource grammar guarantees
that the linearizations are possible in all languages, in the sense of grammatical,
but they might of course be inadequate for stylistic reasons. Assume,
for the sake of argument, that adjectival modification does not sound good in
English, but that a relative clause would be preferrable. One can then start as
before,
```
concrete MusicLexEng of MusicLex =
CatEng ** open ParadigmsEng in {
lin
song_N = regN "song" ;
american_A = regA "American" ;
}
concrete MusicEng0 of Music = MusicI with
(Grammar = GrammarEng),
(MusicLex = MusicLexEng) ;
```
The module ``MusicEng0`` would not be used on the top level, however, but
another module would be built on top of it, with a restricted import from
``MusicEng0``. ``MusicEng`` inherits everything from ``MusicEng0``
except ``PropKind``, and
gives its own definition of this function:
```
concrete MusicEng of Music =
MusicEng0 - [PropKind] ** open GrammarEng in {
lin
PropKind k p =
RelCN k (UseRCl TPres ASimul PPos
(RelVP IdRP (UseComp (CompAP p)))) ;
}
```
===To find rules in the resource grammar library===
===How to find resource functions===
====Inflection paradigms====
The definitions in this example were found by parsing:
Inflection paradigms are defined separately for each language //L//
in the module ``Paradigms``//L//. To test them, the command
``cc`` (= ``compute_concrete``)
can be used:
```
> i LangEng.gf
> i -retain german/ParadigmsGer.gf
-- for Successor:
> p -cat=NP -mcfg -parser=topdown "the mother of Paris"
-- for Even:
> p -cat=S -mcfg -parser=topdown "Paris is old"
-- for And:
> p -cat=S -mcfg -parser=topdown "Paris is old and I am old"
> cc mkN "Schlange"
{
s : Number => Case => Str = table Number {
Sg => table Case {
Nom => "Schlange" ;
Acc => "Schlange" ;
Dat => "Schlange" ;
Gen => "Schlange"
} ;
Pl => table Case {
Nom => "Schlangen" ;
Acc => "Schlangen" ;
Dat => "Schlangen" ;
Gen => "Schlangen"
}
} ;
g : Gender = Fem
}
```
The use of parsing can be systematized by **example-based grammar writing**,
to which we will return later.
===A functor implementation===
The interesting thing now is that the
code in ``ArithmSwe`` is similar to the code in ``ArithmEng``, except for
some lexical items ("noll" vs. "zero", "efterföljare" vs. "successor",
"jämn" vs. "even"). How can we exploit the similarities and
actually share code between the languages?
The solution is to use a functor: an ``incomplete`` module that opens
an ``abstract`` as an ``interface``, and then instantiate it to different
languages that implement the interface. The structure is as follows:
For the sake of convenience, every language implements these five paradigms:
```
abstract Foo ...
incomplete concrete FooI = open Lang, Lex in ...
concrete FooEng of Foo = FooI with (Lang=LangEng), (Lex=LexEng) ;
concrete FooSwe of Foo = FooI with (Lang=LangSwe), (Lex=LexSwe) ;
oper
mkN : Str -> N ; -- regular nouns
mkA : Str -> A : -- regular adjectives
mkV : Str -> V ; -- regular verbs
mkPN : Str -> PN ; -- regular proper names
mkV2 : V -> V2 ; -- direct transitive verbs
```
where ``Lex`` is an abstract lexicon that includes the vocabulary
specific to this application:
It is often possible to initialize a lexicon by just using these functions,
and later revise it by using the more involved paradigms. For instance, in
German we cannot use ``mkN "Lied"`` for ``Song``, because the result would be a
Masculine noun with the plural form ``"Liede"``.
The individual ``Paradigms`` modules
tell what cases are covered by the regular heuristics.
As a limiting case, one could even initialize the lexicon for a new language
by copying the English (or some other already existing) lexicon. This would
produce language with correct grammar but with content words directly borrowed from
English - maybe not so strange in certain technical domains.
====Syntax rules====
Syntax rules should be looked for in the module ``Constructors``.
Below this top-level module exposing overloaded constructors,
there are around 10 abstract modules, each defining constructors for
a group of one or more related categories. For instance, the module
``Noun`` defines how to construct common nouns, noun phrases, and determiners.
But these special modules are seldom needed by the users of the library.
TODO: when are they needed?
Browsing the libraries is helped by the gfdoc-generated HTML pages,
whose LaTeX versions are included in the present document.
====Browsing by the parser====
A method alternative to browsing library documentation is
to use the parser.
Even though parsing is not an intended end-user application
of resource grammars, it is a useful technique for application grammarians
to browse the library. To find out which resource function implements
a particular structure, one can just parse a string that exemplifies this
structure. For instance, to find out how sentences are built using
transitive verbs, write
```
abstract Lex = Cat ** ...
> i english/LangEng.gf
> p -cat=Cl -fcfg "she loves him"
concrete LexEng of Lex = CatEng ** open ParadigmsEng in ...
concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in ...
PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron))
```
Here, again, a complete example (``abstract Arithm`` is as above):
The parser returns original constructors, not overloaded ones.
Parsing with the English resource grammar has an acceptable speed, but
with most languages it takes just too much resources even to build the
parser. However, examples parsed in one language can always be linearized into
other languages:
```
incomplete concrete ArithmI of Arithm = open Lang, Lex in {
lincat
Prop = S ;
Nat = NP ;
lin
Zero =
UsePN zero_PN ;
Succ n =
DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 successor_N2 n) ;
Even n =
UseCl TPres ASimul PPos
(PredVP n (UseComp (CompAP (PositA even_A)))) ;
And x y =
ConjS and_Conj (BaseS x y) ;
}
> i italian/LangIta.gf
--# -path=.:alltenses:prelude
concrete ArithmEng of Arithm = ArithmI with
(Lang = LangEng),
(Lex = LexEng) ;
> l PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron))
--# -path=.:alltenses:prelude
concrete ArithmSwe of Arithm = ArithmI with
(Lang = LangSwe),
(Lex = LexSwe) ;
abstract Lex = Cat ** {
fun
zero_PN : PN ;
successor_N2 : N2 ;
even_A : A ;
}
concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in {
lin
zero_PN = regPN "noll" neutrum ;
successor_N2 =
mkN2 (mk2N "efterföljare" "efterföljare") (mkPreposition "till") ;
even_A = regA "jämn" ;
}
lo ama
```
Therefore, one can use the English parser to write an Italian grammar, and also
to write a language-independent (incomplete) grammar. One can also parse strings
that are bizarre in English but the intended way of expression in another language.
For instance, the phrase for "I am hungry" in Italian is literally "I have hunger".
This can be built by parsing "I have beer" in LanEng and then writing
```
lin IamHungry =
let beer_N = regGenN "fame" feminine
in
PredVP (UsePron i_Pron) (ComplV2 have_V2
(DetCN (DetSg MassDet NoOrd) (UseN beer_N))) ;
```
which uses ParadigmsIta.regGenN.
===Restricted inheritance and qualified opening===