mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-13 23:09:31 -06:00
resource doc in tutorial
This commit is contained in:
@@ -1,5 +1,5 @@
|
||||
The GF Resource Grammar Library
|
||||
Author: Aarne Ranta, Ali El Dada, and Janna Khegai
|
||||
The GF Resource Grammar Library, Version 1.2
|
||||
Authors: Aarne Ranta, Ali El Dada, Janna Khegai, and Björn Bringert
|
||||
Last update: %%date(%c)
|
||||
|
||||
% NOTE: this is a txt2tags file.
|
||||
|
||||
@@ -1658,174 +1658,352 @@ All of the following uses of ``mkN`` are easy to resolve:
|
||||
%--!
|
||||
==Using the resource grammar library TODO==
|
||||
|
||||
A resource grammar is a grammar built on linguistic grounds,
|
||||
to describe a language rather than a domain.
|
||||
The GF resource grammar library, which contains resource grammars for
|
||||
10 languages, is described more closely in the following
|
||||
documents:
|
||||
- [Resource library API documentation ../../lib/resource-1.0/doc/]:
|
||||
for application grammarians using the resource.
|
||||
- [Resource writing HOWTO ../../lib/resource-1.0/doc/Resource-HOWTO.html]:
|
||||
for resource grammarians developing the resource.
|
||||
===Coverage===
|
||||
|
||||
The GF Resource Grammar Library contains grammar rules for
|
||||
10 languages (in addition, 2 languages are available as incomplete
|
||||
implementations, and a few more are under construction). Its purpose
|
||||
is to make these rules available for application programmers,
|
||||
who can thereby concentrate on the semantic and stylistic
|
||||
aspects of their grammars, without having to think about
|
||||
grammaticality. The targeted level of application grammarians
|
||||
is that of a skilled programmer with
|
||||
a practical knowledge of the target languages, but without
|
||||
theoretical knowledge about their grammars.
|
||||
Such a combination of
|
||||
skills is typical of programmers who want to localize
|
||||
software to new languages.
|
||||
|
||||
The current resource languages are
|
||||
- ``Ara``bic
|
||||
- ``Cat``alan
|
||||
- ``Dan``ish
|
||||
- ``Eng``lish
|
||||
- ``Fin``nish
|
||||
- ``Fre``nch
|
||||
- ``Ger``man
|
||||
- ``Ita``lian
|
||||
- ``Nor``wegian
|
||||
- ``Rus``sian
|
||||
- ``Spa``nish
|
||||
- ``Swe``dish
|
||||
|
||||
|
||||
===Interfaces, instances, and functors===
|
||||
The first three letters (``Eng`` etc) are used in grammar module names.
|
||||
The Arabic and Catalan implementations are still incomplete, but
|
||||
enough to be used in many applications.
|
||||
|
||||
===The simplest way===
|
||||
|
||||
The simplest way is to ``open`` a top-level ``Lang`` module
|
||||
and a ``Paradigms`` module:
|
||||
To give an example application, consider
|
||||
music playing devices. In the application,
|
||||
we may have a semantical category ``Kind``, examples
|
||||
of ``Kind``s being ``Song`` and ``Artist``. In German, for instance, ``Song``
|
||||
is linearized into the noun "Lied", but knowing this is not
|
||||
enough to make the application work, because the noun must be
|
||||
produced in both singular and plural, and in four different
|
||||
cases. By using the resource grammar library, it is enough to
|
||||
write
|
||||
```
|
||||
abstract Foo = ...
|
||||
|
||||
concrete FooEng = open LangEng, ParadigmsEng in ...
|
||||
concrete FooSwe = open LangSwe, ParadigmsSwe in ...
|
||||
lin Song = mkN "Lied" "Lieder" neuter
|
||||
```
|
||||
Here is an example.
|
||||
and the eight forms are correctly generated. The resource grammar
|
||||
library contains a complete set of inflectional paradigms (such as
|
||||
``mkN`` here), enabling the definition of any lexical items.
|
||||
|
||||
The resource grammar library is not only about inflectional paradigms - it
|
||||
also has syntax rules. The music player application
|
||||
might also want to modify songs with properties, such as "American",
|
||||
"old", "good". The German grammar for adjectival modifications is
|
||||
particularly complex, because adjectives have to agree in gender,
|
||||
number, and case, and also depend on what determiner is used
|
||||
("ein amerikanisches Lied" vs. "das amerikanische Lied"). All this
|
||||
variation is taken care of by the resource grammar function
|
||||
```
|
||||
abstract Arithm = {
|
||||
cat
|
||||
Prop ;
|
||||
Nat ;
|
||||
fun
|
||||
Zero : Nat ;
|
||||
Succ : Nat -> Nat ;
|
||||
Even : Nat -> Prop ;
|
||||
And : Prop -> Prop -> Prop ;
|
||||
}
|
||||
fun AdjCN : AP -> CN -> CN
|
||||
```
|
||||
(see the tables in the end of this document for the list of all resource grammar
|
||||
functions). The resource grammar implementation of the rule adding properties
|
||||
to kinds is
|
||||
```
|
||||
lin PropKind kind prop = AdjCN prop kind
|
||||
```
|
||||
given that
|
||||
```
|
||||
lincat Prop = AP
|
||||
lincat Kind = CN
|
||||
```
|
||||
The resource library API is devided into language-specific
|
||||
and language-independent parts. To put it roughly,
|
||||
- the lexicon API is language-specific
|
||||
- the syntax API is language-independent
|
||||
|
||||
--# -path=.:alltenses:prelude
|
||||
|
||||
concrete ArithmEng of Arithm = open LangEng, ParadigmsEng in {
|
||||
lincat
|
||||
Prop = S ;
|
||||
Nat = NP ;
|
||||
lin
|
||||
Zero =
|
||||
UsePN (regPN "zero" nonhuman) ;
|
||||
Succ n =
|
||||
DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 (regN2 "successor") n) ;
|
||||
Even n =
|
||||
UseCl TPres ASimul PPos
|
||||
(PredVP n (UseComp (CompAP (PositA (regA "even"))))) ;
|
||||
And x y =
|
||||
ConjS and_Conj (BaseS x y) ;
|
||||
Thus, to render the above example in French instead of German, we need to
|
||||
pick a different linearization of ``Song``,
|
||||
```
|
||||
lin Song = mkN "chanson" feminine
|
||||
```
|
||||
But to linearize ``PropKind``, we can use the very same rule as in German.
|
||||
The resource function ``AdjCN`` has different implementations in the two
|
||||
languages (e.g. a different word order in French),
|
||||
but the application programmer need not care about the difference.
|
||||
|
||||
}
|
||||
|
||||
--# -path=.:alltenses:prelude
|
||||
===Note on APIs===
|
||||
|
||||
concrete ArithmSwe of Arithm = open LangSwe, ParadigmsSwe in {
|
||||
lincat
|
||||
Prop = S ;
|
||||
Nat = NP ;
|
||||
lin
|
||||
Zero =
|
||||
UsePN (regPN "noll" neutrum) ;
|
||||
Succ n =
|
||||
DetCN (DetSg (SgQuant DefArt) NoOrd)
|
||||
(ComplN2 (mkN2 (mk2N "efterföljare" "efterföljare")
|
||||
(mkPreposition "till")) n) ;
|
||||
Even n =
|
||||
UseCl TPres ASimul PPos
|
||||
(PredVP n (UseComp (CompAP (PositA (regA "jämn"))))) ;
|
||||
And x y =
|
||||
ConjS and_Conj (BaseS x y) ;
|
||||
}
|
||||
From version 1.1 onwards, the resource library is available via two
|
||||
APIs:
|
||||
- original ``fun`` and ``oper`` definitions
|
||||
- overloaded ``oper`` definitions
|
||||
|
||||
|
||||
Introducing overloading in GF version 2.7 has been a success in improving
|
||||
the accessibility of libraries. It has also created a layer of abstraction
|
||||
between the writers and users of libraries, and thereby makes the library
|
||||
easier to modify. We shall therefore use the overloaded API
|
||||
in this document. The original function names are mainly interesting
|
||||
for those who want to write or modify libraries.
|
||||
|
||||
|
||||
|
||||
===A complete example===
|
||||
|
||||
To summarize the example, and also give a template for a programmer to work on,
|
||||
here is the complete implementation of a small system with songs and properties.
|
||||
The abstract syntax defines a "domain ontology":
|
||||
```
|
||||
abstract Music = {
|
||||
cat
|
||||
Kind,
|
||||
Property ;
|
||||
fun
|
||||
PropKind : Kind -> Property -> Kind ;
|
||||
Song : Kind ;
|
||||
American : Property ;
|
||||
}
|
||||
```
|
||||
The concrete syntax is defined by a functor (parametrized module),
|
||||
independently of language, by opening
|
||||
two interfaces: the resource ``Grammar`` and an application lexicon.
|
||||
```
|
||||
incomplete concrete MusicI of Music = open Grammar, MusicLex in {
|
||||
lincat
|
||||
Kind = CN ;
|
||||
Property = AP ;
|
||||
lin
|
||||
PropKind k p = AdjCN p k ;
|
||||
Song = UseN song_N ;
|
||||
American = PositA american_A ;
|
||||
}
|
||||
```
|
||||
The application lexicon ``MusicLex`` has an abstract syntax that extends
|
||||
the resource category system ``Cat``.
|
||||
```
|
||||
abstract MusicLex = Cat ** {
|
||||
fun
|
||||
song_N : N ;
|
||||
american_A : A ;
|
||||
}
|
||||
```
|
||||
Each language has its own concrete syntax, which opens the
|
||||
inflectional paradigms module for that language:
|
||||
```
|
||||
concrete MusicLexGer of MusicLex =
|
||||
CatGer ** open ParadigmsGer in {
|
||||
lin
|
||||
song_N = reg2N "Lied" "Lieder" neuter ;
|
||||
american_A = regA "amerikanisch" ;
|
||||
}
|
||||
|
||||
concrete MusicLexFre of MusicLex =
|
||||
CatFre ** open ParadigmsFre in {
|
||||
lin
|
||||
song_N = regGenN "chanson" feminine ;
|
||||
american_A = regA "américain" ;
|
||||
}
|
||||
```
|
||||
The top-level ``Music`` grammars are obtained by
|
||||
instantiating the two interfaces of ``MusicI``:
|
||||
```
|
||||
concrete MusicGer of Music = MusicI with
|
||||
(Grammar = GrammarGer),
|
||||
(MusicLex = MusicLexGer) ;
|
||||
|
||||
concrete MusicFre of Music = MusicI with
|
||||
(Grammar = GrammarFre),
|
||||
(MusicLex = MusicLexFre) ;
|
||||
```
|
||||
Both of these files can use the same ``path``, defined as
|
||||
```
|
||||
--# -path=.:present:prelude
|
||||
```
|
||||
The ``present`` category contains the compiled resources, restricted to
|
||||
present tense; ``alltenses`` has the full resources.
|
||||
|
||||
To localize the music player system to a new language,
|
||||
all that is needed is two modules,
|
||||
one implementing ``MusicLex`` and the other
|
||||
instantiating ``Music``. The latter is
|
||||
completely trivial, whereas the former one involves the choice of correct
|
||||
vocabulary and inflectional paradigms. For instance, Finnish is added as follows:
|
||||
```
|
||||
concrete MusicLexFin of MusicLex =
|
||||
CatFin ** open ParadigmsFin in {
|
||||
lin
|
||||
song_N = regN "kappale" ;
|
||||
american_A = regA "amerikkalainen" ;
|
||||
}
|
||||
|
||||
concrete MusicFin of Music = MusicI with
|
||||
(Grammar = GrammarFin),
|
||||
(MusicLex = MusicLexFin) ;
|
||||
```
|
||||
More work is of course needed if the language-independent linearizations in
|
||||
MusicI are not satisfactory for some language. The resource grammar guarantees
|
||||
that the linearizations are possible in all languages, in the sense of grammatical,
|
||||
but they might of course be inadequate for stylistic reasons. Assume,
|
||||
for the sake of argument, that adjectival modification does not sound good in
|
||||
English, but that a relative clause would be preferrable. One can then start as
|
||||
before,
|
||||
```
|
||||
concrete MusicLexEng of MusicLex =
|
||||
CatEng ** open ParadigmsEng in {
|
||||
lin
|
||||
song_N = regN "song" ;
|
||||
american_A = regA "American" ;
|
||||
}
|
||||
|
||||
concrete MusicEng0 of Music = MusicI with
|
||||
(Grammar = GrammarEng),
|
||||
(MusicLex = MusicLexEng) ;
|
||||
```
|
||||
The module ``MusicEng0`` would not be used on the top level, however, but
|
||||
another module would be built on top of it, with a restricted import from
|
||||
``MusicEng0``. ``MusicEng`` inherits everything from ``MusicEng0``
|
||||
except ``PropKind``, and
|
||||
gives its own definition of this function:
|
||||
```
|
||||
concrete MusicEng of Music =
|
||||
MusicEng0 - [PropKind] ** open GrammarEng in {
|
||||
lin
|
||||
PropKind k p =
|
||||
RelCN k (UseRCl TPres ASimul PPos
|
||||
(RelVP IdRP (UseComp (CompAP p)))) ;
|
||||
}
|
||||
```
|
||||
|
||||
===To find rules in the resource grammar library===
|
||||
|
||||
===How to find resource functions===
|
||||
====Inflection paradigms====
|
||||
|
||||
The definitions in this example were found by parsing:
|
||||
Inflection paradigms are defined separately for each language //L//
|
||||
in the module ``Paradigms``//L//. To test them, the command
|
||||
``cc`` (= ``compute_concrete``)
|
||||
can be used:
|
||||
```
|
||||
> i LangEng.gf
|
||||
> i -retain german/ParadigmsGer.gf
|
||||
|
||||
-- for Successor:
|
||||
> p -cat=NP -mcfg -parser=topdown "the mother of Paris"
|
||||
|
||||
-- for Even:
|
||||
> p -cat=S -mcfg -parser=topdown "Paris is old"
|
||||
|
||||
-- for And:
|
||||
> p -cat=S -mcfg -parser=topdown "Paris is old and I am old"
|
||||
> cc mkN "Schlange"
|
||||
{
|
||||
s : Number => Case => Str = table Number {
|
||||
Sg => table Case {
|
||||
Nom => "Schlange" ;
|
||||
Acc => "Schlange" ;
|
||||
Dat => "Schlange" ;
|
||||
Gen => "Schlange"
|
||||
} ;
|
||||
Pl => table Case {
|
||||
Nom => "Schlangen" ;
|
||||
Acc => "Schlangen" ;
|
||||
Dat => "Schlangen" ;
|
||||
Gen => "Schlangen"
|
||||
}
|
||||
} ;
|
||||
g : Gender = Fem
|
||||
}
|
||||
```
|
||||
The use of parsing can be systematized by **example-based grammar writing**,
|
||||
to which we will return later.
|
||||
|
||||
|
||||
===A functor implementation===
|
||||
|
||||
The interesting thing now is that the
|
||||
code in ``ArithmSwe`` is similar to the code in ``ArithmEng``, except for
|
||||
some lexical items ("noll" vs. "zero", "efterföljare" vs. "successor",
|
||||
"jämn" vs. "even"). How can we exploit the similarities and
|
||||
actually share code between the languages?
|
||||
|
||||
The solution is to use a functor: an ``incomplete`` module that opens
|
||||
an ``abstract`` as an ``interface``, and then instantiate it to different
|
||||
languages that implement the interface. The structure is as follows:
|
||||
For the sake of convenience, every language implements these five paradigms:
|
||||
```
|
||||
abstract Foo ...
|
||||
|
||||
incomplete concrete FooI = open Lang, Lex in ...
|
||||
|
||||
concrete FooEng of Foo = FooI with (Lang=LangEng), (Lex=LexEng) ;
|
||||
concrete FooSwe of Foo = FooI with (Lang=LangSwe), (Lex=LexSwe) ;
|
||||
oper
|
||||
mkN : Str -> N ; -- regular nouns
|
||||
mkA : Str -> A : -- regular adjectives
|
||||
mkV : Str -> V ; -- regular verbs
|
||||
mkPN : Str -> PN ; -- regular proper names
|
||||
mkV2 : V -> V2 ; -- direct transitive verbs
|
||||
```
|
||||
where ``Lex`` is an abstract lexicon that includes the vocabulary
|
||||
specific to this application:
|
||||
It is often possible to initialize a lexicon by just using these functions,
|
||||
and later revise it by using the more involved paradigms. For instance, in
|
||||
German we cannot use ``mkN "Lied"`` for ``Song``, because the result would be a
|
||||
Masculine noun with the plural form ``"Liede"``.
|
||||
The individual ``Paradigms`` modules
|
||||
tell what cases are covered by the regular heuristics.
|
||||
|
||||
As a limiting case, one could even initialize the lexicon for a new language
|
||||
by copying the English (or some other already existing) lexicon. This would
|
||||
produce language with correct grammar but with content words directly borrowed from
|
||||
English - maybe not so strange in certain technical domains.
|
||||
|
||||
|
||||
|
||||
====Syntax rules====
|
||||
|
||||
Syntax rules should be looked for in the module ``Constructors``.
|
||||
Below this top-level module exposing overloaded constructors,
|
||||
there are around 10 abstract modules, each defining constructors for
|
||||
a group of one or more related categories. For instance, the module
|
||||
``Noun`` defines how to construct common nouns, noun phrases, and determiners.
|
||||
But these special modules are seldom needed by the users of the library.
|
||||
|
||||
TODO: when are they needed?
|
||||
|
||||
Browsing the libraries is helped by the gfdoc-generated HTML pages,
|
||||
whose LaTeX versions are included in the present document.
|
||||
|
||||
|
||||
|
||||
====Browsing by the parser====
|
||||
|
||||
A method alternative to browsing library documentation is
|
||||
to use the parser.
|
||||
Even though parsing is not an intended end-user application
|
||||
of resource grammars, it is a useful technique for application grammarians
|
||||
to browse the library. To find out which resource function implements
|
||||
a particular structure, one can just parse a string that exemplifies this
|
||||
structure. For instance, to find out how sentences are built using
|
||||
transitive verbs, write
|
||||
```
|
||||
abstract Lex = Cat ** ...
|
||||
> i english/LangEng.gf
|
||||
|
||||
> p -cat=Cl -fcfg "she loves him"
|
||||
|
||||
concrete LexEng of Lex = CatEng ** open ParadigmsEng in ...
|
||||
concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in ...
|
||||
PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron))
|
||||
```
|
||||
Here, again, a complete example (``abstract Arithm`` is as above):
|
||||
The parser returns original constructors, not overloaded ones.
|
||||
|
||||
Parsing with the English resource grammar has an acceptable speed, but
|
||||
with most languages it takes just too much resources even to build the
|
||||
parser. However, examples parsed in one language can always be linearized into
|
||||
other languages:
|
||||
```
|
||||
incomplete concrete ArithmI of Arithm = open Lang, Lex in {
|
||||
lincat
|
||||
Prop = S ;
|
||||
Nat = NP ;
|
||||
lin
|
||||
Zero =
|
||||
UsePN zero_PN ;
|
||||
Succ n =
|
||||
DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 successor_N2 n) ;
|
||||
Even n =
|
||||
UseCl TPres ASimul PPos
|
||||
(PredVP n (UseComp (CompAP (PositA even_A)))) ;
|
||||
And x y =
|
||||
ConjS and_Conj (BaseS x y) ;
|
||||
}
|
||||
> i italian/LangIta.gf
|
||||
|
||||
--# -path=.:alltenses:prelude
|
||||
concrete ArithmEng of Arithm = ArithmI with
|
||||
(Lang = LangEng),
|
||||
(Lex = LexEng) ;
|
||||
> l PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron))
|
||||
|
||||
--# -path=.:alltenses:prelude
|
||||
concrete ArithmSwe of Arithm = ArithmI with
|
||||
(Lang = LangSwe),
|
||||
(Lex = LexSwe) ;
|
||||
|
||||
abstract Lex = Cat ** {
|
||||
fun
|
||||
zero_PN : PN ;
|
||||
successor_N2 : N2 ;
|
||||
even_A : A ;
|
||||
}
|
||||
|
||||
concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in {
|
||||
lin
|
||||
zero_PN = regPN "noll" neutrum ;
|
||||
successor_N2 =
|
||||
mkN2 (mk2N "efterföljare" "efterföljare") (mkPreposition "till") ;
|
||||
even_A = regA "jämn" ;
|
||||
}
|
||||
lo ama
|
||||
```
|
||||
Therefore, one can use the English parser to write an Italian grammar, and also
|
||||
to write a language-independent (incomplete) grammar. One can also parse strings
|
||||
that are bizarre in English but the intended way of expression in another language.
|
||||
For instance, the phrase for "I am hungry" in Italian is literally "I have hunger".
|
||||
This can be built by parsing "I have beer" in LanEng and then writing
|
||||
```
|
||||
lin IamHungry =
|
||||
let beer_N = regGenN "fame" feminine
|
||||
in
|
||||
PredVP (UsePron i_Pron) (ComplV2 have_V2
|
||||
(DetCN (DetSg MassDet NoOrd) (UseN beer_N))) ;
|
||||
```
|
||||
which uses ParadigmsIta.regGenN.
|
||||
|
||||
===Restricted inheritance and qualified opening===
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user