mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-17 16:59:34 -06:00
780 lines
28 KiB
Plaintext
780 lines
28 KiB
Plaintext
The GF Resource Grammar Library
|
|
Author: Aarne Ranta
|
|
Last update: %%date(%c)
|
|
|
|
% NOTE: this is a txt2tags file.
|
|
% Create an latex file from this file using:
|
|
% txt2tags -ttex --toc gf-formalism.txt
|
|
|
|
%!target:tex
|
|
|
|
|
|
This document is about the
|
|
GF Resource Grammar Library. It presuppose knowledge of GF and its
|
|
module system, knowledge that can be acquired e.g. from the GF
|
|
tutorial. We start with an introduction to the library, and proceed to
|
|
details with the aim of covering all that one needs to know
|
|
in order to use the library.
|
|
How to write one's own resource grammar (i.e. to implement the API for
|
|
a new language), is covered by a separate Resource-HOWTO document.
|
|
|
|
|
|
==Motivation==
|
|
|
|
The GF Resource Grammar Library contains grammar rules for
|
|
10 languages (some more are under construction). Its purpose
|
|
is to make these rules available for application programmers,
|
|
who can thereby concentrate on the semantic and stylistic
|
|
aspects of their grammars, without having to think about
|
|
grammaticality. The targeted level of application grammarians
|
|
is that of a skilled programmer with
|
|
a practical knowledge of the target languages, but without
|
|
theoretical knowledge about their grammars.
|
|
Such a combination of
|
|
skills is typical of programmers who want to localize
|
|
software to new languages.
|
|
|
|
The current resource languages are
|
|
- ``Dan``ish
|
|
- ``Eng``lish
|
|
- ``Fin``nish
|
|
- ``Fre``nch
|
|
- ``Ger``man
|
|
- ``Ita``lian
|
|
- ``Nor``wegian
|
|
- ``Rus``sian
|
|
- ``Spa``nish
|
|
- ``Swe``dish
|
|
|
|
|
|
The first three letters (``Dan`` etc) are used in grammar module names.
|
|
|
|
To give an example application, consider
|
|
music playing devices. In the application,
|
|
we may have a semantical category ``Kind``, examples
|
|
of ``Kind``s being ``Song`` and ``Artist``. In German, for instance, ``Song``
|
|
is linearized into the noun "Lied", but knowing this is not
|
|
enough to make the application work, because the noun must be
|
|
produced in both singular and plural, and in four different
|
|
cases. By using the resource grammar library, it is enough to
|
|
write
|
|
```
|
|
lin Song = reg2N "Lied" "Lieder" neuter
|
|
```
|
|
and the eight forms are correctly generated. The resource grammar
|
|
library contains a complete set of inflectional paradigms (such as
|
|
``regN2`` here), enabling the definition of any lexical items.
|
|
|
|
The resource grammar library is not only about inflectional paradigms - it
|
|
also has syntax rules. The music player application
|
|
might also want to modify songs with properties, such as "American",
|
|
"old", "good". The German grammar for adjectival modifications is
|
|
particularly complex, because adjectives have to agree in gender,
|
|
number, and case, and also depend on what determiner is used
|
|
("ein Amerikanisches Lied" vs. "das Amerikanische Lied"). All this
|
|
variation is taken care of by the resource grammar function
|
|
```
|
|
fun AdjCN : AP -> CN -> CN
|
|
```
|
|
The resource grammar implementation of the rule adding properties
|
|
to kinds is
|
|
```
|
|
lin PropKind kind prop = AdjCN prop kind
|
|
```
|
|
given that
|
|
```
|
|
lincat Prop = AP
|
|
lincat Kind = CN
|
|
```
|
|
The resource library API is devided into language-specific
|
|
and language-independet parts. To put it roughly,
|
|
- the lexicon API is language-specific
|
|
- the syntax API is language-independent
|
|
|
|
|
|
Thus, to render the above example in French instead of German, we need to
|
|
pick a different linearization of ``Song``,
|
|
```
|
|
lin Song = regGenN "chanson" feminine
|
|
```
|
|
But to linearize ``PropKind``, we can use the very same rule as in German.
|
|
The resource function ``AdjCN`` has different implementations in the two
|
|
languages (e.g. a different word order in French),
|
|
but the application programmer need not care about the difference.
|
|
|
|
|
|
===A complete example===
|
|
|
|
To summarize the example, and also give a template for a programmer to work on,
|
|
here is the complete implementation of a small system with songs and properties.
|
|
The abstract syntax defines a "domain ontology":
|
|
```
|
|
abstract Music = {
|
|
cat
|
|
Kind,
|
|
Property ;
|
|
fun
|
|
PropKind : Kind -> Property -> Kind ;
|
|
Song : Kind ;
|
|
American : Property ;
|
|
}
|
|
```
|
|
The concrete syntax is defined by a functor (parametrize module),
|
|
independently of language, by opening
|
|
two interfaces: the resource ``Grammar`` and an application lexicon.
|
|
```
|
|
incomplete concrete MusicI of Music = open Grammar, MusicLex in {
|
|
lincat
|
|
Kind = CN ;
|
|
Property = AP ;
|
|
lin
|
|
PropKind k p = AdjCN p k ;
|
|
Song = UseN song_N ;
|
|
American = PositA american_A ;
|
|
}
|
|
```
|
|
The application lexicon ``MusicLex`` has an abstract syntax that extends
|
|
the resource category system ``Cat``.
|
|
```
|
|
abstract MusicLex = Cat ** {
|
|
fun
|
|
song_N : N ;
|
|
american_A : A ;
|
|
}
|
|
```
|
|
Each language has its own concrete syntax, which opens the
|
|
inflectional paradigms module for that language:
|
|
```
|
|
concrete MusicLexGer of MusicLex = CatGer ** open ParadigmsGer in {
|
|
lin
|
|
song_N = reg2N "Lied" "Lieder" neuter ;
|
|
american_A = regA "Amerikanisch" ;
|
|
}
|
|
|
|
concrete MusicLexFre of MusicLex = CatFre ** open ParadigmsFre in {
|
|
lin
|
|
song_N = regGenN "chanson" feminine ;
|
|
american_A = regA "américain" ;
|
|
}
|
|
```
|
|
The top-level ``Music`` grammars are obtained by
|
|
instantiating the two interfaces of ``MusicI``:
|
|
```
|
|
concrete MusicGer of Music = MusicI with
|
|
(Grammar = GrammarGer),
|
|
(MusicLex = MusicLexGer) ;
|
|
|
|
concrete MusicFre of Music = MusicI with
|
|
(Grammar = GrammarFre),
|
|
(MusicLex = MusicLexFre) ;
|
|
```
|
|
Both of these files can use the same ``path``, defined as
|
|
```
|
|
--# -path=.:present:prelude
|
|
```
|
|
The ``present`` category contains the compiled resources, restricted to
|
|
present tense; ``alltenses`` has the full resources.
|
|
|
|
To localize the music player system to a new language,
|
|
all that is needed is two modules,
|
|
one implementing ``MusicLex`` and the other
|
|
instantiating ``Music``. The latter is
|
|
completely trivial, whereas the former one involves the choice of correct
|
|
vocabulary and inflectional paradigms. For instance, Finnish is added as follows:
|
|
```
|
|
concrete MusicLexFin of MusicLex = CatFin ** open ParadigmsFin in {
|
|
lin
|
|
song_N = regN "kappale" ;
|
|
american_A = regA "amerikkalainen" ;
|
|
}
|
|
|
|
concrete MusicFin of Music = MusicI with
|
|
(Grammar = GrammarFin),
|
|
(MusicLex = MusicLexFin) ;
|
|
```
|
|
More work is of course needed if the language-independent linearizations in
|
|
MusicI are not satisfactory for some language. The resource grammar guarantees
|
|
that the linearizations are possible in all languages, in the sense of grammatical,
|
|
but they might of course be inadequate for stylistic reasons. Assume,
|
|
for the sake of argument, that adjectival modification does not sound good in
|
|
English, but that a relative clause would be preferrable. One can then start as
|
|
before,
|
|
```
|
|
concrete MusicLexEng of MusicLex = CatEng ** open ParadigmsEng in {
|
|
lin
|
|
song_N = regN "song" ;
|
|
american_A = regA "American" ;
|
|
}
|
|
|
|
concrete MusicEng0 of Music = MusicI with
|
|
(Grammar = GrammarEng),
|
|
(MusicLex = MusicLexEng) ;
|
|
```
|
|
The module ``MusicEng0`` would not be used on the top level, however, but
|
|
another module would be built on top of it, with a restricted import from
|
|
``MusicEng0``. ``MusicEng`` inherits everything from ``MusicEng0``
|
|
except ``PropKind``, and
|
|
gives its own definition of this function:
|
|
```
|
|
concrete MusicEng of Music = MusicEng0 - [PropKind] ** open GrammarEng in {
|
|
lin
|
|
PropKind k p =
|
|
RelCN k (UseRCl TPres ASimul PPos (RelVP IdRP (UseComp (CompAP p)))) ;
|
|
}
|
|
```
|
|
|
|
|
|
===Parsing with resource grammars?===
|
|
|
|
The intended use of the resource grammar is as a library for writing
|
|
application grammars. It is not designed for parsing e.g. newspaper text. There
|
|
are several reasons why this is not practical:
|
|
- Efficiency: the resource grammar uses complex data structures, in
|
|
particular, discontinuous constituents, which make parsing slow and the
|
|
parser size huge.
|
|
- Completeness: the resource grammar does not necessarily cover all rules
|
|
of the language - only enough many to be able to express everything
|
|
in one way or another.
|
|
- Lexicon: the resource grammar has a very small lexicon, only meant for test
|
|
purposes.
|
|
- Semantics: the resource grammar has very little semantic control, and may
|
|
accept strange input or deliver strange interpretations.
|
|
- Ambiguity: parsing in the resource grammar may return lots of results many
|
|
of which are implausible.
|
|
|
|
|
|
All of these problems should be solved in application grammars.
|
|
The task of resource grammars is just to take care of low-level linguistic
|
|
details such as inflection, agreement, and word order.
|
|
|
|
It is for the same reasons that resource grammars are not adequate for translation.
|
|
That the syntax API is implemented for different languages of course makes
|
|
it possible to translate via it - but there is no guarantee of translation
|
|
equivalence. Of course, the use of functor implementations such as ``MusicI``
|
|
above only extends to those cases where the syntax API does give translation
|
|
equivalence - but this must be seen as a limiting case, and bigger applications
|
|
will often use only restricted inheritance of ``MusicI``.
|
|
|
|
|
|
|
|
==To find rules in the resource grammar library==
|
|
|
|
===Inflection paradigms===
|
|
|
|
Inflection paradigms are defined separately for each language //L//
|
|
in the module ``Paradigms``//L//. To test them, the command
|
|
``cc`` (= ``compute_concrete``)
|
|
can be used:
|
|
```
|
|
> i -retain german/ParadigmsGer.gf
|
|
|
|
> cc regN "Schlange"
|
|
{
|
|
s : Number => Case => Str = table Number {
|
|
Sg => table Case {
|
|
Nom => "Schlange" ;
|
|
Acc => "Schlange" ;
|
|
Dat => "Schlange" ;
|
|
Gen => "Schlange"
|
|
} ;
|
|
Pl => table Case {
|
|
Nom => "Schlangen" ;
|
|
Acc => "Schlangen" ;
|
|
Dat => "Schlangen" ;
|
|
Gen => "Schlangen"
|
|
}
|
|
} ;
|
|
g : Gender = Fem
|
|
}
|
|
```
|
|
For the sake of convenience, every language implements these four paradigms:
|
|
```
|
|
oper
|
|
regN : Str -> N ; -- regular nouns
|
|
regA : Str -> A : -- regular adjectives
|
|
regV : Str -> V ; -- regular verbs
|
|
dirV : V -> V2 ; -- direct transitive verbs
|
|
```
|
|
It is often possible to initialize a lexicon by just using these functions,
|
|
and later revise it by using the more involved paradigms. For instance, in
|
|
German we cannot use ``regN "Lied"`` for ``Song``, because the result would be a
|
|
Masculine noun with the plural form ``"Liede"``.
|
|
The individual ``Paradigms`` modules
|
|
tell what cases are covered by the regular heuristics.
|
|
|
|
As a limiting case, one could even initialize the lexicon for a new language
|
|
by copying the English (or some other already existing) lexicon. This would
|
|
produce language with correct grammar but with content words directly borrowed from
|
|
English - maybe not so strange in certain technical domains.
|
|
|
|
|
|
|
|
===Syntax rules===
|
|
|
|
Syntax rules should be looked for in the abstract modules defining the
|
|
API. There are around 10 such modules, each defining constructors for
|
|
a group of one or more related categories. For instance, the module
|
|
``Noun`` defines how to construct common nouns, noun phrases, and determiners.
|
|
Thus the proper place to find out how nouns are modified with adjectives
|
|
is ``Noun``, because the result of the construction is again a common noun.
|
|
|
|
Browsing the libraries is helped by the gfdoc-generated HTML pages,
|
|
whose LaTeX versions are included in the present document.
|
|
However, this is still not easy, and the most efficient way is
|
|
probably to use the parser.
|
|
Even though parsing is not an intended end-user application
|
|
of resource grammars, it is a useful technique for application grammarians
|
|
to browse the library. To find out which resource function implements
|
|
a particular structure, one can just parse a string that exemplifies this
|
|
structure. For instance, to find out how sentences are built using
|
|
transitive verbs, write
|
|
```
|
|
> i english/LangEng.gf
|
|
|
|
> p -cat=Cl -fcfg "she loves him"
|
|
|
|
PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron))
|
|
```
|
|
Parsing with the English resource grammar has an acceptable speed, but
|
|
with most languages it takes just too much resources even to build the
|
|
parser. However, examples parsed in one language can always be linearized into
|
|
other languages:
|
|
```
|
|
> i italian/LangIta.gf
|
|
|
|
> l PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron))
|
|
|
|
lo ama
|
|
```
|
|
Therefore, one can use the English parser to write an Italian grammar, and also
|
|
to write a language-independent (incomplete) grammar. One can also parse strings
|
|
that are bizarre in English but the intended way of expression in another language.
|
|
For instance, the phrase for "I am hungry" in Italian is literally "I have hunger".
|
|
This can be built by parsing "I have beer" in LanEng and then writing
|
|
```
|
|
lin IamHungry =
|
|
let beer_N = regGenN "fame" feminine
|
|
in
|
|
PredVP (UsePron i_Pron) (ComplV2 have_V2
|
|
(DetCN (DetSg MassDet NoOrd) (UseN beer_N))) ;
|
|
```
|
|
which uses ParadigmsIta.regGenN.
|
|
|
|
|
|
===Example-based grammar writing===
|
|
|
|
The technique of parsing with the resource grammar can be used in GF source files,
|
|
endowed with the suffix ``.gfe`` ("GF examples"). The suffix tells GF to preprocess
|
|
the file by replacing all expressions of the form
|
|
```
|
|
in Module.Cat "example string"
|
|
```
|
|
by the syntax trees obtained by parsing "example string" in ``Cat`` in ``Module``.
|
|
For instance,
|
|
```
|
|
lin IamHungry =
|
|
let beer_N = regGenN "fame" feminine
|
|
in
|
|
(in LangEng.Cl "I have beer") ;
|
|
```
|
|
will result in the rule displayed in the previous section. The normal binding rules
|
|
of functional programming (and GF) guarantee that local bindings of identifiers
|
|
take precedence over constants of the same forms. Thus it is also possible to
|
|
linearize functions taking arguments in this way:
|
|
```
|
|
lin
|
|
PropKind car_N old_A = in LangEng.CN "old car" ;
|
|
```
|
|
However, the technique of example-based grammar writing has some limitations:
|
|
- Ambiguity. If a string has several parses, the first one is returned, and
|
|
it may not be the intended one. The other parses are shown in a comment, from
|
|
where they must/can be picked manually.
|
|
- Lexicality. The arguments of a function must be atomic identifiers, and are thus
|
|
not available for categories that have no lexical items.
|
|
For instance, the ``PropKind`` rule above gives the result
|
|
```
|
|
lin
|
|
PropKind car_N old_A = AdjCN (UseN car_N) (PositA old_A) ;
|
|
```
|
|
However, it is possible to write a special lexicon that gives atomic rules for
|
|
all those categories that can be used as arguments, for instance,
|
|
```
|
|
fun
|
|
cat_CN : CN ;
|
|
old_AP : AP ;
|
|
```
|
|
and then use this lexicon instead of the standard one included in ``Lang``.
|
|
|
|
|
|
|
|
===Special-purpose APIs===
|
|
|
|
To give an analogy with the well-known type setting software, GF can be compared
|
|
with TeX and the resource grammar library with LaTeX.
|
|
Just like TeX frees the author
|
|
from thinking about low-level problems of page layout, so GF frees the grammarian
|
|
from writing parsing and generation algorithms. But quite a lot of knowledge of
|
|
//how// to write grammars is still needed, and the resource grammar library helps
|
|
GF grammarians in a way similar to how the LaTeX macro package helps TeX authors.
|
|
|
|
But even LaTeX is often too detailed and low-level, and users are encouraged to
|
|
develop their own macro packages. The same applies to GF resource grammars:
|
|
the application grammarian might not need all the choises that the resource
|
|
provides, but would prefer less writing and higher-level programming.
|
|
To this end, application grammarians may want to write their own views on the
|
|
resource grammar. An example of this is already provided, in
|
|
``mathematical/Predication``.
|
|
Instead of the ``NP-VP`` structure, it permits clause construction directly from
|
|
verbs and adjectives and their arguments:
|
|
```
|
|
predV : V -> NP -> Cl ; -- "x converges"
|
|
predV2 : V2 -> NP -> NP -> Cl ; -- "x intersects y"
|
|
predV3 : V3 -> NP -> NP -> NP -> Cl ; -- "x intersects y at z"
|
|
predVColl : V -> NP -> NP -> Cl ; -- "x and y intersect"
|
|
predA : A -> NP -> Cl ; -- "x is even"
|
|
predA2 : A2 -> NP -> NP -> Cl ; -- "x is divisible by y"
|
|
```
|
|
The implementation of this module is the functor ``PredicationI``:
|
|
```
|
|
predV v x = PredVP x (UseV v) ;
|
|
predV2 v x y = PredVP x (ComplV2 v y) ;
|
|
predV3 v x y z = PredVP x (ComplV3 v y z) ;
|
|
predVColl v x y = PredVP (ConjNP and_Conj (BaseNP x y)) (UseV v) ;
|
|
predA a x = PredVP x (UseComp (CompAP (PositA a))) ;
|
|
predA2 a x y = PredVP x (UseComp (CompAP (ComplA2 a y))) ;
|
|
```
|
|
Of course, ``Predication`` can be opened together with ``Grammar``, but using
|
|
the resulting grammar for parsing can be frustrating, since having both
|
|
ways of building clauses simultaneously available will produce spurious
|
|
ambiguities. But using just ``Predication`` without ``Verb``
|
|
for parsing is a good idea,
|
|
since parsing is more efficient without rules producing verb phrases.
|
|
|
|
The use of special-purpose APIs is to some extent just an alternative
|
|
to grammar writing by parsing, and its importance may decrease as parsing
|
|
with resource grammars becomes more practical.
|
|
|
|
|
|
|
|
|
|
|
|
==Overview of syntactic structures==
|
|
|
|
===Texts. phrases, and utterances===
|
|
|
|
The outermost linguistic structure is ``Text``. ``Text``s are composed
|
|
from Phrases (``Phr``) followed by punctuation marks - either of ".", "?" or
|
|
"!" (with their proper variants in Spanish and Arabic). Here is an
|
|
example of a ``Text`` string.
|
|
```
|
|
John walks. Why? He doesn't want to sleep!
|
|
```
|
|
Phrases are mostly built from Utterances (``Utt``), which in turn are
|
|
declarative sentences, questions, or imperatives - but there
|
|
are also "one-word utterances" consisting of noun phrases
|
|
or other subsentential phrases. Some Phrases are atomic,
|
|
for instance "yes" and "no". Here are some examples of Phrases.
|
|
```
|
|
yes
|
|
come on, John
|
|
but John walks
|
|
give me the stick please
|
|
don't you know that he is sleeping
|
|
a glass of wine
|
|
a glass of wine please
|
|
```
|
|
There is no connection between the punctuation marks and the
|
|
types of utterances. This reflects the fact that the punctuation
|
|
mark in a real text is selected as a function of the speech act
|
|
rather than the grammatical form of an utterance. The following
|
|
text is thus well-formed.
|
|
```
|
|
John walks. John walks? John walks!
|
|
```
|
|
What is the difference between Phrase and Utterance? Just technical:
|
|
a Phrase is an Utterance with an optional leading conjunction ("but")
|
|
and an optional tailing vocative ("John", "please").
|
|
|
|
|
|
===Sentences and clauses===
|
|
|
|
The richest of the categories below Utterance is ``S``, Sentence. A Sentence
|
|
is formed from a Clause (``Cl``), by fixing its Tense, Anteriority, and Polarity.
|
|
The difference between Sentence and Clause is thus also rather technical.
|
|
For example, each of the following strings has a distinct syntax tree
|
|
in the category Sentence:
|
|
```
|
|
John walks
|
|
John doesn't walk
|
|
John walked
|
|
John didn't walk
|
|
John has walked
|
|
John hasn't walked
|
|
John will walk
|
|
John won't walk
|
|
...
|
|
```
|
|
whereas in the category Clause all of them are just different forms of
|
|
the same tree.
|
|
|
|
The following syntax tree of the Text "John walks." gives an overview
|
|
of the structural levels.
|
|
```
|
|
Node Constructor Value type Other constructors
|
|
-----------------------------------------------------------
|
|
1. TFullStop Text TQuestMark
|
|
2. (PhrUtt Phr
|
|
3. NoPConj PConj but_PConj
|
|
4. (UttS Utt UttQS
|
|
5. (UseCl S UseQCl
|
|
6. TPres Tense TPast
|
|
7. ASimul Anter AAnter
|
|
8. PPos Pol PNeg
|
|
9. (PredVP Cl
|
|
10. (UsePN NP UsePron, DetCN
|
|
11. john_PN) PN mary_PN
|
|
12. (UseV VP ComplV2, ComplV3
|
|
13. walk_V)))) V sleep_V
|
|
14. NoVoc) Voc please_Voc
|
|
15. TEmpty Text
|
|
```
|
|
Here are some examples of the results of changing constructors.
|
|
```
|
|
1. TFullStop -> TQuestMark John walks?
|
|
3. NoPConj -> but_PConj But John walks.
|
|
6. TPres -> TPast John walked.
|
|
7. ASimul -> AAnter John has walked.
|
|
8. PPos -> PNeg John doesn't walk.
|
|
11. john_PN -> mary_PN Mary walks.
|
|
13. walk_V -> sleep_V John sleeps.
|
|
14. NoVoc -> please_Voc John sleeps please.
|
|
```
|
|
All constructors cannot of course be changed so freely, because the
|
|
resulting tree would not remain well-typed. Here are some changes involving
|
|
many constructors:
|
|
```
|
|
4- 5. UttS (UseCl ...) ->
|
|
UttQS (UseQCl (... QuestCl ...)) Does John walk?
|
|
10-11. UsePN john_PN ->
|
|
UsePron we_Pron We walk.
|
|
12-13. UseV walk_V ->
|
|
ComplV2 love_V2 this_NP John loves this.
|
|
```
|
|
|
|
|
|
===Parts of sentences===
|
|
|
|
The linguistic phenomena mostly discussed in both traditional grammars and modern
|
|
syntax belong to the level of Clauses, that is, lines 9-13, and occasionally
|
|
to Sentences, lines 5-13. At this level, the major categories are
|
|
``NP`` (Noun Phrase) and ``VP`` (Verb Phrase). A Clause typically
|
|
consists of just an ``NP`` and a ``VP``.
|
|
The internal structure of both ``NP`` and ``VP`` can be very complex,
|
|
and these categories are mutually recursive: not only can a ``VP``
|
|
contain an ``NP``,
|
|
```
|
|
[VP loves [NP Mary]]
|
|
```
|
|
but also an ``NP`` can contain a ``VP``
|
|
```
|
|
[NP every man [RS who [VP walks]]]
|
|
```
|
|
(a labelled bracketing like this is of course just a rough approximation of
|
|
a GF syntax tree, but still a useful device of exposition).
|
|
|
|
Most of the resource modules thus define functions that are used inside
|
|
NPs and VPs. Here is a brief overview:
|
|
|
|
**Noun**. How to construct NPs. The main three mechanisms
|
|
for constructing NPs are
|
|
- from proper names: "John"
|
|
- from pronouns: "we"
|
|
- from common nouns by determiners: "this man"
|
|
|
|
|
|
The ``Noun`` module also defines the construction of common nouns.
|
|
The most frequent ways are
|
|
- lexical noun items: "man"
|
|
- adjectival modification: "old man"
|
|
- relative clause modification: "man who sleeps"
|
|
- application of relational nouns: "successor of the number"
|
|
|
|
|
|
**Verb**.
|
|
How to construct VPs. The main mechanism is verbs with their arguments,
|
|
for instance,
|
|
- one-place verbs: "walks"
|
|
- two-place verbs: "loves Mary"
|
|
- three-place verbs: "gives her a kiss"
|
|
- sentence-complement verbs: "says that it is cold"
|
|
- VP-complement verbs: "wants to give her a kiss"
|
|
|
|
|
|
A special verb is the copula, "be" in English but not even realized
|
|
by a verb in all languages.
|
|
A copula can take different kinds of complement:
|
|
- an adjectival phrase: "(John is) old"
|
|
- an adverb: "(John is) here"
|
|
- a noun phrase: "(John is) a man"
|
|
|
|
|
|
**Adjective**.
|
|
How to constuct ``AP``s. The main ways are
|
|
- positive forms of adjectives: "old"
|
|
- comparative forms with object of comparison: "older than John"
|
|
|
|
|
|
**Adverb**.
|
|
How to construct ``Adv``s. The main ways are
|
|
- from adjectives: "slowly"
|
|
- as prepositional phrases: "in the car"
|
|
|
|
|
|
===Modules and their names===
|
|
|
|
The resource modules are named after the kind of
|
|
phrases that are constructed in them,
|
|
and they can be roughly classified by the "level" or "size" of expressions that are
|
|
formed in them:
|
|
- Larger than sentence: ``Text``, ``Phrase``
|
|
- Same level as sentence: ``Sentence``, ``Question``, ``Relative``
|
|
- Parts of sentence: ``Adjective``, ``Adverb``, ``Noun``, ``Verb``
|
|
- Cross-cut (coordination): ``Conjunction``
|
|
|
|
|
|
Because of mutual recursion such as in embedded sentences, this classification is
|
|
not a complete order. However, no mutual dependence is needed between the
|
|
modules in a formal sense - they can all be compiled separately. This is due
|
|
to the module ``Cat``, which defines the type system common to the other modules.
|
|
For instance, the types ``NP`` and ``VP`` are defined in ``Cat``,
|
|
and the module ``Verb`` only
|
|
needs to know what is given in ``Cat``, not what is given in ``Noun``. To implement
|
|
a rule such as
|
|
```
|
|
Verb.ComplV2 : V2 -> NP -> VP
|
|
```
|
|
it is enough to know the linearization type of ``NP``
|
|
(as well as those of ``V2`` and ``VP``, all
|
|
given in ``Cat``). It is not necessary to know what
|
|
ways there are to build ``NP``s (given in ``Noun``), since all these ways must
|
|
conform to the linearization type defined in ``Cat``. Thus the format of
|
|
category-specific modules is as follows:
|
|
```
|
|
abstract Adjective = Cat ** {...}
|
|
abstract Noun = Cat ** {...}
|
|
abstract Verb = Cat ** {...}
|
|
```
|
|
|
|
|
|
===Top-level grammar and lexicon===
|
|
|
|
The module ``Grammar`` collects all the category-specific modules into
|
|
a complete grammar:
|
|
```
|
|
abstract Grammar =
|
|
Adjective, Noun, Verb, ..., Structural, Idiom
|
|
```
|
|
The module ``Structural`` is a lexicon of structural words (function words),
|
|
such as determiners.
|
|
|
|
The module ``Idiom`` is a collection of idiomatic structures whose
|
|
implementation is very language-dependent. An example is existential
|
|
structures ("there is", "es gibt", "il y a", etc).
|
|
|
|
The module ``Lang`` combines ``Grammar`` with a ``Lexicon`` of
|
|
ca. 350 content words:
|
|
```
|
|
abstract Lang = Grammar, Lexicon
|
|
```
|
|
Using ``Lang`` instead of ``Grammar`` as a library may give
|
|
for free some words needed in an application. But its main purpose is to
|
|
help testing the resource library. It does not seem possible to maintain
|
|
a general-purpose multilingual lexicon, and this is the form that the module
|
|
``Lexicon`` has.
|
|
|
|
|
|
|
|
===Language-specific syntactic structures===
|
|
|
|
The API collected in ``Grammar`` has been designed to be implementable for
|
|
all languages in the resource package. It does contain some rules that
|
|
are strange or superfluous in some languages; for instance, the distinction
|
|
between definite and indefinite articles does not apply to Finnish and Russian.
|
|
But such rules are still easy to implement: they only create some superfluous
|
|
ambiguity in the languages in question.
|
|
|
|
But the library makes no claim that all languages should have exactly the same
|
|
abstract syntax. The common API is therefore extended by language-dependent
|
|
rules. The top level of each languages looks as follows (with English as example):
|
|
```
|
|
abstract English = Grammar, ExtraEngAbs, DictEngAbs
|
|
```
|
|
where ``ExtraEngAbs`` is a collection of syntactic structures specific to English,
|
|
and ``DictEngAbs`` is an English dictionary
|
|
(at the moment, it consists of ``IrregEngAbs``,
|
|
the irregular verbs of English). Each of these language-specific grammars has
|
|
the potential to grow into a full-scale grammar of the language. These grammar
|
|
can also be used as libraries, but the possibility of using functors is lost.
|
|
|
|
To give a better overview of language-specific structures,
|
|
modules like ``ExtraEngAbs``
|
|
are built from a language-independent module ``ExtraAbs``
|
|
by restricted inheritance:
|
|
```
|
|
abstract ExtraEngAbs = Extra [f,g,...]
|
|
```
|
|
Thus any category and function in ``Extra`` may be shared by a subset of all
|
|
languages. One can see this set-up as a matrix, which tells
|
|
what ``Extra`` structures
|
|
are implemented in what languages. For the common API in ``Grammar``, the matrix
|
|
is filled with 1's (everything is implemented in every language).
|
|
|
|
Language-specific extensions and the use of restricted
|
|
inheritance is a recent addition to the resource grammar library, and
|
|
has only been exploited in a very small scale so far.
|
|
|
|
|
|
==API Documentation==
|
|
|
|
===Top-level modules===
|
|
|
|
%!include: ../lib/resource-1.0/abstract/Grammar.txt
|
|
%!include: ../lib/resource-1.0/abstract/Lang.txt
|
|
|
|
|
|
===Type system===
|
|
|
|
%!include: ../lib/resource-1.0/abstract/Cat.txt
|
|
%!include: ../lib/resource-1.0/abstract/Common.txt
|
|
|
|
|
|
===Phrase category modules===
|
|
|
|
%!include: ../lib/resource-1.0/abstract/Adjective.txt
|
|
%!include: ../lib/resource-1.0/abstract/Adverb.txt
|
|
%!include: ../lib/resource-1.0/abstract/Conjunction.txt
|
|
%!include: ../lib/resource-1.0/abstract/Idiom.txt
|
|
%!include: ../lib/resource-1.0/abstract/Noun.txt
|
|
%!include: ../lib/resource-1.0/abstract/Numeral.txt
|
|
%!include: ../lib/resource-1.0/abstract/Phrase.txt
|
|
%!include: ../lib/resource-1.0/abstract/Question.txt
|
|
%!include: ../lib/resource-1.0/abstract/Relative.txt
|
|
%!include: ../lib/resource-1.0/abstract/Sentence.txt
|
|
%!include: ../lib/resource-1.0/abstract/Structural.txt
|
|
%!include: ../lib/resource-1.0/abstract/Text.txt
|
|
%!include: ../lib/resource-1.0/abstract/Verb.txt
|
|
|
|
|
|
===Inflectional paradigms===
|
|
|
|
%!include: ../lib/resource-1.0/danish/ParadigmsDan.txt
|
|
%!include: ../lib/resource-1.0/english/ParadigmsEng.txt
|
|
%!include: ../lib/resource-1.0/finnish/ParadigmsFin.txt
|
|
%!include: ../lib/resource-1.0/french/ParadigmsFre.txt
|
|
%!include: ../lib/resource-1.0/german/ParadigmsGer.txt
|
|
%!include: ../lib/resource-1.0/italian/ParadigmsIta.txt
|
|
%!include: ../lib/resource-1.0/norwegian/ParadigmsNor.txt
|
|
%!include: ../lib/resource-1.0/russian/ParadigmsRus.txt
|
|
%!include: ../lib/resource-1.0/spanish/ParadigmsSpa.txt
|
|
%!include: ../lib/resource-1.0/swedish/ParadigmsSwe.txt
|