forked from GitHub/gf-core
320 lines
12 KiB
Plaintext
320 lines
12 KiB
Plaintext
The GF Resource Grammar Library
|
|
|
|
|
|
The GF Resource Grammar Library contains grammar rules for
|
|
10 languages (some more are under construction). Its purpose
|
|
is to make these rules available for application programmers,
|
|
who can thereby concentrate on the semantic and stylistic
|
|
aspects of their grammars, without having to think about
|
|
grammaticality.
|
|
|
|
To give an example, an application dealing with
|
|
music players may have a semantical category ``Kind``, examples
|
|
of Kinds being Song and Artist. In German, for instance, Song
|
|
is linearized into the noun "Lied", but knowing this is not
|
|
enough to make the application work, because the noun must be
|
|
produced in both singular and plural, and in four different
|
|
cases. By using the resource grammar library, it is enough to
|
|
write
|
|
|
|
lin Song = reg2N "Lied" "Lieder" neuter
|
|
|
|
and the eight forms are correctly generated. The use of the resource
|
|
grammar extends from lexical items to syntax rules. The application
|
|
mught also want to modify songs with properties, such as "American",
|
|
"old", "good". The German grammar for adjectival modifications is
|
|
particularly complex, because the adjectives have to agree in gender,
|
|
number, and case, also depending on what determiner is used
|
|
("ein Amerikanisches Lied" vs. "das Amerikanische Lied"). All this
|
|
variation is taken care of by the resource grammar function
|
|
|
|
fun AdjCN : AP -> CN -> CN
|
|
|
|
and the resource grammar implementation of the rule adding properties
|
|
to kinds is
|
|
|
|
lin PropKind kind prop = AdjCN prop kind
|
|
|
|
given that
|
|
|
|
lincat Prop = AP
|
|
lincat Kind = CN
|
|
|
|
The resource library API is devided into language-specific and language-independet
|
|
parts. To put is roughly,
|
|
- syntax is language-independent
|
|
- lexicon is language-specific
|
|
|
|
|
|
Thus, to render the above example in French instead of German, we need to
|
|
pick a different linearization of Song,
|
|
|
|
lin Song = regGenN "chanson" feminine
|
|
|
|
But to linearize PropKind, we can use the very same rule as in German.
|
|
The resource function AdjCN has different implementations in the two
|
|
languages, but the application programmer need not care about the difference.
|
|
|
|
|
|
|
|
==To use a resouce grammar==
|
|
|
|
===Parsing===
|
|
|
|
The intended use of the resource grammar is as a library for writing
|
|
application grammars. It is not designed for e.g. parsing text. There
|
|
are several reasons why this is not so practical:
|
|
- efficiency: the resource grammar uses complex data structures, in
|
|
particular, discontinuous constituents, which make parsing slow and the
|
|
parser size huge
|
|
- completeness: the resource grammar does not necessarily cover all rules
|
|
of the language - only enough many so that it is possible to express everything
|
|
in one way or another
|
|
- lexicon: the resource grammar has a very small lexicon, only meant for test
|
|
purposes
|
|
- semantics: the resource grammar has very little semantic control, and may
|
|
accept strange input or deliver strange interpretations
|
|
- ambiguity: parsing in the resource grammar may return lots of results many
|
|
of which are implausible
|
|
|
|
|
|
All of these problems should be settled in application grammars - the very point
|
|
of resource grammars is to isolate the low-level linguistic details such as
|
|
inflection, agreement, and word order, from semantic questions, which is what
|
|
the application grammarians should solve.
|
|
|
|
|
|
===Inflection paradigms===
|
|
|
|
The inflection paradigms are defined separately for each language L
|
|
in the module ParadigmsL. To test them, the command cc (= compute_concrete)
|
|
can be used:
|
|
|
|
> i -retain german/ParadigmsGer.gf
|
|
|
|
> cc regN "Schlange"
|
|
{
|
|
s : Number => Case => Str = table Number {
|
|
Sg => table Case {
|
|
Nom => "Schlange" ;
|
|
Acc => "Schlange" ;
|
|
Dat => "Schlange" ;
|
|
Gen => "Schlange"
|
|
} ;
|
|
Pl => table Case {
|
|
Nom => "Schlangen" ;
|
|
Acc => "Schlangen" ;
|
|
Dat => "Schlangen" ;
|
|
Gen => "Schlangen"
|
|
}
|
|
} ;
|
|
g : Gender = Fem
|
|
}
|
|
|
|
|
|
|
|
===Syntax rules===
|
|
|
|
Syntax rules should be looked for in the abstract modules defining the
|
|
API. There are around 10 such modules, each defining constructors for
|
|
a group of one or more related categories. For instance, the module
|
|
Noun defines how to construct common nouns, noun phrases, and determiners.
|
|
Thus the proper place to find out how nouns are modified with adjectives
|
|
is Noun, because the result of the construction is again a common noun.
|
|
|
|
Browsing the libraries is helped by the gfdoc-generated HTML pages.
|
|
However, this is still not easy, and the most efficient way is
|
|
probably to use the parser.
|
|
Even though parsing is not an intended end-user application
|
|
of resource grammars, it is a useful technique for application grammarians
|
|
to browse the library. To find out what resource function does some
|
|
particular job, you can just parse a string that exemplifies this job. For
|
|
instance, to find out how sentences are built using transitive verbs, write
|
|
|
|
> i english/LangEng.gf
|
|
|
|
> p -cat=Cl -fcfg "she loves him"
|
|
|
|
PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron))
|
|
|
|
Parsing with the English resource grammar has an acceptable speed, but
|
|
with most languages it takes just too much resources even to build the
|
|
parser. However, examples parsed in one language can always be linearized in
|
|
other languages:
|
|
|
|
> i italian/LangIta.gf
|
|
|
|
> l PredVP (UsePron she_Pron) (ComplV2 love_V2 (UsePron he_Pron))
|
|
|
|
lo ama
|
|
|
|
|
|
|
|
|
|
|
|
|
|
==Overview of linguistic structures==
|
|
|
|
The outermost linguistic structure is Text. Texts are composed
|
|
from Phrases followed by punctuation marks - either of ".", "?" or
|
|
"!! (with their proper variants in Spanish and Arabic). Here is an
|
|
example of a Text.
|
|
|
|
John walks. Why? He doesn't want to sleep!
|
|
|
|
Phrases are mostly built from Utterances, which in turn are
|
|
declarative sentences, questions, or imperatives - but there
|
|
are also "one-word utterances" consisting of noun phrases
|
|
or other subsentential phrases. Some Phrases are more primitive,
|
|
for instance "yes" and "no". Here are some examples of Phrases.
|
|
|
|
yes
|
|
come on, John
|
|
but John walks
|
|
give me the stick please
|
|
don't you know that he is sleeping
|
|
a glass of wine
|
|
a glass of wine please
|
|
|
|
There is no connection between the punctuation marks and the
|
|
types of utterances. This reflects the fact that the punctuation
|
|
mark in a real text is selected as a function of the speech act
|
|
rather than the grammatical form of an utterance. The following
|
|
text is thus well-formed.
|
|
|
|
John walks. John walks? John walks!
|
|
|
|
What is the difference between Phrase and Utterance? Just technical:
|
|
a Phrase is an Utterance with an optional leading conjunction ("but")
|
|
and an optional tailing vocative ("John", "please").
|
|
|
|
The richest of the categories below Utterance is S, Sentence. A Sentence
|
|
is formed from a Clause, by fixing its Tense, Anteriority, and Polarity.
|
|
The difference between Sentence and Clause is thus also rather technical.
|
|
For example, each of the following strings has a distinct syntax tree
|
|
of category Sentence:
|
|
|
|
John walks
|
|
John doesn't walk
|
|
John walked
|
|
John didn't walk
|
|
John has walked
|
|
John hasn't walked
|
|
John will walk
|
|
John won't walk
|
|
...
|
|
|
|
whereas in the category Clause all of them are just different forms of
|
|
the same tree.
|
|
|
|
The following syntax tree of the Text "John walks." gives an overview
|
|
of the structural levels.
|
|
|
|
Node Constructor Type of subtree Alternative constructors
|
|
|
|
1. TFullStop : Text TQuestMark
|
|
2. (PhrUtt : Phr
|
|
3. NoPConj : PConj but_PConj
|
|
4. (UttS : Utt UttQS
|
|
5. (UseCl : S UseQCl
|
|
6. TPres : Tense TPast
|
|
7. ASimul : Anter AAnter
|
|
8. PPos : Pol PNeg
|
|
9. (PredVP : Cl
|
|
10. (UsePN : NP UsePron, DetCN
|
|
11. john_PN) : PN mary_PN
|
|
12. (UseV : VP ComplV2, ComplV3
|
|
13. walk_V)))) : V sleep_V
|
|
14. NoVoc) : Voc please_Voc
|
|
15. TEmpty : Text
|
|
|
|
Here are some examples of the results of changing constructors.
|
|
|
|
1. TFullStop -> TQuestMark John walks?
|
|
3. NoPConj -> but_PConj But John walks.
|
|
6. TPres -> TPast John walked.
|
|
7. ASimul -> AAnter John has walked.
|
|
8. PPos -> PNeg John doesn't walk.
|
|
11. john_PN -> mary_PN Mary walks.
|
|
13. walk_V -> sleep_V John sleeps.
|
|
14. NoVoc -> please_Voc John sleeps please.
|
|
|
|
All constructors cannot of course be changed so freely, because the
|
|
resulting tree would not remain well-typed. Here are some changes involving
|
|
many constructors:
|
|
|
|
4- 5. UttS (UseCl ...) -> UttQS (UseQCl (... QuestCl ...)) Does John walk?
|
|
10-11. UsePN john_PN -> UsePron we_Pron We walk.
|
|
12-13. UseV walk_V -> ComplV2 love_V2 this_NP John loves this.
|
|
|
|
The linguistic phenomena mostly discussed in traditional grammars and modern
|
|
syntax belong to the level of Clauses, that is, lines 9-13, and occasionally
|
|
to Sentences, lines 5-13. At this level, the major categories are
|
|
NP (Noun Phrase) and VP (Verb Phrase). A Clause typically consists of a
|
|
NP and a VP. The internal structure of both NP and VP can be very complex,
|
|
and these categories are mutually recursive: not only can a VP contain an NP,
|
|
|
|
[VP loves [NP Mary]]
|
|
|
|
but an NP can also contain a VP
|
|
|
|
[NP every man [RS who [VP walks]]]
|
|
|
|
(a labelled bracketing like this is of course just a rough approximation of
|
|
a GF syntax tree, but still a useful device of exposition).
|
|
|
|
Most of the resource modules thus define functions that are used inside
|
|
NPs and VPs. Here is a brief overview:
|
|
|
|
Noun: How to construct NPs. The main three mechanisms
|
|
for constructing NPs are
|
|
|
|
- from proper names: John
|
|
- from pronouns: we
|
|
- from common nouns by determiners: this man
|
|
|
|
The Noun module also defines the construction of common nouns. The most frequent ways are
|
|
|
|
- lexical noun items: man
|
|
- adjectival modification: old man
|
|
- relative clause modification: man who sleeps
|
|
|
|
Verb: How to construct VPs. The main mechanism is verbs with their arguments:
|
|
|
|
- one-place verbs: walks
|
|
- two-place verbs: loves Mary
|
|
- three-place verbs: gives her a kiss
|
|
- sentence-complement verbs: says that it is cold
|
|
- VP-complement verbs: wants to give her a kiss
|
|
|
|
A special verb is the copula, "be" in English but not even realized
|
|
by a verb in all languages.
|
|
A copula can take different kinds of complement:
|
|
|
|
- an adjectival phrase: (John is) old
|
|
- an adverb: (John is) here
|
|
- a noun phrase: (John is) a man
|
|
|
|
The resource modules are named after the kind of phrases that are constructed in them,
|
|
and they can be roughly classified by the "level" or "size" of expressions that are
|
|
formed in them:
|
|
|
|
- Larger than sentence: Text, Phrase
|
|
- Same level as sentence: Sentence, Question, Relative
|
|
- Parts of sentence: Adjective, Adverb, Noun, Verb
|
|
- Cross-cut: Conjunction
|
|
|
|
Because of mutual recursion such as embedded sentences, this classification is
|
|
not a complete order. However, no mutual dependence is needed between the
|
|
modules in a formal sense, but they can all be compiled separately. This is due
|
|
to the module Cat, which defines the type system common to the other modules.
|
|
For instance, the types NP and VP are defined in Cat, and the module Verb only
|
|
needs to know what is given in Cat, not what is given in Noun. To implement
|
|
a rule such as
|
|
|
|
Verb.ComplV2 : V2 -> NP -> VP
|
|
|
|
it is enough to know the linearization type of NP (given in Cat), not what
|
|
ways there are to build NPs (given in Noun), since all these ways must
|
|
conform to the linearization type defined in Cat.
|