mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-05-25 18:58:56 -06:00
started next version of tutorial
This commit is contained in:
300
doc/overview-resource.txt
Normal file
300
doc/overview-resource.txt
Normal file
@@ -0,0 +1,300 @@
|
||||
==Texts. phrases, and utterances==
|
||||
|
||||
The outermost linguistic structure is ``Text``. ``Text``s are composed
|
||||
from Phrases (``Phr``) followed by punctuation marks - either of ".", "?" or
|
||||
"!" (with their proper variants in Spanish and Arabic). Here is an
|
||||
example of a ``Text`` string.
|
||||
```
|
||||
John walks. Why? He doesn't want to sleep!
|
||||
```
|
||||
Phrases are mostly built from Utterances (``Utt``), which in turn are
|
||||
declarative sentences, questions, or imperatives - but there
|
||||
are also "one-word utterances" consisting of noun phrases
|
||||
or other subsentential phrases. Some Phrases are atomic,
|
||||
for instance "yes" and "no". Here are some examples of Phrases.
|
||||
```
|
||||
yes
|
||||
come on, John
|
||||
but John walks
|
||||
give me the stick please
|
||||
don't you know that he is sleeping
|
||||
a glass of wine
|
||||
a glass of wine please
|
||||
```
|
||||
There is no connection between the punctuation marks and the
|
||||
types of utterances. This reflects the fact that the punctuation
|
||||
mark in a real text is selected as a function of the speech act
|
||||
rather than the grammatical form of an utterance. The following
|
||||
text is thus well-formed.
|
||||
```
|
||||
John walks. John walks? John walks!
|
||||
```
|
||||
What is the difference between Phrase and Utterance? Just technical:
|
||||
a Phrase is an Utterance with an optional leading conjunction ("but")
|
||||
and an optional tailing vocative ("John", "please").
|
||||
|
||||
|
||||
==Sentences and clauses==
|
||||
|
||||
TODO: use overloaded operations in the examples.
|
||||
|
||||
The richest of the categories below Utterance is ``S``, Sentence. A Sentence
|
||||
is formed from a Clause (``Cl``), by fixing its Tense, Anteriority, and Polarity.
|
||||
For example, each of the following strings has a distinct syntax tree
|
||||
in the category Sentence:
|
||||
```
|
||||
John walks
|
||||
John doesn't walk
|
||||
John walked
|
||||
John didn't walk
|
||||
John has walked
|
||||
John hasn't walked
|
||||
John will walk
|
||||
John won't walk
|
||||
...
|
||||
```
|
||||
whereas in the category Clause all of them are just different forms of
|
||||
the same tree.
|
||||
The difference between Sentence and Clause is thus also rather technical.
|
||||
It may not correspond exactly to any standard usage of the terms
|
||||
"clause" and "sentence".
|
||||
|
||||
Figure 1 shows a type-annotated syntax tree of the Text "John walks."
|
||||
and gives an overview of the structural levels.
|
||||
|
||||
#BFIG
|
||||
|
||||
```
|
||||
Node Constructor Value type Other constructors
|
||||
-----------------------------------------------------------
|
||||
1. TFullStop Text TQuestMark
|
||||
2. (PhrUtt Phr
|
||||
3. NoPConj PConj but_PConj
|
||||
4. (UttS Utt UttQS
|
||||
5. (UseCl S UseQCl
|
||||
6. TPres Tense TPast
|
||||
7. ASimul Anter AAnter
|
||||
8. PPos Pol PNeg
|
||||
9. (PredVP Cl
|
||||
10. (UsePN NP UsePron, DetCN
|
||||
11. john_PN) PN mary_PN
|
||||
12. (UseV VP ComplV2, ComplV3
|
||||
13. walk_V)))) V sleep_V
|
||||
14. NoVoc) Voc please_Voc
|
||||
15. TEmpty Text
|
||||
```
|
||||
|
||||
#BCENTER
|
||||
Figure 1. Type-annotated syntax tree of the Text "John walks."
|
||||
#ECENTER
|
||||
|
||||
#EFIG
|
||||
|
||||
Here are some examples of the results of changing constructors.
|
||||
```
|
||||
1. TFullStop -> TQuestMark John walks?
|
||||
3. NoPConj -> but_PConj But John walks.
|
||||
6. TPres -> TPast John walked.
|
||||
7. ASimul -> AAnter John has walked.
|
||||
8. PPos -> PNeg John doesn't walk.
|
||||
11. john_PN -> mary_PN Mary walks.
|
||||
13. walk_V -> sleep_V John sleeps.
|
||||
14. NoVoc -> please_Voc John sleeps please.
|
||||
```
|
||||
All constructors cannot of course be changed so freely, because the
|
||||
resulting tree would not remain well-typed. Here are some changes involving
|
||||
many constructors:
|
||||
```
|
||||
4- 5. UttS (UseCl ...) ->
|
||||
UttQS (UseQCl (... QuestCl ...)) Does John walk?
|
||||
10-11. UsePN john_PN ->
|
||||
UsePron we_Pron We walk.
|
||||
12-13. UseV walk_V ->
|
||||
ComplV2 love_V2 this_NP John loves this.
|
||||
```
|
||||
|
||||
|
||||
==Parts of sentences==
|
||||
|
||||
The linguistic phenomena mostly discussed in both traditional grammars and modern
|
||||
syntax belong to the level of Clauses, that is, lines 9-13, and occasionally
|
||||
to Sentences, lines 5-13. At this level, the major categories are
|
||||
``NP`` (Noun Phrase) and ``VP`` (Verb Phrase). A Clause typically
|
||||
consists of just an ``NP`` and a ``VP``.
|
||||
The internal structure of both ``NP`` and ``VP`` can be very complex,
|
||||
and these categories are mutually recursive: not only can a ``VP``
|
||||
contain an ``NP``,
|
||||
```
|
||||
[VP loves [NP Mary]]
|
||||
```
|
||||
but also an ``NP`` can contain a ``VP``
|
||||
```
|
||||
[NP every man [RS who [VP walks]]]
|
||||
```
|
||||
(a labelled bracketing like this is of course just a rough approximation of
|
||||
a GF syntax tree, but still a useful device of exposition).
|
||||
|
||||
Most of the resource modules thus define functions that are used inside
|
||||
NPs and VPs. Here is a brief overview:
|
||||
|
||||
**Noun**. How to construct NPs. The main three mechanisms
|
||||
for constructing NPs are
|
||||
- from proper names: "John"
|
||||
- from pronouns: "we"
|
||||
- from common nouns by determiners: "this man"
|
||||
|
||||
|
||||
The ``Noun`` module also defines the construction of common nouns.
|
||||
The most frequent ways are
|
||||
- lexical noun items: "man"
|
||||
- adjectival modification: "old man"
|
||||
- relative clause modification: "man who sleeps"
|
||||
- application of relational nouns: "successor of the number"
|
||||
|
||||
|
||||
**Verb**.
|
||||
How to construct VPs. The main mechanism is verbs with their arguments,
|
||||
for instance,
|
||||
- one-place verbs: "walks"
|
||||
- two-place verbs: "loves Mary"
|
||||
- three-place verbs: "gives her a kiss"
|
||||
- sentence-complement verbs: "says that it is cold"
|
||||
- VP-complement verbs: "wants to give her a kiss"
|
||||
|
||||
|
||||
A special verb is the copula, "be" in English but not even realized
|
||||
by a verb in all languages.
|
||||
A copula can take different kinds of complement:
|
||||
- an adjectival phrase: "(John is) old"
|
||||
- an adverb: "(John is) here"
|
||||
- a noun phrase: "(John is) a man"
|
||||
|
||||
|
||||
**Adjective**.
|
||||
How to constuct ``AP``s. The main ways are
|
||||
- positive forms of adjectives: "old"
|
||||
- comparative forms with object of comparison: "older than John"
|
||||
|
||||
|
||||
**Adverb**.
|
||||
How to construct ``Adv``s. The main ways are
|
||||
- from adjectives: "slowly"
|
||||
- as prepositional phrases: "in the car"
|
||||
|
||||
|
||||
==Modules and their names==
|
||||
|
||||
This section is not necessary for users of the library.
|
||||
|
||||
TODO: explain the overloaded API.
|
||||
|
||||
The resource modules are named after the kind of
|
||||
phrases that are constructed in them,
|
||||
and they can be roughly classified by the "level" or "size" of expressions that are
|
||||
formed in them:
|
||||
- Larger than sentence: ``Text``, ``Phrase``
|
||||
- Same level as sentence: ``Sentence``, ``Question``, ``Relative``
|
||||
- Parts of sentence: ``Adjective``, ``Adverb``, ``Noun``, ``Verb``
|
||||
- Cross-cut (coordination): ``Conjunction``
|
||||
|
||||
|
||||
Because of mutual recursion such as in embedded sentences, this classification is
|
||||
not a complete order. However, no mutual dependence is needed between the
|
||||
modules themselves - they can all be compiled separately. This is due
|
||||
to the module ``Cat``, which defines the type system common to the other modules.
|
||||
For instance, the types ``NP`` and ``VP`` are defined in ``Cat``,
|
||||
and the module ``Verb`` only
|
||||
needs to know what is given in ``Cat``, not what is given in ``Noun``. To implement
|
||||
a rule such as
|
||||
```
|
||||
Verb.ComplV2 : V2 -> NP -> VP
|
||||
```
|
||||
it is enough to know the linearization type of ``NP``
|
||||
(as well as those of ``V2`` and ``VP``, all
|
||||
given in ``Cat``). It is not necessary to know what
|
||||
ways there are to build ``NP``s (given in ``Noun``), since all these ways must
|
||||
conform to the linearization type defined in ``Cat``. Thus the format of
|
||||
category-specific modules is as follows:
|
||||
```
|
||||
abstract Adjective = Cat ** {...}
|
||||
abstract Noun = Cat ** {...}
|
||||
abstract Verb = Cat ** {...}
|
||||
```
|
||||
|
||||
|
||||
==Top-level grammar and lexicon==
|
||||
|
||||
The module ``Grammar`` collects all the category-specific modules into
|
||||
a complete grammar:
|
||||
```
|
||||
abstract Grammar =
|
||||
Adjective, Noun, Verb, ..., Structural, Idiom
|
||||
```
|
||||
The module ``Structural`` is a lexicon of structural words (function words),
|
||||
such as determiners.
|
||||
|
||||
The module ``Idiom`` is a collection of idiomatic structures whose
|
||||
implementation is very language-dependent. An example is existential
|
||||
structures ("there is", "es gibt", "il y a", etc).
|
||||
|
||||
The module ``Lang`` combines ``Grammar`` with a ``Lexicon`` of
|
||||
ca. 350 content words:
|
||||
```
|
||||
abstract Lang = Grammar, Lexicon
|
||||
```
|
||||
Using ``Lang`` instead of ``Grammar`` as a library may give
|
||||
for free some words needed in an application. But its main purpose is to
|
||||
help testing the resource library, rather than as a resource itself.
|
||||
It does not even seem realistic to develop
|
||||
a general-purpose multilingual resource lexicon.
|
||||
|
||||
The diagram in Figure 2 shows the structure of the API.
|
||||
|
||||
#BFIG
|
||||
|
||||
#GRAMMAR
|
||||
|
||||
#BCENTER
|
||||
Figure 2. The resource syntax API.
|
||||
#ECENTER
|
||||
|
||||
#EFIG
|
||||
|
||||
==Language-specific syntactic structures==
|
||||
|
||||
The API collected in ``Grammar`` has been designed to be implementable for
|
||||
all languages in the resource package. It does contain some rules that
|
||||
are strange or superfluous in some languages; for instance, the distinction
|
||||
between definite and indefinite articles does not apply to Finnish and Russian.
|
||||
But such rules are still easy to implement: they only create some superfluous
|
||||
ambiguity in the languages in question.
|
||||
|
||||
But the library makes no claim that all languages should have exactly the same
|
||||
abstract syntax. The common API is therefore extended by language-dependent
|
||||
rules. The top level of each languages looks as follows (with English as example):
|
||||
```
|
||||
abstract English = Grammar, ExtraEngAbs, DictEngAbs
|
||||
```
|
||||
where ``ExtraEngAbs`` is a collection of syntactic structures specific to English,
|
||||
and ``DictEngAbs`` is an English dictionary
|
||||
(at the moment, it consists of ``IrregEngAbs``,
|
||||
the irregular verbs of English). Each of these language-specific grammars has
|
||||
the potential to grow into a full-scale grammar of the language. These grammars
|
||||
can also be used as libraries, but the possibility of using functors is lost.
|
||||
|
||||
To give a better overview of language-specific structures,
|
||||
modules like ``ExtraEngAbs``
|
||||
are built from a language-independent module ``ExtraAbs``
|
||||
by restricted inheritance:
|
||||
```
|
||||
abstract ExtraEngAbs = Extra [f,g,...]
|
||||
```
|
||||
Thus any category and function in ``Extra`` may be shared by a subset of all
|
||||
languages. One can see this set-up as a matrix, which tells
|
||||
what ``Extra`` structures
|
||||
are implemented in what languages. For the common API in ``Grammar``, the matrix
|
||||
is filled with 1's (everything is implemented in every language).
|
||||
|
||||
Language-specific extensions and the use of restricted
|
||||
inheritance is a recent addition to the resource grammar library, and
|
||||
has only been exploited in a very small scale so far.
|
||||
Reference in New Issue
Block a user