mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-14 15:29:31 -06:00
196 lines
10 KiB
Plaintext
196 lines
10 KiB
Plaintext
|
|
|
|
+Syntax: general rules+
|
|
|
|
The rules of syntax specify how words are combined to **phrases**, and how phrases are combined to even longer phrases.
|
|
Phrases, just like words, belong to different categories, which are equipped with inflectional and inherent features and
|
|
with semantic types. Moreover, each syntactic rule has a corresponding **semantic rule**, which specifies how the meaning
|
|
of the new phrases is constructed from the meanings of its parts.
|
|
|
|
The RGL has around 30 categories of phrases, on top of the lexical categories. The widest category is ``Text``, which cover
|
|
entire texts consisting of sentences, questions, interjections, etc, with punctuation. The following picture shows all RGL
|
|
categories as a dependency tree, where ``Text`` is in the root (so it is an upside-down tree), and the lexical categories
|
|
in the leaves. Being above another category in the tree means that phrases of higher categories can have phrases of lower
|
|
categories as parts. But these dependencies can work in both directions: for instance, the noun phrase (``NP``)
|
|
//every man who owns a donkey// has as its part the relative clause (``RCl``), which in turn has its part the noun phrase
|
|
//a donkey//.
|
|
|
|
===Figure: the principal dependences of phrasal and lexical categories===
|
|
|
|
[../categories.png]
|
|
|
|
Lexical categories appear in boxes rather than ellipses, with several categories gathered in some of the boxes.
|
|
|
|
|
|
++The structure of a clause++
|
|
|
|
It is convenient to start from the middle of the RGL: from the structure of a **clause** (``Cl``). A clause is an application
|
|
of a verb to its arguments. For instance, //John paints the house yellow// is an application of the ``V2V`` verb //paint//
|
|
to the arguments //John//, //the house//, and //yellow//. Recalling the table of lexical categories from Chapter 1,
|
|
we can summarize the semantic types of these parts as follows:
|
|
```
|
|
paint : e -> e -> (e -> t) -> t
|
|
John : e
|
|
the house : e
|
|
yellow : e -> t
|
|
```
|
|
Hence the verb //paint// is a **predicate**, a function that can be applied to arguments to return a proposition.
|
|
In this case, we can build the application
|
|
```
|
|
paint John (the house) yellow : t
|
|
```
|
|
which is thus an object of type ``t``.
|
|
|
|
Applying verbs to arguments is how clauses work on the semantic level. However, the syntactic fine-structure is
|
|
a bit more complex. The predication process is hence divided to several steps, which involve intermediate categories.
|
|
Following these steps, a clause is built by adding one argument at a time. Doing in this way, rather than adding
|
|
all arguments at once, has two advantages:
|
|
- the grammar doesn't need to specify the same things again and again for different verb categories
|
|
- at each step of construction, some other rule could apply than adding an argument - for instance, adding an adverb
|
|
|
|
|
|
Here are the steps in which //John paints the house yellow// is constructed from its arguments in the RGL:
|
|
- //paints// and //yellow// are combined to a **verb phrase missing a noun phrase** (``VPSlash``)
|
|
- //paints - yellow// and //the house// are combined to a **verb phrase** (``VP``)
|
|
- //John// and //paints the house yellow// are combined to a **clause** (``Cl``)
|
|
|
|
|
|
|
|
The structure is shown by the following tree:
|
|
|
|
#BECE
|
|
[paint-abstract.png]
|
|
#ENCE
|
|
This tree is called the **abstract syntax tree** of the sentence. It shows the structural components from which the
|
|
sentence has been constructed. Its nodes show the GF names associated with syntax rules and internally used for building
|
|
structures. Thus for instance ``PredVP`` encodes the rule that combines a noun phrase and a verb phrase into a clause,
|
|
``UsePN`` converts a proper name to a noun phrase, and so on. Mathematically, these names
|
|
denote **functions** that build abstract syntax trees from other tree. Every tree belongs to some category.
|
|
The GF notation for the ``PredVP`` rule is
|
|
```
|
|
PredVP : NP -> VP -> Cl
|
|
```
|
|
in words, ``PredVP`` //is a function that takes a noun phrase and a verb phrase and returns a clause//.
|
|
|
|
The tree is thus in fact built by function applications. A computer-friendly notation for trees uses
|
|
parentheses rather than graphical trees:
|
|
```
|
|
PredVP
|
|
(UsePN john_PN)
|
|
(ComplSlash
|
|
(SlashV2A paint_V2A (PositA yellow_A))
|
|
(DetCN (DetQuant DefArt NumSg) (UseN house_N)))
|
|
```
|
|
Before going to the details of phrasal categories and rules, let us compare the abstract syntax tree with
|
|
another tree, known as **parse tree** or **concrete syntax tree**:
|
|
|
|
|
|
#BECE
|
|
[paint-concrete.png]
|
|
#ENCE
|
|
This tree shows, on its leaves, the clause that results from the combination of categories. Each node
|
|
is labelled with the category to which the part of the clause under it belongs to. As shown by the label
|
|
``VPSlash``, this part can consist of many separate groups of words, where words from constructions from
|
|
higher up are inserted.
|
|
|
|
As parse trees display the actual words of a particular language, in a language-specific
|
|
order, they are less interesting from the multilingual point of view than the abstract syntax trees.
|
|
A GF grammar is thus primarily specified by its abstract syntax functions, which are language-neutral,
|
|
and secondarily by the **linearization rules** that convert them to different languages.
|
|
|
|
Let us specify the phrasal categories that are used for making up predications. The lexical category ``V2A`` of
|
|
two-place adjective-complement verbs was explained in Chapter 1.
|
|
|
|
===Table: phrasal categories involved in predication===
|
|
|
|
|| GF name | text name | example | inflection features | inherent features | parts | semantics ||
|
|
| ``Cl`` | clause | //he paints it blue// | temporal, polarity | (none) | one | ``t``
|
|
| ``VP`` | verb phrase | //paints it blue// | temporal, polarity, agreement | subject case | verb, complements | ``e -> t``
|
|
| ``VPSlash`` | slash verb phrase | //paints - blue// | temporal, polarity, agreement | subject and complement case | verb, complements | ``e -> e -> t``
|
|
| ``NP`` | noun phrase | //the house// | case | agreement | one | ``(e -> t) -> t``
|
|
| ``AP`` | adjectival phrase | //very blue// | gender, numeber, case | position | one | ``a`` = ``e -> t``
|
|
|
|
TODO explain **agreement** and **temporal**.
|
|
|
|
TODO explain the semantic type of ``NP``.
|
|
|
|
The functions that build up the clause in our example tree are given in the following table, together with functions that
|
|
build the semantics of the constructed trees. The latter functions operate on variables belonging to the semantic types of
|
|
the arguments of the function.
|
|
|
|
===Table: abstract syntax functions involved in predication===
|
|
|
|
|| GF name | type | example | semantics ||
|
|
| ``PredVP`` | ``NP -> VP -> S`` | //he// + //paints the house blue// | ``np vp``
|
|
| ``ComplSlash`` | ``VPSlash -> NP -> VP`` | //paints - blue// + //the house// | ``\x -> np (\y -> vpslash x y)``
|
|
| ``SlashV2A`` | ``V2A -> AP -> VPSlash`` | //paints// + //blue// | ``\x,y -> v2a x y ap``
|
|
|
|
TODO explain lambda abstraction.
|
|
|
|
The semantics of the clause //John paints the house yellow// can now be computed from the assumed meanings
|
|
```
|
|
John* : e
|
|
paint* : e -> e -> (e -> t) -> t
|
|
the_house* : e
|
|
yellow* : e -> t
|
|
```
|
|
as follows:
|
|
```
|
|
(PredVP John (ComplSlash (SlashV2A paint yellow) the-house))*
|
|
= (ComplSlash (SlashV2A paint yellow) the_house)* John*
|
|
= (SlashV2A paint yellow)* John* the_house*
|
|
= paint* John* the_house* yellow*
|
|
```
|
|
for the moment ignoring the internal structure of noun phrases, which will be explained later.
|
|
|
|
The linearization rules work very much in the same way as the semantic rules. They obey the definitions of
|
|
inflectional and inherent features and discontinuous parts, which together define linearization types of
|
|
the phrasal categories. These types are at this point schematic, because we don't assume any particular
|
|
language. But what we can read out from the category table above is as follows:
|
|
|
|
===Table: schematic linearization types===
|
|
|
|
|| GF name | text name | linearization type ||
|
|
| ``Cl`` | clause | ``{s : Temp => Pol => Str}``
|
|
| ``VP`` | verb phrase | ``{v : V ; c : Agr => Str ; sc : Case}``
|
|
| ``VPSlash`` | slash verb phrase | ``{v : V ; c : Agr => Str ; ; sc, cc : Case}``
|
|
| ``NP`` | noun phrase | ``{s : Case => Str ; a : Agr}``
|
|
| ``AP`` | adjectival phrase | ``{s : Gender => Number => Case => Str ; isPre : Bool}``
|
|
|
|
|
|
TODO explain these types, in particular the use of ``V``
|
|
|
|
These types suggest the following linearization rules:
|
|
```
|
|
PredVP np vp = {s = \\t,p => np.s ! vp.sc ++ vp.v ! t ! p ! np.a ++ vp.c ! np.a}
|
|
ComplSlash vpslash np = {v = vpslash.v ; c = \\a => np.s ! vpslash.cc ++ vpslash.c ! a}
|
|
SlashV2A v2a ap = {v = v2a ; c = ap.s ! v2a.ac ; cc = v2a.ap}
|
|
```
|
|
TODO explain these rules
|
|
|
|
The linearization of the example goes in a way analogous to the computation of semantics.
|
|
It is in both cases **compositional**, which means that the semantics/linearization only
|
|
depends on the semantics/linearization of the immediate arguments, not on the tree structure
|
|
of those arguments. Assuming the following linearizations of the words,
|
|
```
|
|
John* : mkPN "John"
|
|
paint* : mkV "paint" ** {cc = Acc ; ca = Nom}
|
|
the_house* : mkPN "the house"
|
|
yellow* : mkA "yellow"
|
|
```
|
|
we get the linearization of the clause as follows:
|
|
```
|
|
(PredVP John (ComplSlash (SlashV2A paint yellow) the-house))*
|
|
= "John" ++ vp.v ! SgP3 ++ vp.c ! SgP3
|
|
where vp = (ComplSlash (SlashV2A paint yellow) the_house)*
|
|
= {v = mkV "paint" ; c = \\_ => "the house yellow"}
|
|
= "John paints the house yellow"
|
|
```
|
|
|
|
Similar rules as to ``V2A`` apply to all subcategories of verbs. The ``V2`` verbs are first made into ``VPSlash``
|
|
by giving the non-NP complement. ``V3`` verbs can take their two NP complements in either order, which
|
|
means that there are two ``VPSlash``-producing rules. This
|
|
makes it possible to form both the questions //what did she give him// and //whom did she give it//.
|
|
The other ``V`` categories are turned into ``VP`` without going through ``VPSlash``, since they have
|
|
no noun phrase complements.
|