getting into syntax in the grammar document

2013-08-28 05:59:06 +00:00
parent b60cf09a9f
commit cc259fea43
5 changed files with 323 additions and 5 deletions
@@ -23,14 +23,16 @@ of the verb, but it may be separated from it by an object: //Please switch it of
 | ``Person`` | person    | first, second, third
 | ``Case``   | case      | nominative, genitive
 | ``Degree`` | degree    | positive, comparative, superlative
+| ``AForm``  | adjective form | degrees, adverbial
 | ``VForm``  | verb form | infinitive, present, past, past participle, present participle
+| ``VVType`` | infinitive form (for a VV) | bare infinitive, //to// infinitive, //ing// form 


-The assignment of parameter types and the identification of the separate parts of categories defines a
+The assignment of parameter types and the identification of the separate parts of categories defines
 the **data structures** in which the words are stored in a lexicon.
-This data structure is in GF called the **linearization type** of the category. From the computational
+This data structure is in GF called the **linearization type** of the category. From the computer's
 point of view, it is important that the data structures are well defined for all words, even if this may
-sound unnecessary. For instance, since some verbs need a particle part, all verbs must uniformly have a
+sound unnecessary for the human. For instance, since some verbs need a particle part, all verbs must uniformly have a
 storage for this particle, even if it is empty most of the time. This property is guaranteed by
 an operation called **type checking**. It is performed by GF as a part of **grammar compilation**, which
 is the process in which the human-readable description of the grammar is converted to bits executable
@@ -42,7 +44,7 @@ by the computer.
 || GF name  | text name        | example    | inflectional features       | inherent features  ||
 | ``N``     | noun             | //house//  | number, case                | (none)         
 | ``PN``    | proper name      | //Paris//  | case                        | (none)
-| ``A``     | adjective        | //blue//   | degree                      | (none)
+| ``A``     | adjective        | //blue//   | adjective form              | (none)
 | ``V``     | verb             | //sleep//  | verb form                   | particle
 | ``Adv``   | adverb           | //here//   | (none)                      | (none)
 | ``V2``    | two-place verb   | //love//   | verb form                   | particle, preposition
@@ -56,3 +58,149 @@ but a string.
 We have done the same with the preposition strings that define the complement features of verb 
 and other subcategories.

+The "digital grammar" representations of these types are **records**, where for instance the ``VV``
+record type is formally written
+```
+  {s : VForm => Str ; p : Str ; i : InfForm}
+```
+The record has **fields** for different types of data. In the record above, there are three fields:
+- the field labelled ``s``, storing an **inflection table** that produces a **string** (``Str``) depending on verb form,
+- the field labelled ``p``, storing a string representing the particle,
+- the field labelled ``i``, storing an inherent feature for the infinitive form required
+
+
+Thus for instance the record for verb-complement verb //try// (//to do something//) in the lexicon looks as follows:
+```
+  {s = table {
+     VInf => "try" ;
+     VPres => "tries" ;
+     VPast => "tried" ;
+     VPastPart => "tried" ;
+     VPresPart => "trying"
+     } ;
+   p = "" ;
+   i = VVInf 
+   }
+```
+We have not introduce the GF names of the features, as we will not make essential use of them: we will prefer
+informal explanations for all rules. So these records are a little hint for those who want to understand the
+whole chain, from the rules as we state them in natural language, down to machine-readable digital grammars,
+which ultimately have the same structure as our statements.
+
+
++Inflection paradigms++
+
+In many languages, the description of inflectional forms occupies a large part of grammar books. Words, in particular
+verbs, can have dozens of forms, and there can be dozens of different ways of building those forms. Each type of
+inflection is described in a **paradigm**, which is a table including all forms of an example verb. For other
+verbs, it is enough to indicate the number of the paradigm, to say that this verb is inflected "in the same way"
+as the model verb.
+
+
+===Nouns===
+
+Computationally, inflection paradigms are **functions** that take as their arguments **stems**, to which suffixes
+(and sometime prefixes) are added. Here is, for instance, the English **regular noun** paradigm:
+
+|| form      | singular  | plural  ||
+| nominative | //dog//   | //dogs//
+| genitive   | //dog's// | //dogs'//
+
+As a function, it is interpreted as follows: the word //dog// is the stem to which endings are added. Replacing it
+with //cat//, //donkey//, //rabbit//, etc, will yield the forms of these words.
+
+In addition to nouns that are inflected with exactly the same suffixes as //dog//, English has 
+inflection types such as //fly-flies//, //kiss-kisses//, //bush-bushes//, //echo-echoes//. Each of these inflection types
+could be described by a paradigm of its own. However, it is more attractive to see these as variations of the regular
+paradigm, which are predictable by studying the singular nominative. This leads to a generalization of paradigms which
+in the RGL are called **smart paradigms**.
+
+Here is the smart paradigm of English nouns. It tells how the plural nominative is formed from the singular; the
+genitive forms are always formed by just adding //'s// in the singular and //'// in the plural.
+- for nouns ending with //s//, //z//, //x//, //sh//, //ch//, the forms are like //kiss - kisses//
+- for nouns ending with a vowel (one of //a//,//e//,//i//,//o//,//u//) followed by //y//, the forms are like //boy - boys//
+- for all other nouns ending with //y//, the forms are like //baby - babies//
+- for nouns ending with a vowel or //y// and followed by //o//, the forms are like //embryo - embryos//
+- for all other nouns ending with //o//, the forms are like //echo - echoes//
+- for all other nouns, the forms are like //dog - dogs//
+
+
+The same rules are in GF expressed by **regular expression pattern matching** which, although formal and machine-readable,
+might in fact be a nice notation for humans to read as well:
+```
+  "s" | "z" | "x" | "sh" | "ch" => <word, word + "es">
+  #vowel + "y"                  => <word, word + "s">
+  "y"                           => <word, init word + "ies">
+  (#vowel | "y") + "o"          => <word, word + "s">
+  "o"                           => <word, word + "es">
+  _                             => <word, word + "s">
+```
+In this notation, ``|`` means "or" and ``+`` means "followed by". The pattern that is matched is followed by 
+an arrow ``=>``, after which the two forms appear within angel brackets. The patterns are matched in the given
+order, and ``_`` means "anything that was not matched before". Finally, the function ``init`` returns the
+initial segment of a word (e.g. //happ// for //happy//), and the pattern ``#vowel`` is defined as
+``"a" | "e" | "i" | "o" | "u".
+
+In addition to regular and predictable nouns, English has **irregular nouns**, such as //man - men//, 
+//formula - formulae//, //ox - oxen//. These nouns have their plural genitive formed by //'s//: //men's//.
+
+
+
+===Adjectives===
+
+English adjectives inflect for degree, with three values, and also have an adverbial form in their linearization type.
+Here are some regular variations:
+- for adjectives ending with consonant + vowel + consonant: //dim, dimmer, dimmest, dimly//
+- for adjectives ending with //y// not preceded by a vowel: //happy, happier, happier, happily//
+- for other adjectives: //quick, quicker, quickest, quickly//
+
+
+The comparison forms only work for adjectives with at most two syllables. For longer ones,
+they are formed syntactically: //expensive, more expensive, most expensive//. There are also
+some irregular adjectives, the most extreme one being perhaps //good, better, best, well//.
+
+
+
+===Verbs===
+
+English verbs have five different forms, except for the verb //be//, which has some more forms, e.g.
+//sing, sings, sang, sung, singing//.
+But //be// is also special syntactically and semantically, and is in the RGL introduced
+in the syntax rather than in the lexicon.
+
+Two forms, the past (indicative) and the past participle are the same for the so-called **regular verbs**
+(e.g. //play, plays, played, played, playing//). The regular verb paradigm thus looks as follows:
+
+|| feature   | form ||
+| infinitive | //play//
+| present    | //plays//
+| past       | //played//
+| past participle    | //played//
+| present participle | //plays//
+
+The predictable variables are related to the ones we have seen in nouns and adjectives: 
+the present tense of verbs varies in the same way as the plural of nouns,
+and the past varies in the same way as the comparative of adjectives. The most important variations are
+- for verbs ending with //s//, //z//, //x//, //sh//, //ch//: //kiss, kisses, kissed, kissing//
+- for verbs ending with consonant + vowel + consonant: //dim, dims, dimmed, dimming//
+- for verbs ending with //y// not preceded by a vowel: //cry, cries, cried, crying//
+- for verbs ending with //ee//: //free, frees, freed, freeing// 
+- for verbs ending with //ie//: //die, dies, died, dying// 
+- for other verbs ending with //e//: //use, uses, used, using// 
+- for other verbs: //walk, walks, walked, walking//
+
+
+
+English also has a couple of hundred **irregular verbs**, whose infinitive, past, and past participle forms have to stored
+separately. These free forms determine the other forms in the same way as regular verbs. Thus
+- from //cut, cut, cut//, you also get //cuts, cutting//
+- from //fly, flew, flown//, you also get //flies, flying//
+- from //write, wrote, written//, you also get //writes, writing//
+
+
+
+===Structural words=== 
+
+
+
+
@@ -3,6 +3,8 @@ Aarne Ranta


 %!Encoding:utf8
+%!style(html): ../revealpopup.css
+
 %!postproc(tex)  : "#BECE" "begin{center}"
 %!postproc(html) : "#BECE" "<center>"
 %!postproc(tex)  : "#ENCE" "end{center}"
@@ -233,7 +235,7 @@ future additions of more subcategories for verbs.
 ===Table: subcategories of verbs===

 || GF name  | text name                 | example                             | inherent complement features | semantics ||
-| ``V2``    | two-place verb            | //love// (//someone//               | case or preposition | ``e -> e -> t``
+| ``V2``    | two-place verb            | //love// (//someone//)              | case or preposition | ``e -> e -> t``
 | ``V3``    | three-place verb          | //give// (//something to someone//) | two cases or prepositions | ``e -> e -> e -> t``
 | ``VV``    | verb-complement verb      | //try// (//to do something//)       | infinitive form | ``e -> v -> t``
 | ``VS``    | sentence-complement verb  | //know// (//that something happens//)      | sentence mood | ``e -> t -> t``
@@ -2,4 +2,172 @@

 +Syntax: general rules+

+The rules of syntax specify how words are combined to **phrases**, and how phrases are combined to even longer phrases.
+Phrases, just like words, belong to different categories, which are equipped with inflectional and inherent features and
+with semantic types. Moreover, each syntactic rule has a corresponding **semantic rule**, which specifies how the meaning
+of the new phrases is constructed from the meanings of its parts.

+The RGL has around 30 categories of phrases, on top of the lexical categories. The widest category is ``Text``, which cover
+entire texts consisting of sentences, questions, interjections, etc, with punctuation. The following picture shows all RGL
+categories as a dependency tree, where ``Text`` is in the root (so it is an upside-down tree), and the lexical categories
+in the leaves. Being above another category in the tree means that phrases of higher categories can have phrases of lower
+categories as parts. But these dependencies can work in both directions: for instance, the noun phrase (``NP``)
+//every man who owns a donkey// has as its part the relative clause (``RCl``), which in turn has its part the noun phrase
+//a donkey//.
+
+===Figure: the principal dependences of phrasal and lexical categories===
+
+[../categories.png]
+
+Lexical categories appear in boxes rather than ellipses, with several categories gathered in some of the boxes.
+
+
++The structure of a clause++
+
+It is convenient to start from the middle of the RGL: from the structure of a **clause** (``Cl``). A clause is an application
+of a verb to its arguments. For instance, //John paints the house yellow// is an application of the ``V2V`` verb //paint//
+to the arguments //John//, //the house//, and //yellow//. Recalling the table of lexical categories from Chapter 1,
+we can summarize the semantic types of these parts as follows:
+```
+  paint     : e -> e -> (e -> t) -> t
+  John      : e
+  the house : e
+  yellow    : e -> t
+```
+Hence the verb //paint// is a **predicate**, a function that can be applied to arguments to return a proposition.
+In this case, we can build the application
+```
+  paint John (the house) yellow : t
+```
+which is thus an object of type ``t``.
+
+Applying verbs to arguments is how clauses work on the semantic level. However, the syntactic fine-structure is
+a bit more complex. The predication process is hence divided to several steps, which involve intermediate categories.
+Following these steps, a clause is built by adding one argument at a time. Doing in this way, rather than adding
+all arguments at once, has two advantages:
+- the grammar doesn't need to specify the same things again and again for different verb categories
+- at each step of construction, some other rule could apply than adding an argument - for instance, adding an adverb
+
+
+Here are the steps in which //John paints the house yellow// is constructed from its arguments in the RGL:
+- //paints// and //yellow// are combined to a **verb phrase missing a noun phrase** (``VPSlash``)
+- //paints - yellow// and //the house// are combined to a **verb phrase** (``VP``)
+- //John// and //paints the house yellow// are combined to a **clause** (``Cl``)
+
+
+
+The structure is shown by the following tree:
+
+#BECE
+[paint-abstract.png]
+#ENCE
+This tree is called the **abstract syntax tree** of the sentence. It shows the structural components from which the
+sentence has been constructed. Its nodes show the GF names associated with syntax rules and internally used for building
+structures. Thus for instance ``PredVP`` encodes the rule that combines a noun phrase and a verb phrase into a clause,
+``UsePN`` converts a proper name to a noun phrase, and so on. Mathematically, these names 
+denote **functions** that build abstract syntax trees from other tree. Every tree belongs to some category.
+The GF notation for the ``PredVP`` rule is
+```
+  PredVP : NP -> VP -> Cl
+```
+in words, ``PredVP`` //is a function that takes a noun phrase and a verb phrase and returns a clause//.
+
+The tree is thus in fact built by function applications. A computer-friendly notation for trees uses
+parentheses rather than graphical trees:
+```
+  PredVP 
+    (UsePN john_PN) 
+    (ComplSlash 
+      (SlashV2A paint_V2A (PositA yellow_A)) 
+      (DetCN (DetQuant DefArt NumSg) (UseN house_N)))
+```
+Before going to the details of phrasal categories and rules, let us compare the abstract syntax tree with 
+another tree, known as **parse tree** or **concrete syntax tree**:
+
+
+#BECE
+[paint-concrete.png]
+#ENCE
+This tree shows, on its leaves, the clause that results from the combination of categories. Each node
+is labelled with the category to which the part of the clause under it belongs to. As shown by the label
+``VPSlash``, this part can consist of many separate groups of words, where words from constructions from
+higher up are inserted. 
+
+As parse trees display the actual words of a particular language, in a language-specific
+order, they are less interesting from the multilingual point of view than the abstract syntax trees.
+A GF grammar is thus primarily specified by its abstract syntax functions, which are language-neutral,
+and secondarily by the **linearization rules** that convert them to different languages.
+
+Let us specify the phrasal categories that are used for making up predications. The lexical category ``V2A`` of 
+two-place adjective-complement verbs was explained in Chapter 1. 
+
+===Table: phrasal categories involved in predication===
+
+|| GF name  | text name     | example             | inflection features            | inherent features | parts | semantics ||
+| ``Cl``    | clause        | //he paints it blue// | temporal, polarity             | (none)            | one   | ``t``
+| ``VP``    | verb phrase   | //paints it blue//    | temporal, polarity, agreement  | subject case      | verb, complement | ``e -> t`` 
+| ``VPSlash`` | slash verb phrase   | //paints - blue//      | temporal, polarity, agreement  | subject and complement case  | verb, complement | ``e -> e -> t`` 
+| ``NP``    | noun phrase   | //the house//       | case                           | agreement         | one   | ``(e -> t) -> t`` 
+| ``AP``    | adjectival phrase  | //very blue//       | gender, numeber, case     | position         | one   | ``a`` = ``e -> t`` 
+
+TODO explain **agreement** and **temporal**.
+
+TODO explain the semantic type of ``NP``.
+
+The functions that build up the clause in our example tree are given in the following table, together with functions that
+build the semantics of the constructed trees. The latter functions operate on variables belonging to the semantic types of 
+the arguments of the function.
+
+===Table: abstract syntax functions involved in predication===
+
+|| GF name       | type                     | example                            | semantics  ||
+| ``PredVP``     | ``NP -> VP -> S``        | //he// + //paints the house blue// | ``np vp``
+| ``ComplSlash`` | ``VPSlash -> NP -> VP``  | //paints - blue// + //the house//  | ``\x -> np (\y -> vpslash x y)``
+| ``SlashV2A``   | ``V2A -> AP -> VPSlash`` | //paints// + //blue//              | ``\x,y -> v2a x y ap``
+
+TODO explain lambda abstraction.
+
+The semantics of the clause //John paints the house yellow// can now be computed from the assumed meanings 
+```
+  John*      : e
+  paint*     : e -> e -> (e -> t) -> t
+  the_house* : e
+  yellow*    : e -> t
+```
+as follows:
+```
+    (PredVP John (ComplSlash (SlashV2A paint yellow) the-house))*
+  = (ComplSlash (SlashV2A paint yellow) the_house)* John*
+  = (SlashV2A paint yellow)* John* the_house*
+  = paint* John* the_house* yellow*
+```
+for the moment ignoring the internal structure of noun phrases, which will be explained later.
+
+The linearization rules work very much in the same way as the semantic rules. They obey the definitions of
+inflectional and inherent features and discontinuous parts, which together define linearization types of
+the phrasal categories. These types are at this point schematic, because we don't assume any particular
+language. But what we can read out from the category table above is as follows:
+
+===Table: schematic linearization types===
+
+|| GF name  | text name     | linearization type ||
+| ``Cl``    | clause        | ``{s : Temp => Pol => Str}``
+| ``VP``    | verb phrase   | ``{s : Temp => Pol => Agr => {verb,compl : Str} ; sc : Case}``
+| ``VPSlash`` | slash verb phrase  | ``{s : Temp => Pol => Agr => {verb,compl : Str} ; sc, cc : Case}``
+| ``NP``    | noun phrase   | ``{s : Case => Str ; a : Agr}`` 
+| ``AP``    | adjectival phrase  | ``{s : Gender => Number => Case => Str ; isPre : Bool}`` 
+
+
+These types suggest the following linearization rules:
+```
+  PredVP np vp = {s = \\t,p => np.s ! vp.sc ++ vps.verb ++ vps.compl where vps = vp.s ! t ! p ! np.a}
+```
+TODO linearization of the example
+
+
+Similar rules as to ``V2A`` apply to all subcategories of verbs. The ``V2`` verbs are first made into ``VPSlash`` 
+by giving the non-NP complement. ``V3`` verbs can take their two NP complements in either order, which
+means that there are two ``VPSlash``-producing rules. This
+makes it possible to form both the questions //what did she give him// and //whom did she give it//.
+The other ``V`` categories are turned into ``VP`` without going through ``VPSlash``, since they have
+no noun phrase complements.