diff --git a/lib/doc/languages/gf-english-1.txt b/lib/doc/languages/gf-english-1.txt index 8a4862c9a..33ddf1013 100644 --- a/lib/doc/languages/gf-english-1.txt +++ b/lib/doc/languages/gf-english-1.txt @@ -23,14 +23,16 @@ of the verb, but it may be separated from it by an object: //Please switch it of | ``Person`` | person | first, second, third | ``Case`` | case | nominative, genitive | ``Degree`` | degree | positive, comparative, superlative +| ``AForm`` | adjective form | degrees, adverbial | ``VForm`` | verb form | infinitive, present, past, past participle, present participle +| ``VVType`` | infinitive form (for a VV) | bare infinitive, //to// infinitive, //ing// form -The assignment of parameter types and the identification of the separate parts of categories defines a +The assignment of parameter types and the identification of the separate parts of categories defines the **data structures** in which the words are stored in a lexicon. -This data structure is in GF called the **linearization type** of the category. From the computational +This data structure is in GF called the **linearization type** of the category. From the computer's point of view, it is important that the data structures are well defined for all words, even if this may -sound unnecessary. For instance, since some verbs need a particle part, all verbs must uniformly have a +sound unnecessary for the human. For instance, since some verbs need a particle part, all verbs must uniformly have a storage for this particle, even if it is empty most of the time. This property is guaranteed by an operation called **type checking**. It is performed by GF as a part of **grammar compilation**, which is the process in which the human-readable description of the grammar is converted to bits executable @@ -42,7 +44,7 @@ by the computer. || GF name | text name | example | inflectional features | inherent features || | ``N`` | noun | //house// | number, case | (none) | ``PN`` | proper name | //Paris// | case | (none) -| ``A`` | adjective | //blue// | degree | (none) +| ``A`` | adjective | //blue// | adjective form | (none) | ``V`` | verb | //sleep// | verb form | particle | ``Adv`` | adverb | //here// | (none) | (none) | ``V2`` | two-place verb | //love// | verb form | particle, preposition @@ -56,3 +58,149 @@ but a string. We have done the same with the preposition strings that define the complement features of verb and other subcategories. +The "digital grammar" representations of these types are **records**, where for instance the ``VV`` +record type is formally written +``` + {s : VForm => Str ; p : Str ; i : InfForm} +``` +The record has **fields** for different types of data. In the record above, there are three fields: +- the field labelled ``s``, storing an **inflection table** that produces a **string** (``Str``) depending on verb form, +- the field labelled ``p``, storing a string representing the particle, +- the field labelled ``i``, storing an inherent feature for the infinitive form required + + +Thus for instance the record for verb-complement verb //try// (//to do something//) in the lexicon looks as follows: +``` + {s = table { + VInf => "try" ; + VPres => "tries" ; + VPast => "tried" ; + VPastPart => "tried" ; + VPresPart => "trying" + } ; + p = "" ; + i = VVInf + } +``` +We have not introduce the GF names of the features, as we will not make essential use of them: we will prefer +informal explanations for all rules. So these records are a little hint for those who want to understand the +whole chain, from the rules as we state them in natural language, down to machine-readable digital grammars, +which ultimately have the same structure as our statements. + + +++Inflection paradigms++ + +In many languages, the description of inflectional forms occupies a large part of grammar books. Words, in particular +verbs, can have dozens of forms, and there can be dozens of different ways of building those forms. Each type of +inflection is described in a **paradigm**, which is a table including all forms of an example verb. For other +verbs, it is enough to indicate the number of the paradigm, to say that this verb is inflected "in the same way" +as the model verb. + + +===Nouns=== + +Computationally, inflection paradigms are **functions** that take as their arguments **stems**, to which suffixes +(and sometime prefixes) are added. Here is, for instance, the English **regular noun** paradigm: + +|| form | singular | plural || +| nominative | //dog// | //dogs// +| genitive | //dog's// | //dogs'// + +As a function, it is interpreted as follows: the word //dog// is the stem to which endings are added. Replacing it +with //cat//, //donkey//, //rabbit//, etc, will yield the forms of these words. + +In addition to nouns that are inflected with exactly the same suffixes as //dog//, English has +inflection types such as //fly-flies//, //kiss-kisses//, //bush-bushes//, //echo-echoes//. Each of these inflection types +could be described by a paradigm of its own. However, it is more attractive to see these as variations of the regular +paradigm, which are predictable by studying the singular nominative. This leads to a generalization of paradigms which +in the RGL are called **smart paradigms**. + +Here is the smart paradigm of English nouns. It tells how the plural nominative is formed from the singular; the +genitive forms are always formed by just adding //'s// in the singular and //'// in the plural. +- for nouns ending with //s//, //z//, //x//, //sh//, //ch//, the forms are like //kiss - kisses// +- for nouns ending with a vowel (one of //a//,//e//,//i//,//o//,//u//) followed by //y//, the forms are like //boy - boys// +- for all other nouns ending with //y//, the forms are like //baby - babies// +- for nouns ending with a vowel or //y// and followed by //o//, the forms are like //embryo - embryos// +- for all other nouns ending with //o//, the forms are like //echo - echoes// +- for all other nouns, the forms are like //dog - dogs// + + +The same rules are in GF expressed by **regular expression pattern matching** which, although formal and machine-readable, +might in fact be a nice notation for humans to read as well: +``` + "s" | "z" | "x" | "sh" | "ch" => + #vowel + "y" => + "y" => + (#vowel | "y") + "o" => + "o" => + _ => +``` +In this notation, ``|`` means "or" and ``+`` means "followed by". The pattern that is matched is followed by +an arrow ``=>``, after which the two forms appear within angel brackets. The patterns are matched in the given +order, and ``_`` means "anything that was not matched before". Finally, the function ``init`` returns the +initial segment of a word (e.g. //happ// for //happy//), and the pattern ``#vowel`` is defined as +``"a" | "e" | "i" | "o" | "u". + +In addition to regular and predictable nouns, English has **irregular nouns**, such as //man - men//, +//formula - formulae//, //ox - oxen//. These nouns have their plural genitive formed by //'s//: //men's//. + + + +===Adjectives=== + +English adjectives inflect for degree, with three values, and also have an adverbial form in their linearization type. +Here are some regular variations: +- for adjectives ending with consonant + vowel + consonant: //dim, dimmer, dimmest, dimly// +- for adjectives ending with //y// not preceded by a vowel: //happy, happier, happier, happily// +- for other adjectives: //quick, quicker, quickest, quickly// + + +The comparison forms only work for adjectives with at most two syllables. For longer ones, +they are formed syntactically: //expensive, more expensive, most expensive//. There are also +some irregular adjectives, the most extreme one being perhaps //good, better, best, well//. + + + +===Verbs=== + +English verbs have five different forms, except for the verb //be//, which has some more forms, e.g. +//sing, sings, sang, sung, singing//. +But //be// is also special syntactically and semantically, and is in the RGL introduced +in the syntax rather than in the lexicon. + +Two forms, the past (indicative) and the past participle are the same for the so-called **regular verbs** +(e.g. //play, plays, played, played, playing//). The regular verb paradigm thus looks as follows: + +|| feature | form || +| infinitive | //play// +| present | //plays// +| past | //played// +| past participle | //played// +| present participle | //plays// + +The predictable variables are related to the ones we have seen in nouns and adjectives: +the present tense of verbs varies in the same way as the plural of nouns, +and the past varies in the same way as the comparative of adjectives. The most important variations are +- for verbs ending with //s//, //z//, //x//, //sh//, //ch//: //kiss, kisses, kissed, kissing// +- for verbs ending with consonant + vowel + consonant: //dim, dims, dimmed, dimming// +- for verbs ending with //y// not preceded by a vowel: //cry, cries, cried, crying// +- for verbs ending with //ee//: //free, frees, freed, freeing// +- for verbs ending with //ie//: //die, dies, died, dying// +- for other verbs ending with //e//: //use, uses, used, using// +- for other verbs: //walk, walks, walked, walking// + + + +English also has a couple of hundred **irregular verbs**, whose infinitive, past, and past participle forms have to stored +separately. These free forms determine the other forms in the same way as regular verbs. Thus +- from //cut, cut, cut//, you also get //cuts, cutting// +- from //fly, flew, flown//, you also get //flies, flying// +- from //write, wrote, written//, you also get //writes, writing// + + + +===Structural words=== + + + + diff --git a/lib/doc/languages/gf-general-1.txt b/lib/doc/languages/gf-general-1.txt index 8d518ed16..2729f5865 100644 --- a/lib/doc/languages/gf-general-1.txt +++ b/lib/doc/languages/gf-general-1.txt @@ -3,6 +3,8 @@ Aarne Ranta %!Encoding:utf8 +%!style(html): ../revealpopup.css + %!postproc(tex) : "#BECE" "begin{center}" %!postproc(html) : "#BECE" "
" %!postproc(tex) : "#ENCE" "end{center}" @@ -233,7 +235,7 @@ future additions of more subcategories for verbs. ===Table: subcategories of verbs=== || GF name | text name | example | inherent complement features | semantics || -| ``V2`` | two-place verb | //love// (//someone// | case or preposition | ``e -> e -> t`` +| ``V2`` | two-place verb | //love// (//someone//) | case or preposition | ``e -> e -> t`` | ``V3`` | three-place verb | //give// (//something to someone//) | two cases or prepositions | ``e -> e -> e -> t`` | ``VV`` | verb-complement verb | //try// (//to do something//) | infinitive form | ``e -> v -> t`` | ``VS`` | sentence-complement verb | //know// (//that something happens//) | sentence mood | ``e -> t -> t`` diff --git a/lib/doc/languages/gf-general-2.txt b/lib/doc/languages/gf-general-2.txt index 100c8b794..cc7596ebb 100644 --- a/lib/doc/languages/gf-general-2.txt +++ b/lib/doc/languages/gf-general-2.txt @@ -2,4 +2,172 @@ +Syntax: general rules+ +The rules of syntax specify how words are combined to **phrases**, and how phrases are combined to even longer phrases. +Phrases, just like words, belong to different categories, which are equipped with inflectional and inherent features and +with semantic types. Moreover, each syntactic rule has a corresponding **semantic rule**, which specifies how the meaning +of the new phrases is constructed from the meanings of its parts. +The RGL has around 30 categories of phrases, on top of the lexical categories. The widest category is ``Text``, which cover +entire texts consisting of sentences, questions, interjections, etc, with punctuation. The following picture shows all RGL +categories as a dependency tree, where ``Text`` is in the root (so it is an upside-down tree), and the lexical categories +in the leaves. Being above another category in the tree means that phrases of higher categories can have phrases of lower +categories as parts. But these dependencies can work in both directions: for instance, the noun phrase (``NP``) +//every man who owns a donkey// has as its part the relative clause (``RCl``), which in turn has its part the noun phrase +//a donkey//. + +===Figure: the principal dependences of phrasal and lexical categories=== + +[../categories.png] + +Lexical categories appear in boxes rather than ellipses, with several categories gathered in some of the boxes. + + +++The structure of a clause++ + +It is convenient to start from the middle of the RGL: from the structure of a **clause** (``Cl``). A clause is an application +of a verb to its arguments. For instance, //John paints the house yellow// is an application of the ``V2V`` verb //paint// +to the arguments //John//, //the house//, and //yellow//. Recalling the table of lexical categories from Chapter 1, +we can summarize the semantic types of these parts as follows: +``` + paint : e -> e -> (e -> t) -> t + John : e + the house : e + yellow : e -> t +``` +Hence the verb //paint// is a **predicate**, a function that can be applied to arguments to return a proposition. +In this case, we can build the application +``` + paint John (the house) yellow : t +``` +which is thus an object of type ``t``. + +Applying verbs to arguments is how clauses work on the semantic level. However, the syntactic fine-structure is +a bit more complex. The predication process is hence divided to several steps, which involve intermediate categories. +Following these steps, a clause is built by adding one argument at a time. Doing in this way, rather than adding +all arguments at once, has two advantages: +- the grammar doesn't need to specify the same things again and again for different verb categories +- at each step of construction, some other rule could apply than adding an argument - for instance, adding an adverb + + +Here are the steps in which //John paints the house yellow// is constructed from its arguments in the RGL: +- //paints// and //yellow// are combined to a **verb phrase missing a noun phrase** (``VPSlash``) +- //paints - yellow// and //the house// are combined to a **verb phrase** (``VP``) +- //John// and //paints the house yellow// are combined to a **clause** (``Cl``) + + + +The structure is shown by the following tree: + +#BECE +[paint-abstract.png] +#ENCE +This tree is called the **abstract syntax tree** of the sentence. It shows the structural components from which the +sentence has been constructed. Its nodes show the GF names associated with syntax rules and internally used for building +structures. Thus for instance ``PredVP`` encodes the rule that combines a noun phrase and a verb phrase into a clause, +``UsePN`` converts a proper name to a noun phrase, and so on. Mathematically, these names +denote **functions** that build abstract syntax trees from other tree. Every tree belongs to some category. +The GF notation for the ``PredVP`` rule is +``` + PredVP : NP -> VP -> Cl +``` +in words, ``PredVP`` //is a function that takes a noun phrase and a verb phrase and returns a clause//. + +The tree is thus in fact built by function applications. A computer-friendly notation for trees uses +parentheses rather than graphical trees: +``` + PredVP + (UsePN john_PN) + (ComplSlash + (SlashV2A paint_V2A (PositA yellow_A)) + (DetCN (DetQuant DefArt NumSg) (UseN house_N))) +``` +Before going to the details of phrasal categories and rules, let us compare the abstract syntax tree with +another tree, known as **parse tree** or **concrete syntax tree**: + + +#BECE +[paint-concrete.png] +#ENCE +This tree shows, on its leaves, the clause that results from the combination of categories. Each node +is labelled with the category to which the part of the clause under it belongs to. As shown by the label +``VPSlash``, this part can consist of many separate groups of words, where words from constructions from +higher up are inserted. + +As parse trees display the actual words of a particular language, in a language-specific +order, they are less interesting from the multilingual point of view than the abstract syntax trees. +A GF grammar is thus primarily specified by its abstract syntax functions, which are language-neutral, +and secondarily by the **linearization rules** that convert them to different languages. + +Let us specify the phrasal categories that are used for making up predications. The lexical category ``V2A`` of +two-place adjective-complement verbs was explained in Chapter 1. + +===Table: phrasal categories involved in predication=== + +|| GF name | text name | example | inflection features | inherent features | parts | semantics || +| ``Cl`` | clause | //he paints it blue// | temporal, polarity | (none) | one | ``t`` +| ``VP`` | verb phrase | //paints it blue// | temporal, polarity, agreement | subject case | verb, complement | ``e -> t`` +| ``VPSlash`` | slash verb phrase | //paints - blue// | temporal, polarity, agreement | subject and complement case | verb, complement | ``e -> e -> t`` +| ``NP`` | noun phrase | //the house// | case | agreement | one | ``(e -> t) -> t`` +| ``AP`` | adjectival phrase | //very blue// | gender, numeber, case | position | one | ``a`` = ``e -> t`` + +TODO explain **agreement** and **temporal**. + +TODO explain the semantic type of ``NP``. + +The functions that build up the clause in our example tree are given in the following table, together with functions that +build the semantics of the constructed trees. The latter functions operate on variables belonging to the semantic types of +the arguments of the function. + +===Table: abstract syntax functions involved in predication=== + +|| GF name | type | example | semantics || +| ``PredVP`` | ``NP -> VP -> S`` | //he// + //paints the house blue// | ``np vp`` +| ``ComplSlash`` | ``VPSlash -> NP -> VP`` | //paints - blue// + //the house// | ``\x -> np (\y -> vpslash x y)`` +| ``SlashV2A`` | ``V2A -> AP -> VPSlash`` | //paints// + //blue// | ``\x,y -> v2a x y ap`` + +TODO explain lambda abstraction. + +The semantics of the clause //John paints the house yellow// can now be computed from the assumed meanings +``` + John* : e + paint* : e -> e -> (e -> t) -> t + the_house* : e + yellow* : e -> t +``` +as follows: +``` + (PredVP John (ComplSlash (SlashV2A paint yellow) the-house))* + = (ComplSlash (SlashV2A paint yellow) the_house)* John* + = (SlashV2A paint yellow)* John* the_house* + = paint* John* the_house* yellow* +``` +for the moment ignoring the internal structure of noun phrases, which will be explained later. + +The linearization rules work very much in the same way as the semantic rules. They obey the definitions of +inflectional and inherent features and discontinuous parts, which together define linearization types of +the phrasal categories. These types are at this point schematic, because we don't assume any particular +language. But what we can read out from the category table above is as follows: + +===Table: schematic linearization types=== + +|| GF name | text name | linearization type || +| ``Cl`` | clause | ``{s : Temp => Pol => Str}`` +| ``VP`` | verb phrase | ``{s : Temp => Pol => Agr => {verb,compl : Str} ; sc : Case}`` +| ``VPSlash`` | slash verb phrase | ``{s : Temp => Pol => Agr => {verb,compl : Str} ; sc, cc : Case}`` +| ``NP`` | noun phrase | ``{s : Case => Str ; a : Agr}`` +| ``AP`` | adjectival phrase | ``{s : Gender => Number => Case => Str ; isPre : Bool}`` + + +These types suggest the following linearization rules: +``` + PredVP np vp = {s = \\t,p => np.s ! vp.sc ++ vps.verb ++ vps.compl where vps = vp.s ! t ! p ! np.a} +``` +TODO linearization of the example + + +Similar rules as to ``V2A`` apply to all subcategories of verbs. The ``V2`` verbs are first made into ``VPSlash`` +by giving the non-NP complement. ``V3`` verbs can take their two NP complements in either order, which +means that there are two ``VPSlash``-producing rules. This +makes it possible to form both the questions //what did she give him// and //whom did she give it//. +The other ``V`` categories are turned into ``VP`` without going through ``VPSlash``, since they have +no noun phrase complements. diff --git a/lib/doc/languages/paint-abstract.png b/lib/doc/languages/paint-abstract.png new file mode 100644 index 000000000..e7420eba3 Binary files /dev/null and b/lib/doc/languages/paint-abstract.png differ diff --git a/lib/doc/languages/paint-concrete.png b/lib/doc/languages/paint-concrete.png new file mode 100644 index 000000000..e786eaced Binary files /dev/null and b/lib/doc/languages/paint-concrete.png differ