diff --git a/lib/doc/languages/Makefile b/lib/doc/languages/Makefile new file mode 100644 index 000000000..9ea49db68 --- /dev/null +++ b/lib/doc/languages/Makefile @@ -0,0 +1,5 @@ +all: english + +english: + txt2tags -thtml --toc gf-english.txt + diff --git a/lib/doc/languages/gf-english.txt b/lib/doc/languages/gf-english.txt new file mode 100644 index 000000000..dad7139d6 --- /dev/null +++ b/lib/doc/languages/gf-english.txt @@ -0,0 +1,183 @@ +English: A Digital Grammar +Aarne Ranta +%%date + + +%!postproc(tex) : "#BECE" "begin{center}" +%!postproc(html) : "#BECE" "
" +%!postproc(tex) : "#ENCE" "end{center}" +%!postproc(html) : "#ENCE" "
" + + +**Digital grammars** are grammars usable by computers, so that they can mechanically perform +tasks like interpreting, producing, and translating languages. The **GF Resource Grammar Library** +(RGL) is a set of digital grammars which, at the time of writing, covers 28 languages. These grammars +are written in GF, **Grammatical Framework**, which is a programming language designed for +writing digital grammars. + +The grammars in the RGL have been written by linguists, computer scientists, and +programmers who know the languages thoroughly, both in practice and in theory. Almost 50 persons from +around the world have contributed to this work, and ongoing projects are expected to give us many new +languages soon. + +The leading idea of the RGL is that different languages share large parts of their grammars, despite +their observed differences. One important thing that is shared are the **categories**, that is, the +types of words and expressions. For instance, every language in RGL has a category of **nouns**, but +what exactly a noun is varies from language to language. Thus English nouns have four forms +(singular and plural, nominative and genitive, as in //house, houses, house's, houses'//) +whereas French nouns have just two forms (singular and plural //maison, maisons//, "house"), but they also +have a piece of information that English nouns don't have, namely gender (masculine and feminine). +Chinese nouns have just one form (房子 //fangzi// "house"), which is used for both singular and plural, but in +addition, a little bit like the French gender, they have a **classifier** (间 //jian// for the word +"house"). German nouns have 8 forms and a gender, Finnish nouns have 26 forms, and so on. + + + ++Lexical categories+ + +Categories of words are called **lexical categories**. +The language-specific variation in lexical categories is due to **morphology**, that is, the different forms that +one and the same word can have in different contexts. If we look at the 28 languages in the RGL, we can +see that the classification of words is common to all the languages, and the +differences are in morphology. In this chapter, we will explain all lexical categories and give an overview +of their morphological aspects. Details of morphology for each language is given in the language-specific documents. + + +++Main parts of speech: content words++ + +The most important categories of words are given in the following table. More precisely, we will give the +categories of **content words**, which, so so say, describe things and events in the real world. +Content words are distinguished from **structural words**, whose purpose is to combine words into syntactic +structures. Each category of content words may have thousands of words, and new words can be introduced +continuously; therefore, these categories are also called **open categories**. In contrast, structural +words are very few (maybe some dozens), and new ones are very seldom added. + +Each category has a GF name, that is, a short symbolic name, which is the name actually used in the GF program code. +In the text we usually use the text names, but will sometimes find the GF names handy to use as well, since they +give us a short and precise way to state grammatical rules. + + +===Table: categories of content words=== + +|| GF name | text name | example | inflectional features | inherent features || +| ``N`` | noun | //house// | number, case | gender, classifier +| ``PN`` | proper name | //Paris// | case | gender +| ``A`` | adjective | //blue// | gender, number, case, degree | position +| ``V`` | verb | //sleep// | number, person, tense, aspect, mood | subject case +| ``Adv`` | adverb | //here// | (none) | adverb type (place, time, manner) + + +In addition to the names and examples, the table lists the **inflectional features** and **inherent features** +typical of each category. Inflectional features are those that create different forms of words. For instance, +French nouns have forms for number (singular and plural) - or, as one often says, +French nouns are //inflected for number//. In contrast to number, the gender does not give rise to different forms +of French nouns: //maison// ("house") //is// feminine, inherently, and there is no masculine form of //maison//. +(Of course, there are some nouns that do have masculine and feminine forms, such as //étudiant, étudiante// +"male/female student", but this only applies to a minority of French nouns and shouldn't be taken as an +indication of an inflectional gender.) + + +++Syntactic implications++ + +The features given in the table are rough indications for what one can expect in different languages. Thus, +for instance, some languages have no gender at all, and hence their nouns and adjectives won't have +genders either. But the table is a rather good generalization from the 28 language of the RGL: we can +safely say that, if a language //does// have gender, then nouns have an inherent gender and adjectives have +a variable gender. This is not a coincidence but has to do with **syntax**, that is, the combination of words +into complex expressions. Thus, for instance, nouns are combined with adjectives that modify them, so that +#BECE +//blue// + //house// = //blue house// +#ENCE +Now, adjectives have to be combinable with all nouns, independently of the gender of the noun: there are no +separate classes of masculine and feminine adjectives (again, with some apparent exceptions, such as //pregnant//, +but even these adjectives have at least grammatically correct metaphoric uses with nouns of other genders). +This means that we must be able to pick the gender of the adjective in agreement with the gender of the noun +that it modifies, which means that the gender of adjectives must be inflectional. Thus in French the adjective +for "blue" is //bleu//, with the feminine form //bleue//, and works as follows: +#BECE +//bleu// + //maison// = //maison bleue// ("blue house", feminine) + +//bleu// + //livre// = //livre bleu// ("blue book", masculine) +#ENCE +French also provides examples of adjectives with different **positions**: //bleu// is put after the noun +it modifies, whereas //vieux// ("old") is put before the noun: //vieux livre// ("old book"). + +We will return to syntax later. At this point, it is sufficient to say that the morphological features of +words are not there just for nothing, but they play an important role in how words are combined in syntax. +In particular, they determine to a great extent how **agreement** works, that is, how the features of +words depend on each other in combinations. + + +++Subcategorization++ + +In addition to the features needed for inflection and agreement, the lexicon must give information about //what// +combinations are possible with each word. For most nouns and adjective, this is simple: a noun can be modified +by an adjective, for instance, and there is a uniform syntax rule for this. However, there are some nouns and adjectives +that are trickier, because they don't correspond to simple things but to **relations**. For instance, //brother// is +a **relational noun**, since its primary usage is not alone bur in phrases like //brother of this man//. +In the same way, //similar// +is a **relational adjective**, since its primary use is in phrases like //similar to this//. The additional +term attached to these words is called its **complement**; thus //this// is the complement in //similar to this//. +The categories of words that take complements are called **subcategories**. They are morphologically similar to +the main categories, but need extra information for the usage of complements. + +The RGL has categories +for relational nouns and adjectives, and nouns also have a variant with two complements +(e.g. //distance from Paris to Munich//). +From the logical point of you, complements are called **places**, and the number of places +is one plus the number of complements. Hence, for instance, ``N2`` is a **two-place noun**, and +in a phrase like +#BECE +//John is a brother of Mary//, +#ENCE +//John// occupies the "first place" and //Mary// occupies the "second place". This terminology is ultimately +borrowed from logic, where this phrase is represented as the application of a **two-place predicate**, +#BECE +//brother//(//John//,//Mary//). +#ENCE +Ordinary nouns (``N``) have one place, and could therefore in principle be called ``N1``. + +The following table shows the categories of relational nouns and adjectives in the RGL. The inflectional and +inherent features are the same as for one-place nouns and adjectives, but for each complement, the lexicon +must tell what preposition, if any, is needed to attach that complement. For instance, the preposition for +//similar// is //to//, whereas the preposition for //different// is //from//. In languages with richer case +systems (such as German, Latin, and Finnish), the complement information also determines the case (genitive, +dative, ablative, and so on). + + +===Table: subcategories of nouns and adjectives=== + +|| GF name | text name | example | inherent complement features || +| ``N2`` | two-place noun | //brother// (//of someone// | case or preposition +| ``N3`` | three-place noun | //distance// (//from some place to some place// | case or preposition +| ``A2`` | two-place adjective | //similar// (//to something// | case or preposition + + +Verbs show a particularly rich variation in subcategorization. The most familiar distinction is the one between +**intransitive** and **transitive** verbs: intransitive verbs need only a **subject** (like //she// in //she sleeps//), +whereas transitive verbs also need an **object** (like //him// in //she loves him//). Our category ``V`` obviously includes +intransitive verbs. But there is no category for transitive verbs in the RGL. Instead, we have a more general category of +**two-place verbs**, which includes transitive verbs but also verbs that need a preposition (such as //at// in +//she looks at him//). Just like for relational nouns and adjectives, the complement of a two-place verb has variations +in cases and prepositions. + +The following table shows the subcategories of verbs in the RGL. The list is long but it may still be incomplete. For +example, there are no four-place verbs (//she paid him one million pounds for the house//). Such constructions can +be built, as we will see later, by using for instance a ``V3`` verb with an additional adverb. But we can envisage +future additions of more subcategories for verbs. + + +===Table: subcategories of verbs=== + +|| GF name | text name | example | inherent complement features || +| ``V2`` | two-place verb | //love// (//someone// | case or preposition +| ``V3`` | three-place verb | //give// (//something to someone//) | two cases or prepositions +| ``VV`` | verb-complement verb | //try// (//to do something//) | infinitive form +| ``VS`` | sentence-complement verb | //know// (//that something happens//) | sentence mood +| ``VQ`` | question-complement verb | //ask// (//what happens//) | question mood +| ``VA`` | adjective-complement verb | //become// (//something, e.g. old//) | adjective case +| ``V2V`` | two-place verb-complement verb | //force// (//someone to do something//) | infinitive form, control type +| ``V2S`` | two-place sentence-complement verb | //tell// (//someone that something happens//) | object case, sentence mood +| ``V2Q`` | two-place question-complement verb | //ask// (//someone what happens//) | object case, question mood +| ``V2A`` | two-place adjective-complement verb | //paint// (//something in some colour, e.g. blue//) | object and adjective case +