started grammar description text

This commit is contained in:
aarne
2013-08-25 09:47:39 +00:00
parent 288bcafb79
commit 7492aa01d9
2 changed files with 188 additions and 0 deletions

View File

@@ -0,0 +1,5 @@
all: english
english:
txt2tags -thtml --toc gf-english.txt

View File

@@ -0,0 +1,183 @@
English: A Digital Grammar
Aarne Ranta
%%date
%!postproc(tex) : "#BECE" "begin{center}"
%!postproc(html) : "#BECE" "<center>"
%!postproc(tex) : "#ENCE" "end{center}"
%!postproc(html) : "#ENCE" "</center>"
**Digital grammars** are grammars usable by computers, so that they can mechanically perform
tasks like interpreting, producing, and translating languages. The **GF Resource Grammar Library**
(RGL) is a set of digital grammars which, at the time of writing, covers 28 languages. These grammars
are written in GF, **Grammatical Framework**, which is a programming language designed for
writing digital grammars.
The grammars in the RGL have been written by linguists, computer scientists, and
programmers who know the languages thoroughly, both in practice and in theory. Almost 50 persons from
around the world have contributed to this work, and ongoing projects are expected to give us many new
languages soon.
The leading idea of the RGL is that different languages share large parts of their grammars, despite
their observed differences. One important thing that is shared are the **categories**, that is, the
types of words and expressions. For instance, every language in RGL has a category of **nouns**, but
what exactly a noun is varies from language to language. Thus English nouns have four forms
(singular and plural, nominative and genitive, as in //house, houses, house's, houses'//)
whereas French nouns have just two forms (singular and plural //maison, maisons//, "house"), but they also
have a piece of information that English nouns don't have, namely gender (masculine and feminine).
Chinese nouns have just one form (房子 //fangzi// "house"), which is used for both singular and plural, but in
addition, a little bit like the French gender, they have a **classifier** (间 //jian// for the word
"house"). German nouns have 8 forms and a gender, Finnish nouns have 26 forms, and so on.
+Lexical categories+
Categories of words are called **lexical categories**.
The language-specific variation in lexical categories is due to **morphology**, that is, the different forms that
one and the same word can have in different contexts. If we look at the 28 languages in the RGL, we can
see that the classification of words is common to all the languages, and the
differences are in morphology. In this chapter, we will explain all lexical categories and give an overview
of their morphological aspects. Details of morphology for each language is given in the language-specific documents.
++Main parts of speech: content words++
The most important categories of words are given in the following table. More precisely, we will give the
categories of **content words**, which, so so say, describe things and events in the real world.
Content words are distinguished from **structural words**, whose purpose is to combine words into syntactic
structures. Each category of content words may have thousands of words, and new words can be introduced
continuously; therefore, these categories are also called **open categories**. In contrast, structural
words are very few (maybe some dozens), and new ones are very seldom added.
Each category has a GF name, that is, a short symbolic name, which is the name actually used in the GF program code.
In the text we usually use the text names, but will sometimes find the GF names handy to use as well, since they
give us a short and precise way to state grammatical rules.
===Table: categories of content words===
|| GF name | text name | example | inflectional features | inherent features ||
| ``N`` | noun | //house// | number, case | gender, classifier
| ``PN`` | proper name | //Paris// | case | gender
| ``A`` | adjective | //blue// | gender, number, case, degree | position
| ``V`` | verb | //sleep// | number, person, tense, aspect, mood | subject case
| ``Adv`` | adverb | //here// | (none) | adverb type (place, time, manner)
In addition to the names and examples, the table lists the **inflectional features** and **inherent features**
typical of each category. Inflectional features are those that create different forms of words. For instance,
French nouns have forms for number (singular and plural) - or, as one often says,
French nouns are //inflected for number//. In contrast to number, the gender does not give rise to different forms
of French nouns: //maison// ("house") //is// feminine, inherently, and there is no masculine form of //maison//.
(Of course, there are some nouns that do have masculine and feminine forms, such as //étudiant, étudiante//
"male/female student", but this only applies to a minority of French nouns and shouldn't be taken as an
indication of an inflectional gender.)
++Syntactic implications++
The features given in the table are rough indications for what one can expect in different languages. Thus,
for instance, some languages have no gender at all, and hence their nouns and adjectives won't have
genders either. But the table is a rather good generalization from the 28 language of the RGL: we can
safely say that, if a language //does// have gender, then nouns have an inherent gender and adjectives have
a variable gender. This is not a coincidence but has to do with **syntax**, that is, the combination of words
into complex expressions. Thus, for instance, nouns are combined with adjectives that modify them, so that
#BECE
//blue// + //house// = //blue house//
#ENCE
Now, adjectives have to be combinable with all nouns, independently of the gender of the noun: there are no
separate classes of masculine and feminine adjectives (again, with some apparent exceptions, such as //pregnant//,
but even these adjectives have at least grammatically correct metaphoric uses with nouns of other genders).
This means that we must be able to pick the gender of the adjective in agreement with the gender of the noun
that it modifies, which means that the gender of adjectives must be inflectional. Thus in French the adjective
for "blue" is //bleu//, with the feminine form //bleue//, and works as follows:
#BECE
//bleu// + //maison// = //maison bleue// ("blue house", feminine)
//bleu// + //livre// = //livre bleu// ("blue book", masculine)
#ENCE
French also provides examples of adjectives with different **positions**: //bleu// is put after the noun
it modifies, whereas //vieux// ("old") is put before the noun: //vieux livre// ("old book").
We will return to syntax later. At this point, it is sufficient to say that the morphological features of
words are not there just for nothing, but they play an important role in how words are combined in syntax.
In particular, they determine to a great extent how **agreement** works, that is, how the features of
words depend on each other in combinations.
++Subcategorization++
In addition to the features needed for inflection and agreement, the lexicon must give information about //what//
combinations are possible with each word. For most nouns and adjective, this is simple: a noun can be modified
by an adjective, for instance, and there is a uniform syntax rule for this. However, there are some nouns and adjectives
that are trickier, because they don't correspond to simple things but to **relations**. For instance, //brother// is
a **relational noun**, since its primary usage is not alone bur in phrases like //brother of this man//.
In the same way, //similar//
is a **relational adjective**, since its primary use is in phrases like //similar to this//. The additional
term attached to these words is called its **complement**; thus //this// is the complement in //similar to this//.
The categories of words that take complements are called **subcategories**. They are morphologically similar to
the main categories, but need extra information for the usage of complements.
The RGL has categories
for relational nouns and adjectives, and nouns also have a variant with two complements
(e.g. //distance from Paris to Munich//).
From the logical point of you, complements are called **places**, and the number of places
is one plus the number of complements. Hence, for instance, ``N2`` is a **two-place noun**, and
in a phrase like
#BECE
//John is a brother of Mary//,
#ENCE
//John// occupies the "first place" and //Mary// occupies the "second place". This terminology is ultimately
borrowed from logic, where this phrase is represented as the application of a **two-place predicate**,
#BECE
//brother//(//John//,//Mary//).
#ENCE
Ordinary nouns (``N``) have one place, and could therefore in principle be called ``N1``.
The following table shows the categories of relational nouns and adjectives in the RGL. The inflectional and
inherent features are the same as for one-place nouns and adjectives, but for each complement, the lexicon
must tell what preposition, if any, is needed to attach that complement. For instance, the preposition for
//similar// is //to//, whereas the preposition for //different// is //from//. In languages with richer case
systems (such as German, Latin, and Finnish), the complement information also determines the case (genitive,
dative, ablative, and so on).
===Table: subcategories of nouns and adjectives===
|| GF name | text name | example | inherent complement features ||
| ``N2`` | two-place noun | //brother// (//of someone// | case or preposition
| ``N3`` | three-place noun | //distance// (//from some place to some place// | case or preposition
| ``A2`` | two-place adjective | //similar// (//to something// | case or preposition
Verbs show a particularly rich variation in subcategorization. The most familiar distinction is the one between
**intransitive** and **transitive** verbs: intransitive verbs need only a **subject** (like //she// in //she sleeps//),
whereas transitive verbs also need an **object** (like //him// in //she loves him//). Our category ``V`` obviously includes
intransitive verbs. But there is no category for transitive verbs in the RGL. Instead, we have a more general category of
**two-place verbs**, which includes transitive verbs but also verbs that need a preposition (such as //at// in
//she looks at him//). Just like for relational nouns and adjectives, the complement of a two-place verb has variations
in cases and prepositions.
The following table shows the subcategories of verbs in the RGL. The list is long but it may still be incomplete. For
example, there are no four-place verbs (//she paid him one million pounds for the house//). Such constructions can
be built, as we will see later, by using for instance a ``V3`` verb with an additional adverb. But we can envisage
future additions of more subcategories for verbs.
===Table: subcategories of verbs===
|| GF name | text name | example | inherent complement features ||
| ``V2`` | two-place verb | //love// (//someone// | case or preposition
| ``V3`` | three-place verb | //give// (//something to someone//) | two cases or prepositions
| ``VV`` | verb-complement verb | //try// (//to do something//) | infinitive form
| ``VS`` | sentence-complement verb | //know// (//that something happens//) | sentence mood
| ``VQ`` | question-complement verb | //ask// (//what happens//) | question mood
| ``VA`` | adjective-complement verb | //become// (//something, e.g. old//) | adjective case
| ``V2V`` | two-place verb-complement verb | //force// (//someone to do something//) | infinitive form, control type
| ``V2S`` | two-place sentence-complement verb | //tell// (//someone that something happens//) | object case, sentence mood
| ``V2Q`` | two-place question-complement verb | //ask// (//someone what happens//) | object case, question mood
| ``V2A`` | two-place adjective-complement verb | //paint// (//something in some colour, e.g. blue//) | object and adjective case