1
0
forked from GitHub/gf-core

editing the tutorial

This commit is contained in:
aarne
2006-04-03 15:37:39 +00:00
parent 41f899b228
commit 1b2f70545c

View File

@@ -101,7 +101,8 @@ These grammars can be used as **libraries** to define application grammars.
In this way, it is possible to write a high-quality grammar without In this way, it is possible to write a high-quality grammar without
knowing about linguistics: in general, to write an application grammar knowing about linguistics: in general, to write an application grammar
by using the resource library just requires practical knowledge of by using the resource library just requires practical knowledge of
the target language. the target language. and all theoretical knowledge about its grammar
is given by the libraries.
@@ -135,9 +136,10 @@ notation (also known as BNF). The BNF format is often a good
starting point for GF grammar development, because it is starting point for GF grammar development, because it is
simple and widely used. However, the BNF format is not simple and widely used. However, the BNF format is not
good for multilingual grammars. While it is possible to good for multilingual grammars. While it is possible to
translate the words contained in a BNF grammar to another "translate" by just changing the words contained in a
language, proper translation usually involves more, e.g. BNF grammar to words of some other
changing the word order in language, proper translation usually involves more.
For instance, the order of words may have to be changed:
``` Italian cheese ===> formaggio italiano ``` Italian cheese ===> formaggio italiano
The full GF grammar format is designed to support such The full GF grammar format is designed to support such
changes, by separating between the **abstract syntax** changes, by separating between the **abstract syntax**
@@ -150,13 +152,13 @@ they have vary from language to language. For instance,
Italian adjectives usually have four forms where English Italian adjectives usually have four forms where English
has just one: has just one:
``` ```
delicious (wine | wines | pizza | pizzas) delicious (wine, wines, pizza, pizzas)
vino delizioso, vini deliziosi, pizza deliziosa, pizze deliziose vino delizioso, vini deliziosi, pizza deliziosa, pizze deliziose
``` ```
The **morphology** of a language describes the The **morphology** of a language describes the
forms of its words. While the complete description of morphology forms of its words. While the complete description of morphology
belongs to resource grammars, the tutorial will explain the belongs to resource grammars, this tutorial will explain the
main programming concepts involved. This will moreover programming concepts involved in morphology. This will moreover
make it possible to grow the fragment covered by the food example. make it possible to grow the fragment covered by the food example.
The tutorial will in fact build a toy resource grammar in order The tutorial will in fact build a toy resource grammar in order
to illustrate the module structure of library-based application to illustrate the module structure of library-based application
@@ -212,7 +214,6 @@ The command
will give you a list of available commands. will give you a list of available commands.
As a common convention in this Tutorial, we will use As a common convention in this Tutorial, we will use
- ``%`` as a prompt that marks system commands - ``%`` as a prompt that marks system commands
- ``>`` as a prompt that marks GF commands - ``>`` as a prompt that marks GF commands
@@ -427,7 +428,7 @@ a sentence but a sequence of ten sentences.
===Labelled context-free grammars=== ===Labelled context-free grammars===
The syntax trees returned by GF's parser in the previous examples The syntax trees returned by GF's parser in the previous examples
are not so nice to look at. The identifiers of form ``Mks`` are not so nice to look at. The identifiers that form the tree
are **labels** of the BNF rules. To see which label corresponds to are **labels** of the BNF rules. To see which label corresponds to
which rule, you can use the ``print_grammar = pg`` command which rule, you can use the ``print_grammar = pg`` command
with the ``printer`` flag set to ``cf`` (which means context-free): with the ``printer`` flag set to ``cf`` (which means context-free):
@@ -471,7 +472,7 @@ labels to each rule.
In files with the suffix ``.cf``, you can prefix rules with In files with the suffix ``.cf``, you can prefix rules with
labels that you provide yourself - these may be more useful labels that you provide yourself - these may be more useful
than the automatically generated ones. The following is a possible than the automatically generated ones. The following is a possible
labelling of ``paleolithic.cf`` with nicer-looking labels. labelling of ``food.cf`` with nicer-looking labels.
``` ```
Is. S ::= Item "is" Quality ; Is. S ::= Item "is" Quality ;
That. Item ::= "that" Kind ; That. Item ::= "that" Kind ;
@@ -498,7 +499,7 @@ With this grammar, the trees look as follows:
%--! %--!
==The ``.gf`` grammar format== ==The .gf grammar format==
To see what there is in GF's shell state when a grammar To see what there is in GF's shell state when a grammar
has been imported, you can give the plain command has been imported, you can give the plain command
@@ -529,7 +530,7 @@ A GF grammar consists of two main parts:
- **concrete syntax**, defining how trees are linearized into strings - **concrete syntax**, defining how trees are linearized into strings
The EBNF and CF formats fuse these two things together, but it is possible The CF format fuses these two things together, but it is possible
to take them apart. For instance, the sentence formation rule to take them apart. For instance, the sentence formation rule
``` ```
Is. S ::= Item "is" Quality ; Is. S ::= Item "is" Quality ;
@@ -573,7 +574,7 @@ judgement forms:
We return to the precise meanings of these judgement forms later. We return to the precise meanings of these judgement forms later.
First we will look at how judgements are grouped into modules, and First we will look at how judgements are grouped into modules, and
show how the paleolithic grammar is show how the food grammar is
expressed by using modules and judgements. expressed by using modules and judgements.
@@ -728,7 +729,7 @@ one abstract syntax can be equipped with many concrete syntaxes.
A system with this property is called a **multilingual grammar**. A system with this property is called a **multilingual grammar**.
Multilingual grammars can be used for applications such as Multilingual grammars can be used for applications such as
translation. Let us buid an Italian concrete syntax for translation. Let us build an Italian concrete syntax for
``Food`` and then test the resulting ``Food`` and then test the resulting
multilingual grammar. multilingual grammar.
@@ -946,6 +947,7 @@ The graph uses
- black-headed arrows for inheritance - black-headed arrows for inheritance
- white-headed arrows for the concrete-of-abstract relation - white-headed arrows for the concrete-of-abstract relation
[Foodmarket.png] [Foodmarket.png]
@@ -967,7 +969,7 @@ shell escape symbol ``!``. The resulting graph was shown in the previous section
The command ``print_multi = pm`` is used for printing the current multilingual The command ``print_multi = pm`` is used for printing the current multilingual
grammar in various formats, of which the format ``-printer=graph`` just grammar in various formats, of which the format ``-printer=graph`` just
shows the module dependencies. Use the ``help`` to see what other formats shows the module dependencies. Use ``help`` to see what other formats
are available: are available:
``` ```
> help pm > help pm
@@ -982,9 +984,9 @@ are available:
===The golden rule of functional programming=== ===The golden rule of functional programming===
In comparison to the ``.cf`` format, the ``.gf`` format still looks rather In comparison to the ``.cf`` format, the ``.gf`` format looks rather
verbose, and demands lots more characters to be written. You have probably verbose, and demands lots more characters to be written. You have probably
done this by the copy-paste-modify method, which is a standard way to done this by the copy-paste-modify method, which is a common way to
avoid repeating work. avoid repeating work.
However, there is a more elegant way to avoid repeating work than the copy-and-paste However, there is a more elegant way to avoid repeating work than the copy-and-paste
@@ -995,8 +997,8 @@ method. The **golden rule of functional programming** says that
A function separates the shared parts of different computations from the A function separates the shared parts of different computations from the
changing parts, parameters. In functional programming languages, such as changing parts, parameters. In functional programming languages, such as
[Haskell http://www.haskell.org], it is possible to share muc more than in [Haskell http://www.haskell.org], it is possible to share much more than in
the languages such as C and Java. languages such as C and Java.
===Operation definitions=== ===Operation definitions===
@@ -1041,11 +1043,8 @@ strings and records.
resource StringOper = { resource StringOper = {
oper oper
SS : Type = {s : Str} ; SS : Type = {s : Str} ;
ss : Str -> SS = \x -> {s = x} ; ss : Str -> SS = \x -> {s = x} ;
cc : SS -> SS -> SS = \x,y -> ss (x.s ++ y.s) ; cc : SS -> SS -> SS = \x,y -> ss (x.s ++ y.s) ;
prefix : Str -> SS -> SS = \p,x -> ss (p ++ x.s) ; prefix : Str -> SS -> SS = \p,x -> ss (p ++ x.s) ;
} }
``` ```
@@ -1181,7 +1180,7 @@ a **paradigm** - a formula telling how the inflection
forms of a word are formed. forms of a word are formed.
From GF point of view, a paradigm is a function that takes a **lemma** - From GF point of view, a paradigm is a function that takes a **lemma** -
a string also known as a **dictionary form** - and returns an inflection also known as a **dictionary form** - and returns an inflection
table of desired type. Paradigms are not functions in the sense of the table of desired type. Paradigms are not functions in the sense of the
``fun`` judgements of abstract syntax (which operate on trees and not ``fun`` judgements of abstract syntax (which operate on trees and not
on strings), but operations defined in ``oper`` judgements. on strings), but operations defined in ``oper`` judgements.
@@ -1204,13 +1203,13 @@ are written together to form one **token**. Thus, for instance,
%--! %--!
===Worst-case macros and data abstraction=== ===Worst-case functions and data abstraction===
Some English nouns, such as ``mouse``, are so irregular that Some English nouns, such as ``mouse``, are so irregular that
it makes no sense to see them as instances of a paradigm. Even it makes no sense to see them as instances of a paradigm. Even
then, it is useful to perform **data abstraction** from the then, it is useful to perform **data abstraction** from the
definition of the type ``Noun``, and introduce a constructor definition of the type ``Noun``, and introduce a constructor
operation, a **worst-case macro** for nouns: operation, a **worst-case function** for nouns:
``` ```
oper mkNoun : Str -> Str -> Noun = \x,y -> { oper mkNoun : Str -> Str -> Noun = \x,y -> {
s = table { s = table {
@@ -1230,7 +1229,7 @@ and
``` ```
instead of writing the inflection table explicitly. instead of writing the inflection table explicitly.
The grammar engineering advantage of worst-case macros is that The grammar engineering advantage of worst-case functions is that
the author of the resource module may change the definitions of the author of the resource module may change the definitions of
``Noun`` and ``mkNoun``, and still retain the ``Noun`` and ``mkNoun``, and still retain the
interface (i.e. the system of type signatures) that makes it interface (i.e. the system of type signatures) that makes it
@@ -1240,7 +1239,7 @@ terms, ``Noun`` is then treated as an **abstract datatype**.
%--! %--!
===A system of paradigms using ``Prelude`` operations=== ===A system of paradigms using Prelude operations===
In addition to the completely regular noun paradigm ``regNoun``, In addition to the completely regular noun paradigm ``regNoun``,
some other frequent noun paradigms deserve to be some other frequent noun paradigms deserve to be
@@ -1432,7 +1431,7 @@ The rule of subject-verb agreement in English says that the verb
phrase must be inflected in the number of the subject. This phrase must be inflected in the number of the subject. This
means that a noun phrase (functioning as a subject), inherently means that a noun phrase (functioning as a subject), inherently
//has// a number, which it passes to the verb. The verb does not //has// a number, which it passes to the verb. The verb does not
//have// a number, but must be able to receive whatever number the //have// a number, but must be able to //receive// whatever number the
subject has. This distinction is nicely represented by the subject has. This distinction is nicely represented by the
different linearization types of **noun phrases** and **verb phrases**: different linearization types of **noun phrases** and **verb phrases**:
``` ```
@@ -1440,7 +1439,8 @@ different linearization types of **noun phrases** and **verb phrases**:
lincat VP = {s : Number => Str} ; lincat VP = {s : Number => Str} ;
``` ```
We say that the number of ``NP`` is an **inherent feature**, We say that the number of ``NP`` is an **inherent feature**,
whereas the number of ``NP`` is **parametric**. whereas the number of ``NP`` is a **variable feature** (or a
**parametric feature**).
The agreement rule itself is expressed in the linearization rule of The agreement rule itself is expressed in the linearization rule of
the predication structure: the predication structure: