From 5161a93ae8e728e70eac16087cb83df5436ac8a6 Mon Sep 17 00:00:00 2001
From: aarne Grammatical Framework Tutorial
Author: Aarne Ranta <aarne (at) cs.chalmers.se>
-Last update: Fri Dec 16 21:04:37 2005
+Last update: Fri Dec 16 22:10:53 2005
-
-
+
+
+
+
@@ -109,7 +144,7 @@ To start the GF program, assuming you have installed it, just type
in the shell. You will see GF's welcome message and the prompt >.
Now you are ready to try out your first grammar. We start with one that is not written in GF language, but @@ -260,7 +295,7 @@ generate ten strings with one and the same command:
-To generate <i>all<i> sentence that a grammar
+To generate all sentence that a grammar
can generate, use the command generate_trees = gt.
@@ -301,9 +336,10 @@ want to see:> gr -tr | l -tr | p - Mks_0 (Mks_7 Mks_10) (Mks_1 Mks_18) - a louse sleeps - Mks_0 (Mks_7 Mks_10) (Mks_1 Mks_18) + + S_NP_VP (NP_the_CN CN_snake) (VP_V V_sleeps) + the snake sleeps + S_NP_VP (NP_the_CN CN_snake) (VP_V V_sleeps)This facility is good for test purposes: for instance, you @@ -324,7 +360,7 @@ You can read the file back to GF with the
read_file = rfcommand,- > read_file exx.tmp | l -tr | p -lines + > read_file exx.tmp | p -linesNotice the flag
-linesgiven to the parsing @@ -338,45 +374,51 @@ a sentence but a sequence of ten sentences.The syntax trees returned by GF's parser in the previous examples are not so nice to look at. The identifiers of form
Mks-are labels of the EBNF rules. To see which label corresponds to +are labels of the BNF rules. To see which label corresponds to which rule, you can use theprint_grammar = pgcommand with theprinterflag set tocf(which means context-free):> print_grammar -printer=cf - Mks_10. CN ::= "louse" ; - Mks_11. CN ::= "snake" ; - Mks_12. CN ::= "worm" ; - Mks_8. CN ::= A CN ; - Mks_9. CN ::= "boy" ; - Mks_4. NP ::= "this" CN ; - Mks_15. A ::= "thick" ; + + V_laughs. V ::= "laughs" ; + V_sleeps. V ::= "sleeps" ; + V_swims. V ::= "swims" ; + VP_TV_NP. VP ::= TV NP ; + VP_V. VP ::= V ; + VP_is_A. VP ::= "is" A ; + TV_eats. TV ::= "eats" ; + TV_kills. TV ::= "kills" ; + TV_washes. TV ::= "washes" ; + S_NP_VP. S ::= NP VP ; + NP_a_CN. NP ::= "a" ; ...A syntax tree such as
- Mks_4 (Mks_8 Mks_15 Mks_12) + NP_this_CN (CN_A_CN A_thick CN_worm) this thick wormencodes the sequence of grammar rules used for building the -expression. If you look at this tree, you will notice that
-Mks_4-is the label of the rule prefixingthisto a common noun, -Mks_15is the label of the adjectivethick, -and so on. --<h4>The labelled context-free format<h4> +expression. If you look at this tree, you will notice that
+ +NP_this_CN+is the label of the rule prefixingthisto a common noun (CN), +thereby forming a noun phrase (NP). +A_thickis the label of the adjectivethick, +and so on. These labels are formed automatically when the grammar +is compiled by GF.The labelled context-free format
The labelled context-free grammar format permits user-defined labels to each rule. -GF recognizes files of this format by the suffix -
.cf. It is intermediate between EBNF and full GF format. -Let us include the following rules in the file -paleolithic.cf. +In files with the suffix.cf, you can prefix rules with +labels that you provide yourself - these may be more useful +than the automatically generated ones. The following is a possible +labelling ofpaleolithic.cfwith nicer-looking labels.PredVP. S ::= NP VP ; @@ -403,25 +445,10 @@ Let us include the following rules in the file Kill. TV ::= "kills" Wash. TV ::= "washes" ;--<h4>Using the labelled context-free format<h4> -
--The GF commands for the
.cfformat are -exactly the same as for the.ebnfformat. -Just the syntax trees become nicer to read and -to remember. Notice that before reading in -a new grammar in GF you often (but not always, -as we will see later) have first to give the -command (empty = e), which removes the -old grammar from the GF shell state. +With this grammar, the trees look as follows:- > empty - - > i paleolithic.cf - > p "the boy eats a snake" PredVP (Def Boy) (ComplTV Eat (Indef Snake)) @@ -430,10 +457,10 @@ old grammar from the GF shell state. a louse is thick- -The GF grammar format
+ +The ``.gf`` grammar format
-To see what there really is in GF's shell state when a grammar +To see what there is in GF's shell state when a grammar has been imported, you can give the plain command
@@ -446,15 +473,16 @@ you did not need to write the grammar in that notation, but that the GF grammar compiler produced it.print_grammar = pg.-However, we will now start to show how GF's own notation gives you -much more expressive power than the
- +.cfand.ebnf-formats. We will introduce the.gfformat by presenting +However, we will now start the demonstration +how GF's own notation gives you +much more expressive power than the.cf+format. We will introduce the.gfformat by presenting one more way of defining the same grammar as in -paleolithic.cfandpaleolithic.ebnf. +paleolithic.cf. Then we will show how the full GF grammar format enables you to do things that are not possible in the weaker formats.Abstract and concrete syntax
A GF grammar consists of two main parts: @@ -482,16 +510,15 @@ is interpreted as the following pair of rules: The former rule, with the keyword
fun, belongs to the abstract syntax. It defines the functionPredVPwhich constructs syntax trees of form -(PredVP<i>x<i> <i>y<i>). +(PredVPx y).The latter rule, with the keyword
-lin, belongs to the concrete syntax. It defines the linearization function for -syntax trees of form (PredVP<i>x<i> <i>y<i>). --<h4>Judgement forms<h4> +syntax trees of form (
+ +PredVPx y).Judgement forms
Rules in a GF grammar are called judgements, and the keywords
funandlinare used for distinguishing between two @@ -543,27 +570,25 @@ judgement forms:We return to the precise meanings of these judgement forms later. First we will look at how judgements are grouped into modules, and -show how the grammar
-paleolithic.cfis +show how the paleolithic grammar is expressed by using modules and judgements.-<h4>Module types<h4> -
+ +Module types
A GF grammar consists of modules, into which judgements are grouped. The most important module forms are
abstract A = M``, abstract syntax A with judgements in
+ abstract A = M, abstract syntax A with judgements in
the module body M.
- concrete C of A = M``, concrete syntax C of the
+ concrete C of A = M, concrete syntax C of the
abstract syntax A, with judgements in the module body M.
-<h4>Record types, records, and Strs<h4>
-
The linearization type of a category is a record type, with
zero of more fields of different types. The simplest record
@@ -579,8 +604,8 @@ which has one field, with label s and type Str.
Examples of records of this type are
- [s = "foo"}
- [s = "hello" ++ "world"}
+ {s = "foo"}
+ {s = "hello" ++ "world"}
The type Str is really the type of token lists, but
@@ -589,18 +614,26 @@ denoted by string literals in double quotes.
Whenever a record r of type {s : Str} is given,
-r.s is an object of type Str. This is of course
+r.s is an object of type Str. This is
a special case of the projection rule, allowing the extraction
-of fields from a record.
+of fields from a record:
{ ... p : T ... } then r.p : T
+-<h4>An abstract syntax example<h4> -
-
-Each nonterminal occurring in the grammar paleolithic.cf is
-introduced by a cat judgement. Each
-rule label is introduced by a fun judgement.
+To express the abstract syntax of paleolithic.cf in
+a file Paleolithic.gf, we write two kinds of judgements:
cat judgement.
+fun judgement,
+ with the type formed from the nonterminals of the rule.
+
abstract Paleolithic = {
cat
@@ -623,9 +656,8 @@ Notice the use of shorthands permitting the sharing of
the keyword in subsequent judgements, and of the type
in subsequent fun judgements.
-
-<h4>A concrete syntax example<h4>
-
+
+A concrete syntax example
Each category introduced in Paleolithic.gf is
given a lincat rule, and each
@@ -663,9 +695,8 @@ apply as in abstract modules.
}
--<h4>Modules and files<h4> -
+ +
Module name + .gf = file name
.gfc file or to generate
a new one, by looking at modification times.
--<h4>Multilingual grammar<h4> -
+ +
The main advantage of separating abstract from concrete syntax is that
one abstract syntax can be equipped with many concrete syntaxes.
@@ -705,9 +735,8 @@ translation. Let us buid an Italian concrete syntax for
Paleolithic and then test the resulting
multilingual grammar.
-<h4>An Italian concrete syntax<h4> -
+ +
concrete PaleolithicIta of Paleolithic = {
lincat
@@ -739,9 +768,8 @@ multilingual grammar.
}
--<h4>Using a multilingual grammar<h4> -
+ +Import without first emptying
@@ -767,9 +795,8 @@ Translate by using a pipe: il ragazzo mangia il serpente --<h4>Translation quiz<h4> -
+ +
This is a simple language exercise that can be automatically
generated from a multilingual grammar. The system generates a set of
@@ -802,9 +829,8 @@ file for later use, by the command translation_list = tl
The number flag gives the number of sentences generated.
--<h4>The multilingual shell state<h4> -
+ +A GF shell is at any time in a state, which contains a multilingual grammar. One of the concrete @@ -825,9 +851,10 @@ things), you can use the command all concretes : PaleolithicIta PaleolithicEng
--<h4>Extending a grammar<h4> -
+ +The module system of GF makes it possible to extend a grammar in different ways. The syntax of extension is @@ -856,9 +883,8 @@ be built for concrete syntaxes: The effect of extension is that all of the contents of the extended and extending module are put together.
--<h4>Multiple inheritance<h4> -
+ +Specialized vocabularies can be represented as small grammars that only do "one thing" each, e.g. @@ -887,9 +913,8 @@ same time: }
--<h4>Visualizing module structure<h4> -
+ +
When you have created all the abstract syntaxes and
one set of concrete syntaxes needed for Gatherer,
@@ -918,9 +943,8 @@ The command print_multi = pm is used for printing the current multi
grammar in various formats, of which the format -printer=graph just
shows the module dependencies.
-<h4>The module structure of GathererEng<h4>
-
The graph uses
@@ -934,8 +958,8 @@ The graph uses<img src="Gatherer.gif">
- -
Suppose we want to say, with the vocabulary included in
Paleolithic.gf, things like
@@ -946,7 +970,7 @@ Suppose we want to say, with the vocabulary included in
The new grammatical facility we need are the plural forms -of nouns and verbs (<i>boys, sleep<i>), as opposed to their +of nouns and verbs (boys, sleep), as opposed to their singular forms.
@@ -969,9 +993,8 @@ To be able to do all this, we need two new judgement forms, a new module form, and a generalizarion of linearization types from strings to more complex types.
--<h4>Parameters and tables<h4> -
+ +
We define the parameter type of number in Englisn by
using a new form of judgement:
@@ -1011,13 +1034,12 @@ operator !. For instance,
is a selection, whose value is "boys".
-<h4>Inflection tables, paradigms, and oper definitions<h4>
-
All English common nouns are inflected in number, most of them in the same way: the plural form is formed from the singular form by adding the -ending <i>s<i>. This rule is an example of +ending s. This rule is an example of a paradigm - a formula telling how the inflection forms of a word are formed.
@@ -1046,9 +1068,8 @@ the function, and the glueing operator+ telling that
the string held in the variable x and the ending "s"
are written together to form one token.
-
-<h4>The resource module type<h4>
-
Parameter and operator definitions do not belong to the abstract syntax. They can be used when defining concrete syntax - but they are not @@ -1080,7 +1101,7 @@ Resource modules can extend other resource modules, in the same way as modules of other types can extend modules of the same type.
- +
Any number of resource modules can be
@@ -1114,9 +1135,8 @@ available through resource grammars, whose users only need
to pick the right operations and not to know their implementation
details.
-<h4>Worst-case macros and data abstraction<h4> -
+ +
Some English nouns, such as louse, are so irregular that
it makes little sense to see them as instances of a paradigm. Even
@@ -1149,9 +1169,8 @@ interface (i.e. the system of type signatures) that makes it
correct to use these functions in concrete modules. In programming
terms, Noun is then treated as an abstract datatype.
-<h4>A system of paradigms using Prelude operations<h4>
-
The regular noun paradigm regNoun can - and should - of course be defined
by the worst-case macro mkNoun. In addition, some more noun paradigms
@@ -1162,8 +1181,8 @@ could be defined, for instance,
sNoun : Str -> Noun = \kiss -> mkNoun kiss (kiss + "es") ;
-What about nouns like <i>fly<i>, with the plural <i>flies<i>? The already -available solution is to use the so-called "technical stem" <i>fl<i> as +What about nouns like fly, with the plural flies? The already +available solution is to use the so-called "technical stem" fl as argument, and define
@@ -1183,9 +1202,8 @@ The operator-initbelongs to a set of operations in the resource modulePrelude, which therefore has to beopened so thatinitcan be used. --<h4>An intelligent noun paradigm using
+ +caseexpressions<h4> -An intelligent noun paradigm using ``case`` expressions
It may be hard for the user of a resource morphology to pick the right inflection paradigm. A way to help this is to define a more intelligent @@ -1207,16 +1225,15 @@ these forms are explained in the following section.
The paradigms
-regNoundoes not give the correct forms for -all nouns. For instance, <i>louse - lice<i> and -<i>fish - fish<i> must be given by usingmkNoun. -Also the word <i>boy<i> would be inflected incorrectly; to prevent +all nouns. For instance, louse - lice and +fish - fish must be given by usingmkNoun. +Also the word boy would be inflected incorrectly; to prevent this, either usemkNounor modifyregNounso that the"y"case does not apply if the second-last character is a vowel.-<h4>Pattern matching<h4> -
+ +Pattern matching
Expressions of the
tableform are built from lists of argument-value pairs. These pairs are called the branches @@ -1251,9 +1268,8 @@ programming languages are syntactic sugar for table selections: case e of {...} === table {...} ! e
-<h4>Morphological analysis and morphology quiz<h4> -
+ +
Even though in GF morphology
is mostly seen as an auxiliary of syntax, a morphology once defined
@@ -1292,14 +1308,13 @@ file for later use, by the command morpho_list = ml
The number flag gives the number of exercises generated.
--<h4>Parametric vs. inherent features, agreement<h4> -
+ +The rule of subject-verb agreement in English says that the verb phrase must be inflected in the number of the subject. This means that a noun phrase (functioning as a subject), in some sense -<i>has<i> a number, which it "sends" to the verb. The verb does not +has a number, which it "sends" to the verb. The verb does not have a number, but must be able to receive whatever number the subject has. This distinction is nicely represented by the different linearization types of noun phrases and verb phrases: @@ -1329,9 +1344,8 @@ regular only in the present tensse). The reader is invited to inspect the way in which agreement works in the formation of noun phrases and verb phrases.
--<h4>English concrete syntax with parameters<h4> -
+ +
concrete PaleolithicEng of Paleolithic = open MorphoEng in {
lincat
@@ -1358,9 +1372,8 @@ the formation of noun phrases and verb phrases.
}
--<h4>Hierarchic parameter types<h4> -
+ +The reader familiar with a functional programming language such as <a href="http://www.haskell.org">Haskell<a> must have noticed the similarity @@ -1401,15 +1414,14 @@ the adjectival paradigm in which the two singular forms are the same, can be def }
--<h4>Discontinuous constituents<h4> -
+ +A linearization type may contain more strings than one. An example of where this is useful are English particle -verbs, such as <i>switch off<i>. The linearization of +verbs, such as switch off. The linearization of a sentence may place the object between the verb and the particle: -<i>he switched it off<i>. +he switched it off.
The first of the following judgements defines transitive verbs as a
@@ -1427,27 +1439,27 @@ GF currently requires that all fields in linearization records that
have a table with value type Str have as labels
either s or s with an integer index.