diff --git a/doc/tutorial/gf-tutorial2.html b/doc/tutorial/gf-tutorial2.html index 00caa1d58..d657f7cc8 100644 --- a/doc/tutorial/gf-tutorial2.html +++ b/doc/tutorial/gf-tutorial2.html @@ -7,7 +7,7 @@
Italian cheese ===> formaggio italiano @@ -279,14 +288,14 @@ Italian adjectives usually have four forms where English has just one:- delicious (wine | wines | pizza | pizzas) + delicious (wine, wines, pizza, pizzas) vino delizioso, vini deliziosi, pizza deliziosa, pizze delizioseThe morphology of a language describes the forms of its words. While the complete description of morphology -belongs to resource grammars, the tutorial will explain the -main programming concepts involved. This will moreover +belongs to resource grammars, this tutorial will explain the +programming concepts involved in morphology. This will moreover make it possible to grow the fragment covered by the food example. The tutorial will in fact build a toy resource grammar in order to illustrate the module structure of library-based application @@ -584,7 +593,7 @@ a sentence but a sequence of ten sentences.
Labelled context-free grammars
The syntax trees returned by GF's parser in the previous examples -are not so nice to look at. The identifiers of form
Mks+are not so nice to look at. The identifiers that form the tree are labels of the BNF rules. To see which label corresponds to which rule, you can use theprint_grammar = pgcommand with theprinterflag set tocf(which means context-free): @@ -631,7 +640,7 @@ labels to each rule. In files with the suffix.cf, you can prefix rules with labels that you provide yourself - these may be more useful than the automatically generated ones. The following is a possible -labelling ofpaleolithic.cfwith nicer-looking labels. +labelling offood.cfwith nicer-looking labels.Is. S ::= Item "is" Quality ; @@ -661,7 +670,7 @@ With this grammar, the trees look as follows:-
The ``.gf`` grammar format
+The .gf grammar format
To see what there is in GF's shell state when a grammar has been imported, you can give the plain command @@ -696,7 +705,7 @@ A GF grammar consists of two main parts:
-The EBNF and CF formats fuse these two things together, but it is possible +The CF format fuses these two things together, but it is possible to take them apart. For instance, the sentence formation rule
@@ -773,7 +782,7 @@ judgement forms:We return to the precise meanings of these judgement forms later. First we will look at how judgements are grouped into modules, and -show how the paleolithic grammar is +show how the food grammar is expressed by using modules and judgements.
@@ -950,7 +959,7 @@ A system with this property is called a multilingual grammar.Multilingual grammars can be used for applications such as -translation. Let us buid an Italian concrete syntax for +translation. Let us build an Italian concrete syntax for
@@ -1179,10 +1188,11 @@ The graph usesFoodand then test the resulting multilingual grammar.
+
+
+
@@ -1203,7 +1213,7 @@ shell escape symbol !. The resulting graph was shown in the previou
The command print_multi = pm is used for printing the current multilingual
grammar in various formats, of which the format -printer=graph just
-shows the module dependencies. Use the help to see what other formats
+shows the module dependencies. Use help to see what other formats
are available:
@@ -1216,9 +1226,9 @@ are available:@@ -1433,7 +1440,7 @@ forms of a word are formed.The golden rule of functional programming
-In comparison to the
.cfformat, the.gfformat still looks rather +In comparison to the.cfformat, the.gfformat looks rather verbose, and demands lots more characters to be written. You have probably -done this by the copy-paste-modify method, which is a standard way to +done this by the copy-paste-modify method, which is a common way to avoid repeating work.@@ -1232,8 +1242,8 @@ method. The golden rule of functional programming says that
A function separates the shared parts of different computations from the changing parts, parameters. In functional programming languages, such as -Haskell, it is possible to share muc more than in -the languages such as C and Java. +Haskell, it is possible to share much more than in +languages such as C and Java.
Operation definitions
@@ -1283,11 +1293,8 @@ strings and records. resource StringOper = { oper SS : Type = {s : Str} ; - ss : Str -> SS = \x -> {s = x} ; - cc : SS -> SS -> SS = \x,y -> ss (x.s ++ y.s) ; - prefix : Str -> SS -> SS = \p,x -> ss (p ++ x.s) ; }
From GF point of view, a paradigm is a function that takes a lemma -
-a string also known as a dictionary form - and returns an inflection
+also known as a dictionary form - and returns an inflection
table of desired type. Paradigms are not functions in the sense of the
fun judgements of abstract syntax (which operate on trees and not
on strings), but operations defined in oper judgements.
@@ -1457,13 +1464,13 @@ are written together to form one token. Thus, for instance,
Some English nouns, such as mouse, are so irregular that
it makes no sense to see them as instances of a paradigm. Even
then, it is useful to perform data abstraction from the
definition of the type Noun, and introduce a constructor
-operation, a worst-case macro for nouns:
+operation, a worst-case function for nouns:
oper mkNoun : Str -> Str -> Noun = \x,y -> {
@@ -1490,7 +1497,7 @@ and
instead of writing the inflection table explicitly.
-The grammar engineering advantage of worst-case macros is that
+The grammar engineering advantage of worst-case functions is that
the author of the resource module may change the definitions of
Noun and mkNoun, and still retain the
interface (i.e. the system of type signatures) that makes it
@@ -1498,7 +1505,7 @@ correct to use these functions in concrete modules. In programming
terms, Noun is then treated as an abstract datatype.
-A system of paradigms using ``Prelude`` operations
+A system of paradigms using Prelude operations
In addition to the completely regular noun paradigm regNoun,
some other frequent noun paradigms deserve to be
@@ -1707,7 +1714,7 @@ The rule of subject-verb agreement in English says that the verb
phrase must be inflected in the number of the subject. This
means that a noun phrase (functioning as a subject), inherently
has a number, which it passes to the verb. The verb does not
-have a number, but must be able to receive whatever number the
+have a number, but must be able to receive whatever number the
subject has. This distinction is nicely represented by the
different linearization types of noun phrases and verb phrases:
@@ -1717,7 +1724,8 @@ different linearization types of noun phrases and verb phrases:
We say that the number of NP is an inherent feature,
-whereas the number of NP is parametric.
+whereas the number of NP is a variable feature (or a
+parametric feature).
The agreement rule itself is expressed in the linearization rule of @@ -1823,7 +1831,7 @@ Here is an example of pattern matching, the paradigm of regular adjectives. }
-A constructor can have patterns as arguments. For instance, +A constructor can be used as a pattern that has patterns as arguments. For instance, the adjectival paradigm in which the two singular forms are the same, can be defined
@@ -1837,9 +1845,9 @@ can be defined
-Even though in GF morphology
-is mostly seen as an auxiliary of syntax, a morphology once defined
-can be used on its own right. The command morpho_analyse = ma
+Even though morphology is in GF
+mostly used as an auxiliary for syntax, it
+can also be useful on its own right. The command morpho_analyse = ma
can be used to read a text and return for each word the analyses that
it has in the current concrete syntax.
S. For instance,
Score 0/1
-Finally, a list of morphological exercises and save it in a
+Finally, a list of morphological exercises can be generated
+off-line saved in a
file for later use, by the command morpho_list = ml
- > morpho_list -number=25 -cat=V + > morpho_list -number=25 -cat=V | wf exx.txt
The number flag gives the number of exercises generated.
@@ -1884,25 +1893,36 @@ a sentence may place the object between the verb and the particle:
he switched it off.
-The first of the following judgements defines transitive verbs as +The following judgement defines transitive verbs as discontinuous constituents, i.e. as having a linearization -type with two strings and not just one. The second judgement +type with two strings and not just one. +
+
+ lincat TV = {s : Number => Str ; part : Str} ;
+
++This linearization rule shows how the constituents are separated by the object in complementization.
- lincat TV = {s : Number => Str ; part : Str} ;
lin PredTV tv obj = {s = \\n => tv.s ! n ++ obj.s ++ tv.part} ;
There is no restriction in the number of discontinuous constituents
(or other fields) a lincat may contain. The only condition is that
the fields must be of finite types, i.e. built from records, tables,
-parameters, and Str, and not functions. A mathematical result
+parameters, and Str, and not functions.
+
+A mathematical result
about parsing in GF says that the worst-case complexity of parsing
-increases with the number of discontinuous constituents. Moreover,
-the parsing and linearization commands only give reliable results
-for categories whose linearization type has a unique Str valued
-field labelled s.
+increases with the number of discontinuous constituents. This is
+potentially a reason to avoid discontinuous constituents.
+Moreover, the parsing and linearization commands only give accurate
+results for categories whose linearization type has a unique Str
+valued field labelled s. Therefore, discontinuous constituents
+are not a good idea in top-level categories accessed by the users
+of a grammar application.
variants should be used cautiously. It is not
recommended for modules aimed to be libraries, because the
user of the library has no way to choose among the variants.
-Moreover, even though variants admits lists of any type,
-its semantics for complex types can cause surprises.
+Moreover, variants is only defined for basic types (Str
+and parameter types). The grammar compiler will admit
+variants for any types, but it will push it to the
+level of basic types in a way that may be unwanted.
+For instance, German has two words meaning "car",
+Wagen, which is Masculine, and Auto, which is Neuter.
+However, if one writes
+
+
+ variants {{s = "Wagen" ; g = Masc} ; {s = "Auto" ; g = Neutr}}
+
++this will compute to +
+
+ {s = variants {"Wagen" ; "Auto"} ; g = variants {Masc ; Neutr}}
+
++which will also accept erroneous combinations of strings and genders.
-(New since 7 January 2006.) -
-To define string operations computed at compile time, such as in morphology, it is handy to use regular expression patterns:
@@ -2076,7 +2110,6 @@ Another example: English noun plural formation. x + "y" => x + "ies" ; _ => w + "s" } ; -
Semantics: variables are always bound to the first match, which is the first
@@ -2085,8 +2118,10 @@ in the sequence of binding lists Match p v defined as follows. In t
Match (p1|p2) v = Match p1 v ++ Match p2 v
- Match (p1+p2) s = [Match p1 s1 ++ Match p2 s2 | i <- [0..length s], (s1,s2) = splitAt i s]
- Match p* s = Match "" s ++ Match p s ++ Match (p + p) s ++ ...
+ Match (p1+p2) s = [Match p1 s1 ++ Match p2 s2 |
+ i <- [0..length s], (s1,s2) = splitAt i s]
+ Match p* s = [[]] if Match "" s ++ Match p s ++ Match (p+p) s ++... /= []
+ Match -p v = [[]] if Match p v = []
Match c v = [[]] if c == v -- for constant and literal patterns c
Match x v = [[(x,v)]] -- for variable patterns x
Match x@p v = [[(x,v)]] + M if M = Match p v /= []
@@ -2097,14 +2132,18 @@ Examples:
x + "e" + y matches "peter" with x = "p", y = "ter"
-x@("foo"*) matches any token with x = ""
-x + y@("er"*) matches "burgerer" with x = "burg", y = "erer"
+x + "er"* matches "burgerer" with ``x = "burg"
-The construct exemplified in
+Sometimes a token has different forms depending on the token
+that follows. An example is the English indefinite article,
+which is an if a vowel follows, a otherwise.
+Which form is chosen can only be decided at run time, i.e.
+when a string is actually build. GF has a special construct for
+such tokens, the pre construct exemplified in
oper artIndef : Str =
@@ -2152,22 +2191,61 @@ they can be used as arguments. For example:
-- e.g. (StreetAddress 10 "Downing Street") : Address
-
+
+The linearization type is {s : Str} for all these categories.
+
+In this section, we will show how +to encode advanced semantic concepts in an abstract syntax. +We use concepts inherited from type theory. Type theory +is the basis of many systems known as logical frameworks, which are +used for representing mathematical theorems and their proofs on a computer. +In fact, GF has a logical framework as its proper part: +this part is the abstract syntax. +
+
+In a logical framework, the formalization of a mathematical theory
+is a set of type and function declarations. The following is an example
+of such a theory, represented as an abstract module in GF.
+
+ abstract Geometry = {
+ cat
+ Line ; Point ; Circle ; -- basic types of figures
+ Prop ; -- proposition
+ fun
+ Parallel : Line -> Line -> Prop ; -- x is parallel to y
+ Centre : Circle -> Point ; -- the centre of c
+ }
+
+
+A resource grammar is a grammar built on linguistic grounds, to describe a language rather than a domain. -The GF resource grammar library contains resource grammars for +The GF resource grammar library, which contains resource grammars for 10 languages, is described more closely in the following documents:
resource. Its API consists of the following
-modules:
+three modules:
-+Syntax - syntactic structures, language-independent: +
++ ++
+LexEng - lexical paradigms, English: +
++ ++
+LexIta - lexical paradigms, Italian: +
++ ++
Only these three modules should be opened in applications.
The implementations of the resource are given in the following four modules:
+MorphoEng, +
++ ++
+MorphoIta: low-level morphology +
+The example files of this chapter can be found in
+the directory arithm.
+
+The simplest way is to open a top-level Lang module
+and a Paradigms module:
+
+ abstract Foo = ... + + concrete FooEng = open LangEng, ParadigmsEng in ... + concrete FooSwe = open LangSwe, ParadigmsSwe in ... ++
+Here is an example. +
+
+ abstract Arithm = {
+ cat
+ Prop ;
+ Nat ;
+ fun
+ Zero : Nat ;
+ Succ : Nat -> Nat ;
+ Even : Nat -> Prop ;
+ And : Prop -> Prop -> Prop ;
+ }
+
+ --# -path=.:alltenses:prelude
+
+ concrete ArithmEng of Arithm = open LangEng, ParadigmsEng in {
+ lincat
+ Prop = S ;
+ Nat = NP ;
+ lin
+ Zero =
+ UsePN (regPN "zero" nonhuman) ;
+ Succ n =
+ DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 (regN2 "successor") n) ;
+ Even n =
+ UseCl TPres ASimul PPos
+ (PredVP n (UseComp (CompAP (PositA (regA "even"))))) ;
+ And x y =
+ ConjS and_Conj (BaseS x y) ;
+
+ }
+
+ --# -path=.:alltenses:prelude
+
+ concrete ArithmSwe of Arithm = open LangSwe, ParadigmsSwe in {
+ lincat
+ Prop = S ;
+ Nat = NP ;
+ lin
+ Zero =
+ UsePN (regPN "noll" neutrum) ;
+ Succ n =
+ DetCN (DetSg (SgQuant DefArt) NoOrd)
+ (ComplN2 (mkN2 (mk2N "efterföljare" "efterföljare")
+ (mkPreposition "till")) n) ;
+ Even n =
+ UseCl TPres ASimul PPos
+ (PredVP n (UseComp (CompAP (PositA (regA "jämn"))))) ;
+ And x y =
+ ConjS and_Conj (BaseS x y) ;
+ }
+
+
+
++The definitions in this example were found by parsing: +
++ > i LangEng.gf + + -- for Successor: + > p -cat=NP -mcfg -parser=topdown "the mother of Paris" + + -- for Even: + > p -cat=S -mcfg -parser=topdown "Paris is old" + + -- for And: + > p -cat=S -mcfg -parser=topdown "Paris is old and I am old" ++
+The use of parsing can be systematized by example-based grammar writing, +to which we will return later. +
+ +
+The interesting thing now is that the
+code in ArithmSwe is similar to the code in ArithmEng, except for
+some lexical items ("noll" vs. "zero", "efterföljare" vs. "successor",
+"jämn" vs. "even"). How can we exploit the similarities and
+actually share code between the languages?
+
+The solution is to use a functor: an incomplete module that opens
+an abstract as an interface, and then instantiate it to different
+languages that implement the interface. The structure is as follows:
+
+ abstract Foo ... + + incomplete concrete FooI = open Lang, Lex in ... + + concrete FooEng of Foo = FooI with (Lang=LangEng), (Lex=LexEng) ; + concrete FooSwe of Foo = FooI with (Lang=LangSwe), (Lex=LexSwe) ; ++
+where Lex is an abstract lexicon that includes the vocabulary
+specific to this application:
+
+ abstract Lex = Cat ** ... + + concrete LexEng of Lex = CatEng ** open ParadigmsEng in ... + concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in ... ++
+Here, again, a complete example (abstract Arithm is as above):
+
+ incomplete concrete ArithmI of Arithm = open Lang, Lex in {
+ lincat
+ Prop = S ;
+ Nat = NP ;
+ lin
+ Zero =
+ UsePN zero_PN ;
+ Succ n =
+ DetCN (DetSg (SgQuant DefArt) NoOrd) (ComplN2 successor_N2 n) ;
+ Even n =
+ UseCl TPres ASimul PPos
+ (PredVP n (UseComp (CompAP (PositA even_A)))) ;
+ And x y =
+ ConjS and_Conj (BaseS x y) ;
+ }
+
+ --# -path=.:alltenses:prelude
+ concrete ArithmEng of Arithm = ArithmI with
+ (Lang = LangEng),
+ (Lex = LexEng) ;
+
+ --# -path=.:alltenses:prelude
+ concrete ArithmSwe of Arithm = ArithmI with
+ (Lang = LangSwe),
+ (Lex = LexSwe) ;
+
+ abstract Lex = Cat ** {
+ fun
+ zero_PN : PN ;
+ successor_N2 : N2 ;
+ even_A : A ;
+ }
+
+ concrete LexSwe of Lex = CatSwe ** open ParadigmsSwe in {
+ lin
+ zero_PN = regPN "noll" neutrum ;
+ successor_N2 =
+ mkN2 (mk2N "efterföljare" "efterföljare") (mkPreposition "till") ;
+ even_A = regA "jämn" ;
+ }
+
+
+
Transfer means noncompositional tree-transforming operations. @@ -2241,9 +2501,9 @@ See the transfer language documentation for more information.
- +
Lexers and unlexers can be chosen from
@@ -2279,7 +2539,7 @@ Given by help -lexer, help -unlexer:
Issues: @@ -2290,7 +2550,7 @@ Issues:
-mcfg vs. others
Thespeak_aloud = sa command sends a string to the speech
@@ -2320,7 +2580,7 @@ The method words only for grammars of English.
Both Flite and ATK are freely available through the links
above, but they are not distributed together with GF.
The @@ -2337,12 +2597,12 @@ Here is a snapshot of the editor: The grammars of the snapshot are from the Letter grammar package.
- +Forthcoming.
- +Other processes can communicate with the GF command interpreter, @@ -2359,7 +2619,7 @@ Thus the most silent way to invoke GF is - +
GF grammars can be used as parts of programs written in the @@ -2371,15 +2631,15 @@ following languages. The links give more documentation.
A summary is given in the following chart of GF grammar compiler phases:
Formal and Informal Software Specifications, @@ -2392,6 +2652,6 @@ English and German. A simpler example will be explained here.
- +