From 3d9a05f8434344d37f0cf6cd2994233fbecc0780 Mon Sep 17 00:00:00 2001 From: aarne Date: Sun, 18 Dec 2005 21:27:23 +0000 Subject: [PATCH] txt2tags result --- doc/tutorial/gf-tutorial2.html | 1000 ++++++++++++++++---------------- 1 file changed, 502 insertions(+), 498 deletions(-) diff --git a/doc/tutorial/gf-tutorial2.html b/doc/tutorial/gf-tutorial2.html index 42926b668..e2a88d9d3 100644 --- a/doc/tutorial/gf-tutorial2.html +++ b/doc/tutorial/gf-tutorial2.html @@ -7,7 +7,7 @@

Grammatical Framework Tutorial

Author: Aarne Ranta <aarne (at) cs.chalmers.se>
-Last update: Sun Dec 18 21:43:08 2005 +Last update: Sun Dec 18 22:27:21 2005

@@ -77,44 +77,6 @@ Last update: Sun Dec 18 21:43:08 2005 -
  • More constructs for concrete syntax - -
  • More features of the module system - -
  • More concepts of abstract syntax - -
  • Transfer modules -
  • Practical issues - -
  • Case studies - @@ -833,7 +795,7 @@ Try generation now: > gr | l quello formaggio molto noioso è italiano - > gr | l -lang=PaleolithicEng + > gr | l -lang=FoodEng this fish is warm

    @@ -1139,30 +1101,34 @@ Any number of resource modules can be makes definitions contained in the resource usable in the concrete syntax. Here is an example, where the resource StringOper is -opened in a new version of PaleolithicEng. +opened in a new version of FoodEng.

    -  concrete PalEng of Paleolithic = open StringOper in {
    -    lincat 
    -      S, NP, VP, CN, A, V, TV = SS ; 
    +    concrete Food2Eng of Food = open StringOper in {
    +  
    +    lincat
    +      S, Item, Kind, Quality = SS ;
    +  
         lin
    -      PredVP  = cc ;
    -      UseV v  = v ;
    -      ComplTV = cc ; 
    -      UseA  = prefix "is" ;
    -      This  = prefix "this" ;
    -      That  = prefix "that" ;
    -      Def   = prefix "the" ;
    -      Indef = prefix "a" ;
    -      ModA  = cc ;
    -      Boy    = ss "boy" ;
    -      Louse  = ss "louse" ;
    -      Snake  = ss "snake" ;
    -      -- etc
    -  }
    +      Is item quality = cc item (prefix "is" quality) ;
    +      This = prefix "this" ;
    +      That = prefix "that" ;
    +      QKind = cc ;
    +      Wine = ss "wine" ;
    +      Cheese = ss "cheese" ;
    +      Fish = ss "fish" ;
    +      Very = prefix "very" ;
    +      Fresh = ss "fresh" ;
    +      Warm = ss "warm" ;
    +      Italian = ss "Italian" ;
    +      Expensive = ss "expensive" ;
    +      Delicious = ss "delicious" ;
    +      Boring = ss "boring" ;
    +  
    +    }
     

    -The same string operations could be use to write PaleolithicIta +The same string operations could be use to write FoodIta more concisely.

    @@ -1181,15 +1147,14 @@ details.

    Morphology

    Suppose we want to say, with the vocabulary included in -Paleolithic.gf, things like +Food.gf, things like

    -    the boy eats two snakes
    -    all boys sleep  
    +    all Italian wines are delicious
     

    The new grammatical facility we need are the plural forms -of nouns and verbs (boys, sleep), as opposed to their +of nouns and verbs (wines, are), as opposed to their singular forms.

    @@ -1208,9 +1173,9 @@ We want to express such special features of languages in the concrete syntax while ignoring them in the abstract syntax.

    -To be able to do all this, we need one new judgement form, -many new expression forms, -and a generalizarion of linearization types +To be able to do all this, we need one new judgement form +and many new expression forms. +We also need to generalize linearization types from strings to more complex types.

    @@ -1223,12 +1188,12 @@ using a new form of judgement: param Number = Sg | Pl ;

    -To express that nouns in English have a linearization +To express that Kind expressions in English have a linearization depending on number, we replace the linearization type {s : Str} with a type where the s field is a table depending on number:

    -    lincat CN = {s : Number => Str} ;
    +    lincat Kind = {s : Number => Str} ;
     

    The table type Number => Str is in many respects similar to @@ -1238,9 +1203,9 @@ that the argument-value pairs can be listed in a finite table. The following example shows such a table:

    -    lin Boy = {s = table {
    -      Sg => "boy" ;
    -      Pl => "boys"
    +    lin Cheese = {s = table {
    +      Sg => "cheese" ;
    +      Pl => "cheeses"
           }
         } ;
     
    @@ -1249,10 +1214,10 @@ The application of a table to a parameter is done by the selection operator !. For instance,

    -    Boy.s ! Pl
    +    Cheese.s ! Pl
     

    -is a selection, whose value is "boys". +is a selection, whose value is "cheeses".

    Inflection tables, paradigms, and ``oper`` definitions

    @@ -1280,18 +1245,18 @@ The following operation defines the regular noun paradigm of English: } ;

    -The glueing operator + tells that +The gluing operator + tells that the string held in the variable x and the ending "s" are written together to form one token. Thus, for instance,

    -    (regNoun "boy").s ! Pl  ---> "boy" + "s"  --->  "boys"
    +    (regNoun "cheese").s ! Pl  ---> "cheese" + "s"  --->  "cheeses"
     

    Worst-case macros and data abstraction

    -Some English nouns, such as louse, are so irregular that +Some English nouns, such as mouse, are so irregular that it makes no sense to see them as instances of a paradigm. Even then, it is useful to perform data abstraction from the definition of the type Noun, and introduce a constructor @@ -1306,10 +1271,10 @@ operation, a worst-case macro for nouns: } ;

    -Thus we define +Thus we could define

    -    lin Louse = mkNoun "louse" "lice" ;
    +    lin Mouse = mkNoun "mouse" "mice" ;
     

    and @@ -1384,7 +1349,7 @@ these forms are explained in the next section.

    The paradigms regNoun does not give the correct forms for -all nouns. For instance, louse - lice and +all nouns. For instance, mouse - mice and fish - fish must be given by using mkNoun. Also the word boy would be inflected incorrectly; to prevent this, either use mkNoun or modify @@ -1541,7 +1506,7 @@ means that a noun phrase (functioning as a subject), inherently has a number, which it passes to the verb. The verb does not have a number, but must be able to receive whatever number the subject has. This distinction is nicely represented by the -different linearization types of noun phrases and verb phrases: +different linearization types of noun phrases and verb phrases:

         lincat NP = {s : Str ; n : Number} ;
    @@ -1559,437 +1524,476 @@ the predication structure:
         lin PredVP np vp = {s = np.s ++ vp.s ! np.n} ;
     

    -The following section will present a new version of -PaleolithingEng, assuming an abstract syntax -xextended with All and Two. -It also assumes that MorphoEng has a paradigm -regVerb for regular verbs (which need only be -regular only in the present tensse). +The following section will present +FoodsEng, assuming the abstract syntax Foods +that is similar to Food but also has the +plural determiners All and Most. The reader is invited to inspect the way in which agreement works in -the formation of noun phrases and verb phrases. +the formation of sentences.

    English concrete syntax with parameters

    -  concrete PaleolithicEng of Paleolithic = open Prelude, MorphoEng in {
    -  lincat 
    -    S, A          = SS ; 
    -    VP, CN, V, TV = {s : Number => Str} ; 
    -    NP            = {s : Str ; n : Number} ; 
    -  lin
    -    PredVP np vp  = ss (np.s ++ vp.s ! np.n) ;
    -    UseV   v      = v ;
    -    ComplTV tv np = {s = \\n => tv.s ! n ++ np.s} ;
    -    UseA a   = {s = \\n => case n of {Sg => "is" ; Pl => "are"} ++ a.s} ;
    -    This     = det Sg "this" ;
    -    Indef    = det Sg "a" ;
    -    All      = det Pl "all" ;
    -    Two      = det Pl "two" ;
    -    ModA  a cn = {s = \\n => a.s ++ cn.s ! n} ;
    -    Louse  = mkNoun "louse" "lice" ;
    -    Snake  = regNoun "snake" ;
    -    Green  = ss "green" ;
    -    Warm   = ss "warm" ;
    -    Laugh  = regVerb "laugh" ;
    -    Sleep  = regVerb "sleep" ;
    -    Kill   = regVerb "kill" ;
    -  oper
    -    det : Number -> Str -> Noun -> {s : Str ; n : Number} = \n,d,cn -> {
    -      s = d ++ n.s ! n ;
    -      n = n
    -      } ;
    +  --# -path=.:prelude
    +  
    +  concrete FoodsEng of Foods = open Prelude, MorphoEng in {
    +  
    +    lincat
    +      S, Quality = SS ; 
    +      Kind = {s : Number => Str} ; 
    +      Item = {s : Str ; n : Number} ; 
    +  
    +    lin
    +      Is item quality = ss (item.s ++ (mkVerb "are" "is").s ! item.n ++ quality.s) ;
    +      This = det Sg "this" ;
    +      That = det Sg "that" ;
    +      All  = det Pl "all" ;
    +      Most = det Pl "most" ;
    +      QKind quality kind = {s = \\n => quality.s ++ kind.s ! n} ;
    +      Wine = regNoun "wine" ;
    +      Cheese = regNoun "cheese" ;
    +      Fish = mkNoun "fish" "fish" ;
    +      Very = prefixSS "very" ;
    +      Fresh = ss "fresh" ;
    +      Warm = ss "warm" ;
    +      Italian = ss "Italian" ;
    +      Expensive = ss "expensive" ;
    +      Delicious = ss "delicious" ;
    +      Boring = ss "boring" ;
    +  
    +    oper
    +      det : Number -> Str -> Noun -> {s : Str ; n : Number} = \n,d,cn -> {
    +        s = d ++ cn.s ! n ;
    +        n = n
    +        } ;
    +  
       }
    -
    -

    - -

    Hierarchic parameter types

    -

    -The reader familiar with a functional programming language such as -Haskell must have noticed the similarity -between parameter types in GF and algebraic datatypes (data definitions -in Haskell). The GF parameter types are actually a special case of algebraic -datatypes: the main restriction is that in GF, these types must be finite. -(It is this restriction that makes it possible to invert linearization rules into -parsing methods.) -

    -

    -However, finite is not the same thing as enumerated. Even in GF, parameter -constructors can take arguments, provided these arguments are from other -parameter types - only recursion is forbidden. Such parameter types impose a -hierarchic order among parameters. They are often needed to define -the linguistically most accurate parameter systems. -

    -

    -To give an example, Swedish adjectives -are inflected in number (singular or plural) and -gender (uter or neuter). These parameters would suggest 2*2=4 different -forms. However, the gender distinction is done only in the singular. Therefore, -it would be inaccurate to define adjective paradigms using the type -Gender => Number => Str. The following hierarchic definition -yields an accurate system of three adjectival forms. -

    -
    -    param AdjForm = ASg Gender | APl ;
    -    param Gender  = Uter | Neuter ;
    -
    -

    -In pattern matching, a constructor can have patterns as arguments. For instance, -the adjectival paradigm in which the two singular forms are the same, can be defined -

    -
    -    oper plattAdj : Str -> AdjForm => Str = \x -> table {
    -      ASg _ => x ;
    -      APl   => x + "a" ;
    -      }
    -
    -

    - -

    Morphological analysis and morphology quiz

    -

    -Even though in GF morphology -is mostly seen as an auxiliary of syntax, a morphology once defined -can be used on its own right. The command morpho_analyse = ma -can be used to read a text and return for each word the analyses that -it has in the current concrete syntax. -

    -
    -    > rf bible.txt | morpho_analyse
    -
    -

    -In the same way as translation exercises, morphological exercises can -be generated, by the command morpho_quiz = mq. Usually, -the category is set to be something else than S. For instance, -

    -
    -    > i lib/resource/french/VerbsFre.gf
    -    > morpho_quiz -cat=V
    +      ```
       
    -    Welcome to GF Morphology Quiz.
    -    ...
       
    -    réapparaître : VFin VCondit  Pl  P2
    -    réapparaitriez
    -    > No, not réapparaitriez, but
    -    réapparaîtriez
    -    Score 0/1
    -
    -

    -Finally, a list of morphological exercises and save it in a -file for later use, by the command morpho_list = ml -

    -
    -    > morpho_list -number=25 -cat=V
    -
    -

    -The number flag gives the number of exercises generated. -

    - -

    Discontinuous constituents

    -

    -A linearization type may contain more strings than one. -An example of where this is useful are English particle -verbs, such as switch off. The linearization of -a sentence may place the object between the verb and the particle: -he switched it off. -

    -

    -The first of the following judgements defines transitive verbs as -discontinuous constituents, i.e. as having a linearization -type with two strings and not just one. The second judgement -shows how the constituents are separated by the object in complementization. -

    -
    -    lincat TV = {s : Number => Str ; s2 : Str} ;
    -    lin ComplTV tv obj = {s = \\n => tv.s ! n ++ obj.s ++ tv.s2} ;
    -
    -

    -There is no restriction in the number of discontinuous constituents -(or other fields) a lincat may contain. The only condition is that -the fields must be of finite types, i.e. built from records, tables, -parameters, and Str, and not functions. A mathematical result -about parsing in GF says that the worst-case complexity of parsing -increases with the number of discontinuous constituents. Moreover, -the parsing and linearization commands only give reliable results -for categories whose linearization type has a unique Str valued -field labelled s. -

    - -

    More constructs for concrete syntax

    - -

    Free variation

    -

    -Sometimes there are many alternative ways to define a concrete syntax. -For instance, the verb negation in English can be expressed both by -does not and doesn't. In linguistic terms, these expressions -are in free variation. The variants construct of GF can -be used to give a list of strings in free variation. For example, -

    -
    -    NegVerb verb = {s = variants {["does not"] ; "doesn't} ++ verb.s} ;
    -
    -

    -An empty variant list -

    -
    -    variants {}
    -
    -

    -can be used e.g. if a word lacks a certain form. -

    -

    -In general, variants should be used cautiously. It is not -recommended for modules aimed to be libraries, because the -user of the library has no way to choose among the variants. -Moreover, even though variants admits lists of any type, -its semantics for complex types can cause surprises. -

    - -

    Record extension and subtyping

    -

    -Record types and records can be extended with new fields. For instance, -in German it is natural to see transitive verbs as verbs with a case. -The symbol ** is used for both constructs. -

    -
    -    lincat TV = Verb ** {c : Case} ;
       
    -    lin Follow = regVerb "folgen" ** {c = Dative} ; 
    -
    -

    -To extend a record type or a record with a field whose label it -already has is a type error. -

    -

    -A record type T is a subtype of another one R, if T has -all the fields of R and possibly other fields. For instance, -an extension of a record type is always a subtype of it. -

    -

    -If T is a subtype of R, an object of T can be used whenever -an object of R is required. For instance, a transitive verb can -be used whenever a verb is required. -

    -

    -Contravariance means that a function taking an R as argument -can also be applied to any object of a subtype T. -

    - -

    Tuples and product types

    -

    -Product types and tuples are syntactic sugar for record types and records: -

    -
    -    T1 * ... * Tn   ===   {p1 : T1 ; ... ; pn : Tn}
    -    <t1, ...,  tn>  ===   {p1 = T1 ; ... ; pn = Tn}
    -
    -

    -Thus the labels p1, p2,...` are hard-coded. -

    - -

    Predefined types and operations

    -

    -GF has the following predefined categories in abstract syntax: -

    -
    -    cat Int ;     -- integers, e.g. 0, 5, 743145151019
    -    cat Float ;   -- floats, e.g.   0.0, 3.1415926
    -    cat String ;  -- strings, e.g.  "", "foo", "123"
    -
    -

    -The objects of each of these categories are literals -as indicated in the comments above. No fun definition -can have a predefined category as its value type, but -they can be used as arguments. For example: -

    -
    -    fun StreetAddress : Int -> String -> Address ;
    -    lin StreetAddress number street = {s = number.s ++ street.s} ;
    +  %--!
    +  ===Hierarchic parameter types===
       
    -    -- e.g. (StreetAddress 10 "Downing Street") : Address
    -
    -

    - -

    More features of the module system

    - -

    Resource grammars and their reuse

    -

    -See -resource library documentation -

    - -

    Interfaces, instances, and functors

    -

    -See an -example built this way -

    - -

    Restricted inheritance and qualified opening

    - -

    More concepts of abstract syntax

    - -

    Dependent types

    - -

    Higher-order abstract syntax

    - -

    Semantic definitions

    - -

    Transfer modules

    -

    -Transfer means noncompositional tree-transforming operations. -The command apply_transfer = at is typically used in a pipe: -

    -
    -    > p "John walks and John runs" | apply_transfer aggregate | l
    -    John walks and runs
    -
    -

    -See the -sources of this example. -

    -

    -See the -transfer language documentation -for more information. -

    - -

    Practical issues

    - -

    Lexers and unlexers

    -

    -Lexers and unlexers can be chosen from -a list of predefined ones, using the flags-lexer and `` -unlexer`` either -in the grammar file or on the GF command line. -

    -

    -Given by help -lexer, help -unlexer: -

    -
    -      The default is words.
    -      -lexer=words         tokens are separated by spaces or newlines
    -      -lexer=literals      like words, but GF integer and string literals recognized
    -      -lexer=vars          like words, but "x","x_...","$...$" as vars, "?..." as meta
    -      -lexer=chars         each character is a token
    -      -lexer=code          use Haskell's lex
    -      -lexer=codevars      like code, but treat unknown words as variables, ?? as meta
    -      -lexer=text          with conventions on punctuation and capital letters
    -      -lexer=codelit       like code, but treat unknown words as string literals
    -      -lexer=textlit       like text, but treat unknown words as string literals
    -      -lexer=codeC         use a C-like lexer
    -      -lexer=ignore        like literals, but ignore unknown words
    -      -lexer=subseqs       like ignore, but then try all subsequences from longest
    +  The reader familiar with a functional programming language such as
    +  [Haskell http://www.haskell.org] must have noticed the similarity
    +  between parameter types in GF and **algebraic datatypes** (``data`` definitions
    +  in Haskell). The GF parameter types are actually a special case of algebraic
    +  datatypes: the main restriction is that in GF, these types must be finite.
    +  (It is this restriction that makes it possible to invert linearization rules into
    +  parsing methods.)
       
    -      The default is unwords.
    -      -unlexer=unwords     space-separated token list (like unwords)
    -      -unlexer=text        format as text: punctuation, capitals, paragraph <p>
    -      -unlexer=code        format as code (spacing, indentation)
    -      -unlexer=textlit     like text, but remove string literal quotes
    -      -unlexer=codelit     like code, but remove string literal quotes
    -      -unlexer=concat      remove all spaces
    -      -unlexer=bind        like identity, but bind at "&+"
    +  However, finite is not the same thing as enumerated. Even in GF, parameter
    +  constructors can take arguments, provided these arguments are from other
    +  parameter types - only recursion is forbidden. Such parameter types impose a
    +  hierarchic order among parameters. They are often needed to define
    +  the linguistically most accurate parameter systems.
    +  
    +  To give an example, Swedish adjectives
    +  are inflected in number (singular or plural) and
    +  gender (uter or neuter). These parameters would suggest 2*2=4 different
    +  forms. However, the gender distinction is done only in the singular. Therefore,
    +  it would be inaccurate to define adjective paradigms using the type
    +  ``Gender => Number => Str``. The following hierarchic definition
    +  yields an accurate system of three adjectival forms.
    +
    +

    + param AdjForm = ASg Gender | APl ; + param Gender = Uter | Neuter ; +

    +
    +  In pattern matching, a constructor can have patterns as arguments. For instance,
    +  the adjectival paradigm in which the two singular forms are the same, can be defined
    +
    +

    + oper plattAdj : Str -> AdjForm => Str = \x -> table { + ASg _ => x ; + APl => x + "a" ; + } +

    +
    +  
    +  
    +  %--!
    +  ===Morphological analysis and morphology quiz===
    +  
    +  Even though in GF morphology
    +  is mostly seen as an auxiliary of syntax, a morphology once defined
    +  can be used on its own right. The command ``morpho_analyse = ma``
    +  can be used to read a text and return for each word the analyses that
    +  it has in the current concrete syntax.
    +
    +

    + > rf bible.txt | morpho_analyse +

    +
    +  In the same way as translation exercises, morphological exercises can
    +  be generated, by the command ``morpho_quiz = mq``. Usually,
    +  the category is set to be something else than ``S``. For instance,
    +
    +

    + > i lib/resource/french/VerbsFre.gf + > morpho_quiz -cat=V +

    +

    + Welcome to GF Morphology Quiz. + ... +

    +

    + réapparaître : VFin VCondit Pl P2 + réapparaitriez + > No, not réapparaitriez, but + réapparaîtriez + Score 0/1 +

    +
    +  Finally, a list of morphological exercises and save it in a
    +  file for later use, by the command ``morpho_list = ml``
    +
    +

    + > morpho_list -number=25 -cat=V +

    +
    +  The ``number`` flag gives the number of exercises generated.
    +  
    +  
    +  
    +  %--!
    +  ===Discontinuous constituents===
    +  
    +  A linearization type may contain more strings than one. 
    +  An example of where this is useful are English particle
    +  verbs, such as //switch off//. The linearization of
    +  a sentence may place the object between the verb and the particle:
    +  //he switched it off//.
    +  
    +  The first of the following judgements defines transitive verbs as
    +  **discontinuous constituents**, i.e. as having a linearization
    +  type with two strings and not just one. The second judgement
    +  shows how the constituents are separated by the object in complementization.
    +
    +

    + lincat TV = {s : Number => Str ; s2 : Str} ; + lin ComplTV tv obj = {s = \\n => tv.s ! n ++ obj.s ++ tv.s2} ; +

    +
    +  There is no restriction in the number of discontinuous constituents
    +  (or other fields) a  ``lincat`` may contain. The only condition is that
    +  the fields must be of finite types, i.e. built from records, tables,
    +  parameters, and ``Str``, and not functions. A mathematical result
    +  about parsing in GF says that the worst-case complexity of parsing
    +  increases with the number of discontinuous constituents. Moreover,
    +  the parsing and linearization commands only give reliable results
    +  for categories whose linearization type has a unique ``Str`` valued
    +  field labelled ``s``.
    +  
    +  
    +  %--!
    +  ==More constructs for concrete syntax==
    +  
    +  
    +  %--!
    +  ===Free variation===
    +  
    +  Sometimes there are many alternative ways to define a concrete syntax.
    +  For instance, the verb negation in English can be expressed both by
    +  //does not// and //doesn't//. In linguistic terms, these expressions
    +  are in **free variation**. The ``variants`` construct of GF can
    +  be used to give a list of strings in free variation. For example,
    +
    +

    + NegVerb verb = {s = variants {["does not"] ; "doesn't} ++ verb.s} ; +

    +
    +  An empty variant list
    +
    +

    + variants {} +

    +
    +  can be used e.g. if a word lacks a certain form.
    +  
    +  In general, ``variants`` should be used cautiously. It is not
    +  recommended for modules aimed to be libraries, because the
    +  user of the library has no way to choose among the variants.
    +  Moreover, even though ``variants`` admits lists of any type,
    +  its semantics for complex types can cause surprises.
    +  
    +  
    +  
    +  
    +  ===Record extension and subtyping===
    +  
    +  Record types and records can be **extended** with new fields. For instance,
    +  in German it is natural to see transitive verbs as verbs with a case.
    +  The symbol ``**`` is used for both constructs.
    +
    +

    + lincat TV = Verb ** {c : Case} ; +

    +

    + lin Follow = regVerb "folgen" ** {c = Dative} ; +

    +
    +  To extend a record type or a record with a field whose label it
    +  already has is a type error.
    +  
    +  A record type //T// is a **subtype** of another one //R//, if //T// has
    +  all the fields of //R// and possibly other fields. For instance,
    +  an extension of a record type is always a subtype of it.
    +  
    +  If //T// is a subtype of //R//, an object of //T// can be used whenever
    +  an object of //R// is required. For instance, a transitive verb can
    +  be used whenever a verb is required.
    +  
    +  **Contravariance** means that a function taking an //R// as argument
    +  can also be applied to any object of a subtype //T//.
    +  
    +  
    +  
    +  ===Tuples and product types===
    +  
    +  Product types and tuples are syntactic sugar for record types and records:
    +
    +

    + T1 * ... * Tn === {p1 : T1 ; ... ; pn : Tn} + <t1, ..., tn> === {p1 = T1 ; ... ; pn = Tn} +

    +
    +  Thus the labels ``p1, p2,...``` are hard-coded.
    +  
    +  
    +  %--!
    +  ===Prefix-dependent choices===
    +  
    +  The construct exemplified in
    +
    +

    + oper artIndef : Str = + pre {"a" ; "an" / strs {"a" ; "e" ; "i" ; "o"}} ; +

    +
    +  Thus
    +
    +

    + artIndef ++ "cheese" ---> "a" ++ "cheese" + artIndef ++ "apple" ---> "an" ++ "cheese" +

    +
    +  This very example does not work in all situations: the prefix
    +  //u// has no general rules, and some problematic words are
    +  //euphemism, one-eyed, n-gram//. It is possible to write
    +
    +

    + oper artIndef : Str = + pre {"a" ; + "a" / strs {"eu" ; "one"} ; + "an" / strs {"a" ; "e" ; "i" ; "o" ; "n-"} + } ; +

    +
    +  
    +  
    +  
    +  ===Predefined types and operations===
    +  
    +  GF has the following predefined categories in abstract syntax:
    +
    +

    + cat Int ; -- integers, e.g. 0, 5, 743145151019 + cat Float ; -- floats, e.g. 0.0, 3.1415926 + cat String ; -- strings, e.g. "", "foo", "123" +

    +
    +  The objects of each of these categories are **literals**
    +  as indicated in the comments above. No ``fun`` definition
    +  can have a predefined category as its value type, but
    +  they can be used as arguments. For example:
    +
    +

    + fun StreetAddress : Int -> String -> Address ; + lin StreetAddress number street = {s = number.s ++ street.s} ; +

    +

    + -- e.g. (StreetAddress 10 "Downing Street") : Address +

    +
    +  
    +  
    +  %--!
    +  ==More features of the module system==
    +  
    +  
    +  ===Resource grammars and their reuse===
    +  
    +  See 
    +  [resource library documentation  ../../lib/resource/doc/gf-resource.html]
    +  
    +  
    +  ===Interfaces, instances, and functors===
    +  
    +  See an
    +  [example built this way ../../examples/mp3/mp3-resource.html]
    +  
    +  
    +  ===Restricted inheritance and qualified opening===
    +  
    +  
    +  
    +  ==More concepts of abstract syntax==
    +  
    +  
    +  ===Dependent types===
    +  
    +  ===Higher-order abstract syntax===
    +  
    +  ===Semantic definitions===
    +  
    +  
    +  
    +  ==Transfer modules==
    +  
    +  Transfer means noncompositional tree-transforming operations.
    +  The command ``apply_transfer = at`` is typically used in a pipe:
    +
    +

    + > p "John walks and John runs" | apply_transfer aggregate | l + John walks and runs +

    +
    +  See the
    +  [sources ../../transfer/examples/aggregation] of this example.
    +  
    +  See the
    +  [transfer language documentation  ../transfer.html]
    +  for more information.
    +  
    +  
    +  ==Practical issues==
    +  
    +  
    +  ===Lexers and unlexers===
    +  
    +  Lexers and unlexers can be chosen from
    +  a list of predefined ones, using the flags``-lexer`` and `` -unlexer`` either
    +  in the grammar file or on the GF command line.
    +  
    +  Given by ``help -lexer``, ``help -unlexer``:
    +
    +

    + The default is words. + -lexer=words tokens are separated by spaces or newlines + -lexer=literals like words, but GF integer and string literals recognized + -lexer=vars like words, but "x","x_...","$...$" as vars, "?..." as meta + -lexer=chars each character is a token + -lexer=code use Haskell's lex + -lexer=codevars like code, but treat unknown words as variables, ?? as meta + -lexer=text with conventions on punctuation and capital letters + -lexer=codelit like code, but treat unknown words as string literals + -lexer=textlit like text, but treat unknown words as string literals + -lexer=codeC use a C-like lexer + -lexer=ignore like literals, but ignore unknown words + -lexer=subseqs like ignore, but then try all subsequences from longest +

    +

    + The default is unwords. + -unlexer=unwords space-separated token list (like unwords) + -unlexer=text format as text: punctuation, capitals, paragraph <p> + -unlexer=code format as code (spacing, indentation) + -unlexer=textlit like text, but remove string literal quotes + -unlexer=codelit like code, but remove string literal quotes + -unlexer=concat remove all spaces + -unlexer=bind like identity, but bind at "&+" +

    +
    +  
    +  
    +  ===Efficiency of grammars===
    +  
    +  Issues:
    +  
    +  - the choice of datastructures in ``lincat``s
    +  - the value of the ``optimize`` flag 
    +  - parsing efficiency: ``-mcfg`` vs. others
    +  
    +  
    +  ===Speech input and output===
    +  
    +  The``speak_aloud = sa`` command sends a string to the speech
    +  synthesizer 
    +  [Flite http://www.speech.cs.cmu.edu/flite/doc/].
    +  It is typically used via a pipe:
    +  ```  generate_random | linearize | speak_aloud
    +  The result is only satisfactory for English.
    +  
    +  The ``speech_input = si`` command receives a string from a
    +  speech recognizer that requires the installation of
    +  [ATK http://mi.eng.cam.ac.uk/~sjy/software.htm].
    +  It is typically used to pipe input to a parser:
    +  ```  speech_input -tr | parse
    +  The method words only for grammars of English.
    +  
    +  Both Flite and ATK are freely available through the links
    +  above, but they are not distributed together with GF.
    +  
    +  
    +  
    +  
    +  ===Multilingual syntax editor===
    +  
    +  The 
    +  [Editor User Manual http://www.cs.chalmers.se/~aarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm]
    +  describes the use of the editor, which works for any multilingual GF grammar.
    +  
    +  Here is a snapshot of the editor:
    +  
    +  [../quick-editor.gif]
    +   
    +  The grammars of the snapshot are from the
    +  [Letter grammar package http://www.cs.chalmers.se/~aarne/GF/examples/letter].
    +  
    +  
    +  
    +  ===Interactive Development Environment (IDE)===
    +  
    +  Forthcoming.
    +  
    +  
    +  ===Communicating with GF===
    +  
    +  Other processes can communicate with the GF command interpreter,
    +  and also with the GF syntax editor.
    +  
    +  
    +  ===Embedded grammars in Haskell, Java, and Prolog===
    +  
    +  GF grammars can be used as parts of programs written in the
    +  following languages. The links give more documentation.
    +  
    +  - [Java http://www.cs.chalmers.se/~bringert/gf/gf-java.html]
    +  - [Haskell http://www.cs.chalmers.se/~aarne/GF/src/GF/Embed/EmbedAPI.hs]
    +  - [Prolog http://www.cs.chalmers.se/~peb/software.html]
    +  
    +  
    +  ===Alternative input and output grammar formats===
    +  
    +  A summary is given in the following chart of GF grammar compiler phases:
    +  [../gf-compiler.png]
    +  
    +  
    +  ==Case studies==
    +  
    +  ===Interfacing formal and natural languages===
    +  
    +  [Formal and Informal Software Specifications http://www.cs.chalmers.se/~krijo/thesis/thesisA4.pdf],
    +  PhD Thesis by
    +  [Kristofer Johannisson http://www.cs.chalmers.se/~krijo], is an extensive example of this.
    +  The system is based on a multilingual grammar relating the formal language OCL with
    +  English and German.
    +  
    +  A simpler example will be explained here.
       
     
    -

    - -

    Efficiency of grammars

    -

    -Issues: -

    - - - -

    Speech input and output

    -

    -Thespeak_aloud = sa command sends a string to the speech -synthesizer -Flite. -It is typically used via a pipe: -

    -
    -   generate_random | linearize | speak_aloud
    -
    -

    -The result is only satisfactory for English. -

    -

    -The speech_input = si command receives a string from a -speech recognizer that requires the installation of -ATK. -It is typically used to pipe input to a parser: -

    -
    -   speech_input -tr | parse
    -
    -

    -The method words only for grammars of English. -

    -

    -Both Flite and ATK are freely available through the links -above, but they are not distributed together with GF. -

    - -

    Multilingual syntax editor

    -

    -The -Editor User Manual -describes the use of the editor, which works for any multilingual GF grammar. -

    -

    -Here is a snapshot of the editor: -

    -

    - -

    -

    -The grammars of the snapshot are from the -Letter grammar package. -

    - -

    Interactive Development Environment (IDE)

    -

    -Forthcoming. -

    - -

    Communicating with GF

    -

    -Other processes can communicate with the GF command interpreter, -and also with the GF syntax editor. -

    - -

    Embedded grammars in Haskell, Java, and Prolog

    -

    -GF grammars can be used as parts of programs written in the -following languages. The links give more documentation. -

    - - - -

    Alternative input and output grammar formats

    -

    -A summary is given in the following chart of GF grammar compiler phases: - -

    - -

    Case studies

    - -

    Interfacing formal and natural languages

    -

    -Formal and Informal Software Specifications, -PhD Thesis by -Kristofer Johannisson, is an extensive example of this. -The system is based on a multilingual grammar relating the formal language OCL with -English and German. -

    -

    -A simpler example will be explained here. -