diff --git a/doc/tutorial/gf-tutorial2.html b/doc/tutorial/gf-tutorial2.html index b1bd541ae..2ca7949cc 100644 --- a/doc/tutorial/gf-tutorial2.html +++ b/doc/tutorial/gf-tutorial2.html @@ -51,7 +51,7 @@ It will guide you

Getting the GF program

-The program is open-source free software, which you can download from the +The program is open-source free software, which you can download via the GF Homepage:
http://www.cs.chalmers.se/~aarne/GF @@ -290,8 +290,10 @@ and so on.

The labelled context-free format

The labelled context-free grammar format permits user-defined -labels to each rule. GF recognizes files of this format by the suffix -.cf. Let us include the following rules in the file +labels to each rule. +GF recognizes files of this format by the suffix +.cf. It is intermediate between EBNF and full GF format. +Let us include the following rules in the file paleolithic.cf.
   PredVP.  S   ::= NP VP ;
@@ -407,16 +409,20 @@ Rules in a GF grammar are called judgements, and the keywords
 judgement forms:
 
 We return to the precise meanings of these judgement forms later.
 First we will look at how judgements are grouped into modules, and
 show how the grammar paleolithic.cf is
@@ -436,10 +442,41 @@ module forms are
        abstract syntax A, with judgements in the module body M.
 
 
+
+
+

Record types, records, and Strs

+ +The linearization type of a category is a record type, with +zero of more fields of different types. The simplest record +type used for linearization in GF is +
+  {s : Str}
+
+which has one field, with label s and type Str. + +

+ +Examples of records of this type are +

+  [s = "foo"}
+  [s = "hello" ++ "world"}
+
+The type Str is really the type of token lists, but +most of the time one can conveniently think of it as the type of strings, +denoted by string literals in double quotes. + +

+ +Whenever a record r of type {s : Str} is given, +r.s is an object of type Str. This is of course +a special case of the projection rule, allowing the extraction +of fields from a record. + +

An abstract syntax example

-Each nonterminal occurring in paleolithic.cf is +Each nonterminal occurring in the grammar paleolithic.cf is introduced by a cat judgement. Each rule label is introduced by a fun judgement.
@@ -520,11 +557,11 @@ Import PaleolithicEng.gf and try what happens
 
The GF program does not only read the file PaleolithicEng.gf, but also all other files that it -depends on - in this case, Paleolithic.gf. +depends on - in this case, Paleolithic.gf.

-For each file that is compiles, a .gfc file +For each file that is compiled, a .gfc file is generated. The GFC format (="GF Canonical") is the "machine code" of GF, which is faster to process than GF source files. When reading a module, GF knows whether @@ -611,7 +648,7 @@ Translate by using a pipe:

Translation quiz

-This is a simple kind of language exercises that can be automatically +This is a simple language exercise that can be automatically generated from a multilingual grammar. The system generates a set of random sentence, displays them in one language, and checks the user's answer given in another language. The command translation_quiz = tq @@ -706,7 +743,7 @@ only do "one thing" each, e.g. fun Cep, Agaric : Mushroom ; }
-They can afterwards be combined in bigger grammars by using +They can afterwards be combined into bigger grammars by using multiple inheritance, i.e. extension of several grammars at the same time:
@@ -786,14 +823,14 @@ The introduction of plural forms requires two things:
 
 Different languages have different rules of inflection and agreement.
 For instance, Italian has also agreement in gender (masculine vs. feminine).
-We want to be able to ignore such differences in the abstract
-syntax.
+We want to express such special features of languages precisely in
+concrete syntax while ignoring them in abstract syntax.
 
 

-To be able to do all this, we need a couple of new judgement forms, -a new module form, and a more powerful way of expressing linearization -rules. +To be able to do all this, we need two new judgement forms, +a new module form, and a generalizarion of linearization types +from strings to more complex types. @@ -1018,7 +1055,7 @@ these forms are explained in the following section. The paradigms regNoun does not give the correct forms for all nouns. For instance, louse - lice and -fish - fish must be given by using mkNoun. +fish - fish must be given by using mkNoun. Also the word boy would be inflected incorrectly; to prevent this, either use mkNoun or modify regNoun so that the "y" case does not @@ -1165,7 +1202,7 @@ lin

Hierarchic parameter types

The reader familiar with a functional programming language such as -Haskell must have noticed the similarity +Haskell must have noticed the similarity between parameter types in GF and algebraic datatypes (data definitions in Haskell). The GF parameter types are actually a special case of algebraic datatypes: the main restriction is that in GF, these types must be finite. diff --git a/index.html b/index.html index d523c1a1f..dce5b4f1b 100644 --- a/index.html +++ b/index.html @@ -150,7 +150,7 @@ of a multimodal dialogue system built with embedded grammars.

-Resource grammar library: +Resource grammar library: basic structures of ten languages (Danish, English, Finnish, French, German, Italian, Norwegian, Russian, Spanish, Swedish). @@ -240,6 +240,10 @@ outdated). Language specification of the GF grammar formalism. +

  • + +Resource grammar library documentation. +
  • Highlights of Version 2.1 and 2.0 (in comparison with version 1.2). diff --git a/lib/resource/doc/Makefile b/lib/resource/doc/Makefile index 8e3c79697..af91c3a1b 100644 --- a/lib/resource/doc/Makefile +++ b/lib/resource/doc/Makefile @@ -31,7 +31,7 @@ gfdoc: gfdoc ../italian/BeschIta.gf ; mv ../italian/BeschIta.html . gfdoc ../spanish/ParadigmsSpa.gf ; mv ../spanish/ParadigmsSpa.html . -# gfdoc ../spanish/BasicSpa.gf ; mv ../spanish/BasicSpa.html . + gfdoc ../spanish/BasicSpa.gf ; mv ../spanish/BasicSpa.html . gfdoc ../spanish/BeschSpa.gf ; mv ../spanish/BeschSpa.html . gifs: api lang scand low diff --git a/lib/resource/doc/example/Animals.gf b/lib/resource/doc/example/Animals.gf index e6d084243..17e64c7b1 100644 --- a/lib/resource/doc/example/Animals.gf +++ b/lib/resource/doc/example/Animals.gf @@ -5,6 +5,6 @@ abstract Animals = Questions ** { fun -- a lexicon of animals and actions among them Dog, Cat, Mouse, Lion, Zebra : Entity ; - Chase, Eat, Like : Action ; + Chase, Eat, See : Action ; } diff --git a/lib/resource/doc/example/AnimalsEng.gf b/lib/resource/doc/example/AnimalsEng.gf index a25a95ae2..e8a9c474a 100644 --- a/lib/resource/doc/example/AnimalsEng.gf +++ b/lib/resource/doc/example/AnimalsEng.gf @@ -11,5 +11,5 @@ concrete AnimalsEng of Animals = QuestionsEng ** Zebra = regN "zebra" ; Chase = dirV2 (regV "chase") ; Eat = dirV2 eat_V ; - Like = dirV2 (regV "like") ; + See = dirV2 see_V ; } diff --git a/lib/resource/doc/example/AnimalsFre.gf b/lib/resource/doc/example/AnimalsFre.gf index 635f47208..d35de8243 100644 --- a/lib/resource/doc/example/AnimalsFre.gf +++ b/lib/resource/doc/example/AnimalsFre.gf @@ -11,5 +11,5 @@ concrete AnimalsFre of Animals = QuestionsFre ** Zebra = regN "zèbre" masculine ; Chase = dirV2 (regV "chasser") ; Eat = dirV2 (regV "manger") ; - Like = dirV2 (regV "aimer") ; + See = voir_V2 ; } diff --git a/lib/resource/doc/example/AnimalsSwe.gf b/lib/resource/doc/example/AnimalsSwe.gf index f7aaa0cd6..acd839317 100644 --- a/lib/resource/doc/example/AnimalsSwe.gf +++ b/lib/resource/doc/example/AnimalsSwe.gf @@ -11,5 +11,5 @@ concrete AnimalsSwe of Animals = QuestionsSwe ** Zebra = regN "zebra" utrum ; Chase = dirV2 (regV "jaga") ; Eat = dirV2 äta_V ; - Like = mkV2 (mk2V "tycka" "tycker") "om" ; + See = dirV2 se_V ; } diff --git a/lib/resource/doc/example/QuestionsI.gf b/lib/resource/doc/example/QuestionsI.gf index 7dda8ca8c..9915cc8d5 100644 --- a/lib/resource/doc/example/QuestionsI.gf +++ b/lib/resource/doc/example/QuestionsI.gf @@ -1,6 +1,6 @@ --# -path=.:resource/abstract:resource/../prelude --- Language-independent question grammar parametwized on Resource. +-- Language-independent question grammar parametrized on Resource. incomplete concrete QuestionsI of Questions = open Resource in { lincat diff --git a/lib/resource/doc/example/mkAnimals.gfcm b/lib/resource/doc/example/mkAnimals.gfs similarity index 100% rename from lib/resource/doc/example/mkAnimals.gfcm rename to lib/resource/doc/example/mkAnimals.gfs diff --git a/lib/resource/doc/gf-resource.html b/lib/resource/doc/gf-resource.html index 902cc5432..795f333f6 100644 --- a/lib/resource/doc/gf-resource.html +++ b/lib/resource/doc/gf-resource.html @@ -9,9 +9,11 @@

    -Second Version, Gothenburg, 1 March 2005 +Third Version, 22 May 2005
    -First Draft, Gothenburg, 7 February 2005 +Second Version, 1 March 2005 +
    +First Draft, 7 February 2005

    @@ -31,7 +33,8 @@ A grammar formalism based on functional programming and type theory.

    -Designed to be nice for ordinary programmers to use. +Designed to be nice for ordinary programmers to use: by this +we mean programmers without training in linguistics.

    @@ -47,6 +50,7 @@ Thus not primarily another theoretical framework for linguists. +

    Multilingual grammars

    @@ -90,6 +94,7 @@ wenn 2 ist gerade, dann 2+2 ist gerade
    om 2 är jämnt, 2+2 är jämnt
    +

    Solving the difficulties

    @@ -197,17 +202,15 @@ Where do we get the data from?
  • automatic extraction or hand-writing?
  • reuse of existing resources? - -

    - -Extra constraint: we want open-source free software. - +
    +Extra constraint: we want open-source free software and +hence cannot use existing proprietary resources. -

    The scope of the resource grammar library

    +

    The scope of a resource grammar library for a language

    All morphological paradigms @@ -228,6 +231,7 @@ Currently,
    +

    Success criteria

    @@ -251,24 +255,33 @@ families, using the module system of GF.

    These are not our success criteria

    -Language coverage: you can parse all expressions. Example: +Language coverage: to be able to parse all expressions. +
    +Example: the French passé simple tense, although covered by the -morhology, is not used in the language-independent API, but -only the passé composé is. +morphology, is not used in the language-independent API, but +only the passé composé is. However, an application +accessing the French-specific (or Romance-specific) +modules can use the passé simple.

    -Semantic correctness +Semantic correctness: only to produce meaningful expressions. +
    +Example: the following sentences can be generated

       colourless green ideas sleep furiously
     
       the time is seventy past forty-two
     
    +However, an applicatio grammar can use a domain-specific +semantics to guarantee semantic well-formedness.

    (Warning for linguists:) theoretical innovation in -syntax (and it will all be hidden anyway!) +syntax is not among the goals +(and it would be hidden from users anyway!). @@ -334,6 +347,7 @@ The current GF Resource Project covers ten languages: The first three letters (Dan etc) are used in grammar module names +

    Library structure 1: language-independent API

    @@ -351,6 +365,7 @@ conjunctions, pronouns), e.g. and_Conj : Conj ;
  • +

    Library structure 2: language-dependent modules

    @@ -477,6 +492,8 @@ Alternative views on sentence formation: Spanish paradigms
    +example use of Spanish paradigms +
    Spanish verb conjugations

    @@ -491,7 +508,7 @@ Alternative views on sentence formation:

    Use as top-level grammar: testing

    -Import a set of $LangX$ grammars: +Import a set of LangX grammars:
       i english/LangEng.gf
       i swedish/LangSwe.gf
    @@ -532,11 +549,14 @@ Import directly by open:
     
       concrete AppNor of App = open LangNor, ParadigmsNor in {...}
     
    -No more dummy reuse modules and bulky .gfr files! +(Note for the users of GF 2.1 and older: +the dummy reuse modules and their bulky .gfr versions +are no longer needed!)

    -If you need to convert resource category records to/from strings, use +If you need to convert resource records to strings, and don't want to know +the concrete type (as you never should), you can use

       Predef.toStr : (L : Type) -> L -> Str ; 
     
    @@ -548,65 +568,99 @@ If you need to convert resource category records to/from strings, use +

    Use as library through parser

    -Use the parser when developing a resource. +You can use the parser with a LangX grammar +when developing a resource. + +

    + +Using the -v option shows if the parser fails because +of unknown words.

       > p -cat=S -v "jag ska åka till Chalmers"
       unknown tokens [TS "åka",TS "Chalmers"]
    -
    +
    +Then try to select words that LangX recognizes: +
       > p -cat=S "jag ska gå till Danmark"
       UseCl (PosTP TFuture ASimul)
         (AdvCl (SPredV i_NP go_V)
         (AdvPP (PrepNP to_Prep (UsePN (PNCountry Denmark)))))
     
    -Extend vocabulary at need. +Use these API structures and extend vocabulary to match your need.
       åka_V = lexV "åker" ; 
       Chalmers = regPN "Chalmers" neutrum ;
     
    + +

    Syntax editor as library browser

    + +You can run the syntax editor on LangX to +find resource API functions through context-sensitive menus. +For instance, the shell command +
    +  jgf LangEng.gf LangFre.gf
    +
    +opens the editor with English and French views. The + +Editor User Manual gives more information on the use of the editor. + +

    + +A restriction of the editor is that it does not give access to +ParadigmsX modules. An IDE environment extending the editor +to a grammar programming tool is work in progress. + + +

    Example application: a small translation system

    -You can say things like the following: +In this system, you can express questions and answers of +the following forms:
    -  who chases mice ?
    -  whom does the lion chase ?
    -  the dog chases cats
    +  Who chases mice ?
    +  Whom does the lion chase ?
    +  The dog chases cats.
     
    -Source modules: +We build the abstract syntax in two phases: +

    -Abstract syntax: -Questions, -Animals +The concrete syntax of English is built in three phases: +

    -Concrete syntax of questions parametrized on the resource API: -QuestionsI +The concrete syntax of Swedish is built upon QuestionsI +in a similar way, with the modules +QuestionsSwe and. +AnimalsSwe.

    -English concrete syntax: -QuestionsEng, -AnimalsEng +The concrete syntax of French consists similarly of the modules +QuestionsFre and +AnimalsFre. -

    - -French concrete syntax: -QuestionsFre, -AnimalsFre - -

    - -Swedish concrete syntax: -QuestionsSwe, -AnimalsSwe @@ -635,27 +689,13 @@ and you get an end-user grammar animals.gfcm. You can also write the commands in a gfs (GF script) file, say -mkAnimals.gfs, +mkAnimals.gfs, and then call GF with

       gf <mkAnimals.gfs
     
    - -

    Further simplifications of the application grammar

    - -Step 1: use a simplified access to present-tense sentences, -SentenceX (to be written...) - -

    - -Step 2: factor out the categories and purely combinational -rules into an incomplete module (to be shown... but -this does not work for French, which uses different structures: -e.g. Qui aime les lions ? with a definite phrase -where English has Who loves lions? -

    Implementation details: the structure of low-level files

    @@ -678,6 +718,7 @@ In two language families: +

    Current status

    @@ -701,6 +742,7 @@ X = implemented (few exceptions may occur) - = not implemented +

    Known bugs and limitations

    @@ -737,10 +779,11 @@ some verbs in Basic should be reflexive Swedish +

    Obtaining it

    -Get the grammar package atDownload from +Get the grammar package from GF Download Page. The current libraries are in lib/resource. Version 0.6 is in diff --git a/lib/resource/french/StructuralFre.gf b/lib/resource/french/StructuralFre.gf index 2665b3607..4016d7c03 100644 --- a/lib/resource/french/StructuralFre.gf +++ b/lib/resource/french/StructuralFre.gf @@ -6,7 +6,7 @@ concrete StructuralFre of Structural = lin - UseNumeral n = {s = \\g => n.s !g ; n = n.n} ; + UseNumeral n = {s = \\g => n.s !g ; n = n.n ; isNo = False} ; above_Prep = {s = ["au dessus"] ; c = genitive} ; after_Prep = justPrep "après" ; diff --git a/lib/resource/italian/StructuralIta.gf b/lib/resource/italian/StructuralIta.gf index c37abbf21..1eec02262 100644 --- a/lib/resource/italian/StructuralIta.gf +++ b/lib/resource/italian/StructuralIta.gf @@ -5,7 +5,7 @@ concrete StructuralIta of Structural = CategoriesIta, NumeralsIta ** lin - UseNumeral n = {s = \\g => n.s !g ; n = n.n} ; + UseNumeral n = {s = \\g => n.s !g ; n = n.n ; isNo = False} ; above_Prep = justPrep "sopra" ; after_Prep = justPrep "dopo" ; diff --git a/lib/resource/romance/CategoriesRomance.gf b/lib/resource/romance/CategoriesRomance.gf index abeb751d2..c5b063091 100644 --- a/lib/resource/romance/CategoriesRomance.gf +++ b/lib/resource/romance/CategoriesRomance.gf @@ -40,7 +40,7 @@ lincat -- = CommNoun ** {s2 : Preposition ; c : CaseA} ; N3 = Function ** {s3 : Preposition ; c3 : CaseA} ; Prep = {s : Preposition ; c : CaseA} ; - Num = {s : Gender => Str ; n : Number} ; + Num = {s : Gender => Str ; n : Number ; isNo : Bool} ; A = Adjective ; -- = {s : AForm => Str ; p : Bool} ; diff --git a/lib/resource/romance/RulesRomance.gf b/lib/resource/romance/RulesRomance.gf index 19abcc3a9..886cf4b20 100644 --- a/lib/resource/romance/RulesRomance.gf +++ b/lib/resource/romance/RulesRomance.gf @@ -35,7 +35,7 @@ lin ModGenOne = npGenDet singular ; ModGenNum = npGenDetNum ; - UseInt i = {s = \\_ => i.s ; n = Pl} ; ---- n + UseInt i = {s = \\_ => i.s ; n = Pl ; isNo = False} ; ---- n NoNum = noNum ; UseA = adj2adjPhrase ; diff --git a/lib/resource/romance/SyntaxRomance.gf b/lib/resource/romance/SyntaxRomance.gf index 6e4778264..2b87e8cf7 100644 --- a/lib/resource/romance/SyntaxRomance.gf +++ b/lib/resource/romance/SyntaxRomance.gf @@ -60,9 +60,10 @@ oper pronNounPhrase : Pronoun -> NounPhrase = \pro -> pro ; -- Many determiners can be modified with numerals, which may be inflected in --- gender. +-- gender. The label $isNo$ is a hack used to force $des$ for plural +-- indefinite with $noNum$. - Numeral : Type = {s : Gender => Str ; n : Number} ; + Numeral : Type = {s : Gender => Str ; n : Number ; isNo : Bool} ; pronWithNum : Pronoun -> Numeral -> Pronoun = \nous,deux -> {s = \\c => nous.s ! c ++ deux.s ! pgen2gen nous.g ; @@ -72,7 +73,7 @@ oper c = nous.c } ; - noNum : Numeral = {s = \\_ => [] ; n = Pl} ; + noNum : Numeral = {s = \\_ => [] ; n = Pl ; isNo = True} ; -- The existence construction "il y a", "c'è / ci sono" is defined separately, -- and ad hoc, in each language. @@ -138,7 +139,11 @@ oper indefNounPhraseNum : Numeral -> CommNounPhrase -> NounPhrase = \nu,mec -> normalNounPhrase - (\\c => prepCase c ++ nu.s ! mec.g ++ mec.s ! nu.n) + (\\c => case nu.isNo of { + True => artIndef mec.g Pl c ++ mec.s ! Pl ; + _ => prepCase c ++ nu.s ! mec.g ++ mec.s ! nu.n + } + ) mec.g nu.n ; diff --git a/lib/resource/spanish/StructuralSpa.gf b/lib/resource/spanish/StructuralSpa.gf index 0dee0ec74..afd6b0e94 100644 --- a/lib/resource/spanish/StructuralSpa.gf +++ b/lib/resource/spanish/StructuralSpa.gf @@ -6,7 +6,7 @@ concrete StructuralSpa of Structural = CategoriesSpa, NumeralsSpa ** lin - UseNumeral n = {s = \\g => n.s !g ; n = n.n} ; + UseNumeral n = {s = \\g => n.s !g ; n = n.n ; isNo = False} ; above_Prep = justPrep "sobre" ; after_Prep = {s = "después" ; c = genitive} ;