improved resource doc

2026-07-14 01:22:46 -06:00 · 2005-05-22 18:43:00 +00:00
parent 60427e170c
commit e451bc03ba
16 changed files with 188 additions and 99 deletions
@@ -51,7 +51,7 @@ It will guide you
 <!-- NEW -->
 <h3>Getting the GF program</h3>

-The program is open-source free software, which you can download from the
+The program is open-source free software, which you can download via the
 GF Homepage:<br>
 <a href="http://www.cs.chalmers.se/%7Eaarne/GF">
 <tt>http://www.cs.chalmers.se/~aarne/GF</tt></a>
@@ -290,8 +290,10 @@ and so on.
 <h4>The labelled context-free format</h4>

 The <b>labelled context-free grammar</b> format permits user-defined
-labels to each rule. GF recognizes files of this format by the suffix
-<tt>.cf</tt>. Let us include the following rules in the file
+labels to each rule.
+GF recognizes files of this format by the suffix
+<tt>.cf</tt>. It is intermediate between EBNF and full GF format.
+Let us include the following rules in the file
 <tt>paleolithic.cf</tt>.
 <pre>
  PredVP.  S   ::= NP VP ;
@@ -407,16 +409,20 @@ Rules in a GF grammar are called <b>judgements</b>, and the keywords
 judgement forms:
 <ul>
  <li> abstract syntax
-  <ul>
-    <li> cat C
-    <li> fun f : A
-  </ul>
+  <p>
+  <table>
+    <tr> <td>form </td><td>reading        </td></tr>
+    <tr> <td><tt>cat</tt> C</td><td>C is a category</td></tr>
+    <tr> <td><tt>fun</tt> f <tt>:</tt> A</td><td>f is a function of type A</td></tr>
+  </table>
  <li> concrete syntax
-  <ul>
-    <li> lincat C = T
-    <li> lin f x ... y = t
+  <p>
+    <table>
+    <tr> <td>form </td><td>reading        </td></tr>
+    <tr> <td><tt>lincat</tt> C <tt>=</tt> T</td><td>category C has linearization type T</td></tr>
+    <tr> <td><tt>lin</tt> f <tt>=</tt> t</td><td>function f has linearization t</td></tr>
+  </table>
  </ul>
-</ul>
 We return to the precise meanings of these judgement forms later.
 First we will look at how judgements are grouped into modules, and
 show how the grammar <tt>paleolithic.cf</tt> is
@@ -436,10 +442,41 @@ module forms are
       abstract syntax A, with judgements in the module body M.
 </ul>

+
+<!-- NEW -->
+<h4>Record types, records, and <tt>Str</tt>s</h4>
+
+The linearization type of a category is a <b>record type</b>, with
+zero of more <b>fields</b> of different types. The simplest record
+type used for linearization in GF is
+<pre>
+  {s : Str}
+</pre>
+which has one field, with <b>label</b> <tt>s</tt> and type <tt>Str</tt>.
+
+<p>
+
+Examples of records of this type are
+<pre>
+  [s = "foo"}
+  [s = "hello" ++ "world"}
+</pre>
+The type <tt>Str</tt> is really the type of <b>token lists</b>, but
+most of the time one can conveniently think of it as the type of strings,
+denoted by string literals in double quotes.
+
+<p>
+
+Whenever a record <tt>r</tt> of type <tt>{s : Str}</tt> is given,
+<tt>r.s</tt> is an object of type <tt>Str</tt>. This is of course
+a special case of the <b>projection</b> rule, allowing the extraction
+of fields from a record.
+
+
 <!-- NEW -->
 <h4>An abstract syntax example</h4>

-Each nonterminal occurring in <tt>paleolithic.cf</tt> is
+Each nonterminal occurring in the grammar <tt>paleolithic.cf</tt> is
 introduced by a <tt>cat</tt> judgement. Each
 rule label is introduced by a <tt>fun</tt> judgement.
 <pre>
@@ -520,11 +557,11 @@ Import <tt>PaleolithicEng.gf</tt> and try what happens
 </pre>
 The GF program does not only read the file 
 <tt>PaleolithicEng.gf</tt>, but also all other files that it
-depends on - in this case,  <tt>Paleolithic.gf</tt>.
+depends on - in this case, <tt>Paleolithic.gf</tt>.

 <p>

-For each file that is compiles, a <tt>.gfc</tt> file
+For each file that is compiled, a <tt>.gfc</tt> file
 is generated. The GFC format (="GF Canonical") is the
 "machine code" of GF, which is faster to process than
 GF source files. When reading a module, GF knows whether
@@ -611,7 +648,7 @@ Translate by using a pipe:
 <!-- NEW -->
 <h4>Translation quiz</h4>

-This is a simple kind of language exercises that can be automatically
+This is a simple language exercise that can be automatically
 generated from a multilingual grammar. The system generates a set of
 random sentence, displays them in one language, and checks the user's
 answer given in another language. The command <tt>translation_quiz = tq</tt>
@@ -706,7 +743,7 @@ only do "one thing" each, e.g.
    fun Cep, Agaric : Mushroom ;
  }
 </pre>
-They can afterwards be combined in bigger grammars by using
+They can afterwards be combined into bigger grammars by using
 <b>multiple inheritance</b>, i.e. extension of several grammars at the
 same time:
 <pre>
@@ -786,14 +823,14 @@ The introduction of plural forms requires two things:
 </ul>
 Different languages have different rules of inflection and agreement.
 For instance, Italian has also agreement in gender (masculine vs. feminine).
-We want to be able to ignore such differences in the abstract
-syntax.
+We want to express such special features of languages precisely in
+concrete syntax while ignoring them in abstract syntax.

 <p>

-To be able to do all this, we need a couple of new judgement forms,
-a new module form, and a more powerful way of expressing linearization
-rules.
+To be able to do all this, we need two new judgement forms,
+a new module form, and a generalizarion of linearization types
+from strings to more complex types.


 <!-- NEW -->
@@ -1018,7 +1055,7 @@ these forms are explained in the following section.

 The paradigms <tt>regNoun</tt> does not give the correct forms for
 all nouns. For instance, <i>louse - lice</i> and
-<i>fish - fish</i> must be given by using <tt>mkNoun</i>.
+<i>fish - fish</i> must be given by using <tt>mkNoun</tt>.
 Also the word <i>boy</i> would be inflected incorrectly; to prevent
 this, either use <tt>mkNoun</tt> or modify 
 <tt>regNoun</tt> so that the <tt>"y"</tt> case does not
@@ -1165,7 +1202,7 @@ lin
 <h4>Hierarchic parameter types</h4>

 The reader familiar with a functional programming language such as
-<a href="www.haskell.org">Haskell</a> must have noticed the similarity
+<a href="http://www.haskell.org">Haskell</a> must have noticed the similarity
 between parameter types in GF and algebraic datatypes (<tt>data</tt> definitions
 in Haskell). The GF parameter types are actually a special case of algebraic
 datatypes: the main restriction is that in GF, these types must be finite.
@@ -150,7 +150,7 @@ of a multimodal dialogue system built with embedded grammars.

 <p>

-<a href="lib/resource/doc/gf-resource.html">Resource grammar library</a>: 
+<a href="lib/resource/doc/01-gf-resource.html">Resource grammar library</a>: 
 basic structures of ten languages
 (Danish, English, Finnish, French, German,
 Italian, Norwegian, Russian, Spanish, Swedish).
@@ -240,6 +240,10 @@ outdated).
 <a href="doc/DocGF.pdf">
 Language specification</a> of the GF grammar formalism.

+</li><li>
+<a href="lib/resource/doc/01-gf-resource.html">
+Resource grammar library documentation</a>.
+
 </li><li>
 <a href="../GF2.0/doc/gf2-highlights.html">
 Highlights</a> of Version 2.1 and 2.0 (in comparison with version 1.2).
@@ -31,7 +31,7 @@ gfdoc:
 	gfdoc ../italian/BeschIta.gf ; mv ../italian/BeschIta.html .

 	gfdoc ../spanish/ParadigmsSpa.gf ; mv ../spanish/ParadigmsSpa.html .
-#	gfdoc ../spanish/BasicSpa.gf ; mv ../spanish/BasicSpa.html .
+	gfdoc ../spanish/BasicSpa.gf ; mv ../spanish/BasicSpa.html .
 	gfdoc ../spanish/BeschSpa.gf ; mv ../spanish/BeschSpa.html .

 gifs: api lang scand low
@@ -5,6 +5,6 @@ abstract Animals = Questions ** {
  fun
    -- a lexicon of animals and actions among them
    Dog, Cat, Mouse, Lion, Zebra : Entity ;
-    Chase, Eat, Like : Action ;
+    Chase, Eat, See : Action ;
 }

@@ -11,5 +11,5 @@ concrete AnimalsEng of Animals = QuestionsEng **
    Zebra = regN "zebra" ;
    Chase = dirV2 (regV "chase") ;
    Eat = dirV2 eat_V ;
-    Like = dirV2 (regV "like") ;
+    See = dirV2 see_V ;
 }
@@ -11,5 +11,5 @@ concrete AnimalsFre of Animals = QuestionsFre **
    Zebra = regN "zèbre" masculine ;
    Chase = dirV2 (regV "chasser") ;
    Eat   = dirV2 (regV "manger") ;
-    Like  = dirV2 (regV "aimer") ;
+    See   = voir_V2 ;
 }
@@ -11,5 +11,5 @@ concrete AnimalsSwe of Animals = QuestionsSwe **
    Zebra = regN "zebra" utrum ;
    Chase = dirV2 (regV "jaga") ;
    Eat = dirV2 äta_V ;
-    Like = mkV2 (mk2V "tycka" "tycker") "om" ;
+    See = dirV2 se_V ;
 }
@@ -1,6 +1,6 @@
 --# -path=.:resource/abstract:resource/../prelude

-- Language-independent question grammar parametwized on Resource.
+-- Language-independent question grammar parametrized on Resource.

 incomplete concrete QuestionsI of Questions = open Resource in {
  lincat
@@ -9,9 +9,11 @@

 <p>

-Second Version, Gothenburg, 1 March 2005
+Third Version, 22 May 2005
 <br>
-First Draft, Gothenburg, 7 February 2005
+Second Version, 1 March 2005
+<br>
+First Draft, 7 February 2005

 </p><p>

@@ -31,7 +33,8 @@ A grammar formalism based on functional programming and type theory.

 <p>

-Designed to be nice for <i>ordinary programmers</i> to use.
+Designed to be nice for <i>ordinary programmers</i> to use: by this
+we mean programmers without training in linguistics.

 <p>

@@ -47,6 +50,7 @@ Thus <i>not</i> primarily another theoretical framework for
 linguists.


+
 <!-- NEW -->
 <h2>Multilingual grammars</h2>

@@ -90,6 +94,7 @@ wenn 2 ist gerade, dann 2+2 ist gerade<br>
 om 2 är jämnt, 2+2 är jämnt<br>
 </i>

+
 <!-- NEW -->
 <h2>Solving the difficulties</h2>

@@ -197,17 +202,15 @@ Where do we get the data from?
 <li> automatic extraction or hand-writing?
 <br>
 <li> reuse of existing resources?
-
-<p>
-
-Extra constraint: we want open-source free software.
-
+<br>
+Extra constraint: we want open-source free software and
+hence cannot use existing proprietary resources.




 <!-- NEW -->
-<h2>The scope of the resource grammar library</h2>
+<h2>The scope of a resource grammar library for a language</h2>

 All morphological paradigms

@@ -228,6 +231,7 @@ Currently,<br>



+
 <!-- NEW -->
 <h2>Success criteria</h2>

@@ -251,24 +255,33 @@ families, using the module system of GF.
 <!-- NEW -->
 <h2>These are not our success criteria</h2>

-Language coverage: you can parse all expressions. Example:
+Language coverage: to be able to parse all expressions.
+<br>
+Example:
 the French <i>passé simple</i> tense, although covered by the
-morhology, is not used in the language-independent API, but
-only the <i>passé composé</i> is.
+morphology, is not used in the language-independent API, but
+only the <i>passé composé</i> is. However, an application
+accessing the French-specific (or Romance-specific)
+modules can use the passé simple.

 <p>

-Semantic correctness
+Semantic correctness: only to produce meaningful expressions.
+<br>
+Example: the following sentences can be generated
 <pre>
  colourless green ideas sleep furiously

  the time is seventy past forty-two
 </pre>
+However, an applicatio grammar can use a domain-specific
+semantics to guarantee semantic well-formedness.

 <p>

 (Warning for linguists:) theoretical innovation in
-syntax (and it will all be hidden anyway!)
+syntax is not among the goals
+(and it would be hidden from users anyway!).



@@ -334,6 +347,7 @@ The current GF Resource Project covers ten languages:
 The first three letters (<tt>Dan</tt> etc) are used in grammar module names


+
 <!-- NEW -->
 <h2>Library structure 1: language-independent API</h2>

@@ -351,6 +365,7 @@ conjunctions, pronouns), e.g.
  and_Conj : Conj ;
 </pre>

+
 <!-- NEW -->
 <h2>Library structure 2: language-dependent modules</h2>

@@ -477,6 +492,8 @@ Alternative views on sentence formation:

 <a href="ParadigmsSpa.html">Spanish paradigms</a>
 <br>
+<a href="BasicSpa.html">example use of Spanish paradigms</a>
+<br>
 <a href="BeschSpa.html">Spanish verb conjugations</a>
 <p>

@@ -491,7 +508,7 @@ Alternative views on sentence formation:
 <!-- NEW -->
 <h2>Use as top-level grammar: testing</h2>

-Import a set of $LangX$ grammars:
+Import a set of <tt>LangX</tt> grammars:
 <pre>
  i english/LangEng.gf
  i swedish/LangSwe.gf
@@ -532,11 +549,14 @@ Import directly by <tt>open</tt>:
 <pre>
  concrete AppNor of App = open LangNor, ParadigmsNor in {...}
 </pre>
-No more dummy <tt>reuse</tt> modules and bulky <tt>.gfr</tt> files!
+(Note for the users of GF 2.1 and older:
+the dummy <tt>reuse</tt> modules and their bulky <tt>.gfr</tt> versions
+are no longer needed!)

 <p>

-If you need to convert resource category records to/from strings, use
+If you need to convert resource records to strings, and don't want to know
+the concrete type (as you never should), you can use
 <pre>
  Predef.toStr : (L : Type) -> L -> Str ; 
 </pre>
@@ -548,65 +568,99 @@ If you need to convert resource category records to/from strings, use



+
 <!-- NEW -->
 <h2>Use as library through parser</h2>

-Use the parser when developing a resource.
+You can use the parser with a <tt>LangX</tt> grammar
+when developing a resource.
+
+<p>
+
+Using the <tt>-v</tt> option shows if the parser fails because
+of unknown words.
 <pre>
  > p -cat=S -v "jag ska åka till Chalmers"
  unknown tokens [TS "åka",TS "Chalmers"]
-
+</pre>
+Then try to select words that <tt>LangX</tt> recognizes:
+<pre>
  > p -cat=S "jag ska gå till Danmark"
  UseCl (PosTP TFuture ASimul)
    (AdvCl (SPredV i_NP go_V)
    (AdvPP (PrepNP to_Prep (UsePN (PNCountry Denmark)))))
 </pre>
-Extend vocabulary at need.
+Use these API structures and extend vocabulary to match your need.
 <pre>
  åka_V = lexV "åker" ; 
  Chalmers = regPN "Chalmers" neutrum ;
 </pre>

+<!-- NEW -->
+<h2>Syntax editor as library browser</h2>
+
+You can run the syntax editor on <tt>LangX</tt> to
+find resource API functions through context-sensitive menus.
+For instance, the shell command
+<pre>
+  jgf LangEng.gf LangFre.gf
+</pre>
+opens the editor with English and French views. The
+<a href="http://www.cs.chalmers.se/%7Eaarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm">
+Editor User Manual</a> gives more information on the use of the editor.
+
+<p>
+
+A restriction of the editor is that it does not give access to
+<tt>ParadigmsX</tt> modules. An IDE environment extending the editor
+to a grammar programming tool is work in progress.
+
+
+

 <!-- NEW -->
 <h2>Example application: a small translation system</h2>

-You can say things like the following:
+In this system, you can express questions and answers of
+the following forms:
 <pre>
-  who chases mice ?
-  whom does the lion chase ?
-  the dog chases cats
+  Who chases mice ?
+  Whom does the lion chase ?
+  The dog chases cats.
 </pre>
-Source modules:
+We build the abstract syntax in two phases:
+<ul>
+<li> <a href=example/Questions.gf>Questions</a> defines question and
+  answer forms independently of domain
+<li> <a href=example/Animals.gf>Animals</a> defines a lexicon with
+  animals and things that animals do.
+</ul>

 <p>

-Abstract syntax:
-<a href=example/Questions.gf>Questions</a>,
-<a href=example/Animals.gf>Animals</a>
+The concrete syntax of English is built in three phases:
+<ul>
+<li> <a href="example/QuestionsI.gf">QuestionsI</a> is a parametrized module
+  using the API module <tt>Resource</tt>.
+<li> <a href="example/QuestionsEng.gf">QuestionsEng</a> is an instantiation
+  of the API with <tt>ResourceEng</tt>.
+<li> <a href="example/AnimalsEng.gf">AnimalsEng</a> is a concrete syntax
+  of <tt>Animals</tt> using <tt>ParadigmsEng</tt> and <tt>VerbsEng</tt>.
+</ul>

 <p>

-Concrete syntax of questions parametrized on the resource API:
-<a href=example/QuestionsI.gf>QuestionsI</a>
+The concrete syntax of Swedish is built upon <tt>QuestionsI</tt>
+in a similar way, with the modules
+<a href=example/QuestionsSwe.gf>QuestionsSwe</a> and.
+<a href=example/AnimalsSwe.gf>AnimalsSwe</a>.

 <p>

-English concrete syntax:
-<a href=example/QuestionsEng.gf>QuestionsEng</a>,
-<a href=example/AnimalsEng.gf>AnimalsEng</a>
+The concrete syntax of French consists similarly of the modules
+<a href=example/QuestionsFre.gf>QuestionsFre</a> and
+<a href=example/AnimalsFre.gf>AnimalsFre</a>.

-<p>
-
-French concrete syntax:
-<a href=example/QuestionsFre.gf>QuestionsFre</a>,
-<a href=example/AnimalsFre.gf>AnimalsFre</a>
-
-<p>
-
-Swedish concrete syntax:
-<a href=example/QuestionsSwe.gf>QuestionsSwe</a>,
-<a href=example/AnimalsSwe.gf>AnimalsSwe</a>



@@ -635,27 +689,13 @@ and you get an end-user grammar <tt>animals.gfcm</tt>.

 You can also write the commands in a <tt>gfs</tt> (<b>GF script</b>)
 file, say
-<a href=mkAnimals.gfc><tt>mkAnimals.gfs</tt></a>,
+<a href="example/mkAnimals.gfs"><tt>mkAnimals.gfs</tt></a>,
 and then call GF with
 <pre>
  gf &lt;mkAnimals.gfs
 </pre>


-<!-- NEW -->
-<h2>Further simplifications of the application grammar</h2>
-
-Step 1: use a simplified access to present-tense sentences,
-<tt>SentenceX</tt> (to be written...)
-
-<p>
-
-Step 2: factor out the categories and purely combinational
-rules into an <tt>incomplete</tt> module (to be shown... but
-this does not work for French, which uses different structures:
-e.g. <i>Qui aime les lions ?</i> with a definite phrase
-where English has <i>Who loves lions?</i>
-

 <!-- NEW -->
 <h2>Implementation details: the structure of low-level files</h2>
@@ -678,6 +718,7 @@ In two language families:
 </center>


+
 <!-- NEW -->
 <h2>Current status</h2>

@@ -701,6 +742,7 @@ X = implemented (few exceptions may occur)
 - = not implemented


+
 <!-- NEW -->
 <h2>Known bugs and limitations</h2>

@@ -737,10 +779,11 @@ some verbs in Basic should be reflexive
 Swedish      


+
 <!-- NEW -->
 <h2>Obtaining it</h2>

-Get the grammar package atDownload from
+Get the grammar package from
 <a href="http://sourceforge.net/project/showfiles.php?group_id=132285">
 GF Download Page</a>. The current libraries are in
 <tt>lib/resource</tt>. Version 0.6 is in
@@ -6,7 +6,7 @@ concrete StructuralFre of Structural =

 lin

-  UseNumeral n = {s = \\g => n.s !g ; n = n.n} ;
+  UseNumeral n = {s = \\g => n.s !g ; n = n.n ; isNo = False} ;

  above_Prep = {s = ["au dessus"] ; c = genitive} ;
  after_Prep = justPrep "après" ;
@@ -5,7 +5,7 @@ concrete StructuralIta of Structural = CategoriesIta, NumeralsIta **

 lin

-  UseNumeral n = {s = \\g => n.s !g ; n = n.n} ;
+  UseNumeral n = {s = \\g => n.s !g ; n = n.n ; isNo = False} ;

  above_Prep = justPrep "sopra" ;
  after_Prep = justPrep "dopo" ;
@@ -40,7 +40,7 @@ lincat
      -- = CommNoun ** {s2 : Preposition ; c : CaseA} ;
  N3     = Function ** {s3 : Preposition ; c3 : CaseA} ;
  Prep   = {s : Preposition ; c : CaseA} ; 
-  Num    = {s : Gender => Str ; n : Number} ;
+  Num    = {s : Gender => Str ; n : Number ; isNo : Bool} ;

  A      = Adjective ;
      -- = {s : AForm => Str ; p : Bool} ;
@@ -35,7 +35,7 @@ lin
  ModGenOne = npGenDet singular ;
  ModGenNum = npGenDetNum ;

-  UseInt i = {s = \\_ => i.s ; n = Pl} ; ---- n
+  UseInt i = {s = \\_ => i.s ; n = Pl ; isNo = False} ; ---- n
  NoNum = noNum ;

  UseA = adj2adjPhrase ;
@@ -60,9 +60,10 @@ oper
  pronNounPhrase : Pronoun -> NounPhrase = \pro -> pro ;

 -- Many determiners can be modified with numerals, which may be inflected in
-- gender.
+-- gender. The label $isNo$ is a hack used to force $des$ for plural
+-- indefinite with $noNum$.

-  Numeral : Type = {s : Gender => Str ; n : Number} ;
+  Numeral : Type = {s : Gender => Str ; n : Number ; isNo : Bool} ;

  pronWithNum : Pronoun -> Numeral -> Pronoun = \nous,deux ->
    {s = \\c => nous.s ! c ++ deux.s ! pgen2gen nous.g ; 
@@ -72,7 +73,7 @@ oper
     c = nous.c
    } ;

-  noNum : Numeral = {s = \\_ => [] ; n = Pl} ;
+  noNum : Numeral = {s = \\_ => [] ; n = Pl ; isNo = True} ;

 -- The existence construction "il y a", "c'è / ci sono" is defined separately,
 -- and ad hoc, in each language.
@@ -138,7 +139,11 @@ oper

  indefNounPhraseNum : Numeral -> CommNounPhrase -> NounPhrase = \nu,mec -> 
    normalNounPhrase 
-      (\\c => prepCase c ++ nu.s ! mec.g ++ mec.s ! nu.n)
+      (\\c => case nu.isNo of {
+                True => artIndef mec.g Pl c ++ mec.s ! Pl ;
+                _ => prepCase c ++ nu.s ! mec.g ++ mec.s ! nu.n
+                }
+      )
      mec.g 
      nu.n ; 

@@ -6,7 +6,7 @@ concrete StructuralSpa of Structural = CategoriesSpa, NumeralsSpa **
  
 lin

-  UseNumeral n = {s = \\g => n.s !g ; n = n.n} ;
+  UseNumeral n = {s = \\g => n.s !g ; n = n.n ; isNo = False} ;

  above_Prep = justPrep "sobre" ;
  after_Prep = {s = "después" ; c = genitive} ;