fixed SyntaxEng.fromAgr

2026-04-23 19:42:50 -06:00 · 2005-11-03 10:31:27 +00:00
parent 26c1c12825
commit b59faa21df
4 changed files with 232 additions and 67 deletions
--- a/lib/resource/doc/example/Animals.gf
+++ b/lib/resource/doc/example/Animals.gf
@@ -2,6 +2,8 @@
 abstract Animals = Questions ** {
  flags startcat=Phrase ;
  fun
    -- a lexicon of animals and actions among them
    Dog, Cat, Mouse, Lion, Zebra : Entity ;
--- a/lib/resource/doc/example/mkAnimals.gfs
+++ b/lib/resource/doc/example/mkAnimals.gfs
@@ -1,4 +1,4 @@
-  i -src AnimalsEng.gf ;; s
+  i -ex AnimalsEng.gf ;; s
-  i -src AnimalsFre.gf ;; s
+  i AnimalsFre.gf ;; s
-  i -src AnimalsSwe.gf ;; s
+  i AnimalsSwe.gf ;; s
  pm | wf animals.gfcm
--- a/lib/resource/doc/gf-resource.html
+++ b/lib/resource/doc/gf-resource.html
@@ -5,15 +5,17 @@
 <img src="gf-logo.gif">
-<h1>GF Resource Grammar Library</h1>
+<h1>The GF Resource Grammar Library</h1>
 <p>
 <font size=2>
 Fourth Version, 2 November 2005.
 <br>
 Third Version, 22 May 2005. Completed 1 July.
 <br>
 Second Version, 1 March 2005
 <br>
 First Draft, 7 February 2005
 </font>
 </p><p>
@@ -29,16 +31,16 @@ Aarne Ranta
 <!-- NEW -->
 <h2>GF = Grammatical Framework</h2>
-A grammar formalism based on functional programming and type theory.
+GF is a grammar formalism based on functional programming and type theory.
 <p>
-Designed to be nice for <i>ordinary programmers</i> to use: by this
+GF was designed to be nice for <i>ordinary programmers</i> to use: by this
-we mean programmers without training in linguistics.
+we mean programmers without training in linguistics. 
 <p>
-Mission: to make natural-language applications available for
+The mission of GF is to make natural-language applications available for
 ordinary programmers, in tasks like
 <ul>
 <li> software documentation
@@ -46,7 +48,7 @@ ordinary programmers, in tasks like
 <li> human-computer interaction
 <li> dialogue systems
 </ul>
-Thus <i>not</i> primarily another theoretical framework for
+Thus GF is <i>not</i> primarily another theoretical framework for
 linguists.
@@ -54,10 +56,16 @@ linguists.
 <!-- NEW -->
 <h2>Multilingual grammars</h2>
 A GF grammar consists of an abstract syntax and a set
 of concrete syntaxes.
 <p>
 <b>Abstract syntax</b>: language-independent representation
 <pre>
  cat Prop ; Nat ;
  fun Even : Nat -> Prop ;
  fun NInt : Int -> Nat ;
 </pre>
 <b>Concrete syntax</b>: mapping from abstract syntax trees to strings in a language
 (English, French, German, Swedish,...)
@@ -70,11 +78,18 @@ linguists.
  lin Even x = {s = x.s ++ "är" ++ "jämnt"} ;
 </pre>
-We can <b>translate</b> between language via the abstract syntax.
+We can <b>translate</b> between languages via the abstract syntax:
 <pre>
  4 is even                  4 ist gerade
             \              /
               Even (NInt 4)
             /              \
  4 est pair                  4 är jämnt
 </pre>
 <p>
-Is it really so simple?
+But is it really so simple?
 <!-- NEW -->
@@ -93,6 +108,8 @@ la somme de 3 et de 5 est pair<br>
 wenn 2 ist gerade, dann 2+2 ist gerade<br>
 om 2 är jämnt, 2+2 är jämnt<br>
 </i>
 All these sentences are grammatically incorrect.
 <!-- NEW -->
@@ -120,7 +137,10 @@ and <b>record types</b>. For instance, French:
      }
    } ;
 </pre>
-
+To learn more about these constructs, consult GF documentation, e.g. the
 <a href="../../../doc/tutorial/01-gf-tutorial2.html">New Grammarian's Tutorial</a>.
 However, in what follows we will show how to avoid learning them and
 still write linguistically correct grammars.
 <!-- NEW -->
@@ -131,7 +151,7 @@ theoretical knowledge about the language.
 <p>
-Which kind of a programmer is easier to find?
+Which kind of a programmer is it easier to find?
 <ul>
 <li> one who can write a sorting algorithm 
 <li> one who can write a grammar for Swedish determiners
@@ -177,6 +197,10 @@ To use library functions for syntax and morphology:
 <pre>
  Even = predA (regA "jämn") ;
 </pre>
 For the French version, we write
 <pre>
  Even = predA (regA "pair") ;
 </pre>
@@ -207,6 +231,44 @@ Extra constraint: we want open-source free software and
 hence cannot use existing proprietary resources.
 <!-- NEW -->
 <h2>Answers to questions in grammar library design</h2>
 The current GF resource grammar library has
 made the following decisions:
 <p>
 The library has, for each language
 <br>
 <li> complete morphology, some lexicon (500 words), representative fragment of syntax,
 very little semantics,
 <p>
 Organization and presentation:
 <br>
 <li> division into top-level (API) modules, and internal modules (only
 interesting for resource implementors)
 <br>
 <li> the API is, as much as possible, common in different languages
 <br>
 <li> we favour "school grammar" concepts rather than innovative linguistic theory
 <p>
 Where do we get the data from?
 <br>
 <li> morphology and syntax are hand-written
 <br>
 <li> the 500-word lexicon is hand-written, but a tool is provided
     for automatic lexicon extraction
 <br>
 <li> we have not reused existing resources
 <br>
 The resource grammar library is entirely
 open-source free software (under GNU GPL license).
 <!-- NEW -->
@@ -351,6 +413,11 @@ The first three letters (<tt>Dan</tt> etc) are used in grammar module names
 <!-- NEW -->
 <h2>Library structure 1: language-independent API</h2>
 <li> <tt>Lang</tt> is the top module collecting all of the following.
 <p>
 <li> syntactic <tt>Categories</tt> (parts of speech, word classes), e.g.
 <pre>
  V ; NP ; CN ; Det ;  -- verb, noun phrase, common noun, determiner
@@ -360,29 +427,44 @@ The first three letters (<tt>Dan</tt> etc) are used in grammar module names
  DetNP : Det -> CN -> NP ; -- combine Det and CN into NP
 </pre>
 <li> the most common <tt>Structural</tt> words (determiners,
-conjunctions, pronouns), e.g.
+conjunctions, pronouns) (now 83), e.g.
 <pre>
  and_Conj : Conj ;
 </pre
 <li> <tt>Numerals</tt>, number words from 1 to 999,999 with their
 inflections, e.g.
 <pre>
  n8 : Digit ;
 </pre
 <li> <tt>Basic</tt> lexicon of (now 218) frequent everyday words
 <pre>
  man_N : N ;
 </pre>
 <p>
 In addition, and not included in <tt>Lang</tt>, there is
 <li> <tt>SwadeshLex</tt>, lexicon of (now 206) words from the
 <a href="http://en.wiktionary.org/wiki/Swadesh_List">Swadesh list</a>, e.g.
 <pre>
  squeeze_V : V ;
 </pre>
 Of course, there is some overlap between <tt>SwadeshLex</tt> and the other modules.
 <!-- NEW -->
 <h2>Library structure 2: language-dependent modules</h2>
-<li> morphological <tt>Paradigms</tt>, e.g.
+<li> morphological <tt>Paradigms</tt>, e.g. Swedish
 <pre>
  mkN : Str -> Str -> Str -> Str -> Gender -> N ; -- worst-case nouns
  mkN : Str -> N ;                                -- regular nouns
 </pre>
-<li> irregular <tt>Verbs</tt>, e.g.
+<li> (in some languages) irregular <tt>Verbs</tt>, e.g.
 <pre>
  angripa_V = irregV "angripa" "angrep" "angripit" ;
 </pre>
-<li> <tt>Lexicon</tt> of frequent words
+<li> (not yet available) <tt>Ext</tt>ended syntax with language-specific rules
 <pre>
  man_N = mkN "man" "mannen" "män" "männen" masculine ;
 </pre>
 <li> <tt>Ext</tt>ended syntax with language-specific rules
 <pre>
  PassBli : V2 -> NP -> VP ;  -- bli överkörd av ngn
 </pre>
@@ -399,28 +481,20 @@ to implement the current API.
 Reservations:
 <ul>
-<li> does not necessarily extend to all other languages
+<li> this does not necessarily extend to all other languages
-<li> does not necessarily cover the most idiomatic expressions
+<li> this does not necessarily cover the most idiomatic expressions
     of each language
-<li> may not be the easiest API to implement (e.g. negation and
+<li> this may not be the easiest API to implement (e.g. negation and
 inversion with  <i>do</i> in English suggest that some other
 structure would be more natural)
-<li> does not guarantee that same structure has the same semantics
+<li> it is not guaranteed that same structure has the same semantics
-in different languages
+in all different languages
 <p>
 <!-- NEW -->
 <h2>Library structure: language-independent API</h2>
 <center>
 <img src="Resource.gif">
 </center>
 <!-- NEW -->
 <h2>Library structure: test bed for the language-independent API</h2>
 <center>
 <img src="Lang.gif">
 </center>
@@ -435,7 +509,7 @@ in different languages
 <a href="Rules.html">Rules</a>
 <p>
-Alternative views on sentence formation:
+Two alternative views on sentence formation by predication:
 <a href="Clause.html">Clause</a>,
 <a href="Verbphrase.html">Verbphrase</a>
@@ -519,11 +593,27 @@ Import a set of <tt>LangX</tt> grammars:
  i english/LangEng.gf
  i swedish/LangSwe.gf
 </pre>
-Test with random generation, translation, morphological analysis...
+Alternatively, you can <tt>make</tt> a precompiled package of
 all the languages by using <tt>lib/resource/Makefile</tt>:
 <pre>
-
+  make
-
+  gf langs.gfcm
 </pre>
 Then you can test with translation, random generation, morphological analysis...
 <pre>
  > p -lang=LangEng "I have loved her." | l -lang=LangFre
  Je l' ai aimée.
  > gr -cat=NP | l -multi
  The sock
  Strumpan
  Strømpen
  La media
  La calza
  La chaussette
  Sukka
 </pre>
 <!-- NEW -->
 <h2>Use as top-level grammar: language learning quizzes</h2>
@@ -586,14 +676,14 @@ when developing a resource.
 Using the <tt>-v</tt> option shows if the parser fails because
 of unknown words.
 <pre>
-  > p -cat=S -v "jag ska åka till Chalmers"
+  > p -cat=S -v -lexer=words "jag ska åka till Chalmers"
  unknown tokens [TS "åka",TS "Chalmers"]
 </pre>
 Then try to select words that <tt>LangX</tt> recognizes:
 <pre>
-  > p -cat=S "jag ska gå till Danmark"
+  > p -cat=S "jag ska springa till Danmark"
  UseCl (PosTP TFuture ASimul)
-    (AdvCl (SPredV i_NP go_V)
+    (AdvCl (SPredV i_NP run_V)
    (AdvPP (PrepNP to_Prep (UsePN (PNCountry Denmark)))))
 </pre>
 Use these API structures and extend vocabulary to match your need.
@@ -609,7 +699,7 @@ You can run the syntax editor on <tt>LangX</tt> to
 find resource API functions through context-sensitive menus.
 For instance, the shell command
 <pre>
-  jgf LangEng.gf LangFre.gf
+  gfeditor LangEng.gf LangFre.gf
 </pre>
 opens the editor with English and French views. The
 <a href="http://www.cs.chalmers.se/%7Eaarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm">
@@ -682,42 +772,48 @@ for this set of grammars.
 <p>
-Just issue the following GF commands 
+To produce an end-user multilingual grammar <tt>animals.gfcm</tt>,
-<pre>
+write the sequence of compilation commands in a <tt>gfs</tt> (<b>GF script</b>)
  i -src AnimalsEng.gf ;; s
  i -src AnimalsFre.gf ;; s
  i -src AnimalsSwe.gf ;; s
  pm | wf animals.gfcm
 </pre>
 and you get an end-user grammar <tt>animals.gfcm</tt>.
 <p>
 You can also write the commands in a <tt>gfs</tt> (<b>GF script</b>)
 file, say
 <a href="example/mkAnimals.gfs"><tt>mkAnimals.gfs</tt></a>,
 and then call GF with
 <pre>
  gf &lt;mkAnimals.gfs
 </pre>
 To try out the grammar,
 <pre>
  > i animals.gfcm
  > gr | l -multi
  vem jagar hundar ?
  qui chasse des chiens ?
  who chases dogs ?
 </pre>
 <!-- NEW -->
 <h2>Grammar writing by examples</h2>
-(New in GF 3/6/2005)
+(New in GF 2.3)
 <p>
 You can use the resource grammar as a parser on a special file format,
-<tt>.gfe</tt> ("GF examples"). Here is the new source,
+<tt>.gfe</tt> ("GF examples"). Here is the real source,
 <a href="example/QuestionsI.gfe">QuestionsI.gfe</a>, which
-generates
+generated
-<a href="example/QuestionsI.gf">QuestionsI.gf</a>,
+<a href="example/QuestionsI.gf">QuestionsI.gf</a>.
-when you execute the command
+when you executed the GF command
 <pre>
-  gf -examples QuestionsI.gfe
+  i -ex AnimalsEng.gf
 </pre>
 Since <tt>QuestionsI</tt> is an incomplete module ("functor"),
 it need only be built once. This is why only the first
 command in <tt>mkAnimals.gfs</tt> needs the flag <tt>-ex</tt>.
 <p>
 Of course, the grammar of any language can be created by
 parsing any language, as long as they have a common resource API.
 The use of English resource is generally recommended, because it
@@ -792,6 +888,30 @@ If many substitutions are needed, semicolons are used as separators:
 </pre>
 <!-- NEW -->
 <h2>Implementation details: low-level files</h2>
 <b>For developers of resource grammars.</b>
 The modules listed in this section should never be imported in application
 grammars.
 <p>
 Each of the API implementations uses the following auxiliary resource modules:
 <ul>
 <li> <tt>Types</tt>, the morphological paradigms and word classes
 <li> <tt>Morpho</tt>, inflection machinery
 <li> <tt>Syntax</tt>, complex categories and their combinations
 </ul>
 In addition, the following language-independent modules from <tt>lib/prelude</tt>
 are used.
 <ul>
 <li> <tt>Predef</tt>, operations whose definitions are hard-coded in GF
 <li> <tt>Prelude</tt>, generic string and boolean operations
 <li> <tt>Coordination</tt>, coordination structures for arbitrary categories
 </ul>
 <!-- NEW -->
 <h2>Implementation details: the structure of low-level files</h2>
@@ -800,14 +920,53 @@ If many substitutions are needed, semicolons are used as separators:
 </center>
 <!-- NEW -->
 <h2>How to change a resource grammar?</h2>
 In many cases, the source of a bug is in one of
 the low-level modules. Try to trace it back there
 by starting from the high-level module.
 <p>
 (Much more to be written...)
 <!-- NEW -->
 <h2>How to write a resource grammar?</h2>
 Start with a more limited goal, e.g. to implement
 the <tt>stoneage</tt> grammar (<tt>examples/stoneage</tt>)
 for your language.
 <p>
 For this, you need
 <ul>
 <li> most of <tt>Types</tt>
 <li> most of <tt>Morpho</tt>
 <li> some of <tt>Syntax</tt>
 <li> most of <tt>Paradigms</tt>
 </ul>
 <p>
 A useful command to test <tt>oper</tt>s:
 <pre>
  i -retain MorphoRot.gf
  cc regNoun "foo"
 </pre>
 <!-- NEW -->
 <h2>The use of parametrized modules</h2>
-In two language families:
+In two language families, a lot of code is shared.
 <ul>
 <li> Romance: French, Italian, Spanish
 <li> Scandinavian: Danish, Norwegian, Swedish
 </ul>
 The structure looks like this.
 <center>
 <img src="Scand.gif">
 </center>
@@ -850,11 +1009,10 @@ the previous page</i>.)
 Danish            
 <p>
 English:      
 missing uncontracted negations.
 <p>
 Finnish:
 missing many nominal forms of verbs;
 compiling the heuristic paradigms is slow;
 the basic lexicon has some erroneous inflectional forms;
 possessive and interrogative suffixes have no proper lexer.
 <p>
 French:
@@ -869,7 +1027,7 @@ some verbs in Basic should be reflexive;
 bad forms of reflexive infinitives
 <p>
 Norwegian:
-possessives <i>bilen min</i> not included
+possessives of type <i>bilen min</i> not included
 <p>
 Russian
 <p>
@@ -894,4 +1052,9 @@ GF Download Page</a>. The current libraries are in
 <tt>lib/resource</tt>. Version 0.6 is in
 <tt>lib/resource-0.6</tt>.
 <p>
 The very very latest version of GF and its libraries is in
 <a href="http://www.cs.chalmers.se/~bringert/gf/downloads/snapshots/">Snapshots</a>.
 </body></html>
--- a/lib/resource/english/SyntaxEng.gf
+++ b/lib/resource/english/SyntaxEng.gf
@@ -62,7 +62,7 @@ oper
      case a of {
        ASgP1   => {n = Sg ; p = P1 ; g = human} ;
        ASgP2   => {n = Sg ; p = P2 ; g = human} ;
-        ASgP3 g => {n = Sg ; p = P1 ; g = g} ;
+        ASgP3 g => {n = Sg ; p = P3 ; g = g} ;
        APl   p => {n = Pl ; p = p ; g = human}
        } ;