fixed SyntaxEng.fromAgr

This commit is contained in:
aarne
2005-11-03 10:31:27 +00:00
parent 26c1c12825
commit b59faa21df
4 changed files with 232 additions and 67 deletions

View File

@@ -2,6 +2,8 @@
abstract Animals = Questions ** {
flags startcat=Phrase ;
fun
-- a lexicon of animals and actions among them
Dog, Cat, Mouse, Lion, Zebra : Entity ;

View File

@@ -1,4 +1,4 @@
i -src AnimalsEng.gf ;; s
i -src AnimalsFre.gf ;; s
i -src AnimalsSwe.gf ;; s
i -ex AnimalsEng.gf ;; s
i AnimalsFre.gf ;; s
i AnimalsSwe.gf ;; s
pm | wf animals.gfcm

View File

@@ -5,15 +5,17 @@
<img src="gf-logo.gif">
<h1>GF Resource Grammar Library</h1>
<h1>The GF Resource Grammar Library</h1>
<p>
<font size=2>
Fourth Version, 2 November 2005.
<br>
Third Version, 22 May 2005. Completed 1 July.
<br>
Second Version, 1 March 2005
<br>
First Draft, 7 February 2005
</font>
</p><p>
@@ -29,16 +31,16 @@ Aarne Ranta
<!-- NEW -->
<h2>GF = Grammatical Framework</h2>
A grammar formalism based on functional programming and type theory.
GF is a grammar formalism based on functional programming and type theory.
<p>
Designed to be nice for <i>ordinary programmers</i> to use: by this
we mean programmers without training in linguistics.
GF was designed to be nice for <i>ordinary programmers</i> to use: by this
we mean programmers without training in linguistics.
<p>
Mission: to make natural-language applications available for
The mission of GF is to make natural-language applications available for
ordinary programmers, in tasks like
<ul>
<li> software documentation
@@ -46,7 +48,7 @@ ordinary programmers, in tasks like
<li> human-computer interaction
<li> dialogue systems
</ul>
Thus <i>not</i> primarily another theoretical framework for
Thus GF is <i>not</i> primarily another theoretical framework for
linguists.
@@ -54,10 +56,16 @@ linguists.
<!-- NEW -->
<h2>Multilingual grammars</h2>
A GF grammar consists of an abstract syntax and a set
of concrete syntaxes.
<p>
<b>Abstract syntax</b>: language-independent representation
<pre>
cat Prop ; Nat ;
fun Even : Nat -> Prop ;
fun NInt : Int -> Nat ;
</pre>
<b>Concrete syntax</b>: mapping from abstract syntax trees to strings in a language
(English, French, German, Swedish,...)
@@ -70,11 +78,18 @@ linguists.
lin Even x = {s = x.s ++ "är" ++ "jämnt"} ;
</pre>
We can <b>translate</b> between language via the abstract syntax.
We can <b>translate</b> between languages via the abstract syntax:
<pre>
4 is even 4 ist gerade
\ /
Even (NInt 4)
/ \
4 est pair 4 är jämnt
</pre>
<p>
Is it really so simple?
But is it really so simple?
<!-- NEW -->
@@ -93,6 +108,8 @@ la somme de 3 et de 5 est pair<br>
wenn 2 ist gerade, dann 2+2 ist gerade<br>
om 2 är jämnt, 2+2 är jämnt<br>
</i>
All these sentences are grammatically incorrect.
<!-- NEW -->
@@ -120,7 +137,10 @@ and <b>record types</b>. For instance, French:
}
} ;
</pre>
To learn more about these constructs, consult GF documentation, e.g. the
<a href="../../../doc/tutorial/01-gf-tutorial2.html">New Grammarian's Tutorial</a>.
However, in what follows we will show how to avoid learning them and
still write linguistically correct grammars.
<!-- NEW -->
@@ -131,7 +151,7 @@ theoretical knowledge about the language.
<p>
Which kind of a programmer is easier to find?
Which kind of a programmer is it easier to find?
<ul>
<li> one who can write a sorting algorithm
<li> one who can write a grammar for Swedish determiners
@@ -177,6 +197,10 @@ To use library functions for syntax and morphology:
<pre>
Even = predA (regA "jämn") ;
</pre>
For the French version, we write
<pre>
Even = predA (regA "pair") ;
</pre>
@@ -207,6 +231,44 @@ Extra constraint: we want open-source free software and
hence cannot use existing proprietary resources.
<!-- NEW -->
<h2>Answers to questions in grammar library design</h2>
The current GF resource grammar library has
made the following decisions:
<p>
The library has, for each language
<br>
<li> complete morphology, some lexicon (500 words), representative fragment of syntax,
very little semantics,
<p>
Organization and presentation:
<br>
<li> division into top-level (API) modules, and internal modules (only
interesting for resource implementors)
<br>
<li> the API is, as much as possible, common in different languages
<br>
<li> we favour "school grammar" concepts rather than innovative linguistic theory
<p>
Where do we get the data from?
<br>
<li> morphology and syntax are hand-written
<br>
<li> the 500-word lexicon is hand-written, but a tool is provided
for automatic lexicon extraction
<br>
<li> we have not reused existing resources
<br>
The resource grammar library is entirely
open-source free software (under GNU GPL license).
<!-- NEW -->
@@ -351,6 +413,11 @@ The first three letters (<tt>Dan</tt> etc) are used in grammar module names
<!-- NEW -->
<h2>Library structure 1: language-independent API</h2>
<li> <tt>Lang</tt> is the top module collecting all of the following.
<p>
<li> syntactic <tt>Categories</tt> (parts of speech, word classes), e.g.
<pre>
V ; NP ; CN ; Det ; -- verb, noun phrase, common noun, determiner
@@ -360,29 +427,44 @@ The first three letters (<tt>Dan</tt> etc) are used in grammar module names
DetNP : Det -> CN -> NP ; -- combine Det and CN into NP
</pre>
<li> the most common <tt>Structural</tt> words (determiners,
conjunctions, pronouns), e.g.
conjunctions, pronouns) (now 83), e.g.
<pre>
and_Conj : Conj ;
</pre
<li> <tt>Numerals</tt>, number words from 1 to 999,999 with their
inflections, e.g.
<pre>
n8 : Digit ;
</pre
<li> <tt>Basic</tt> lexicon of (now 218) frequent everyday words
<pre>
man_N : N ;
</pre>
<p>
In addition, and not included in <tt>Lang</tt>, there is
<li> <tt>SwadeshLex</tt>, lexicon of (now 206) words from the
<a href="http://en.wiktionary.org/wiki/Swadesh_List">Swadesh list</a>, e.g.
<pre>
squeeze_V : V ;
</pre>
Of course, there is some overlap between <tt>SwadeshLex</tt> and the other modules.
<!-- NEW -->
<h2>Library structure 2: language-dependent modules</h2>
<li> morphological <tt>Paradigms</tt>, e.g.
<li> morphological <tt>Paradigms</tt>, e.g. Swedish
<pre>
mkN : Str -> Str -> Str -> Str -> Gender -> N ; -- worst-case nouns
mkN : Str -> N ; -- regular nouns
</pre>
<li> irregular <tt>Verbs</tt>, e.g.
<li> (in some languages) irregular <tt>Verbs</tt>, e.g.
<pre>
angripa_V = irregV "angripa" "angrep" "angripit" ;
</pre>
<li> <tt>Lexicon</tt> of frequent words
<pre>
man_N = mkN "man" "mannen" "män" "männen" masculine ;
</pre>
<li> <tt>Ext</tt>ended syntax with language-specific rules
<li> (not yet available) <tt>Ext</tt>ended syntax with language-specific rules
<pre>
PassBli : V2 -> NP -> VP ; -- bli överkörd av ngn
</pre>
@@ -399,28 +481,20 @@ to implement the current API.
Reservations:
<ul>
<li> does not necessarily extend to all other languages
<li> does not necessarily cover the most idiomatic expressions
<li> this does not necessarily extend to all other languages
<li> this does not necessarily cover the most idiomatic expressions
of each language
<li> may not be the easiest API to implement (e.g. negation and
<li> this may not be the easiest API to implement (e.g. negation and
inversion with <i>do</i> in English suggest that some other
structure would be more natural)
<li> does not guarantee that same structure has the same semantics
in different languages
<li> it is not guaranteed that same structure has the same semantics
in all different languages
<p>
<!-- NEW -->
<h2>Library structure: language-independent API</h2>
<center>
<img src="Resource.gif">
</center>
<!-- NEW -->
<h2>Library structure: test bed for the language-independent API</h2>
<center>
<img src="Lang.gif">
</center>
@@ -435,7 +509,7 @@ in different languages
<a href="Rules.html">Rules</a>
<p>
Alternative views on sentence formation:
Two alternative views on sentence formation by predication:
<a href="Clause.html">Clause</a>,
<a href="Verbphrase.html">Verbphrase</a>
@@ -519,11 +593,27 @@ Import a set of <tt>LangX</tt> grammars:
i english/LangEng.gf
i swedish/LangSwe.gf
</pre>
Test with random generation, translation, morphological analysis...
Alternatively, you can <tt>make</tt> a precompiled package of
all the languages by using <tt>lib/resource/Makefile</tt>:
<pre>
make
gf langs.gfcm
</pre>
Then you can test with translation, random generation, morphological analysis...
<pre>
> p -lang=LangEng "I have loved her." | l -lang=LangFre
Je l' ai aimée.
> gr -cat=NP | l -multi
The sock
Strumpan
Strømpen
La media
La calza
La chaussette
Sukka
</pre>
<!-- NEW -->
<h2>Use as top-level grammar: language learning quizzes</h2>
@@ -586,14 +676,14 @@ when developing a resource.
Using the <tt>-v</tt> option shows if the parser fails because
of unknown words.
<pre>
> p -cat=S -v "jag ska åka till Chalmers"
> p -cat=S -v -lexer=words "jag ska åka till Chalmers"
unknown tokens [TS "åka",TS "Chalmers"]
</pre>
Then try to select words that <tt>LangX</tt> recognizes:
<pre>
> p -cat=S "jag ska till Danmark"
> p -cat=S "jag ska springa till Danmark"
UseCl (PosTP TFuture ASimul)
(AdvCl (SPredV i_NP go_V)
(AdvCl (SPredV i_NP run_V)
(AdvPP (PrepNP to_Prep (UsePN (PNCountry Denmark)))))
</pre>
Use these API structures and extend vocabulary to match your need.
@@ -609,7 +699,7 @@ You can run the syntax editor on <tt>LangX</tt> to
find resource API functions through context-sensitive menus.
For instance, the shell command
<pre>
jgf LangEng.gf LangFre.gf
gfeditor LangEng.gf LangFre.gf
</pre>
opens the editor with English and French views. The
<a href="http://www.cs.chalmers.se/%7Eaarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm">
@@ -682,42 +772,48 @@ for this set of grammars.
<p>
Just issue the following GF commands
<pre>
i -src AnimalsEng.gf ;; s
i -src AnimalsFre.gf ;; s
i -src AnimalsSwe.gf ;; s
pm | wf animals.gfcm
</pre>
and you get an end-user grammar <tt>animals.gfcm</tt>.
<p>
You can also write the commands in a <tt>gfs</tt> (<b>GF script</b>)
To produce an end-user multilingual grammar <tt>animals.gfcm</tt>,
write the sequence of compilation commands in a <tt>gfs</tt> (<b>GF script</b>)
file, say
<a href="example/mkAnimals.gfs"><tt>mkAnimals.gfs</tt></a>,
and then call GF with
<pre>
gf &lt;mkAnimals.gfs
</pre>
To try out the grammar,
<pre>
> i animals.gfcm
> gr | l -multi
vem jagar hundar ?
qui chasse des chiens ?
who chases dogs ?
</pre>
<!-- NEW -->
<h2>Grammar writing by examples</h2>
(New in GF 3/6/2005)
(New in GF 2.3)
<p>
You can use the resource grammar as a parser on a special file format,
<tt>.gfe</tt> ("GF examples"). Here is the new source,
<tt>.gfe</tt> ("GF examples"). Here is the real source,
<a href="example/QuestionsI.gfe">QuestionsI.gfe</a>, which
generates
<a href="example/QuestionsI.gf">QuestionsI.gf</a>,
when you execute the command
generated
<a href="example/QuestionsI.gf">QuestionsI.gf</a>.
when you executed the GF command
<pre>
gf -examples QuestionsI.gfe
i -ex AnimalsEng.gf
</pre>
Since <tt>QuestionsI</tt> is an incomplete module ("functor"),
it need only be built once. This is why only the first
command in <tt>mkAnimals.gfs</tt> needs the flag <tt>-ex</tt>.
<p>
Of course, the grammar of any language can be created by
parsing any language, as long as they have a common resource API.
The use of English resource is generally recommended, because it
@@ -792,6 +888,30 @@ If many substitutions are needed, semicolons are used as separators:
</pre>
<!-- NEW -->
<h2>Implementation details: low-level files</h2>
<b>For developers of resource grammars.</b>
The modules listed in this section should never be imported in application
grammars.
<p>
Each of the API implementations uses the following auxiliary resource modules:
<ul>
<li> <tt>Types</tt>, the morphological paradigms and word classes
<li> <tt>Morpho</tt>, inflection machinery
<li> <tt>Syntax</tt>, complex categories and their combinations
</ul>
In addition, the following language-independent modules from <tt>lib/prelude</tt>
are used.
<ul>
<li> <tt>Predef</tt>, operations whose definitions are hard-coded in GF
<li> <tt>Prelude</tt>, generic string and boolean operations
<li> <tt>Coordination</tt>, coordination structures for arbitrary categories
</ul>
<!-- NEW -->
<h2>Implementation details: the structure of low-level files</h2>
@@ -800,14 +920,53 @@ If many substitutions are needed, semicolons are used as separators:
</center>
<!-- NEW -->
<h2>How to change a resource grammar?</h2>
In many cases, the source of a bug is in one of
the low-level modules. Try to trace it back there
by starting from the high-level module.
<p>
(Much more to be written...)
<!-- NEW -->
<h2>How to write a resource grammar?</h2>
Start with a more limited goal, e.g. to implement
the <tt>stoneage</tt> grammar (<tt>examples/stoneage</tt>)
for your language.
<p>
For this, you need
<ul>
<li> most of <tt>Types</tt>
<li> most of <tt>Morpho</tt>
<li> some of <tt>Syntax</tt>
<li> most of <tt>Paradigms</tt>
</ul>
<p>
A useful command to test <tt>oper</tt>s:
<pre>
i -retain MorphoRot.gf
cc regNoun "foo"
</pre>
<!-- NEW -->
<h2>The use of parametrized modules</h2>
In two language families:
In two language families, a lot of code is shared.
<ul>
<li> Romance: French, Italian, Spanish
<li> Scandinavian: Danish, Norwegian, Swedish
</ul>
The structure looks like this.
<center>
<img src="Scand.gif">
</center>
@@ -850,11 +1009,10 @@ the previous page</i>.)
Danish
<p>
English:
missing uncontracted negations.
<p>
Finnish:
missing many nominal forms of verbs;
compiling the heuristic paradigms is slow;
the basic lexicon has some erroneous inflectional forms;
possessive and interrogative suffixes have no proper lexer.
<p>
French:
@@ -869,7 +1027,7 @@ some verbs in Basic should be reflexive;
bad forms of reflexive infinitives
<p>
Norwegian:
possessives <i>bilen min</i> not included
possessives of type <i>bilen min</i> not included
<p>
Russian
<p>
@@ -894,4 +1052,9 @@ GF Download Page</a>. The current libraries are in
<tt>lib/resource</tt>. Version 0.6 is in
<tt>lib/resource-0.6</tt>.
<p>
The very very latest version of GF and its libraries is in
<a href="http://www.cs.chalmers.se/~bringert/gf/downloads/snapshots/">Snapshots</a>.
</body></html>

View File

@@ -62,7 +62,7 @@ oper
case a of {
ASgP1 => {n = Sg ; p = P1 ; g = human} ;
ASgP2 => {n = Sg ; p = P2 ; g = human} ;
ASgP3 g => {n = Sg ; p = P1 ; g = g} ;
ASgP3 g => {n = Sg ; p = P3 ; g = g} ;
APl p => {n = Pl ; p = p ; g = human}
} ;