improved resource doc

This commit is contained in:
aarne
2005-05-22 18:43:00 +00:00
parent 60427e170c
commit e451bc03ba
16 changed files with 188 additions and 99 deletions

View File

@@ -51,7 +51,7 @@ It will guide you
<!-- NEW -->
<h3>Getting the GF program</h3>
The program is open-source free software, which you can download from the
The program is open-source free software, which you can download via the
GF Homepage:<br>
<a href="http://www.cs.chalmers.se/%7Eaarne/GF">
<tt>http://www.cs.chalmers.se/~aarne/GF</tt></a>
@@ -290,8 +290,10 @@ and so on.
<h4>The labelled context-free format</h4>
The <b>labelled context-free grammar</b> format permits user-defined
labels to each rule. GF recognizes files of this format by the suffix
<tt>.cf</tt>. Let us include the following rules in the file
labels to each rule.
GF recognizes files of this format by the suffix
<tt>.cf</tt>. It is intermediate between EBNF and full GF format.
Let us include the following rules in the file
<tt>paleolithic.cf</tt>.
<pre>
PredVP. S ::= NP VP ;
@@ -407,16 +409,20 @@ Rules in a GF grammar are called <b>judgements</b>, and the keywords
judgement forms:
<ul>
<li> abstract syntax
<ul>
<li> cat C
<li> fun f : A
</ul>
<p>
<table>
<tr> <td>form </td><td>reading </td></tr>
<tr> <td><tt>cat</tt> C</td><td>C is a category</td></tr>
<tr> <td><tt>fun</tt> f <tt>:</tt> A</td><td>f is a function of type A</td></tr>
</table>
<li> concrete syntax
<ul>
<li> lincat C = T
<li> lin f x ... y = t
<p>
<table>
<tr> <td>form </td><td>reading </td></tr>
<tr> <td><tt>lincat</tt> C <tt>=</tt> T</td><td>category C has linearization type T</td></tr>
<tr> <td><tt>lin</tt> f <tt>=</tt> t</td><td>function f has linearization t</td></tr>
</table>
</ul>
</ul>
We return to the precise meanings of these judgement forms later.
First we will look at how judgements are grouped into modules, and
show how the grammar <tt>paleolithic.cf</tt> is
@@ -436,10 +442,41 @@ module forms are
abstract syntax A, with judgements in the module body M.
</ul>
<!-- NEW -->
<h4>Record types, records, and <tt>Str</tt>s</h4>
The linearization type of a category is a <b>record type</b>, with
zero of more <b>fields</b> of different types. The simplest record
type used for linearization in GF is
<pre>
{s : Str}
</pre>
which has one field, with <b>label</b> <tt>s</tt> and type <tt>Str</tt>.
<p>
Examples of records of this type are
<pre>
[s = "foo"}
[s = "hello" ++ "world"}
</pre>
The type <tt>Str</tt> is really the type of <b>token lists</b>, but
most of the time one can conveniently think of it as the type of strings,
denoted by string literals in double quotes.
<p>
Whenever a record <tt>r</tt> of type <tt>{s : Str}</tt> is given,
<tt>r.s</tt> is an object of type <tt>Str</tt>. This is of course
a special case of the <b>projection</b> rule, allowing the extraction
of fields from a record.
<!-- NEW -->
<h4>An abstract syntax example</h4>
Each nonterminal occurring in <tt>paleolithic.cf</tt> is
Each nonterminal occurring in the grammar <tt>paleolithic.cf</tt> is
introduced by a <tt>cat</tt> judgement. Each
rule label is introduced by a <tt>fun</tt> judgement.
<pre>
@@ -520,11 +557,11 @@ Import <tt>PaleolithicEng.gf</tt> and try what happens
</pre>
The GF program does not only read the file
<tt>PaleolithicEng.gf</tt>, but also all other files that it
depends on - in this case, <tt>Paleolithic.gf</tt>.
depends on - in this case, <tt>Paleolithic.gf</tt>.
<p>
For each file that is compiles, a <tt>.gfc</tt> file
For each file that is compiled, a <tt>.gfc</tt> file
is generated. The GFC format (="GF Canonical") is the
"machine code" of GF, which is faster to process than
GF source files. When reading a module, GF knows whether
@@ -611,7 +648,7 @@ Translate by using a pipe:
<!-- NEW -->
<h4>Translation quiz</h4>
This is a simple kind of language exercises that can be automatically
This is a simple language exercise that can be automatically
generated from a multilingual grammar. The system generates a set of
random sentence, displays them in one language, and checks the user's
answer given in another language. The command <tt>translation_quiz = tq</tt>
@@ -706,7 +743,7 @@ only do "one thing" each, e.g.
fun Cep, Agaric : Mushroom ;
}
</pre>
They can afterwards be combined in bigger grammars by using
They can afterwards be combined into bigger grammars by using
<b>multiple inheritance</b>, i.e. extension of several grammars at the
same time:
<pre>
@@ -786,14 +823,14 @@ The introduction of plural forms requires two things:
</ul>
Different languages have different rules of inflection and agreement.
For instance, Italian has also agreement in gender (masculine vs. feminine).
We want to be able to ignore such differences in the abstract
syntax.
We want to express such special features of languages precisely in
concrete syntax while ignoring them in abstract syntax.
<p>
To be able to do all this, we need a couple of new judgement forms,
a new module form, and a more powerful way of expressing linearization
rules.
To be able to do all this, we need two new judgement forms,
a new module form, and a generalizarion of linearization types
from strings to more complex types.
<!-- NEW -->
@@ -1018,7 +1055,7 @@ these forms are explained in the following section.
The paradigms <tt>regNoun</tt> does not give the correct forms for
all nouns. For instance, <i>louse - lice</i> and
<i>fish - fish</i> must be given by using <tt>mkNoun</i>.
<i>fish - fish</i> must be given by using <tt>mkNoun</tt>.
Also the word <i>boy</i> would be inflected incorrectly; to prevent
this, either use <tt>mkNoun</tt> or modify
<tt>regNoun</tt> so that the <tt>"y"</tt> case does not
@@ -1165,7 +1202,7 @@ lin
<h4>Hierarchic parameter types</h4>
The reader familiar with a functional programming language such as
<a href="www.haskell.org">Haskell</a> must have noticed the similarity
<a href="http://www.haskell.org">Haskell</a> must have noticed the similarity
between parameter types in GF and algebraic datatypes (<tt>data</tt> definitions
in Haskell). The GF parameter types are actually a special case of algebraic
datatypes: the main restriction is that in GF, these types must be finite.

View File

@@ -150,7 +150,7 @@ of a multimodal dialogue system built with embedded grammars.
<p>
<a href="lib/resource/doc/gf-resource.html">Resource grammar library</a>:
<a href="lib/resource/doc/01-gf-resource.html">Resource grammar library</a>:
basic structures of ten languages
(Danish, English, Finnish, French, German,
Italian, Norwegian, Russian, Spanish, Swedish).
@@ -240,6 +240,10 @@ outdated).
<a href="doc/DocGF.pdf">
Language specification</a> of the GF grammar formalism.
</li><li>
<a href="lib/resource/doc/01-gf-resource.html">
Resource grammar library documentation</a>.
</li><li>
<a href="../GF2.0/doc/gf2-highlights.html">
Highlights</a> of Version 2.1 and 2.0 (in comparison with version 1.2).

View File

@@ -31,7 +31,7 @@ gfdoc:
gfdoc ../italian/BeschIta.gf ; mv ../italian/BeschIta.html .
gfdoc ../spanish/ParadigmsSpa.gf ; mv ../spanish/ParadigmsSpa.html .
# gfdoc ../spanish/BasicSpa.gf ; mv ../spanish/BasicSpa.html .
gfdoc ../spanish/BasicSpa.gf ; mv ../spanish/BasicSpa.html .
gfdoc ../spanish/BeschSpa.gf ; mv ../spanish/BeschSpa.html .
gifs: api lang scand low

View File

@@ -5,6 +5,6 @@ abstract Animals = Questions ** {
fun
-- a lexicon of animals and actions among them
Dog, Cat, Mouse, Lion, Zebra : Entity ;
Chase, Eat, Like : Action ;
Chase, Eat, See : Action ;
}

View File

@@ -11,5 +11,5 @@ concrete AnimalsEng of Animals = QuestionsEng **
Zebra = regN "zebra" ;
Chase = dirV2 (regV "chase") ;
Eat = dirV2 eat_V ;
Like = dirV2 (regV "like") ;
See = dirV2 see_V ;
}

View File

@@ -11,5 +11,5 @@ concrete AnimalsFre of Animals = QuestionsFre **
Zebra = regN "zèbre" masculine ;
Chase = dirV2 (regV "chasser") ;
Eat = dirV2 (regV "manger") ;
Like = dirV2 (regV "aimer") ;
See = voir_V2 ;
}

View File

@@ -11,5 +11,5 @@ concrete AnimalsSwe of Animals = QuestionsSwe **
Zebra = regN "zebra" utrum ;
Chase = dirV2 (regV "jaga") ;
Eat = dirV2 äta_V ;
Like = mkV2 (mk2V "tycka" "tycker") "om" ;
See = dirV2 se_V ;
}

View File

@@ -1,6 +1,6 @@
--# -path=.:resource/abstract:resource/../prelude
-- Language-independent question grammar parametwized on Resource.
-- Language-independent question grammar parametrized on Resource.
incomplete concrete QuestionsI of Questions = open Resource in {
lincat

View File

@@ -9,9 +9,11 @@
<p>
Second Version, Gothenburg, 1 March 2005
Third Version, 22 May 2005
<br>
First Draft, Gothenburg, 7 February 2005
Second Version, 1 March 2005
<br>
First Draft, 7 February 2005
</p><p>
@@ -31,7 +33,8 @@ A grammar formalism based on functional programming and type theory.
<p>
Designed to be nice for <i>ordinary programmers</i> to use.
Designed to be nice for <i>ordinary programmers</i> to use: by this
we mean programmers without training in linguistics.
<p>
@@ -47,6 +50,7 @@ Thus <i>not</i> primarily another theoretical framework for
linguists.
<!-- NEW -->
<h2>Multilingual grammars</h2>
@@ -90,6 +94,7 @@ wenn 2 ist gerade, dann 2+2 ist gerade<br>
om 2 är jämnt, 2+2 är jämnt<br>
</i>
<!-- NEW -->
<h2>Solving the difficulties</h2>
@@ -197,17 +202,15 @@ Where do we get the data from?
<li> automatic extraction or hand-writing?
<br>
<li> reuse of existing resources?
<p>
Extra constraint: we want open-source free software.
<br>
Extra constraint: we want open-source free software and
hence cannot use existing proprietary resources.
<!-- NEW -->
<h2>The scope of the resource grammar library</h2>
<h2>The scope of a resource grammar library for a language</h2>
All morphological paradigms
@@ -228,6 +231,7 @@ Currently,<br>
<!-- NEW -->
<h2>Success criteria</h2>
@@ -251,24 +255,33 @@ families, using the module system of GF.
<!-- NEW -->
<h2>These are not our success criteria</h2>
Language coverage: you can parse all expressions. Example:
Language coverage: to be able to parse all expressions.
<br>
Example:
the French <i>passé simple</i> tense, although covered by the
morhology, is not used in the language-independent API, but
only the <i>passé composé</i> is.
morphology, is not used in the language-independent API, but
only the <i>passé composé</i> is. However, an application
accessing the French-specific (or Romance-specific)
modules can use the passé simple.
<p>
Semantic correctness
Semantic correctness: only to produce meaningful expressions.
<br>
Example: the following sentences can be generated
<pre>
colourless green ideas sleep furiously
the time is seventy past forty-two
</pre>
However, an applicatio grammar can use a domain-specific
semantics to guarantee semantic well-formedness.
<p>
(Warning for linguists:) theoretical innovation in
syntax (and it will all be hidden anyway!)
syntax is not among the goals
(and it would be hidden from users anyway!).
@@ -334,6 +347,7 @@ The current GF Resource Project covers ten languages:
The first three letters (<tt>Dan</tt> etc) are used in grammar module names
<!-- NEW -->
<h2>Library structure 1: language-independent API</h2>
@@ -351,6 +365,7 @@ conjunctions, pronouns), e.g.
and_Conj : Conj ;
</pre>
<!-- NEW -->
<h2>Library structure 2: language-dependent modules</h2>
@@ -477,6 +492,8 @@ Alternative views on sentence formation:
<a href="ParadigmsSpa.html">Spanish paradigms</a>
<br>
<a href="BasicSpa.html">example use of Spanish paradigms</a>
<br>
<a href="BeschSpa.html">Spanish verb conjugations</a>
<p>
@@ -491,7 +508,7 @@ Alternative views on sentence formation:
<!-- NEW -->
<h2>Use as top-level grammar: testing</h2>
Import a set of $LangX$ grammars:
Import a set of <tt>LangX</tt> grammars:
<pre>
i english/LangEng.gf
i swedish/LangSwe.gf
@@ -532,11 +549,14 @@ Import directly by <tt>open</tt>:
<pre>
concrete AppNor of App = open LangNor, ParadigmsNor in {...}
</pre>
No more dummy <tt>reuse</tt> modules and bulky <tt>.gfr</tt> files!
(Note for the users of GF 2.1 and older:
the dummy <tt>reuse</tt> modules and their bulky <tt>.gfr</tt> versions
are no longer needed!)
<p>
If you need to convert resource category records to/from strings, use
If you need to convert resource records to strings, and don't want to know
the concrete type (as you never should), you can use
<pre>
Predef.toStr : (L : Type) -> L -> Str ;
</pre>
@@ -548,65 +568,99 @@ If you need to convert resource category records to/from strings, use
<!-- NEW -->
<h2>Use as library through parser</h2>
Use the parser when developing a resource.
You can use the parser with a <tt>LangX</tt> grammar
when developing a resource.
<p>
Using the <tt>-v</tt> option shows if the parser fails because
of unknown words.
<pre>
> p -cat=S -v "jag ska åka till Chalmers"
unknown tokens [TS "åka",TS "Chalmers"]
</pre>
Then try to select words that <tt>LangX</tt> recognizes:
<pre>
> p -cat=S "jag ska gå till Danmark"
UseCl (PosTP TFuture ASimul)
(AdvCl (SPredV i_NP go_V)
(AdvPP (PrepNP to_Prep (UsePN (PNCountry Denmark)))))
</pre>
Extend vocabulary at need.
Use these API structures and extend vocabulary to match your need.
<pre>
åka_V = lexV "åker" ;
Chalmers = regPN "Chalmers" neutrum ;
</pre>
<!-- NEW -->
<h2>Syntax editor as library browser</h2>
You can run the syntax editor on <tt>LangX</tt> to
find resource API functions through context-sensitive menus.
For instance, the shell command
<pre>
jgf LangEng.gf LangFre.gf
</pre>
opens the editor with English and French views. The
<a href="http://www.cs.chalmers.se/%7Eaarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm">
Editor User Manual</a> gives more information on the use of the editor.
<p>
A restriction of the editor is that it does not give access to
<tt>ParadigmsX</tt> modules. An IDE environment extending the editor
to a grammar programming tool is work in progress.
<!-- NEW -->
<h2>Example application: a small translation system</h2>
You can say things like the following:
In this system, you can express questions and answers of
the following forms:
<pre>
who chases mice ?
whom does the lion chase ?
the dog chases cats
Who chases mice ?
Whom does the lion chase ?
The dog chases cats.
</pre>
Source modules:
We build the abstract syntax in two phases:
<ul>
<li> <a href=example/Questions.gf>Questions</a> defines question and
answer forms independently of domain
<li> <a href=example/Animals.gf>Animals</a> defines a lexicon with
animals and things that animals do.
</ul>
<p>
Abstract syntax:
<a href=example/Questions.gf>Questions</a>,
<a href=example/Animals.gf>Animals</a>
The concrete syntax of English is built in three phases:
<ul>
<li> <a href="example/QuestionsI.gf">QuestionsI</a> is a parametrized module
using the API module <tt>Resource</tt>.
<li> <a href="example/QuestionsEng.gf">QuestionsEng</a> is an instantiation
of the API with <tt>ResourceEng</tt>.
<li> <a href="example/AnimalsEng.gf">AnimalsEng</a> is a concrete syntax
of <tt>Animals</tt> using <tt>ParadigmsEng</tt> and <tt>VerbsEng</tt>.
</ul>
<p>
Concrete syntax of questions parametrized on the resource API:
<a href=example/QuestionsI.gf>QuestionsI</a>
The concrete syntax of Swedish is built upon <tt>QuestionsI</tt>
in a similar way, with the modules
<a href=example/QuestionsSwe.gf>QuestionsSwe</a> and.
<a href=example/AnimalsSwe.gf>AnimalsSwe</a>.
<p>
English concrete syntax:
<a href=example/QuestionsEng.gf>QuestionsEng</a>,
<a href=example/AnimalsEng.gf>AnimalsEng</a>
The concrete syntax of French consists similarly of the modules
<a href=example/QuestionsFre.gf>QuestionsFre</a> and
<a href=example/AnimalsFre.gf>AnimalsFre</a>.
<p>
French concrete syntax:
<a href=example/QuestionsFre.gf>QuestionsFre</a>,
<a href=example/AnimalsFre.gf>AnimalsFre</a>
<p>
Swedish concrete syntax:
<a href=example/QuestionsSwe.gf>QuestionsSwe</a>,
<a href=example/AnimalsSwe.gf>AnimalsSwe</a>
@@ -635,27 +689,13 @@ and you get an end-user grammar <tt>animals.gfcm</tt>.
You can also write the commands in a <tt>gfs</tt> (<b>GF script</b>)
file, say
<a href=mkAnimals.gfc><tt>mkAnimals.gfs</tt></a>,
<a href="example/mkAnimals.gfs"><tt>mkAnimals.gfs</tt></a>,
and then call GF with
<pre>
gf &lt;mkAnimals.gfs
</pre>
<!-- NEW -->
<h2>Further simplifications of the application grammar</h2>
Step 1: use a simplified access to present-tense sentences,
<tt>SentenceX</tt> (to be written...)
<p>
Step 2: factor out the categories and purely combinational
rules into an <tt>incomplete</tt> module (to be shown... but
this does not work for French, which uses different structures:
e.g. <i>Qui aime les lions ?</i> with a definite phrase
where English has <i>Who loves lions?</i>
<!-- NEW -->
<h2>Implementation details: the structure of low-level files</h2>
@@ -678,6 +718,7 @@ In two language families:
</center>
<!-- NEW -->
<h2>Current status</h2>
@@ -701,6 +742,7 @@ X = implemented (few exceptions may occur)
- = not implemented
<!-- NEW -->
<h2>Known bugs and limitations</h2>
@@ -737,10 +779,11 @@ some verbs in Basic should be reflexive
Swedish
<!-- NEW -->
<h2>Obtaining it</h2>
Get the grammar package atDownload from
Get the grammar package from
<a href="http://sourceforge.net/project/showfiles.php?group_id=132285">
GF Download Page</a>. The current libraries are in
<tt>lib/resource</tt>. Version 0.6 is in

View File

@@ -6,7 +6,7 @@ concrete StructuralFre of Structural =
lin
UseNumeral n = {s = \\g => n.s !g ; n = n.n} ;
UseNumeral n = {s = \\g => n.s !g ; n = n.n ; isNo = False} ;
above_Prep = {s = ["au dessus"] ; c = genitive} ;
after_Prep = justPrep "après" ;

View File

@@ -5,7 +5,7 @@ concrete StructuralIta of Structural = CategoriesIta, NumeralsIta **
lin
UseNumeral n = {s = \\g => n.s !g ; n = n.n} ;
UseNumeral n = {s = \\g => n.s !g ; n = n.n ; isNo = False} ;
above_Prep = justPrep "sopra" ;
after_Prep = justPrep "dopo" ;

View File

@@ -40,7 +40,7 @@ lincat
-- = CommNoun ** {s2 : Preposition ; c : CaseA} ;
N3 = Function ** {s3 : Preposition ; c3 : CaseA} ;
Prep = {s : Preposition ; c : CaseA} ;
Num = {s : Gender => Str ; n : Number} ;
Num = {s : Gender => Str ; n : Number ; isNo : Bool} ;
A = Adjective ;
-- = {s : AForm => Str ; p : Bool} ;

View File

@@ -35,7 +35,7 @@ lin
ModGenOne = npGenDet singular ;
ModGenNum = npGenDetNum ;
UseInt i = {s = \\_ => i.s ; n = Pl} ; ---- n
UseInt i = {s = \\_ => i.s ; n = Pl ; isNo = False} ; ---- n
NoNum = noNum ;
UseA = adj2adjPhrase ;

View File

@@ -60,9 +60,10 @@ oper
pronNounPhrase : Pronoun -> NounPhrase = \pro -> pro ;
-- Many determiners can be modified with numerals, which may be inflected in
-- gender.
-- gender. The label $isNo$ is a hack used to force $des$ for plural
-- indefinite with $noNum$.
Numeral : Type = {s : Gender => Str ; n : Number} ;
Numeral : Type = {s : Gender => Str ; n : Number ; isNo : Bool} ;
pronWithNum : Pronoun -> Numeral -> Pronoun = \nous,deux ->
{s = \\c => nous.s ! c ++ deux.s ! pgen2gen nous.g ;
@@ -72,7 +73,7 @@ oper
c = nous.c
} ;
noNum : Numeral = {s = \\_ => [] ; n = Pl} ;
noNum : Numeral = {s = \\_ => [] ; n = Pl ; isNo = True} ;
-- The existence construction "il y a", "c'è / ci sono" is defined separately,
-- and ad hoc, in each language.
@@ -138,7 +139,11 @@ oper
indefNounPhraseNum : Numeral -> CommNounPhrase -> NounPhrase = \nu,mec ->
normalNounPhrase
(\\c => prepCase c ++ nu.s ! mec.g ++ mec.s ! nu.n)
(\\c => case nu.isNo of {
True => artIndef mec.g Pl c ++ mec.s ! Pl ;
_ => prepCase c ++ nu.s ! mec.g ++ mec.s ! nu.n
}
)
mec.g
nu.n ;

View File

@@ -6,7 +6,7 @@ concrete StructuralSpa of Structural = CategoriesSpa, NumeralsSpa **
lin
UseNumeral n = {s = \\g => n.s !g ; n = n.n} ;
UseNumeral n = {s = \\g => n.s !g ; n = n.n ; isNo = False} ;
above_Prep = justPrep "sobre" ;
after_Prep = {s = "después" ; c = genitive} ;