mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-12 22:39:31 -06:00
698 lines
14 KiB
HTML
698 lines
14 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
|
|
<html><head><title></title></head>
|
|
<body bgcolor="#ffffff" text="#000000">
|
|
<center>
|
|
|
|
<img src="gf-logo.gif">
|
|
|
|
<h1>GF Resource Grammar Library</h1>
|
|
|
|
<p>
|
|
|
|
Second Version, Gothenburg, 1 March 2005
|
|
<br>
|
|
First Draft, Gothenburg, 7 February 2005
|
|
|
|
</p><p>
|
|
|
|
Aarne Ranta
|
|
|
|
</p><p>
|
|
|
|
<tt>aarne@cs.chalmers.se</tt>
|
|
</p></center>
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>GF = Grammatical Framework</h2>
|
|
|
|
A grammar formalism based on functional programming and type theory.
|
|
|
|
<p>
|
|
|
|
Designed to be nice for <i>ordinary programmers</i> to use.
|
|
|
|
<p>
|
|
|
|
Mission: to make natural-language applications available for
|
|
ordinary programmers, in tasks like
|
|
<ul>
|
|
<li> software documentation
|
|
<li> domain-specific translation
|
|
<li> human-computer interaction
|
|
<li> dialogue systems
|
|
</ul>
|
|
Thus <i>not</i> primarily another theoretical framework for
|
|
linguists.
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Multilingual grammars</h2>
|
|
|
|
<b>Abstract syntax</b>: language-independent representation
|
|
<pre>
|
|
cat Prop ; Nat ;
|
|
fun Even : Nat -> Prop ;
|
|
</pre>
|
|
<b>Concrete syntax</b>: mapping from abstract syntax trees to strings in a language
|
|
(English, French, German, Swedish,...)
|
|
<pre>
|
|
lin Even x = {s = x.s ++ "is" ++ "even"} ;
|
|
|
|
lin Even x = {s = x.s ++ "est" ++ "pair"} ;
|
|
|
|
lin Even x = {s = x.s ++ "ist" ++ "gerade"} ;
|
|
|
|
lin Even x = {s = x.s ++ "är" ++ "jämnt"} ;
|
|
</pre>
|
|
We can <b>translate</b> between language via the abstract syntax.
|
|
|
|
<p>
|
|
|
|
Is it really so simple?
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Difficulties with concrete syntax</h2>
|
|
|
|
Most languages have rules of <b>inflection</b>, <b>agreement</b>,
|
|
and <b>word order</b>, which have to be obeyed when putting together
|
|
expressions.
|
|
|
|
<p>
|
|
|
|
The previous multilingual grammar breaks these rules in many situations:
|
|
<p><i>
|
|
2 and 3 is even<br>
|
|
la somme de 3 et de 5 est pair<br>
|
|
wenn 2 ist gerade, dann 2+2 ist gerade<br>
|
|
om 2 är jämnt, 2+2 är jämnt<br>
|
|
</i>
|
|
|
|
<!-- NEW -->
|
|
<h2>Solving the difficulties</h2>
|
|
|
|
GF has tools for expressing the linguistic rules that are needed to
|
|
produce correct translations in different languages.
|
|
|
|
<p>
|
|
|
|
Instead of just strings, we need <p>parameters</b>, <b>tables</b>,
|
|
and <b>record types</b>. For instance, French:
|
|
<pre>
|
|
param Mod = Ind | Subj ;
|
|
param Gen = Masc | Fem ;
|
|
|
|
lincat Nat = {s : Str ; g : Gen} ;
|
|
lincat Prop = {s : Mod => Str} ;
|
|
|
|
lin Even x = {s =
|
|
table {
|
|
m => x.s ++
|
|
case m of {Ind => "est" ; Subj => "soit"} ++
|
|
case x.g of {Masc => "pair" ; Fem => "paire"}
|
|
}
|
|
} ;
|
|
</pre>
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Language + Libraries</h2>
|
|
|
|
Writing natural language grammars still requires
|
|
theoretical knowledge about the language.
|
|
|
|
<p>
|
|
|
|
Which kind of a programmer is easier to find?
|
|
<ul>
|
|
<li> one who can write a sorting algorithm
|
|
<li> one who can write a grammar for Swedish determiners
|
|
</ul>
|
|
|
|
<p>
|
|
|
|
In main-stream programming, sorting algorithms are not
|
|
written by hand but taken from <b>libraries</b>.
|
|
|
|
<p>
|
|
|
|
In the same way, we want to create grammar libraries that encapsulate
|
|
basic linguistic facts.
|
|
|
|
<p>
|
|
|
|
Cf. the Java success story: the language is just a half of the
|
|
success - libraries are another half.
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Example of library-based grammar writing</h2>
|
|
|
|
To define a Swedish expression of a mathematical predicate from scratch:
|
|
<pre>
|
|
Even x =
|
|
let jämn = case <x.n,x.g> of {
|
|
<Sg,Utr> => "jämn" ;
|
|
<Sg,Neutr> => "jämnt" ;
|
|
<Pl,_> => "jämna"
|
|
}
|
|
in
|
|
{s = table {
|
|
Main => x.s ! Nom ++ "är" ++ jämn ;
|
|
Inv => "är" ++ x.s ! Nom ++ jämn ;
|
|
Sub => x.s ! Nom ++ "är" ++ jämn
|
|
}
|
|
}
|
|
</pre>
|
|
To use library functions for syntax and morphology:
|
|
<pre>
|
|
Even = predA (regA "jämn") ;
|
|
</pre>
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Questions in grammar library design</h2>
|
|
|
|
What should there be in the library?
|
|
<br>
|
|
<li> morphology, lexicon, syntax, semantics,...
|
|
|
|
<p>
|
|
|
|
How do we organize and present the library?
|
|
<br>
|
|
<li> division into modules, level of granularity
|
|
<br>
|
|
<li> "school grammar" vs. sophisticated linguistic concepts
|
|
|
|
<p>
|
|
|
|
Where do we get the data from?
|
|
<br>
|
|
<li> automatic extraction or hand-writing?
|
|
<br>
|
|
<li> reuse of existing resources?
|
|
|
|
<p>
|
|
|
|
Extra constraint: we want open-source free software.
|
|
|
|
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>The scope of the resource grammar library</h2>
|
|
|
|
All morphological paradigms
|
|
|
|
<p>
|
|
|
|
Basic lexicon of structural, common, and irregular words
|
|
|
|
<p>
|
|
|
|
Basic syntactic structures
|
|
|
|
<p>
|
|
|
|
Currently,<br>
|
|
<li> <i>no</i> semantics,<br>
|
|
<li> <i>no</i> language-specific structures if not necessary for expressivity.
|
|
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Success criteria</h2>
|
|
|
|
Grammatical correctness
|
|
|
|
<p>
|
|
|
|
Semantic coverage: you can express whatever you want.
|
|
|
|
<p>
|
|
|
|
Usability as library for non-linguists.
|
|
|
|
<p>
|
|
|
|
(Bonus for linguists:) nice generalizations w.r.t. language
|
|
families, using the module system of GF.
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>These are not our success criteria</h2>
|
|
|
|
Language coverage: you can parse all expressions. Example:
|
|
the French <i>passé simple</i> tense, although covered by the
|
|
morhology, is not used in the language-independent API, but
|
|
only the <i>passé composé</i> is.
|
|
|
|
<p>
|
|
|
|
Semantic correctness
|
|
<pre>
|
|
colourless green ideas sleep furiously
|
|
|
|
the time is seventy past forty-two
|
|
</pre>
|
|
|
|
<p>
|
|
|
|
(Warning for linguists:) theoretical innovation in
|
|
syntax (and it will all be hidden anyway!)
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>So where is semantics?</h2>
|
|
|
|
GF incorporates a <b>Logical Framework</b> and is therefore
|
|
capable of expressing logical semantics <i>à la</i> Montague
|
|
or any other flavour, including anaphora and discourse.
|
|
|
|
<p>
|
|
|
|
But we do <i>not</i> try to give semantics once and
|
|
for all for the whole language.
|
|
|
|
<p>
|
|
|
|
Instead, we expect semantics to be given in
|
|
<b>application grammars</b> built on semantic models
|
|
of different domains.
|
|
|
|
<p>
|
|
|
|
Example application: number theory
|
|
<pre>
|
|
fun Even : Nat -> Prop ; -- a mathematical predicate
|
|
|
|
lin Even = predA (regA "even") ; -- English translation
|
|
lin Even = predA (regA "pair") ; -- French translation
|
|
lin Even = predA (regA "jämn") ; -- Swedish translation
|
|
</pre>
|
|
How could the resource predict that just <i>these</i>
|
|
translations are correct in this domain?
|
|
|
|
<p>
|
|
|
|
Application grammars are built by experts of these domains
|
|
who - thanks to resource grammars - do no more need to be
|
|
experts in linguistics.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Languages</h2>
|
|
|
|
The current GF Resource Project covers ten languages:
|
|
<ul>
|
|
<li><tt>Dan</tt>ish
|
|
<li><tt>Eng</tt>lish
|
|
<li><tt>Fin</tt>nish
|
|
<li><tt>Fre</tt>nch
|
|
<li><tt>Ger</tt>man
|
|
<li><tt>Ita</tt>lian
|
|
<li><tt>Nor</tt>wegian
|
|
<li><tt>Rus</tt>sian
|
|
<li><tt>Spa</tt>nish
|
|
<li><tt>Swe</tt>dish
|
|
</ul>
|
|
The first three letters (<tt>Dan</tt> etc) are used in grammar module names
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Library structure 1: language-independent API</h2>
|
|
|
|
<li> syntactic <tt>Categories</tt> (parts of speech, word classes), e.g.
|
|
<pre>
|
|
V ; NP ; CN ; Det ; -- verb, noun phrase, common noun, determiner
|
|
</pre>
|
|
<li> <tt>Rules</tt> for combining words and phrases, e.g.
|
|
<pre>
|
|
DetNP : Det -> CN -> NP ; -- combine Det and CN into NP
|
|
</pre>
|
|
<li> the most common <tt>Structural</tt> words (determiners,
|
|
conjunctions, pronouns), e.g.
|
|
<pre>
|
|
and_Conj : Conj ;
|
|
</pre>
|
|
|
|
<!-- NEW -->
|
|
<h2>Library structure 2: language-dependent modules</h2>
|
|
|
|
<li> morphological <tt>Paradigms</tt>, e.g.
|
|
<pre>
|
|
mkN : Str -> Str -> Str -> Str -> Gender -> N ; -- worst-case nouns
|
|
mkN : Str -> N ; -- regular nouns
|
|
</pre>
|
|
<li> irregular <tt>Verbs</tt>, e.g.
|
|
<pre>
|
|
angripa_V = irregV "angripa" "angrep" "angripit" ;
|
|
</pre>
|
|
<li> <tt>Lexicon</tt> of frequent words
|
|
<pre>
|
|
man_N = mkN "man" "mannen" "män" "männen" masculine ;
|
|
</pre>
|
|
<li> <tt>Ext</tt>ended syntax with language-specific rules
|
|
<pre>
|
|
PassBli : V2 -> NP -> VP ; -- bli överkörd av ngn
|
|
</pre>
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>How much can be language-independent?</h2>
|
|
|
|
For the ten languages we have considered, it <i>is</i> possible
|
|
to implement the current API.
|
|
|
|
<p>
|
|
|
|
Reservations:
|
|
<ul>
|
|
<li> does not necessarily extend to all other languages
|
|
<li> does not necessarily cover the most idiomatic expressions
|
|
of each language
|
|
<li> may not be the easiest API to implement (e.g. negation and
|
|
inversion with <i>do</i> in English suggest that some other
|
|
structure would be more natural)
|
|
<li> does not guarantee that same structure has the same semantics
|
|
in different languages
|
|
<p>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Library structure: language-independent API</h2>
|
|
|
|
<center>
|
|
<img src="Resource.gif">
|
|
</center>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Library structure: test bed for the language-independent API</h2>
|
|
|
|
<center>
|
|
<img src="Lang.gif">
|
|
</center>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>API documentation</h2>
|
|
|
|
<a href="Categories.html">Categories</a>
|
|
|
|
<p>
|
|
<a href="Rules.html">Rules</a>
|
|
|
|
<p>
|
|
Alternative views on sentence formation:
|
|
<a href="Clause.html">Clause</a>,
|
|
<a href="Verbphrase.html">Verbphrase</a>
|
|
|
|
<p>
|
|
<a href="Structural.html">Structural</a>
|
|
|
|
<p>
|
|
|
|
<a href="Time.html">Time</a>
|
|
|
|
<p>
|
|
<a href="Basic.html">Basic</a>
|
|
|
|
<p>
|
|
|
|
<a href="Lang.html">Lang</a>
|
|
|
|
<p>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Paradigms documentation</h2>
|
|
|
|
<a href="ParadigmsEng.html">English paradigms</a>
|
|
<br>
|
|
<a href="BasicEng.html">example use of English oaradigms</a>
|
|
<br>
|
|
<a href="VerbsEng.html">English verbs</a>
|
|
|
|
<p>
|
|
|
|
<a href="ParadigmsFre.html">French paradigms</a>
|
|
<br>
|
|
<a href="BasicFre.html">example use of French paradigms</a>
|
|
<br>
|
|
<a href="VerbsFre.html">French verbs</a>
|
|
|
|
<p>
|
|
|
|
<a href="ParadigmsIta.html">Italian paradigms</a>
|
|
<br>
|
|
<a href="BasicIta.html">example use of Italian paradigms</a>
|
|
<br>
|
|
<a href="BeschIta.html">Italian verb conjugations</a>
|
|
|
|
<p>
|
|
|
|
<a href="ParadigmsNor.html">Norwegian paradigms</a>
|
|
<br>
|
|
<a href="BasicNor.html">example use of Norwegian paradigms</a>
|
|
<br>
|
|
<a href="VerbsNor.html">Norwegian verbs</a>
|
|
<p>
|
|
|
|
<a href="ParadigmsSpa.html">Spanish paradigms</a>
|
|
<br>
|
|
<a href="BeschSpa.html">Spanish verb conjugations</a>
|
|
<p>
|
|
|
|
<a href="ParadigmsSwe.html">Swedish paradigms</a>
|
|
<br>
|
|
<a href="BasicSwe.html">example use of Swedish paradigms</a>
|
|
<br>
|
|
<a href="VerbsSwe.html">Swedish verbs</a>
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Use as top-level grammar: testing</h2>
|
|
|
|
Import a set of $LangX$ grammars:
|
|
<pre>
|
|
i english/LangEng.gf
|
|
i swedish/LangSwe.gf
|
|
</pre>
|
|
Test with random generation, translation, morphological analysis...
|
|
<pre>
|
|
|
|
|
|
</pre>
|
|
|
|
<!-- NEW -->
|
|
<h2>Use as top-level grammar: language learning quizzes</h2>
|
|
|
|
Morpho quiz with words:
|
|
<pre>
|
|
i
|
|
|
|
</pre>
|
|
Morpho quiz with phrases:
|
|
<pre>
|
|
|
|
|
|
</pre>
|
|
Translation quiz with sentences:
|
|
<pre>
|
|
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Use as library</h2>
|
|
|
|
Import directly by <tt>open</tt>:
|
|
<pre>
|
|
concrete AppNor of App = open LangNor, ParadigmsNor in {...}
|
|
</pre>
|
|
No more dummy <tt>reuse</tt> modules and bulky <tt>.gfr</tt> files!
|
|
|
|
<p>
|
|
|
|
If you need to convert resource category records to/from strings, use
|
|
<pre>
|
|
Predef.toStr : (L : Type) -> L -> Str ;
|
|
</pre>
|
|
<tt>L</tt> must be a linearization type. For instance,
|
|
<pre>
|
|
toStr LangNor.CN (ModAP (PositADeg old_ADeg) (UseN car_N))
|
|
---> "gammel bil"
|
|
</pre>
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Use as library through parser</h2>
|
|
|
|
Use the parser when developing a resource.
|
|
<pre>
|
|
> p -cat=S -v "jag ska åka till Chalmers"
|
|
unknown tokens [TS "åka",TS "Chalmers"]
|
|
|
|
> p -cat=S "jag ska gå till Danmark"
|
|
UseCl (PosTP TFuture ASimul)
|
|
(AdvCl (SPredV i_NP go_V)
|
|
(AdvPP (PrepNP to_Prep (UsePN (PNCountry Denmark)))))
|
|
</pre>
|
|
Extend vocabulary at need.
|
|
<pre>
|
|
åka_V = lexV "åker" ;
|
|
Chalmers = regPN "Chalmers" neutrum ;
|
|
</pre>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Example application: a small translation system</h2>
|
|
|
|
You can say things like the following:
|
|
<pre>
|
|
who chases mice ?
|
|
whom does the lion chase ?
|
|
the dog chases cats
|
|
</pre>
|
|
Source modules:
|
|
|
|
<p>
|
|
|
|
<a href=example/Animals.gf>Animals</a>
|
|
|
|
<p>
|
|
|
|
<a href=example/AnimalsEng.gf>AnimalsEng</a>
|
|
|
|
<p>
|
|
|
|
<a href=example/AnimalsFre.gf>AnimalsFre</a>
|
|
|
|
<p>
|
|
|
|
<a href=example/AnimalsSwe.gf>AnimalsSwe</a>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Compiling the example application</h2>
|
|
|
|
The resources are bulky, and it takes a therefore a lot of
|
|
time and memory to load the grammars. However, they can be
|
|
compiled into the <tt>gfcm</tt>
|
|
(<b>GF canonical multilingual</b>) format,
|
|
which is almost one thousand times smaller and faster to load
|
|
for this set of grammars.
|
|
|
|
<p>
|
|
|
|
Just issue the following GF commands
|
|
<pre>
|
|
i -src AnimalsEng.gf ;; s
|
|
i -src AnimalsFre.gf ;; s
|
|
i -src AnimalsSwe.gf ;; s
|
|
pm | wf animals.gfcm
|
|
</pre>
|
|
and you get an end-user grammar <tt>animals.gfcm</tt>.
|
|
|
|
<p>
|
|
|
|
You can also write the commands in a <tt>gfs</tt> (<b>GF script</b>)
|
|
file, say
|
|
<a href=mkAnimals.gfc><tt>mkAnimals.gfs</tt></a>,
|
|
and then call GF with
|
|
<pre>
|
|
gf <mkAnimals.gfs
|
|
</pre>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Further simplifications of the application grammar</h2>
|
|
|
|
Step 1: use a simplified access to present-tense sentences,
|
|
<tt>SentenceX</tt> (to be written...)
|
|
|
|
<p>
|
|
|
|
Step 2: factor out the categories and purely combinational
|
|
rules into an <tt>incomplete</tt> module (to be shown... but
|
|
this does not work for French, which uses different structures:
|
|
e.g. <i>Qui aime les lions ?</i> with a definite phrase
|
|
where English has <i>Who loves lions?</i>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Implementation details: the structure of low-level files</h2>
|
|
|
|
<center>
|
|
<img src="Low.gif">
|
|
</center>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>The use of parametrized modules</h2>
|
|
|
|
In two language families:
|
|
<ul>
|
|
<li> Romance: French, Italian, Spanish
|
|
<li> Scandinavian: Danish, Norwegian, Swedish
|
|
</ul>
|
|
<center>
|
|
<img src="Scand.gif">
|
|
</center>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Current status</h2>
|
|
|
|
<table border=1>
|
|
<tr><td>Language</td> <td>v0.6</td> <td>API</td> <td>Paradigms</td> <td>Basic lex</td> <td>Verbs</td></tr>
|
|
<tr><td>Danish</td> <td> </td> <td>X</td> <td> </td> <td> </td> <td> </tr>
|
|
<tr><td>English</td> <td>X</td> <td>X</td> <td>X</td> <td>X</td> <td>X</tr>
|
|
<tr><td>Finnish</td> <td>X</td> <td> </td> <td> </td> <td> </td> <td> </tr>
|
|
<tr><td>French</td> <td>X</td> <td>X</td> <td>X</td> <td>X</td> <td>X</tr>
|
|
<tr><td>German</td> <td>X</td> <td> </td> <td>*</td> <td> </td> <td> </tr>
|
|
<tr><td>Italian</td> <td>X</td> <td>X</td> <td>X</td> <td>X</td> <td>X</tr>
|
|
<tr><td>Norwegian</td> <td> </td> <td>X</td> <td>X</td> <td>X</td> <td>X</tr>
|
|
<tr><td>Russian</td> <td>X</td> <td>*</td> <td>*</td> <td> </td> <td> </tr>
|
|
<tr><td>Spanish</td> <td> </td> <td>X</td> <td>X</td> <td> </td> <td>X</tr>
|
|
<tr><td>Swedish</td> <td>X</td> <td>X</td> <td>X</td> <td>X</td> <td>X</tr>
|
|
</table>
|
|
|
|
<!-- NEW -->
|
|
<h2>Obtaining it</h2>
|
|
|
|
Now on CVS at Chalmers:
|
|
<pre>
|
|
cvs -d /users/cs/aarne/cvs checkout GF2.0/lib
|
|
</pre>
|
|
|
|
<p>
|
|
|
|
To appear later at GF Homepage:<p>
|
|
|
|
<a href="http://www.cs.chalmers.se/%7Eaarne/GF">
|
|
<tt>http://www.cs.chalmers.se/~aarne/GF</tt></a>
|
|
</p></body></html>
|