mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-10 13:29:32 -06:00
1074 lines
25 KiB
HTML
1074 lines
25 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
|
|
<html><head><title></title></head>
|
|
<body bgcolor="#ffffff" text="#000000">
|
|
<center>
|
|
|
|
<img src="gf-logo.gif">
|
|
|
|
<h1>The GF Resource Grammar Library</h1>
|
|
|
|
<p>
|
|
|
|
<font size=2>
|
|
|
|
|
|
Fifth Version, 18 January 2006 (some notes on v 1.0).
|
|
<br>
|
|
Fourth Version, 2 November 2005.
|
|
<br>
|
|
Third Version, 22 May 2005. Completed 1 July.
|
|
Second Version, 1 March 2005
|
|
First Draft, 7 February 2005
|
|
</font>
|
|
|
|
</p><p>
|
|
|
|
Aarne Ranta
|
|
|
|
</p><p>
|
|
|
|
<tt>aarne@cs.chalmers.se</tt>
|
|
</p></center>
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>GF = Grammatical Framework</h2>
|
|
|
|
GF is a grammar formalism based on functional programming and type theory.
|
|
|
|
<p>
|
|
|
|
GF was designed to be nice for <i>ordinary programmers</i> to use: by this
|
|
we mean programmers without training in linguistics.
|
|
|
|
<p>
|
|
|
|
The mission of GF is to make natural-language applications available for
|
|
ordinary programmers, in tasks like
|
|
<ul>
|
|
<li> software documentation
|
|
<li> domain-specific translation
|
|
<li> human-computer interaction
|
|
<li> dialogue systems
|
|
</ul>
|
|
Thus GF is <i>not</i> primarily another theoretical framework for
|
|
linguists.
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Multilingual grammars</h2>
|
|
|
|
A GF grammar consists of an abstract syntax and a set
|
|
of concrete syntaxes.
|
|
|
|
<p>
|
|
|
|
<b>Abstract syntax</b>: language-independent representation
|
|
<pre>
|
|
cat Prop ; Nat ;
|
|
fun Even : Nat -> Prop ;
|
|
fun NInt : Int -> Nat ;
|
|
</pre>
|
|
<b>Concrete syntax</b>: mapping from abstract syntax trees to strings in a language
|
|
(English, French, German, Swedish,...)
|
|
<pre>
|
|
lin Even x = {s = x.s ++ "is" ++ "even"} ;
|
|
|
|
lin Even x = {s = x.s ++ "est" ++ "pair"} ;
|
|
|
|
lin Even x = {s = x.s ++ "ist" ++ "gerade"} ;
|
|
|
|
lin Even x = {s = x.s ++ "är" ++ "jämnt"} ;
|
|
</pre>
|
|
We can <b>translate</b> between languages via the abstract syntax:
|
|
<pre>
|
|
4 is even 4 ist gerade
|
|
\ /
|
|
Even (NInt 4)
|
|
/ \
|
|
4 est pair 4 är jämnt
|
|
</pre>
|
|
|
|
<p>
|
|
|
|
But is it really so simple?
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Difficulties with concrete syntax</h2>
|
|
|
|
Most languages have rules of <b>inflection</b>, <b>agreement</b>,
|
|
and <b>word order</b>, which have to be obeyed when putting together
|
|
expressions.
|
|
|
|
<p>
|
|
|
|
The previous multilingual grammar breaks these rules in many situations:
|
|
<p><i>
|
|
2 and 3 is even<br>
|
|
la somme de 3 et de 5 est pair<br>
|
|
wenn 2 ist gerade, dann 2+2 ist gerade<br>
|
|
om 2 är jämnt, 2+2 är jämnt<br>
|
|
</i>
|
|
All these sentences are grammatically incorrect.
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Solving the difficulties</h2>
|
|
|
|
GF has tools for expressing the linguistic rules that are needed to
|
|
produce correct translations in different languages.
|
|
|
|
<p>
|
|
|
|
Instead of just strings, we need <p>parameters</b>, <b>tables</b>,
|
|
and <b>record types</b>. For instance, French:
|
|
<pre>
|
|
param Mod = Ind | Subj ;
|
|
param Gen = Masc | Fem ;
|
|
|
|
lincat Nat = {s : Str ; g : Gen} ;
|
|
lincat Prop = {s : Mod => Str} ;
|
|
|
|
lin Even x = {s =
|
|
table {
|
|
m => x.s ++
|
|
case m of {Ind => "est" ; Subj => "soit"} ++
|
|
case x.g of {Masc => "pair" ; Fem => "paire"}
|
|
}
|
|
} ;
|
|
</pre>
|
|
To learn more about these constructs, consult GF documentation, e.g. the
|
|
<a href="../../../doc/tutorial/gf-tutorial2.html">New Grammarian's Tutorial</a>.
|
|
However, in what follows we will show how to avoid learning them and
|
|
still write linguistically correct grammars.
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Language + Libraries</h2>
|
|
|
|
Writing natural language grammars still requires
|
|
theoretical knowledge about the language.
|
|
|
|
<p>
|
|
|
|
Which kind of a programmer is it easier to find?
|
|
<ul>
|
|
<li> one who can write a sorting algorithm
|
|
<li> one who can write a grammar for Swedish determiners
|
|
</ul>
|
|
|
|
<p>
|
|
|
|
In main-stream programming, sorting algorithms are not
|
|
written by hand but taken from <b>libraries</b>.
|
|
|
|
<p>
|
|
|
|
In the same way, we want to create grammar libraries that encapsulate
|
|
basic linguistic facts.
|
|
|
|
<p>
|
|
|
|
Cf. the Java success story: the language is just a half of the
|
|
success - libraries are another half.
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Example of library-based grammar writing</h2>
|
|
|
|
To define a Swedish expression of a mathematical predicate from scratch:
|
|
<pre>
|
|
Even x =
|
|
let jämn = case <x.n,x.g> of {
|
|
<Sg,Utr> => "jämn" ;
|
|
<Sg,Neutr> => "jämnt" ;
|
|
<Pl,_> => "jämna"
|
|
}
|
|
in
|
|
{s = table {
|
|
Main => x.s ! Nom ++ "är" ++ jämn ;
|
|
Inv => "är" ++ x.s ! Nom ++ jämn ;
|
|
Sub => x.s ! Nom ++ "är" ++ jämn
|
|
}
|
|
}
|
|
</pre>
|
|
To use library functions for syntax and morphology:
|
|
<pre>
|
|
Even = predA (regA "jämn") ;
|
|
</pre>
|
|
For the French version, we write
|
|
<pre>
|
|
Even = predA (regA "pair") ;
|
|
</pre>
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Questions in grammar library design</h2>
|
|
|
|
What should there be in the library?
|
|
<br>
|
|
<li> morphology, lexicon, syntax, semantics,...
|
|
|
|
<p>
|
|
|
|
How do we organize and present the library?
|
|
<br>
|
|
<li> division into modules, level of granularity
|
|
<br>
|
|
<li> "school grammar" vs. sophisticated linguistic concepts
|
|
|
|
<p>
|
|
|
|
Where do we get the data from?
|
|
<br>
|
|
<li> automatic extraction or hand-writing?
|
|
<br>
|
|
<li> reuse of existing resources?
|
|
<br>
|
|
Extra constraint: we want open-source free software and
|
|
hence cannot use existing proprietary resources.
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Answers to questions in grammar library design</h2>
|
|
|
|
The current GF resource grammar library has
|
|
made the following decisions:
|
|
<p>
|
|
The library has, for each language
|
|
<br>
|
|
<li> complete morphology, some lexicon (500 words), representative fragment of syntax,
|
|
very little semantics,
|
|
|
|
<p>
|
|
|
|
Organization and presentation:
|
|
<br>
|
|
<li> division into top-level (API) modules, and internal modules (only
|
|
interesting for resource implementors)
|
|
<br>
|
|
<li> the API is, as much as possible, common in different languages
|
|
<br>
|
|
<li> we favour "school grammar" concepts rather than innovative linguistic theory
|
|
|
|
<p>
|
|
|
|
Where do we get the data from?
|
|
<br>
|
|
<li> morphology and syntax are hand-written
|
|
<br>
|
|
<li> the 500-word lexicon is hand-written, but a tool is provided
|
|
for automatic lexicon extraction
|
|
<br>
|
|
<li> we have not reused existing resources
|
|
<br>
|
|
The resource grammar library is entirely
|
|
open-source free software (under GNU GPL license).
|
|
|
|
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>The scope of a resource grammar library for a language</h2>
|
|
|
|
All morphological paradigms
|
|
|
|
<p>
|
|
|
|
Basic lexicon of structural, common, and irregular words
|
|
|
|
<p>
|
|
|
|
Basic syntactic structures
|
|
|
|
<p>
|
|
|
|
Currently,<br>
|
|
<li> <i>no</i> semantics,<br>
|
|
<li> <i>no</i> language-specific structures if not necessary for expressivity.
|
|
|
|
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Success criteria</h2>
|
|
|
|
Grammatical correctness
|
|
|
|
<p>
|
|
|
|
Semantic coverage: you can express whatever you want.
|
|
|
|
<p>
|
|
|
|
Usability as library for non-linguists.
|
|
|
|
<p>
|
|
|
|
(Bonus for linguists:) nice generalizations w.r.t. language
|
|
families, using the module system of GF.
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>These are not our success criteria</h2>
|
|
|
|
Language coverage: to be able to parse all expressions.
|
|
<br>
|
|
Example:
|
|
the French <i>passé simple</i> tense, although covered by the
|
|
morphology, is not used in the language-independent API, but
|
|
only the <i>passé composé</i> is. However, an application
|
|
accessing the French-specific (or Romance-specific)
|
|
modules can use the passé simple.
|
|
|
|
<p>
|
|
|
|
Semantic correctness: only to produce meaningful expressions.
|
|
<br>
|
|
Example: the following sentences can be generated
|
|
<pre>
|
|
colourless green ideas sleep furiously
|
|
|
|
the time is seventy past forty-two
|
|
</pre>
|
|
However, an applicatio grammar can use a domain-specific
|
|
semantics to guarantee semantic well-formedness.
|
|
|
|
<p>
|
|
|
|
(Warning for linguists:) theoretical innovation in
|
|
syntax is not among the goals
|
|
(and it would be hidden from users anyway!).
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>So where is semantics?</h2>
|
|
|
|
GF incorporates a <b>Logical Framework</b> and is therefore
|
|
capable of expressing logical semantics <i>à la</i> Montague
|
|
or any other flavour, including anaphora and discourse.
|
|
|
|
<p>
|
|
|
|
But we do <i>not</i> try to give semantics once and
|
|
for all for the whole language.
|
|
|
|
<p>
|
|
|
|
Instead, we expect semantics to be given in
|
|
<b>application grammars</b> built on semantic models
|
|
of different domains.
|
|
|
|
<p>
|
|
|
|
Example application: number theory
|
|
<pre>
|
|
fun Even : Nat -> Prop ; -- a mathematical predicate
|
|
|
|
lin Even = predA (regA "even") ; -- English translation
|
|
lin Even = predA (regA "pair") ; -- French translation
|
|
lin Even = predA (regA "jämn") ; -- Swedish translation
|
|
</pre>
|
|
How could the resource predict that just <i>these</i>
|
|
translations are correct in this domain?
|
|
|
|
<p>
|
|
|
|
Application grammars are built by experts of these domains
|
|
who - thanks to resource grammars - do no more need to be
|
|
experts in linguistics.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Languages</h2>
|
|
|
|
The current GF Resource Project covers ten languages:
|
|
<ul>
|
|
<li><tt>Dan</tt>ish
|
|
<li><tt>Eng</tt>lish
|
|
<li><tt>Fin</tt>nish
|
|
<li><tt>Fre</tt>nch
|
|
<li><tt>Ger</tt>man
|
|
<li><tt>Ita</tt>lian
|
|
<li><tt>Nor</tt>wegian
|
|
<li><tt>Rus</tt>sian
|
|
<li><tt>Spa</tt>nish
|
|
<li><tt>Swe</tt>dish
|
|
</ul>
|
|
The first three letters (<tt>Dan</tt> etc) are used in grammar module names
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Library structure 1: language-independent API</h2>
|
|
|
|
|
|
<li> <tt>Lang</tt> is the top module collecting all of the following.
|
|
|
|
<p>
|
|
|
|
<li> syntactic <tt>Categories</tt> (parts of speech, word classes), e.g.
|
|
<pre>
|
|
V ; NP ; CN ; Det ; -- verb, noun phrase, common noun, determiner
|
|
</pre>
|
|
<li> <tt>Rules</tt> for combining words and phrases, e.g.
|
|
<pre>
|
|
DetNP : Det -> CN -> NP ; -- combine Det and CN into NP
|
|
</pre>
|
|
<li> the most common <tt>Structural</tt> words (determiners,
|
|
conjunctions, pronouns) (now 83), e.g.
|
|
<pre>
|
|
and_Conj : Conj ;
|
|
</pre
|
|
<li> <tt>Numerals</tt>, number words from 1 to 999,999 with their
|
|
inflections, e.g.
|
|
<pre>
|
|
n8 : Digit ;
|
|
</pre
|
|
<li> <tt>Basic</tt> lexicon of (now 218) frequent everyday words
|
|
<pre>
|
|
man_N : N ;
|
|
</pre>
|
|
|
|
<p>
|
|
|
|
In addition, and not included in <tt>Lang</tt>, there is
|
|
<li> <tt>SwadeshLex</tt>, lexicon of (now 206) words from the
|
|
<a href="http://en.wiktionary.org/wiki/Swadesh_List">Swadesh list</a>, e.g.
|
|
<pre>
|
|
squeeze_V : V ;
|
|
</pre>
|
|
Of course, there is some overlap between <tt>SwadeshLex</tt> and the other modules.
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Library structure 2: language-dependent modules</h2>
|
|
|
|
<li> morphological <tt>Paradigms</tt>, e.g. Swedish
|
|
<pre>
|
|
mkN : Str -> Str -> Str -> Str -> Gender -> N ; -- worst-case nouns
|
|
mkN : Str -> N ; -- regular nouns
|
|
</pre>
|
|
<li> (in some languages) irregular <tt>Verbs</tt>, e.g.
|
|
<pre>
|
|
angripa_V = irregV "angripa" "angrep" "angripit" ;
|
|
</pre>
|
|
<li> (not yet available) <tt>Ext</tt>ended syntax with language-specific rules
|
|
<pre>
|
|
PassBli : V2 -> NP -> VP ; -- bli överkörd av ngn
|
|
</pre>
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>How much can be language-independent?</h2>
|
|
|
|
For the ten languages we have considered, it <i>is</i> possible
|
|
to implement the current API.
|
|
|
|
<p>
|
|
|
|
Reservations:
|
|
<ul>
|
|
<li> this does not necessarily extend to all other languages
|
|
<li> this does not necessarily cover the most idiomatic expressions
|
|
of each language
|
|
<li> this may not be the easiest API to implement (e.g. negation and
|
|
inversion with <i>do</i> in English suggest that some other
|
|
structure would be more natural)
|
|
<li> it is not guaranteed that same structure has the same semantics
|
|
in all different languages
|
|
</ul>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Library structure: language-independent API</h2>
|
|
|
|
<center>
|
|
<img src="Lang.gif">
|
|
</center>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>API documentation</h2>
|
|
|
|
<a href="Categories.html">Categories</a>
|
|
|
|
<p>
|
|
<a href="Rules.html">Rules</a>
|
|
|
|
<p>
|
|
Two alternative views on sentence formation by predication:
|
|
<a href="Clause.html">Clause</a>,
|
|
<a href="Verbphrase.html">Verbphrase</a>
|
|
|
|
<p>
|
|
<a href="Structural.html">Structural</a>
|
|
|
|
<p>
|
|
|
|
<a href="Time.html">Time</a>
|
|
|
|
<p>
|
|
<a href="Basic.html">Basic</a>
|
|
|
|
<p>
|
|
|
|
<a href="Lang.html">Lang</a>
|
|
|
|
<p>
|
|
|
|
See also <a href="../../resource-1.0/doc/gfdoc">resource v 1.0 documentation</a>,
|
|
now implemented for English, German, and Swedish.
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Paradigms documentation</h2>
|
|
|
|
<a href="ParadigmsEng.html">English paradigms</a>
|
|
<br>
|
|
<a href="BasicEng.html">example use of English oaradigms</a>
|
|
<br>
|
|
<a href="VerbsEng.html">English verbs</a>
|
|
|
|
<p>
|
|
|
|
<a href="ParadigmsFin.html">Finnish paradigms</a>
|
|
<br>
|
|
<a href="BasicFin.html">example use of Finnish oaradigms</a>
|
|
|
|
<p>
|
|
|
|
<a href="ParadigmsFre.html">French paradigms</a>
|
|
<br>
|
|
<a href="BasicFre.html">example use of French paradigms</a>
|
|
<br>
|
|
<a href="VerbsFre.html">French verbs</a>
|
|
|
|
<p>
|
|
|
|
<a href="ParadigmsIta.html">Italian paradigms</a>
|
|
<br>
|
|
<a href="BasicIta.html">example use of Italian paradigms</a>
|
|
<br>
|
|
<a href="BeschIta.html">Italian verb conjugations</a>
|
|
|
|
<p>
|
|
|
|
<a href="ParadigmsNor.html">Norwegian paradigms</a>
|
|
<br>
|
|
<a href="BasicNor.html">example use of Norwegian paradigms</a>
|
|
<br>
|
|
<a href="VerbsNor.html">Norwegian verbs</a>
|
|
<p>
|
|
|
|
<a href="ParadigmsSpa.html">Spanish paradigms</a>
|
|
<br>
|
|
<a href="BasicSpa.html">example use of Spanish paradigms</a>
|
|
<br>
|
|
<a href="BeschSpa.html">Spanish verb conjugations</a>
|
|
<p>
|
|
|
|
<a href="ParadigmsSwe.html">Swedish paradigms</a>
|
|
<br>
|
|
<a href="BasicSwe.html">example use of Swedish paradigms</a>
|
|
<br>
|
|
<a href="VerbsSwe.html">Swedish verbs</a>
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Use as top-level grammar: testing</h2>
|
|
|
|
Import a set of <tt>LangX</tt> grammars:
|
|
<pre>
|
|
i english/LangEng.gf
|
|
i swedish/LangSwe.gf
|
|
</pre>
|
|
Alternatively, you can <tt>make</tt> a precompiled package of
|
|
all the languages by using <tt>lib/resource/Makefile</tt>:
|
|
<pre>
|
|
make
|
|
gf langs.gfcm
|
|
</pre>
|
|
Then you can test with translation, random generation, morphological analysis...
|
|
<pre>
|
|
> p -lang=LangEng "I have loved her." | l -lang=LangFre
|
|
Je l' ai aimée.
|
|
|
|
> gr -cat=NP | l -multi
|
|
The sock
|
|
Strumpan
|
|
Strømpen
|
|
La media
|
|
La calza
|
|
La chaussette
|
|
Sukka
|
|
</pre>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Use as top-level grammar: language learning quizzes</h2>
|
|
|
|
Morpho quiz with words (e.g. French verbs):
|
|
<pre>
|
|
i french/VerbsFre.gf
|
|
mq -cat=V
|
|
</pre>
|
|
Morpho quiz with phrases (e.g. Swedish clauses):
|
|
<pre>
|
|
i swedish/LangSwe.gf
|
|
mq -cat=Cl
|
|
</pre>
|
|
Translation quiz with sentences (e.g. sentences from English to Swedish):
|
|
<pre>
|
|
i swedish/LangEng.gf
|
|
i swedish/LangSwe.gf
|
|
tq -cat=S LangEng LangSwe
|
|
</pre>
|
|
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Use as library</h2>
|
|
|
|
Import directly by <tt>open</tt>:
|
|
<pre>
|
|
concrete AppNor of App = open LangNor, ParadigmsNor in {...}
|
|
</pre>
|
|
(Note for the users of GF 2.1 and older:
|
|
the dummy <tt>reuse</tt> modules and their bulky <tt>.gfr</tt> versions
|
|
are no longer needed!)
|
|
|
|
<p>
|
|
|
|
If you need to convert resource records to strings, and don't want to know
|
|
the concrete type (as you never should), you can use
|
|
<pre>
|
|
Predef.toStr : (L : Type) -> L -> Str ;
|
|
</pre>
|
|
<tt>L</tt> must be a linearization type. For instance,
|
|
<pre>
|
|
toStr LangNor.CN (ModAP (PositADeg old_ADeg) (UseN car_N))
|
|
---> "gammel bil"
|
|
</pre>
|
|
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Use as library through parser</h2>
|
|
|
|
You can use the parser with a <tt>LangX</tt> grammar
|
|
when developing a resource.
|
|
|
|
<p>
|
|
|
|
Using the <tt>-v</tt> option shows if the parser fails because
|
|
of unknown words.
|
|
<pre>
|
|
> p -cat=S -v -lexer=words "jag ska åka till Chalmers"
|
|
unknown tokens [TS "åka",TS "Chalmers"]
|
|
</pre>
|
|
Then try to select words that <tt>LangX</tt> recognizes:
|
|
<pre>
|
|
> p -cat=S "jag ska springa till Danmark"
|
|
UseCl (PosTP TFuture ASimul)
|
|
(AdvCl (SPredV i_NP run_V)
|
|
(AdvPP (PrepNP to_Prep (UsePN (PNCountry Denmark)))))
|
|
</pre>
|
|
Use these API structures and extend vocabulary to match your need.
|
|
<pre>
|
|
åka_V = lexV "åker" ;
|
|
Chalmers = regPN "Chalmers" neutrum ;
|
|
</pre>
|
|
|
|
<!-- NEW -->
|
|
<h2>Syntax editor as library browser</h2>
|
|
|
|
You can run the syntax editor on <tt>LangX</tt> to
|
|
find resource API functions through context-sensitive menus.
|
|
For instance, the shell command
|
|
<pre>
|
|
gfeditor LangEng.gf LangFre.gf
|
|
</pre>
|
|
opens the editor with English and French views. The
|
|
<a href="http://www.cs.chalmers.se/%7Eaarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm">
|
|
Editor User Manual</a> gives more information on the use of the editor.
|
|
|
|
<p>
|
|
|
|
A restriction of the editor is that it does not give access to
|
|
<tt>ParadigmsX</tt> modules. An IDE environment extending the editor
|
|
to a grammar programming tool is work in progress.
|
|
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Example application: a small translation system</h2>
|
|
|
|
In this system, you can express questions and answers of
|
|
the following forms:
|
|
<pre>
|
|
Who chases mice ?
|
|
Whom does the lion chase ?
|
|
The dog chases cats.
|
|
</pre>
|
|
We build the abstract syntax in two phases:
|
|
<ul>
|
|
<li> <a href=example/Questions.gf>Questions</a> defines question and
|
|
answer forms independently of domain
|
|
<li> <a href=example/Animals.gf>Animals</a> defines a lexicon with
|
|
animals and things that animals do.
|
|
</ul>
|
|
|
|
<p>
|
|
|
|
The concrete syntax of English is built in three phases:
|
|
<ul>
|
|
<li> <a href="example/HandQuestionsI.gf">QuestionsI</a> is a parametrized module
|
|
using the API module <tt>Resource</tt>.
|
|
<li> <a href="example/QuestionsEng.gf">QuestionsEng</a> is an instantiation
|
|
of the API with <tt>ResourceEng</tt>.
|
|
<li> <a href="example/AnimalsEng.gf">AnimalsEng</a> is a concrete syntax
|
|
of <tt>Animals</tt> using <tt>ParadigmsEng</tt> and <tt>VerbsEng</tt>.
|
|
</ul>
|
|
|
|
<p>
|
|
|
|
The concrete syntax of Swedish is built upon <tt>QuestionsI</tt>
|
|
in a similar way, with the modules
|
|
<a href=example/QuestionsSwe.gf>QuestionsSwe</a> and.
|
|
<a href=example/AnimalsSwe.gf>AnimalsSwe</a>.
|
|
|
|
<p>
|
|
|
|
The concrete syntax of French consists similarly of the modules
|
|
<a href=example/QuestionsFre.gf>QuestionsFre</a> and
|
|
<a href=example/AnimalsFre.gf>AnimalsFre</a>.
|
|
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Compiling the example application</h2>
|
|
|
|
The resources are bulky, and it takes a therefore a lot of
|
|
time and memory to load the grammars. However, they can be
|
|
compiled into the <tt>gfcm</tt>
|
|
(<b>GF canonical multilingual</b>) format,
|
|
which is almost one thousand times smaller and faster to load
|
|
for this set of grammars.
|
|
|
|
<p>
|
|
|
|
To produce an end-user multilingual grammar <tt>animals.gfcm</tt>,
|
|
write the sequence of compilation commands in a <tt>gfs</tt> (<b>GF script</b>)
|
|
file, say
|
|
<a href="example/mkAnimals.gfs"><tt>mkAnimals.gfs</tt></a>,
|
|
and then call GF with
|
|
<pre>
|
|
gf <mkAnimals.gfs
|
|
</pre>
|
|
To try out the grammar,
|
|
<pre>
|
|
> i animals.gfcm
|
|
|
|
> gr | l -multi
|
|
vem jagar hundar ?
|
|
qui chasse des chiens ?
|
|
who chases dogs ?
|
|
</pre>
|
|
|
|
|
|
<!-- NEW -->
|
|
|
|
<h2>Grammar writing by examples</h2>
|
|
|
|
(New in GF 2.3)
|
|
|
|
<p>
|
|
|
|
You can use the resource grammar as a parser on a special file format,
|
|
<tt>.gfe</tt> ("GF examples"). Here is the real source,
|
|
<a href="example/QuestionsI.gfe">QuestionsI.gfe</a>, which
|
|
generated
|
|
<a href="example/QuestionsI.gf">QuestionsI.gf</a>.
|
|
when you executed the GF command
|
|
<pre>
|
|
i -ex AnimalsEng.gf
|
|
</pre>
|
|
Since <tt>QuestionsI</tt> is an incomplete module ("functor"),
|
|
it need only be built once. This is why only the first
|
|
command in <tt>mkAnimals.gfs</tt> needs the flag <tt>-ex</tt>.
|
|
|
|
<p>
|
|
|
|
Of course, the grammar of any language can be created by
|
|
parsing any language, as long as they have a common resource API.
|
|
The use of English resource is generally recommended, because it
|
|
is smaller and faster to parse than the other languages.
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Constants and variables in examples</h2>
|
|
|
|
The file <a href="example/QuestionsI.gfe">QuestionsI.gfe</a> uses
|
|
as resource <tt>LangEng</tt>, which contains all resource syntax and
|
|
a lexicon of ca. 300 words. A linearization rule, such as
|
|
<pre>
|
|
lin Who love_V2 man_N = in Phr "who loves men ?" ;
|
|
</pre>
|
|
uses as argument variables constants for words that can be found in
|
|
the lexicon. It is due to this that the example can be parsed.
|
|
When the resulting rule,
|
|
<pre>
|
|
lin Who love_V2 man_N =
|
|
QuestPhrase (UseQCl (PosTP TPresent ASimul)
|
|
(QPredV2 who8one_IP love_V2 (IndefNumNP NoNum (UseN man_N)))) ;
|
|
</pre>
|
|
is read by the GF compiler, the identifiers <tt>love_V2</tt> and
|
|
<tt>man_N</tt> are not treated as constants, but, following
|
|
the normal binding rules of functional languages, as bound variables.
|
|
This is what gives the example method the generality that is needed.
|
|
|
|
<p>
|
|
|
|
To write linearization rules by examples one thus has to know at
|
|
least one abstract syntax constant for each category for which
|
|
one needs a variable.
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Extending the lexicon on the fly</h2>
|
|
|
|
The greatest limitation of the example method is that the lexicon
|
|
may lack many of the words that are needed in examples. If parsing
|
|
fails because of this, the compiler gives a list of unknown words
|
|
in its error message. An obvious solution is,
|
|
of course, to extend the resource lexicon and try again.
|
|
A more light-weight solution is to add a <b>substitution</b> to
|
|
the example. For instance, if you want the example "the pope"
|
|
but the lexicon does not have the word "pope", you can write
|
|
<pre>
|
|
lin Pope = in NP "the man" {man_N = regN "pope"} ;
|
|
</pre>
|
|
The resulting linearization rule is initially
|
|
<pre>
|
|
lin Pope = DefOneNP (UseN man_N) ;
|
|
</pre>
|
|
but the substitution changes this to
|
|
<pre>
|
|
lin Pope = DefOneNP (UseN (regN "pope")) ;
|
|
</pre>
|
|
In this way, you do not have to extend the resource lexicon, but you
|
|
need to open the Paradigms module to compile the resulting term.
|
|
|
|
<p>
|
|
|
|
Of course, the substituted expressions may come from another language
|
|
than the main language of the example:
|
|
<pre>
|
|
lin Pope = in NP "the man" {man_N = regN "pape" masculine} ;
|
|
</pre>
|
|
If many substitutions are needed, semicolons are used as separators:
|
|
<pre>
|
|
{man_N = regN "pope" ; walk_N = regV "pray"} ;
|
|
</pre>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Implementation details: low-level files</h2>
|
|
|
|
<b>For developers of resource grammars.</b>
|
|
The modules listed in this section should never be imported in application
|
|
grammars.
|
|
|
|
<p>
|
|
|
|
Each of the API implementations uses the following auxiliary resource modules:
|
|
<ul>
|
|
<li> <tt>Types</tt>, the morphological paradigms and word classes
|
|
<li> <tt>Morpho</tt>, inflection machinery
|
|
<li> <tt>Syntax</tt>, complex categories and their combinations
|
|
</ul>
|
|
In addition, the following language-independent modules from <tt>lib/prelude</tt>
|
|
are used.
|
|
<ul>
|
|
<li> <tt>Predef</tt>, operations whose definitions are hard-coded in GF
|
|
<li> <tt>Prelude</tt>, generic string and boolean operations
|
|
<li> <tt>Coordination</tt>, coordination structures for arbitrary categories
|
|
</ul>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Implementation details: the structure of low-level files</h2>
|
|
|
|
<center>
|
|
<img src="Low.gif">
|
|
</center>
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>How to change a resource grammar?</h2>
|
|
|
|
In many cases, the source of a bug is in one of
|
|
the low-level modules. Try to trace it back there
|
|
by starting from the high-level module.
|
|
|
|
<p>
|
|
|
|
(Much more to be written...)
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>How to write a resource grammar?</h2>
|
|
|
|
Start with a more limited goal, e.g. to implement
|
|
the <tt>stoneage</tt> grammar (<tt>examples/stoneage</tt>)
|
|
for your language.
|
|
|
|
<p>
|
|
|
|
For this, you need
|
|
<ul>
|
|
<li> most of <tt>Types</tt>
|
|
<li> most of <tt>Morpho</tt>
|
|
<li> some of <tt>Syntax</tt>
|
|
<li> most of <tt>Paradigms</tt>
|
|
</ul>
|
|
|
|
<p>
|
|
|
|
A useful command to test <tt>oper</tt>s:
|
|
<pre>
|
|
i -retain MorphoRot.gf
|
|
cc regNoun "foo"
|
|
</pre>
|
|
|
|
<p>
|
|
|
|
See also <a href="../../resource-1.0/doc/Resource-HOWTO.html">Resource-HOWTO</a>
|
|
(under construction).
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>The use of parametrized modules</h2>
|
|
|
|
In two language families, a lot of code is shared.
|
|
<ul>
|
|
<li> Romance: French, Italian, Spanish
|
|
<li> Scandinavian: Danish, Norwegian, Swedish
|
|
</ul>
|
|
The structure looks like this.
|
|
<center>
|
|
<img src="Scand.gif">
|
|
</center>
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Current status</h2>
|
|
|
|
<table border=1>
|
|
<tr><td>Language</td> <td>v0.6</td> <td>v0.9</td> <td>v1.0</td> <td>Paradigms</td> <td>Basic lex</td> <td>Verbs</td></tr>
|
|
<tr><td>Danish</td> <td>-</td> <td>X</td> <td>-</td><td>-</td> <td>-</td> <td>-</tr>
|
|
<tr><td>English</td> <td>X</td> <td>X</td> <td>X</td><td>X</td> <td>X</td> <td>X</tr>
|
|
<tr><td>Finnish</td> <td>X</td> <td>+</td> <td>-</td><td>X</td> <td>X</td> <td>0</tr>
|
|
<tr><td>French</td> <td>X</td> <td>X</td> <td>X</td><td>X</td> <td>X</td> <td>X</tr>
|
|
<tr><td>German</td> <td>X</td> <td>-</td> <td>X</td><td>X</td> <td>-</td> <td>-</tr>
|
|
<tr><td>Italian</td> <td>X</td> <td>X</td> <td>-</td><td>X</td> <td>X</td> <td>X</tr>
|
|
<tr><td>Norwegian</td> <td>-</td> <td>X</td> <td>-</td><td>X</td> <td>X</td> <td>X</tr>
|
|
<tr><td>Russian</td> <td>X</td> <td>X</td> <td>-</td><td>*</td> <td>-</td> <td>-</tr>
|
|
<tr><td>Spanish</td> <td>-</td> <td>X</td> <td>-</td><td>X</td> <td>X</td> <td>X</tr>
|
|
<tr><td>Swedish</td> <td>X</td> <td>X</td> <td>X</td><td>X</td> <td>X</td> <td>X</tr>
|
|
</table>
|
|
X = implemented (few exceptions may occur)
|
|
<br>
|
|
+ = implemented for a large part
|
|
<br>
|
|
* = linguistic material ready for implementation
|
|
<br>
|
|
- = not implemented
|
|
<br>
|
|
0 = not applicable
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Known bugs and limitations</h2>
|
|
|
|
(<i>The listed limitations are ones that do not follow from the table on
|
|
the previous page</i>.)
|
|
<p>
|
|
Danish
|
|
<p>
|
|
English:
|
|
missing uncontracted negations.
|
|
<p>
|
|
Finnish:
|
|
compiling the heuristic paradigms is slow;
|
|
possessive and interrogative suffixes have no proper lexer.
|
|
<p>
|
|
French:
|
|
no inverted questions;
|
|
some verbs in Basic should be reflexive
|
|
<p>
|
|
German
|
|
<p>
|
|
Italian:
|
|
no omission of unstressed subject pronouns;
|
|
some verbs in Basic should be reflexive;
|
|
bad forms of reflexive infinitives
|
|
<p>
|
|
Norwegian:
|
|
possessives of type <i>bilen min</i> not included
|
|
<p>
|
|
Russian
|
|
<p>
|
|
Spanish:
|
|
no omission of unstressed subject pronouns;
|
|
no switch to dative case for human objects;
|
|
some verbs in Basic should be reflexive;
|
|
bad forms of reflexive infinitives;
|
|
spurious parameter for verb auxiliary inherited from Romance
|
|
|
|
<p>
|
|
Swedish:
|
|
|
|
|
|
|
|
<!-- NEW -->
|
|
<h2>Obtaining it</h2>
|
|
|
|
Get the grammar package from
|
|
<a href="http://sourceforge.net/project/showfiles.php?group_id=132285">
|
|
GF Download Page</a>. The current libraries are in
|
|
<tt>lib/resource</tt>. Version 0.6 is in
|
|
<tt>lib/resource-0.6</tt>.
|
|
|
|
<p>
|
|
|
|
The very very latest version of GF and its libraries is in
|
|
<a href="http://www.cs.chalmers.se/~bringert/gf/downloads/snapshots/">Snapshots</a>.
|
|
|
|
</body></html>
|