mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-18 17:29:32 -06:00
1041 lines
22 KiB
Plaintext
1041 lines
22 KiB
Plaintext
GF Resource Grammar Library
|
|
Author: Aarne Ranta <aarne (at) cs.chalmers.se>
|
|
Last update: %%date(%c)
|
|
|
|
% NOTE: this is a txt2tags file.
|
|
% Create an html file from this file using:
|
|
% txt2tags --toc gf-resource.txt
|
|
|
|
%!target:html
|
|
|
|
%!postproc(html): #NEW <!-- NEW -->
|
|
|
|
|
|
#NEW
|
|
==GF = Grammatical Framework==
|
|
|
|
GF is a grammar formalism based on functional programming and type theory.
|
|
|
|
|
|
|
|
GF was designed to be nice for //ordinary programmers// to use: by this
|
|
we mean programmers without training in linguistics.
|
|
|
|
|
|
|
|
The mission of GF is to make natural-language applications available for
|
|
ordinary programmers, in tasks like
|
|
|
|
- software documentation
|
|
- domain-specific translation
|
|
- human-computer interaction
|
|
- dialogue systems
|
|
|
|
Thus GF is //not// primarily another theoretical framework for
|
|
linguists.
|
|
|
|
|
|
|
|
#NEW
|
|
==Multilingual grammars==
|
|
|
|
A GF grammar consists of an abstract syntax and a set
|
|
of concrete syntaxes.
|
|
|
|
|
|
|
|
**Abstract syntax**: language-independent representation
|
|
```
|
|
cat Prop ; Nat ;
|
|
fun Even : Nat -> Prop ;
|
|
fun NInt : Int -> Nat ;
|
|
```
|
|
**Concrete syntax**: mapping from abstract syntax trees to strings in a language
|
|
(English, French, German, Swedish,...)
|
|
```
|
|
lin Even x = {s = x.s ++ "is" ++ "even"} ;
|
|
|
|
lin Even x = {s = x.s ++ "est" ++ "pair"} ;
|
|
|
|
lin Even x = {s = x.s ++ "ist" ++ "gerade"} ;
|
|
|
|
lin Even x = {s = x.s ++ "är" ++ "jämnt"} ;
|
|
```
|
|
We can **translate** between languages via the abstract syntax:
|
|
```
|
|
4 is even 4 ist gerade
|
|
\ /
|
|
Even (NInt 4)
|
|
/ \
|
|
4 est pair 4 är jämnt
|
|
```
|
|
|
|
|
|
|
|
But is it really so simple?
|
|
|
|
|
|
#NEW
|
|
==Difficulties with concrete syntax==
|
|
|
|
Most languages have rules of **inflection**, **agreement**,
|
|
and **word order**, which have to be obeyed when putting together
|
|
expressions.
|
|
|
|
|
|
|
|
The previous multilingual grammar breaks these rules in many situations:
|
|
//
|
|
2 and 3 is even
|
|
la somme de 3 et de 5 est pair
|
|
wenn 2 ist gerade, dann 2+2 ist gerade
|
|
om 2 är jämnt, 2+2 är jämnt
|
|
//
|
|
All these sentences are grammatically incorrect.
|
|
|
|
|
|
|
|
#NEW
|
|
==Solving the difficulties==
|
|
|
|
GF has tools for expressing the linguistic rules that are needed to
|
|
produce correct translations in different languages.
|
|
|
|
|
|
|
|
Instead of just strings, we need parameters**, **tables**,
|
|
and **record types**. For instance, French:
|
|
```
|
|
param Mod = Ind | Subj ;
|
|
param Gen = Masc | Fem ;
|
|
|
|
lincat Nat = {s : Str ; g : Gen} ;
|
|
lincat Prop = {s : Mod => Str} ;
|
|
|
|
lin Even x = {s =
|
|
table {
|
|
m => x.s ++
|
|
case m of {Ind => "est" ; Subj => "soit"} ++
|
|
case x.g of {Masc => "pair" ; Fem => "paire"}
|
|
}
|
|
} ;
|
|
```
|
|
To learn more about these constructs, consult GF documentation, e.g. the
|
|
[../../../doc/tutorial/gf-tutorial2.html New Grammarian's Tutorial].
|
|
However, in what follows we will show how to avoid learning them and
|
|
still write linguistically correct grammars.
|
|
|
|
|
|
#NEW
|
|
==Language + Libraries==
|
|
|
|
Writing natural language grammars still requires
|
|
theoretical knowledge about the language.
|
|
|
|
|
|
|
|
Which kind of a programmer is it easier to find?
|
|
|
|
- one who can write a sorting algorithm
|
|
- one who can write a grammar for Swedish determiners
|
|
|
|
|
|
|
|
|
|
In main-stream programming, sorting algorithms are not
|
|
written by hand but taken from **libraries**.
|
|
|
|
|
|
|
|
In the same way, we want to create grammar libraries that encapsulate
|
|
basic linguistic facts.
|
|
|
|
|
|
|
|
Cf. the Java success story: the language is just a half of the
|
|
success - libraries are another half.
|
|
|
|
|
|
|
|
#NEW
|
|
==Example of library-based grammar writing==
|
|
|
|
To define a Swedish expression of a mathematical predicate from scratch:
|
|
```
|
|
Even x =
|
|
let jämn = case <x.n,x.g> of {
|
|
<Sg,Utr> => "jämn" ;
|
|
<Sg,Neutr> => "jämnt" ;
|
|
<Pl,_> => "jämna"
|
|
}
|
|
in
|
|
{s = table {
|
|
Main => x.s ! Nom ++ "är" ++ jämn ;
|
|
Inv => "är" ++ x.s ! Nom ++ jämn ;
|
|
Sub => x.s ! Nom ++ "är" ++ jämn
|
|
}
|
|
}
|
|
```
|
|
To use library functions for syntax and morphology:
|
|
```
|
|
Even = predA (regA "jämn") ;
|
|
```
|
|
For the French version, we write
|
|
```
|
|
Even = predA (regA "pair") ;
|
|
```
|
|
|
|
|
|
|
|
#NEW
|
|
==Questions in grammar library design==
|
|
|
|
What should there be in the library?
|
|
|
|
- morphology, lexicon, syntax, semantics,...
|
|
|
|
|
|
|
|
How do we organize and present the library?
|
|
|
|
- division into modules, level of granularity
|
|
|
|
- "school grammar" vs. sophisticated linguistic concepts
|
|
|
|
|
|
|
|
Where do we get the data from?
|
|
|
|
- automatic extraction or hand-writing?
|
|
|
|
- reuse of existing resources?
|
|
|
|
Extra constraint: we want open-source free software and
|
|
hence cannot use existing proprietary resources.
|
|
|
|
|
|
#NEW
|
|
==Answers to questions in grammar library design==
|
|
|
|
The current GF resource grammar library has
|
|
made the following decisions:
|
|
|
|
The library has, for each language
|
|
|
|
- complete morphology, some lexicon (500 words), representative fragment of syntax,
|
|
very little semantics,
|
|
|
|
|
|
|
|
Organization and presentation:
|
|
|
|
- division into top-level (API) modules, and internal modules (only
|
|
interesting for resource implementors)
|
|
|
|
- the API is, as much as possible, common in different languages
|
|
|
|
- we favour "school grammar" concepts rather than innovative linguistic theory
|
|
|
|
|
|
|
|
Where do we get the data from?
|
|
|
|
- morphology and syntax are hand-written
|
|
|
|
- the 500-word lexicon is hand-written, but a tool is provided
|
|
for automatic lexicon extraction
|
|
|
|
- we have not reused existing resources
|
|
|
|
The resource grammar library is entirely
|
|
open-source free software (under GNU GPL license).
|
|
|
|
|
|
|
|
|
|
|
|
#NEW
|
|
==The scope of a resource grammar library for a language==
|
|
|
|
All morphological paradigms
|
|
|
|
|
|
|
|
Basic lexicon of structural, common, and irregular words
|
|
|
|
|
|
|
|
Basic syntactic structures
|
|
|
|
|
|
|
|
Currently,
|
|
- //no// semantics,
|
|
- //no// language-specific structures if not necessary for expressivity.
|
|
|
|
|
|
|
|
|
|
|
|
#NEW
|
|
==Success criteria==
|
|
|
|
Grammatical correctness
|
|
|
|
|
|
|
|
Semantic coverage: you can express whatever you want.
|
|
|
|
|
|
|
|
Usability as library for non-linguists.
|
|
|
|
|
|
|
|
(Bonus for linguists:) nice generalizations w.r.t. language
|
|
families, using the module system of GF.
|
|
|
|
|
|
|
|
#NEW
|
|
==These are not our success criteria==
|
|
|
|
Language coverage: to be able to parse all expressions.
|
|
|
|
Example:
|
|
the French //passé simple// tense, although covered by the
|
|
morphology, is not used in the language-independent API, but
|
|
only the //passé composé// is. However, an application
|
|
accessing the French-specific (or Romance-specific)
|
|
modules can use the passé simple.
|
|
|
|
|
|
|
|
Semantic correctness: only to produce meaningful expressions.
|
|
|
|
Example: the following sentences can be generated
|
|
```
|
|
colourless green ideas sleep furiously
|
|
|
|
the time is seventy past forty-two
|
|
```
|
|
However, an applicatio grammar can use a domain-specific
|
|
semantics to guarantee semantic well-formedness.
|
|
|
|
|
|
|
|
(Warning for linguists:) theoretical innovation in
|
|
syntax is not among the goals
|
|
(and it would be hidden from users anyway!).
|
|
|
|
|
|
|
|
#NEW
|
|
==So where is semantics?==
|
|
|
|
GF incorporates a **Logical Framework** and is therefore
|
|
capable of expressing logical semantics //à la// Montague
|
|
or any other flavour, including anaphora and discourse.
|
|
|
|
|
|
|
|
But we do //not// try to give semantics once and
|
|
for all for the whole language.
|
|
|
|
|
|
|
|
Instead, we expect semantics to be given in
|
|
**application grammars** built on semantic models
|
|
of different domains.
|
|
|
|
|
|
|
|
Example application: number theory
|
|
```
|
|
fun Even : Nat -> Prop ; -- a mathematical predicate
|
|
|
|
lin Even = predA (regA "even") ; -- English translation
|
|
lin Even = predA (regA "pair") ; -- French translation
|
|
lin Even = predA (regA "jämn") ; -- Swedish translation
|
|
```
|
|
How could the resource predict that just //these//
|
|
translations are correct in this domain?
|
|
|
|
|
|
|
|
Application grammars are built by experts of these domains
|
|
who - thanks to resource grammars - do no more need to be
|
|
experts in linguistics.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
#NEW
|
|
==Languages==
|
|
|
|
The current GF Resource Project covers ten languages:
|
|
|
|
-``Dan``ish
|
|
-``Eng``lish
|
|
-``Fin``nish
|
|
-``Fre``nch
|
|
-``Ger``man
|
|
-``Ita``lian
|
|
-``Nor``wegian
|
|
-``Rus``sian
|
|
-``Spa``nish
|
|
-``Swe``dish
|
|
|
|
The first three letters (``Dan`` etc) are used in grammar module names
|
|
|
|
|
|
|
|
#NEW
|
|
==Library structure 1: language-independent API==
|
|
|
|
|
|
- ``Lang`` is the top module collecting all of the following.
|
|
|
|
|
|
|
|
- syntactic ``Categories`` (parts of speech, word classes), e.g.
|
|
```
|
|
V ; NP ; CN ; Det ; -- verb, noun phrase, common noun, determiner
|
|
```
|
|
- ``Rules`` for combining words and phrases, e.g.
|
|
```
|
|
DetNP : Det -> CN -> NP ; -- combine Det and CN into NP
|
|
```
|
|
- the most common ``Structural`` words (determiners,
|
|
conjunctions, pronouns) (now 83), e.g.
|
|
```
|
|
and_Conj : Conj ;
|
|
```
|
|
- ``Numerals``, number words from 1 to 999,999 with their
|
|
inflections, e.g.
|
|
```
|
|
n8 : Digit ;
|
|
```
|
|
- ``Basic`` lexicon of (now 218) frequent everyday words
|
|
```
|
|
man_N : N ;
|
|
```
|
|
|
|
|
|
|
|
In addition, and not included in ``Lang``, there is
|
|
- ``SwadeshLex``, lexicon of (now 206) words from the
|
|
[http://en.wiktionary.org/wiki/Swadesh_List Swadesh list], e.g.
|
|
```
|
|
squeeze_V : V ;
|
|
```
|
|
Of course, there is some overlap between ``SwadeshLex`` and the other modules.
|
|
|
|
|
|
#NEW
|
|
==Library structure 2: language-dependent modules==
|
|
|
|
- morphological ``Paradigms``, e.g. Swedish
|
|
```
|
|
mkN : Str -> Str -> Str -> Str -> Gender -> N ; -- worst-case nouns
|
|
mkN : Str -> N ; -- regular nouns
|
|
```
|
|
- (in some languages) irregular ``Verbs``, e.g.
|
|
```
|
|
angripa_V = irregV "angripa" "angrep" "angripit" ;
|
|
```
|
|
- (not yet available) ``Ext``ended syntax with language-specific rules
|
|
```
|
|
PassBli : V2 -> NP -> VP ; -- bli överkörd av ngn
|
|
```
|
|
|
|
|
|
|
|
#NEW
|
|
==How much can be language-independent?==
|
|
|
|
For the ten languages we have considered, it //is// possible
|
|
to implement the current API.
|
|
|
|
|
|
|
|
Reservations:
|
|
|
|
- this does not necessarily extend to all other languages
|
|
- this does not necessarily cover the most idiomatic expressions
|
|
of each language
|
|
- this may not be the easiest API to implement (e.g. negation and
|
|
inversion with //do// in English suggest that some other
|
|
structure would be more natural)
|
|
- it is not guaranteed that same structure has the same semantics
|
|
in all different languages
|
|
|
|
|
|
|
|
#NEW
|
|
==Library structure: language-independent API==
|
|
|
|
%#center
|
|
[src="Lang.gif]
|
|
%#center
|
|
|
|
|
|
#NEW
|
|
==API documentation==
|
|
|
|
[Categories.html Categories]
|
|
|
|
|
|
[Rules.html Rules]
|
|
|
|
|
|
Two alternative views on sentence formation by predication:
|
|
[Clause.html Clause],
|
|
[Verbphrase.html Verbphrase]
|
|
|
|
|
|
[Structural.html Structural]
|
|
|
|
|
|
|
|
[Time.html Time]
|
|
|
|
|
|
[Basic.html Basic]
|
|
|
|
|
|
|
|
[Lang.html Lang]
|
|
|
|
|
|
|
|
See also [../../resource-1.0/doc/gfdoc resource v 1.0 documentation],
|
|
now implemented for English, German, and Swedish.
|
|
|
|
|
|
|
|
#NEW
|
|
==Paradigms documentation==
|
|
|
|
[ParadigmsEng.html English paradigms]
|
|
|
|
[BasicEng.html example use of English oaradigms]
|
|
|
|
[VerbsEng.html English verbs]
|
|
|
|
|
|
|
|
[ParadigmsFin.html Finnish paradigms]
|
|
|
|
[BasicFin.html example use of Finnish oaradigms]
|
|
|
|
|
|
|
|
[ParadigmsFre.html French paradigms]
|
|
|
|
[BasicFre.html example use of French paradigms]
|
|
|
|
[VerbsFre.html French verbs]
|
|
|
|
|
|
|
|
[ParadigmsIta.html Italian paradigms]
|
|
|
|
[BasicIta.html example use of Italian paradigms]
|
|
|
|
[BeschIta.html Italian verb conjugations]
|
|
|
|
|
|
|
|
[ParadigmsNor.html Norwegian paradigms]
|
|
|
|
[BasicNor.html example use of Norwegian paradigms]
|
|
|
|
[VerbsNor.html Norwegian verbs]
|
|
|
|
|
|
[ParadigmsSpa.html Spanish paradigms]
|
|
|
|
[BasicSpa.html example use of Spanish paradigms]
|
|
|
|
[BeschSpa.html Spanish verb conjugations]
|
|
|
|
|
|
[ParadigmsSwe.html Swedish paradigms]
|
|
|
|
[BasicSwe.html example use of Swedish paradigms]
|
|
|
|
[VerbsSwe.html Swedish verbs]
|
|
|
|
|
|
|
|
#NEW
|
|
==Use as top-level grammar: testing==
|
|
|
|
Import a set of ``LangX`` grammars:
|
|
```
|
|
i english/LangEng.gf
|
|
i swedish/LangSwe.gf
|
|
```
|
|
Alternatively, you can ``make`` a precompiled package of
|
|
all the languages by using ``lib/resource/Makefile``:
|
|
```
|
|
make
|
|
gf langs.gfcm
|
|
```
|
|
Then you can test with translation, random generation, morphological analysis...
|
|
```
|
|
> p -lang=LangEng "I have loved her." | l -lang=LangFre
|
|
Je l' ai aimée.
|
|
|
|
> gr -cat=NP | l -multi
|
|
The sock
|
|
Strumpan
|
|
Strømpen
|
|
La media
|
|
La calza
|
|
La chaussette
|
|
Sukka
|
|
```
|
|
|
|
|
|
#NEW
|
|
==Use as top-level grammar: language learning quizzes==
|
|
|
|
Morpho quiz with words (e.g. French verbs):
|
|
```
|
|
i french/VerbsFre.gf
|
|
mq -cat=V
|
|
```
|
|
Morpho quiz with phrases (e.g. Swedish clauses):
|
|
```
|
|
i swedish/LangSwe.gf
|
|
mq -cat=Cl
|
|
```
|
|
Translation quiz with sentences (e.g. sentences from English to Swedish):
|
|
```
|
|
i swedish/LangEng.gf
|
|
i swedish/LangSwe.gf
|
|
tq -cat=S LangEng LangSwe
|
|
```
|
|
|
|
|
|
|
|
|
|
#NEW
|
|
==Use as library==
|
|
|
|
Import directly by ``open``:
|
|
```
|
|
concrete AppNor of App = open LangNor, ParadigmsNor in {...}
|
|
```
|
|
(Note for the users of GF 2.1 and older:
|
|
the dummy ``reuse`` modules and their bulky ``.gfr`` versions
|
|
are no longer needed!)
|
|
|
|
|
|
|
|
If you need to convert resource records to strings, and don't want to know
|
|
the concrete type (as you never should), you can use
|
|
```
|
|
Predef.toStr : (L : Type) -> L -> Str ;
|
|
```
|
|
``L`` must be a linearization type. For instance,
|
|
```
|
|
toStr LangNor.CN (ModAP (PositADeg old_ADeg) (UseN car_N))
|
|
---> "gammel bil"
|
|
```
|
|
|
|
|
|
|
|
|
|
#NEW
|
|
==Use as library through parser==
|
|
|
|
You can use the parser with a ``LangX`` grammar
|
|
when developing a resource.
|
|
|
|
|
|
|
|
Using the ``-v`` option shows if the parser fails because
|
|
of unknown words.
|
|
```
|
|
> p -cat=S -v -lexer=words "jag ska åka till Chalmers"
|
|
unknown tokens [TS "åka",TS "Chalmers"]
|
|
```
|
|
Then try to select words that ``LangX`` recognizes:
|
|
```
|
|
> p -cat=S "jag ska springa till Danmark"
|
|
UseCl (PosTP TFuture ASimul)
|
|
(AdvCl (SPredV i_NP run_V)
|
|
(AdvPP (PrepNP to_Prep (UsePN (PNCountry Denmark)))))
|
|
```
|
|
Use these API structures and extend vocabulary to match your need.
|
|
```
|
|
åka_V = lexV "åker" ;
|
|
Chalmers = regPN "Chalmers" neutrum ;
|
|
```
|
|
|
|
#NEW
|
|
==Syntax editor as library browser==
|
|
|
|
You can run the syntax editor on ``LangX`` to
|
|
find resource API functions through context-sensitive menus.
|
|
For instance, the shell command
|
|
```
|
|
gfeditor LangEng.gf LangFre.gf
|
|
```
|
|
opens the editor with English and French views. The
|
|
[http://www.cs.chalmers.se/%7Eaarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm
|
|
Editor User Manual] gives more information on the use of the editor.
|
|
|
|
|
|
|
|
A restriction of the editor is that it does not give access to
|
|
``ParadigmsX`` modules. An IDE environment extending the editor
|
|
to a grammar programming tool is work in progress.
|
|
|
|
|
|
|
|
|
|
#NEW
|
|
==Example application: a small translation system==
|
|
|
|
In this system, you can express questions and answers of
|
|
the following forms:
|
|
```
|
|
Who chases mice ?
|
|
Whom does the lion chase ?
|
|
The dog chases cats.
|
|
```
|
|
We build the abstract syntax in two phases:
|
|
|
|
- [example/Questions.gf>Questions] defines question and
|
|
answer forms independently of domain
|
|
- [example/Animals.gf>Animals] defines a lexicon with
|
|
animals and things that animals do.
|
|
|
|
|
|
|
|
|
|
The concrete syntax of English is built in three phases:
|
|
|
|
- [example/HandQuestionsI.gf QuestionsI] is a parametrized module
|
|
using the API module ``Resource``.
|
|
- [example/QuestionsEng.gf QuestionsEng] is an instantiation
|
|
of the API with ``ResourceEng``.
|
|
- [example/AnimalsEng.gf AnimalsEng] is a concrete syntax
|
|
of ``Animals`` using ``ParadigmsEng`` and ``VerbsEng``.
|
|
|
|
|
|
|
|
|
|
The concrete syntax of Swedish is built upon ``QuestionsI``
|
|
in a similar way, with the modules
|
|
[example/QuestionsSwe.gf QuestionsSwe] and.
|
|
[example/AnimalsSwe.gf AnimalsSwe].
|
|
|
|
|
|
|
|
The concrete syntax of French consists similarly of the modules
|
|
[example/QuestionsFre.gf QuestionsFre] and
|
|
[example/AnimalsFre.gf AnimalsFre].
|
|
|
|
|
|
|
|
|
|
#NEW
|
|
==Compiling the example application==
|
|
|
|
The resources are bulky, and it takes a therefore a lot of
|
|
time and memory to load the grammars. However, they can be
|
|
compiled into the ``gfcm``
|
|
(**GF canonical multilingual**) format,
|
|
which is almost one thousand times smaller and faster to load
|
|
for this set of grammars.
|
|
|
|
|
|
|
|
To produce an end-user multilingual grammar ``animals.gfcm``,
|
|
write the sequence of compilation commands in a ``gfs`` (**GF script**)
|
|
file, say
|
|
[example/mkAnimals.gfs ``mkAnimals.gfs``],
|
|
and then call GF with
|
|
```
|
|
gf <mkAnimals.gfs
|
|
```
|
|
To try out the grammar,
|
|
```
|
|
> i animals.gfcm
|
|
|
|
> gr | l -multi
|
|
vem jagar hundar ?
|
|
qui chasse des chiens ?
|
|
who chases dogs ?
|
|
```
|
|
|
|
|
|
#NEW
|
|
|
|
==Grammar writing by examples==
|
|
|
|
(New in GF 2.3)
|
|
|
|
|
|
|
|
You can use the resource grammar as a parser on a special file format,
|
|
``.gfe`` ("GF examples"). Here is the real source,
|
|
[example/QuestionsI.gfe QuestionsI.gfe], which
|
|
generated
|
|
[example/QuestionsI.gf QuestionsI.gf].
|
|
when you executed the GF command
|
|
```
|
|
i -ex AnimalsEng.gf
|
|
```
|
|
Since ``QuestionsI`` is an incomplete module ("functor"),
|
|
it need only be built once. This is why only the first
|
|
command in ``mkAnimals.gfs`` needs the flag ``-ex``.
|
|
|
|
|
|
|
|
Of course, the grammar of any language can be created by
|
|
parsing any language, as long as they have a common resource API.
|
|
The use of English resource is generally recommended, because it
|
|
is smaller and faster to parse than the other languages.
|
|
|
|
|
|
#NEW
|
|
==Constants and variables in examples==
|
|
|
|
The file [example/QuestionsI.gfe QuestionsI.gfe] uses
|
|
as resource ``LangEng``, which contains all resource syntax and
|
|
a lexicon of ca. 300 words. A linearization rule, such as
|
|
```
|
|
lin Who love_V2 man_N = in Phr "who loves men ?" ;
|
|
```
|
|
uses as argument variables constants for words that can be found in
|
|
the lexicon. It is due to this that the example can be parsed.
|
|
When the resulting rule,
|
|
```
|
|
lin Who love_V2 man_N =
|
|
QuestPhrase (UseQCl (PosTP TPresent ASimul)
|
|
(QPredV2 who8one_IP love_V2 (IndefNumNP NoNum (UseN man_N)))) ;
|
|
```
|
|
is read by the GF compiler, the identifiers ``love_V2`` and
|
|
``man_N`` are not treated as constants, but, following
|
|
the normal binding rules of functional languages, as bound variables.
|
|
This is what gives the example method the generality that is needed.
|
|
|
|
|
|
|
|
To write linearization rules by examples one thus has to know at
|
|
least one abstract syntax constant for each category for which
|
|
one needs a variable.
|
|
|
|
|
|
|
|
#NEW
|
|
==Extending the lexicon on the fly==
|
|
|
|
The greatest limitation of the example method is that the lexicon
|
|
may lack many of the words that are needed in examples. If parsing
|
|
fails because of this, the compiler gives a list of unknown words
|
|
in its error message. An obvious solution is,
|
|
of course, to extend the resource lexicon and try again.
|
|
A more light-weight solution is to add a **substitution** to
|
|
the example. For instance, if you want the example "the pope"
|
|
but the lexicon does not have the word "pope", you can write
|
|
```
|
|
lin Pope = in NP "the man" {man_N = regN "pope"} ;
|
|
```
|
|
The resulting linearization rule is initially
|
|
```
|
|
lin Pope = DefOneNP (UseN man_N) ;
|
|
```
|
|
but the substitution changes this to
|
|
```
|
|
lin Pope = DefOneNP (UseN (regN "pope")) ;
|
|
```
|
|
In this way, you do not have to extend the resource lexicon, but you
|
|
need to open the Paradigms module to compile the resulting term.
|
|
|
|
|
|
|
|
Of course, the substituted expressions may come from another language
|
|
than the main language of the example:
|
|
```
|
|
lin Pope = in NP "the man" {man_N = regN "pape" masculine} ;
|
|
```
|
|
If many substitutions are needed, semicolons are used as separators:
|
|
```
|
|
{man_N = regN "pope" ; walk_N = regV "pray"} ;
|
|
```
|
|
|
|
|
|
#NEW
|
|
==Implementation details: low-level files==
|
|
|
|
**For developers of resource grammars.**
|
|
The modules listed in this section should never be imported in application
|
|
grammars.
|
|
|
|
|
|
|
|
Each of the API implementations uses the following auxiliary resource modules:
|
|
|
|
- ``Types``, the morphological paradigms and word classes
|
|
- ``Morpho``, inflection machinery
|
|
- ``Syntax``, complex categories and their combinations
|
|
|
|
In addition, the following language-independent modules from ``lib/prelude``
|
|
are used.
|
|
|
|
- ``Predef``, operations whose definitions are hard-coded in GF
|
|
- ``Prelude``, generic string and boolean operations
|
|
- ``Coordination``, coordination structures for arbitrary categories
|
|
|
|
|
|
|
|
#NEW
|
|
==Implementation details: the structure of low-level files==
|
|
|
|
%#center
|
|
[Low.gif]
|
|
%#center
|
|
|
|
|
|
#NEW
|
|
==How to change a resource grammar?==
|
|
|
|
In many cases, the source of a bug is in one of
|
|
the low-level modules. Try to trace it back there
|
|
by starting from the high-level module.
|
|
|
|
|
|
|
|
(Much more to be written...)
|
|
|
|
|
|
#NEW
|
|
==How to write a resource grammar?==
|
|
|
|
Start with a more limited goal, e.g. to implement
|
|
the ``stoneage`` grammar (``examples/stoneage``)
|
|
for your language.
|
|
|
|
|
|
|
|
For this, you need
|
|
|
|
- most of ``Types``
|
|
- most of ``Morpho``
|
|
- some of ``Syntax``
|
|
- most of ``Paradigms``
|
|
|
|
|
|
|
|
|
|
A useful command to test ``oper``s:
|
|
```
|
|
i -retain MorphoRot.gf
|
|
cc regNoun "foo"
|
|
```
|
|
|
|
|
|
|
|
See also [../../resource-1.0/doc/Resource-HOWTO.html Resource-HOWTO]
|
|
(under construction).
|
|
|
|
|
|
#NEW
|
|
==The use of parametrized modules==
|
|
|
|
In two language families, a lot of code is shared.
|
|
- Romance: French, Italian, Spanish
|
|
- Scandinavian: Danish, Norwegian, Swedish
|
|
|
|
|
|
The structure looks like this.
|
|
|
|
[]
|
|
|
|
|
|
#NEW
|
|
==Current status==
|
|
|
|
| Language | v0.6 | v0.9 | v1.0 | Paradigms | Lexicon | Verbs |
|
|
| Arabic | - | - | + | X | X | -
|
|
| Danish | - | X | X | X | X | X
|
|
| English | X | X | X | X | X | X
|
|
| Finnish | X | + | X | X | X | 0
|
|
| French | X | X | X | X | X | X
|
|
| German | X | - | X | X | X | X
|
|
| Italian | X | X | X | X | X | X
|
|
| Norwegian | - | X | X | X | X | X
|
|
| Russian | X | X | X | X | X | -
|
|
| Spanish | - | X | X | X | X | X
|
|
| Swedish | X | X | X | X | X | X
|
|
|
|
X = implemented (few exceptions may occur)
|
|
|
|
+ = implemented for a large part
|
|
|
|
* = linguistic material ready for implementation
|
|
|
|
- = not implemented
|
|
|
|
0 = not applicable
|
|
|
|
|
|
#NEW
|
|
==Known bugs and limitations==
|
|
|
|
(//The listed limitations are ones that do not follow from the table on
|
|
the previous page//.)
|
|
|
|
Danish
|
|
|
|
English
|
|
|
|
Finnish:
|
|
compiling the heuristic regular paradigms is slow;
|
|
possessive and interrogative suffixes have no proper lexer.
|
|
|
|
French:
|
|
no inverted questions
|
|
|
|
German
|
|
|
|
Italian:
|
|
no binding of clitics with infinitive
|
|
|
|
Norwegian
|
|
|
|
Russian:
|
|
missing rules for ordinal numbers
|
|
|
|
Spanish
|
|
|
|
Swedish
|
|
|
|
|
|
|
|
#NEW
|
|
==Obtaining it==
|
|
|
|
Get the grammar package from
|
|
[http://sourceforge.net/project/showfiles.php?group_id=132285
|
|
GF Download Page]. The current libraries are in
|
|
``lib/resource-1.0``. Version 0.9 is in
|
|
``lib/resource-0.9``. Version 0.6 is in
|
|
``lib/resource-0.6``.
|
|
|
|
|
|
|
|
The very very latest version of GF and its libraries is in the
|
|
[Darcs repository http://www.cs.chalmers.se/Cs/Research/Language-technology/darcs/GF/doc/darcs.html].
|
|
|