Second Version, Gothenburg, 1 March 2005
First Draft, Gothenburg, 7 February 2005
Aarne Ranta
aarne@cs.chalmers.se
Designed to be nice for ordinary programmers to use.
Mission: to make natural-language applications available for ordinary programmers, in tasks like
cat Prop ; Nat ; fun Even : Nat -> Prop ;Concrete syntax: mapping from abstract syntax trees to strings in a language (English, French, German, Swedish,...)
lin Even x = {s = x.s ++ "is" ++ "even"} ;
lin Even x = {s = x.s ++ "est" ++ "pair"} ;
lin Even x = {s = x.s ++ "ist" ++ "gerade"} ;
lin Even x = {s = x.s ++ "är" ++ "jämnt"} ;
We can translate between language via the abstract syntax.
Is it really so simple?
The previous multilingual grammar breaks these rules in many situations:
2 and 3 is even
la somme de 3 et de 5 est pair
wenn 2 ist gerade, dann 2+2 ist gerade
om 2 är jämnt, 2+2 är jämnt
Instead of just strings, we need
parameters, tables, and record types. For instance, French:
param Mod = Ind | Subj ;
param Gen = Masc | Fem ;
lincat Nat = {s : Str ; g : Gen} ;
lincat Prop = {s : Mod => Str} ;
lin Even x = {s =
table {
m => x.s ++
case m of {Ind => "est" ; Subj => "soit"} ++
case x.g of {Masc => "pair" ; Fem => "paire"}
}
} ;
Which kind of a programmer is easier to find?
In main-stream programming, sorting algorithms are not written by hand but taken from libraries.
In the same way, we want to create grammar libraries that encapsulate basic linguistic facts.
Cf. the Java success story: the language is just a half of the success - libraries are another half.
Even x =
let jämn = case <x.n,x.g> of {
<Sg,Utr> => "jämn" ;
<Sg,Neutr> => "jämnt" ;
<Pl,_> => "jämna"
}
in
{s = table {
Main => x.s ! Nom ++ "är" ++ jämn ;
Inv => "är" ++ x.s ! Nom ++ jämn ;
Sub => x.s ! Nom ++ "är" ++ jämn
}
}
To use library functions for syntax and morphology:
Even = predA (regA "jämn") ;
How do we organize and present the library?
Where do we get the data from?
Extra constraint: we want open-source free software.
Basic lexicon of structural, common, and irregular words
Basic syntactic structures
Currently,
Semantic coverage: you can express whatever you want.
Usability as library for non-linguists.
(Bonus for linguists:) nice generalizations w.r.t. language families, using the module system of GF.
Semantic correctness
colourless green ideas sleep furiously the time is seventy past forty-two
(Warning for linguists:) theoretical innovation in syntax (and it will all be hidden anyway!)
But we do not try to give semantics once and for all for the whole language.
Instead, we expect semantics to be given in application grammars built on semantic models of different domains.
Example application: number theory
fun Even : Nat -> Prop ; -- a mathematical predicate lin Even = predA (regA "even") ; -- English translation lin Even = predA (regA "pair") ; -- French translation lin Even = predA (regA "jämn") ; -- Swedish translationHow could the resource predict that just these translations are correct in this domain?
Application grammars are built by experts of these domains who - thanks to resource grammars - do no more need to be experts in linguistics.
V ; NP ; CN ; Det ; -- verb, noun phrase, common noun, determiner
DetNP : Det -> CN -> NP ; -- combine Det and CN into NP
and_Conj : Conj ;
mkN : Str -> Str -> Str -> Str -> Gender -> N ; -- worst-case nouns mkN : Str -> N ; -- regular nouns
angripa_V = irregV "angripa" "angrep" "angripit" ;
man_N = mkN "man" "mannen" "män" "männen" masculine ;
PassBli : V2 -> NP -> VP ; -- bli överkörd av ngn
Reservations:
Alternative views on sentence formation: Clause, Verbphrase
French paradigms
example use of French paradigms
French verbs
Italian paradigms
example use of Italian paradigms
Italian verb conjugations
Norwegian paradigms
example use of Norwegian paradigms
Norwegian verbs
Spanish paradigms
Spanish verb conjugations
Swedish paradigms
example use of Swedish paradigms
Swedish verbs
i english/LangEng.gf i swedish/LangSwe.gfTest with random generation, translation, morphological analysis...
iMorpho quiz with phrases:
Translation quiz with sentences:
concrete AppNor of App = open LangNor, ParadigmsNor in {...}
No more dummy reuse modules and bulky .gfr files!
If you need to convert resource category records to/from strings, use
Predef.toStr : (L : Type) -> L -> Str ;L must be a linearization type. For instance,
toStr LangNor.CN (ModAP (PositADeg old_ADeg) (UseN car_N)) ---> "gammel bil"
> p -cat=S -v "jag ska åka till Chalmers"
unknown tokens [TS "åka",TS "Chalmers"]
> p -cat=S "jag ska gå till Danmark"
UseCl (PosTP TFuture ASimul)
(AdvCl (SPredV i_NP go_V)
(AdvPP (PrepNP to_Prep (UsePN (PNCountry Denmark)))))
Extend vocabulary at need.
åka_V = lexV "åker" ; Chalmers = regPN "Chalmers" neutrum ;
who chases mice ? whom does the lion chase ? the dog chases catsSource modules:
Just issue the following GF commands
i -src AnimalsEng.gf ;; s i -src AnimalsFre.gf ;; s i -src AnimalsSwe.gf ;; s pm | wf animals.gfcmand you get an end-user grammar animals.gfcm.
You can also write the commands in a gfs (GF script) file, say mkAnimals.gfs, and then call GF with
gf <mkAnimals.gfs
Step 2: factor out the categories and purely combinational rules into an incomplete module (to be shown... but this does not work for French, which uses different structures: e.g. Qui aime les lions ? with a definite phrase where English has Who loves lions?
| Language | v0.6 | API | Paradigms | Basic lex | Verbs |
| Danish | X | ||||
| English | X | X | X | X | X |
| Finnish | X | ||||
| French | X | X | X | X | X |
| German | X | * | |||
| Italian | X | X | X | X | X |
| Norwegian | X | X | X | X | |
| Russian | X | * | * | ||
| Spanish | X | X | X | ||
| Swedish | X | X | X | X | X |
cvs -d /users/cs/aarne/cvs checkout GF2.0/lib
To appear later at GF Homepage: