diff --git a/lib/resource-1.0/doc/gslt-sem-2006.html b/lib/resource-1.0/doc/gslt-sem-2006.html new file mode 100644 index 000000000..ce3514db5 --- /dev/null +++ b/lib/resource-1.0/doc/gslt-sem-2006.html @@ -0,0 +1,1153 @@ + + +
+ ++ +
++Current funding +
++Previous funding +
++Main applications +
++ +
++Staff contributions to grammar libraries: +
++Student projects on grammar libraries: +
++Technology, also: +
++ +
++The main device of division of labour in programming. +
++Instead of writing a sorting algorithm over and over again, +the programmers take it from a library. You write (in Haskell), +
++ Data.List.sort xs ++
+instead of a lot of code actually implementing sorting. +
++Practical advantages: +
++ +
++Libraries promote abstraction: you abstract away from details. +
++The use of libraries is therefore a good programming style. +
++It is also scientifically interesting to create libraries: +you have to think about abstractions on your domain of expertise. +
++Notice: libraries can bring abstraction to almost any language, +if it just has a support for functions or macros. +
++ +
++Example: we want to create a GUI (Graphical User Interface) button +that says yes, and localize it to different languages: +
++ Yes Ja Kyllä Oui Ja Sė ++
+Possible ways to do this: +
++ yesButton english = button "Yes" + yesButton swedish = button "Ja" + yesButton finnish = button "Kyllä" ++ +
+ +
+
+3. Use a library Text such that you can write
+
+ yesButton lang = button (Text.render lang Text.Yes) ++
+The library has an API (Application Programmer's Interface) with: +
++ Yes : Text + No : Text ++
+ render : Language -> Text -> String ++
+ +
++This is what you often see as a feedback from a program: +
++ You have 1 messages. ++
+Or perhaps with a little more thought: +
++ You have 1 message(s). ++
+The code that should be written is of course +
++ mess n = "You have" +++ show n +++ messages ++ "." + where + messages = if n==1 then "message" else "messages" ++
+(E.g. VoiceXML supports this.) +
++ +
++The same as with "Yes": you have to know the words "you", +"have", "message". +
++Moreover, you have to know the inflection of the equivalent +of "message": +
++ if n == 1 then "meddelande" else "meddelanden" ++
+Moreover, you have to know the congruence with different numbers +(e.g. Arabic): +
++ if n == 1 then "risAlaö" else + if n == 2 then "risAlatAn" else + if n < 11 then "rasA'il" else + "risAlaö" ++ +
+ +
++You also have to know the case required by the verb "have" +e.g. Finnish: +
++ 1 viesti -- nominative + 4 viestiä -- partitive ++
+Moreover, you have to know what is the proper way to politely +address the user: +
++ Du har 3 meddelanden / Ni har 3 meddelanden + Vous avez 3 messages / Tu as 3 messages ++
+(This can also depend on country and the kind of program.) +
++ +
++In analogy with the "Yes" case, you write +
++ mess lang n = render lang (Text.YouHaveMessages n) ++
+Hmm, is this so smart? What about if you want to say +
++ You have 4 documents. + You have 5 jewels. + I have 7 surprises. ++
+It is time to move from canned text to a grammar. +
++ +
++You may want to write +
++ mess lang n = render lang (Have PolYou (Num n Message)) + sword lang n = render lang (Have FamYou (Num n Jewel)) + surpr lang n = render lang (Have I (Num n Surprise)) ++
+For this purpose, you need a library with the API +
++ Have : NounPhrase -> NounPhrase -> Sentence + + PolYou : NounPhrase + FamYou : NounPhrase + + Num : Int -> Noun -> NounPhrase + + Message : Noun + Jewel : Noun ++ +
+ +
++The library API for language will certainly grow big and become +difficult to use. Why couldn't I just write +
++ mess lang n = render lang (parse english "you have n messages") ++
+To this end, the API should provide the top-level function +
++ parse : Language -> String -> Sentence ++
+The library that we will present actually has this as well! +
++ +
+
+The only complication is that parse does not always return
+just one sentence. Those may be zero:
+
+ "you have n mesaggse" + ++
+or many: +
++ "you have n messages" + + Have PolYou (Num n Message) + Have FamYou (Num n Message) + Have PlurYou (Num n Message) ++
+Thus some amount of interaction is needed. +
++ +
++The library has construction functions like +
++ Have : NounPhrase -> NounPhrase -> Sentence + PolYou : NounPhrase ++
+These functions build grammatical structures, which +can have different realizations in different languages. +
++Therefore we also need realization functions, +
++ render : Language -> Sentence -> String + parse : Language -> String -> [Sentence] ++
+Both of them require linguistic expertise to write - but, +one this is done, they can be used with very little linguistic +knowledge by application programmers! +
++ +
++GF = Grammatical Framework +
++Those who know GF have already seen the introduction as a +seduction argument leading to GF. +
++In GF, +
++ +
++Simplest possible example: +
+
+ abstract Text = {
+ cat Text ;
+ fun Yes : Text ;
+ fun No : Text ;
+ }
+
+ concrete TextEng of Text = {
+ lin Yes = ss "yes" ;
+ lin No = ss "no" ;
+ }
+
+ concrete TextFin of Text = {
+ lin Yes = ss "kyllä" ;
+ lin No = ss "ei" ;
+ }
+
+
++ +
+
+The realizatin function is, for each language, implemented by
+linearization rules (lin).
+
+The linearization rules directly give the render method:
+
+ render english x = TextEng.lin x ++
+The GF formalism moreover has the property of reversibility: +
++ +
++multilingual grammar = abstract syntax + concrete syntaxes +
++Examples of the idea: +
++ +
++An abstract syntax has other names: +
++The concrete syntax defines how the ontology +is represented in a language. +
++The following requirements are made: +
++Benefit: translation via semantic model of domain can reach high quality. +
++Problem: the expertise of both a linguist and a domain expert are required. +
++ +
++Arithmetic of natural numbers: abstract syntax +
++ cat Prop ; Nat ; + fun Even : Nat -> Prop ; ++
+Concrete syntax: mapping from abstract syntax trees to strings in a language +(English, French, German, Swedish,...) +
+
+ lin Even x = {s = x.s ++ "is" ++ "even"} ;
+ lin Even x = {s = x.s ++ "est" ++ "pair"} ;
+ lin Even x = {s = x.s ++ "ist" ++ "gerade"} ;
+ lin Even x = {s = x.s ++ "är" ++ "jämnt"} ;
+
+
++ +
++We can translate using the abstract syntax as interlingua: +
++ 4 is even 4 ist gerade + \ / + Even (NInt 4) + / \ + 4 est pair 4 är jämnt ++
+This idea is used e.g. in the WebALT project to generate mathematical +teaching material in 7 languages. +
++But is it really so simple? +
++ +
++The previous multilingual grammar breaks these rules in many situations: +
++ 2 and 3 is even + la somme de 3 et de 5 est pair + wenn 2 ist gerade, dann 2+2 ist gerade + om x är jämnt, summan av x och 2 är jämnt ++
+All these sentences are grammatically incorrect. +
++ +
++GF can express the linguistic rules that are needed to +produce correct translations: +
++In addition to strings, we use parameters, tables, +and record types. For instance, French: +
+
+ param Mod = Ind | Subj ;
+ param Gen = Masc | Fem ;
+
+ lincat Nat = {s : Str ; g : Gen} ;
+ lincat Prop = {s : Mod => Str} ;
+
+ lin Even x = {s =
+ table {
+ m => x.s ++
+ case m of {Ind => "est" ; Subj => "soit"} ++
+ case x.g of {Masc => "pair" ; Fem => "paire"}
+ }
+ } ;
+
++Linguistic knowledge dominates in the size of this grammar. +
++ +
++Application grammar ("semantic grammar") +
++Resource grammar ("syntactic grammar") +
++ +
++The expressive power is between TAG and HPSG. +
++The language is more high-level: a modern, typed functional programming language. +
++It enables linguistic generalizations and abstractions. +
++But we don't want to bother application grammarians with these details. +
++We have built a module system that can hide details. +
++ +
++Assume the following API +
++ cat S ; NP ; A ; + + fun predA : A -> NP -> S ; + + oper regA : Str -> A ; ++
+Now implement Even for four languages
+
+ lincat + Prop = S ; + Nat = NP ; + lin + Even = predA (regA "even") ; -- English + Even = predA (regA "jämn") ; -- Swedish + Even = predA (regA "pair") ; -- French + Even = predA (regA "gerade") ; -- German ++
+Notice: the choice of adjective is domain expert knowledge. +
++ +
++What should there be in the library? +
++How do we organize and present the library? +
++Where to get the data from? +
++Extra constraint: we want open-source free software and +hence cannot use existing proprietary resources. +
++ +
++Coverage, for each language: +
++Organization: +
++Presentation: +
+gfdoc for generating HTML from grammars
++ +
++Where do we get the data from? +
++The resource grammar library is entirely open-source free software (under GNU GPL license). +
++ +
++Grammatical correctness of everything generated. +
++Semantic coverage: you can express whatever you want. +
++Usability as library for non-linguists. +
++Evaluation: tested in third-party projects. +
++ +
++Language coverage: +
++Semantic correctness: +
++ colourless green ideas sleep furiously + the time is seventy past forty-two ++
+Linguistic innovation in syntax: +
++ +
++Application grammars use domain-specific +semantics to guarantee semantic well-formedness. +
++GF incorporates a Logical Framework and can express +
++Language-independent API is a rough semantic model. +
++But we do not try to give semantics once and +for all for the whole language. +
++ +
++Grammar composition: any grammar can serve as resource to another one. +
++No fixed set of representation levels; here some examples for +
++ 2 is even + 2 är jämnt ++
+In Arithm
+
+ Even 2 ++
+In Predication (high level resource API)
+
+ predA (IntNP 2) (regA "even") + predA (IntNP 2) (regA "jämn") ++
+In Lang (ground level resource API)
+
+ UseCl TPres ASimul PPos (PredVP (UsePN (IntPN 2)) + (UseComp (CompAP (PositA (regA "even"))))) + UseCl TPres ASimul PPos (PredVP (UsePN (IntPN 2)) + (UseComp (CompAP (PositA (regA "jämn"))))) ++ +
+ +
++The current GF Resource Project covers ten languages: +
+Danish
+English
+Finnish
+French
+German
+Italian
+Norwegian (bokmål)
+Russian
+Spanish
+Swedish
++Implementation of API v 1.0 projected for the end of February. +
++In addition, we have parts (morphology) of Arabic, Estonian, Latin, and Urdu +
++ +
+
+
+
+Cf. "matrix" in BLARK, LinGo +
++ +
+ParadigmsSwe
++ mkN : (man,mannen,män,männen : Str) -> N ; -- worst-case nouns + regV : (leker : Str) -> V ; -- regular verbs ++
IrregSwe
++ angripa_V = irregV "angripa" "angrep" "angripit" ; ++
ExtNor
++ PostPoss : CN -> Pron -> NP ; -- bilen min ++
+ +
++English: negation and auxiliary vs. non-auxiliary verbs +
++Finnish: object case +
++German: double infinitives +
++Romance: clitic pronouns +
++Scandinavian: determiners +
++In particular: how to make the grammars efficient +
++ +
++For the ten languages we have considered, it is possible +to implement the current API. +
++Reservations: +
++ +
++Simplest case: use the API in the same way for all languages. +
++In practice: use the API in different ways for different languages +
++ -- Eng: x's name is y + Name x y = predNP (GenCN x (regN "name")) (StringNP y) + -- Swe: x heter y + Name x y = predV2 x heta_V2 (StringNP y) ++
+This amounts to compile-time transfer. +
++Surprisingly, writing an application grammar requires more native-speaker knowledge +than writing a resource grammar! +
++ +
++We can go even farther than share an abstract API: we can share implementations +among related languages. +
++Exploited in two families: +
++The declarations of Scandinavian syntax differences +
++ +
++We cannot anticipate all vocabulary needed in application grammars. +
++Therefore we provide high-level paradigms to add new words. +
++Example heuristic, from ParadigsSwe: +
+
+ regV : (leker : Str) -> V ;
+
+ regV leker = case leker of {
+ lek + ("a" | "ar") => conj1 (lek + "a") ;
+ lek + "er" => conj2 (lek + "a") ;
+ bo + "r" => conj3 bo
+ }
+
+
++ +
+
+ decl2Noun : Str -> N = \bil ->
+ let
+ bb : Str * Str = case bil of {
+ pojk + "e" => <pojk + "ar", bil + "n"> ;
+ nyck + "e" + l@("l" | "r") => <nyck + l + "ar",bil + "n"> ;
+ sock + "e" + "n" => <sock + "nar", sock + "nen"> ;
+ _ => <bil + "ar", bil + "en">
+ } ;
+ in mkN bil bb.p2 bb.p1 (bb.p1 + "na") ;
+
+
++ +
++ -printer=lbnf BNF Converter, thereby C/Bison, Java/JavaCup + -printer=fullform full-form lexicon, short format + -printer=xml XML: DTD for the pg command, object for st + -printer=gsl Nuance GSL speech recognition grammar + -printer=jsgf Java Speech Grammar Format + -printer=srgs_xml SRGS XML format + -printer=srgs_xml_prob SRGS XML format, with weights + -printer=slf a finite automaton in the HTK SLF format + -printer=regular a regular grammar in a simple BNF + -printer=gfc-prolog gfc in prolog format (also pg) ++ +
+ +
++Haskell, Java, Prolog +
++Parsing, generation, translation +
++Push-button creation of spoken language translators (using Nuance) +
++ +
++Can we use the libraries outside domain-specific fragments? +
++We seem to be approaching full coverage from below. +
++The resource API is not good for heavy-duty parsing (too abstract and +therefore too inefficient). +
++Two ideas: +
++ +
++The most general format is multilingual treebank generation: +
++ > gr -tr | l -multi + UseCl TCond AAnter PNeg (PredVP (DetCN (DetSg DefSg NoOrd) + (AdjCN (PositA young_A) (UseN woman_N))) (ComplV2 love_V2 (UsePron he_Pron))) + + The young woman wouldn't have loved him + Den unga kvinnan skulle inte ha älskat honom + Den unge kvinna ville ikke ha elska ham + La joven mujer no lo habría amado + La giovane donna non lo avrebbe amato + La jeune femme ne l' aurait pas aimé + Nuori nainen ei olisi rakastanut häntä ++
+This is either exhaustive or random, possibly +with probability weights attached to constructors. +
++A special case is corpus generation: just leave one language. +
++Can this be useful? Cf. Rebecca Jonson this afternoon. +
++ +
++CLE = Core Language Engine +
++LinGo Matrix project (HPSG) +
++Parsing detached from grammar (Nivre) - grammar detached from parsing +
++ +
++Stoneage grammar, based on the Swadesh word list. +
++Implemented as application on top of the resource grammar. +
++Illustrate generation and spoken-language parsing. +
+ + + +