From 9d6586b43c69554437492b2bb47739d06e30ba13 Mon Sep 17 00:00:00 2001 From: aarne Date: Mon, 30 Jan 2006 15:24:46 +0000 Subject: [PATCH] added sem HTML --- lib/resource-1.0/doc/gslt-sem-2006.html | 1153 +++++++++++++++++++++++ 1 file changed, 1153 insertions(+) create mode 100644 lib/resource-1.0/doc/gslt-sem-2006.html diff --git a/lib/resource-1.0/doc/gslt-sem-2006.html b/lib/resource-1.0/doc/gslt-sem-2006.html new file mode 100644 index 000000000..ce3514db5 --- /dev/null +++ b/lib/resource-1.0/doc/gslt-sem-2006.html @@ -0,0 +1,1153 @@ + + + + +Grammars as Software Libraries + +

Grammars as Software Libraries

+ +Author: Aarne Ranta <aarne (at) cs.chalmers.se>
+Last update: Mon Jan 30 10:53:18 2006 +
+ +

+ +

+

Setting

+

+Current funding +

+ + +

+Previous funding +

+ + +

+Main applications +

+ + +

+ +

+

People

+

+Staff contributions to grammar libraries: +

+ + +

+Student projects on grammar libraries: +

+ + +

+Technology, also: +

+ + +

+ +

+

Software Libraries

+

+The main device of division of labour in programming. +

+

+Instead of writing a sorting algorithm over and over again, +the programmers take it from a library. You write (in Haskell), +

+
+    Data.List.sort xs
+
+

+instead of a lot of code actually implementing sorting. +

+

+Practical advantages: +

+ + +

+ +

+

Abstraction

+

+Libraries promote abstraction: you abstract away from details. +

+

+The use of libraries is therefore a good programming style. +

+

+It is also scientifically interesting to create libraries: +you have to think about abstractions on your domain of expertise. +

+

+Notice: libraries can bring abstraction to almost any language, +if it just has a support for functions or macros. +

+

+ +

+

Grammars as libraries?

+

+Example: we want to create a GUI (Graphical User Interface) button +that says yes, and localize it to different languages: +

+
+    Yes   Ja   Kyllä   Oui   Ja   Sė
+
+

+Possible ways to do this: +

+
    +
  1. Go around dictionaries to find the word in different languages +
    +    yesButton english = button "Yes"
    +    yesButton swedish = button "Ja"
    +    yesButton finnish = button "Kyllä"
    +
    +

    +
  2. Hire more programmers to perform localization in different languages +
+ +

+ +

+

+3. Use a library Text such that you can write +

+
+    yesButton lang = button (Text.render lang Text.Yes) 
+
+

+The library has an API (Application Programmer's Interface) with: +

+
    +
  1. A repository of text elements such as +
    +    Yes    : Text
    +    No     : Text
    +
    +
  2. A function rendering text elements in different languages: +
    +    render : Language -> Text -> String
    +
    +
+ +

+ +

+

A slightly more advanced example

+

+This is what you often see as a feedback from a program: +

+
+    You have 1 messages.
+
+

+Or perhaps with a little more thought: +

+
+    You have 1 message(s).
+
+

+The code that should be written is of course +

+
+    mess n = "You have" +++ show n +++ messages ++ "." 
+      where
+        messages = if n==1 then "message" else "messages"
+
+

+(E.g. VoiceXML supports this.) +

+

+ +

+

Problems with the more advanced example

+

+The same as with "Yes": you have to know the words "you", +"have", "message". +

+

+Moreover, you have to know the inflection of the equivalent +of "message": +

+
+    if n == 1 then "meddelande" else "meddelanden"
+
+

+Moreover, you have to know the congruence with different numbers +(e.g. Arabic): +

+
+    if n == 1 then "risAlaö" else
+    if n == 2 then "risAlatAn" else 
+    if n < 11 then "rasA'il" else
+                   "risAlaö"
+
+

+

+ +

+

More problems with the advanced example

+

+You also have to know the case required by the verb "have" +e.g. Finnish: +

+
+    1 viesti   -- nominative 
+    4 viestiä  -- partitive
+
+

+Moreover, you have to know what is the proper way to politely +address the user: +

+
+    Du har 3 meddelanden / Ni har 3 meddelanden
+    Vous avez 3 messages / Tu as 3 messages
+
+

+(This can also depend on country and the kind of program.) +

+

+ +

+

A library-based solution

+

+In analogy with the "Yes" case, you write +

+
+    mess lang n = render lang (Text.YouHaveMessages n)
+
+

+Hmm, is this so smart? What about if you want to say +

+
+    You have 4 documents.
+    You have 5 jewels.
+    I have 7 surprises.
+
+

+It is time to move from canned text to a grammar. +

+

+ +

+

An improved library-based solution

+

+You may want to write +

+
+    mess  lang n = render lang (Have PolYou (Num n Message))
+    sword lang n = render lang (Have FamYou (Num n Jewel))
+    surpr lang n = render lang (Have I      (Num n Surprise))
+
+

+For this purpose, you need a library with the API +

+
+    Have    : NounPhrase -> NounPhrase -> Sentence
+  
+    PolYou  : NounPhrase
+    FamYou  : NounPhrase
+  
+    Num     : Int -> Noun -> NounPhrase
+  
+    Message : Noun  
+    Jewel   : Noun
+
+

+

+ +

+

The ultimate solution?

+

+The library API for language will certainly grow big and become +difficult to use. Why couldn't I just write +

+
+    mess lang n = render lang (parse english "you have n messages")
+
+

+To this end, the API should provide the top-level function +

+
+    parse : Language -> String -> Sentence
+
+

+The library that we will present actually has this as well! +

+

+ +

+

+The only complication is that parse does not always return +just one sentence. Those may be zero: +

+
+    "you have n mesaggse"
+  
+
+

+or many: +

+
+    "you have n messages"
+  
+    Have PolYou  (Num n Message)
+    Have FamYou  (Num n Message)
+    Have PlurYou (Num n Message)
+
+

+Thus some amount of interaction is needed. +

+

+ +

+

The components of a grammar library

+

+The library has construction functions like +

+
+    Have   : NounPhrase -> NounPhrase -> Sentence
+    PolYou : NounPhrase
+
+

+These functions build grammatical structures, which +can have different realizations in different languages. +

+

+Therefore we also need realization functions, +

+
+    render : Language -> Sentence -> String
+    parse  : Language -> String   -> [Sentence]
+
+

+Both of them require linguistic expertise to write - but, +one this is done, they can be used with very little linguistic +knowledge by application programmers! +

+

+ +

+

Implementing a grammar library in GF

+

+GF = Grammatical Framework +

+

+Those who know GF have already seen the introduction as a +seduction argument leading to GF. +

+

+In GF, +

+ + +

+ +

+

+Simplest possible example: +

+
+    abstract Text = {
+      cat Text ;
+      fun Yes : Text ;
+      fun No  : Text ;
+      } 
+  
+    concrete TextEng of Text = {
+      lin Yes = ss "yes" ;
+      lin No  = ss "no" ;
+      } 
+  
+    concrete TextFin of Text = {
+      lin Yes = ss "kyllä" ;
+      lin No  = ss "ei" ;
+      } 
+
+

+

+ +

+

Linearization and parsing

+

+The realizatin function is, for each language, implemented by +linearization rules (lin). +

+

+The linearization rules directly give the render method: +

+
+    render english x = TextEng.lin x
+
+

+The GF formalism moreover has the property of reversibility: +

+ + +

+ +

+

Applying GF

+

+multilingual grammar = abstract syntax + concrete syntaxes +

+

+Examples of the idea: +

+ + +

+ +

+

Domain, ontology, idiom

+

+An abstract syntax has other names: +

+ + +

+The concrete syntax defines how the ontology +is represented in a language. +

+

+The following requirements are made: +

+ + +

+Benefit: translation via semantic model of domain can reach high quality. +

+

+Problem: the expertise of both a linguist and a domain expert are required. +

+

+ +

+

Example domain

+

+Arithmetic of natural numbers: abstract syntax +

+
+    cat Prop ; Nat ;
+    fun Even : Nat -> Prop ;
+
+

+Concrete syntax: mapping from abstract syntax trees to strings in a language +(English, French, German, Swedish,...) +

+
+    lin Even x = {s = x.s ++ "is"  ++ "even"} ; 
+    lin Even x = {s = x.s ++ "est" ++ "pair"} ;
+    lin Even x = {s = x.s ++ "ist" ++ "gerade"} ;
+    lin Even x = {s = x.s ++ "är"  ++ "jämnt"} ;
+
+

+

+ +

+

Translation system

+

+We can translate using the abstract syntax as interlingua: +

+
+    4 is even                  4 ist gerade
+               \              /
+                 Even (NInt 4)
+               /              \
+    4 est pair                  4 är jämnt
+
+

+This idea is used e.g. in the WebALT project to generate mathematical +teaching material in 7 languages. +

+

+But is it really so simple? +

+

+ +

+

Difficulties with concrete syntax

+

+The previous multilingual grammar breaks these rules in many situations: +

+
+    2 and 3 is even
+    la somme de 3 et de 5 est pair
+    wenn 2 ist gerade, dann 2+2 ist gerade
+    om x är jämnt, summan av x och 2 är jämnt
+
+

+All these sentences are grammatically incorrect. +

+

+ +

+

Solving the difficulties

+

+GF can express the linguistic rules that are needed to +produce correct translations: +

+

+In addition to strings, we use parameters, tables, +and record types. For instance, French: +

+
+    param Mod = Ind | Subj ;
+    param Gen = Masc | Fem ;
+  
+    lincat Nat  = {s : Str ; g : Gen} ;
+    lincat Prop = {s : Mod => Str} ;
+  
+    lin Even x = {s =
+        table {
+          m => x.s ++
+               case m   of {Ind  => "est" ;  Subj => "soit"} ++
+               case x.g of {Masc => "pair" ; Fem  => "paire"}
+          }
+        } ;
+
+

+Linguistic knowledge dominates in the size of this grammar. +

+

+ +

+

Application grammars vs. resource grammars

+

+Application grammar ("semantic grammar") +

+ + +

+Resource grammar ("syntactic grammar") +

+ + +

+ +

+

GF as programming language

+

+The expressive power is between TAG and HPSG. +

+

+The language is more high-level: a modern, typed functional programming language. +

+

+It enables linguistic generalizations and abstractions. +

+

+But we don't want to bother application grammarians with these details. +

+

+We have built a module system that can hide details. +

+

+ +

+

Concrete syntax using library

+

+Assume the following API +

+
+    cat S ; NP ; A ;
+  
+    fun predA : A -> NP -> S ;
+  
+    oper regA : Str -> A ;
+
+

+Now implement Even for four languages +

+
+    lincat
+      Prop = S ;
+      Nat  = NP ;
+    lin
+      Even = predA (regA "even") ;   -- English
+      Even = predA (regA "jämn") ;   -- Swedish
+      Even = predA (regA "pair") ;   -- French
+      Even = predA (regA "gerade") ; -- German
+
+

+Notice: the choice of adjective is domain expert knowledge. +

+

+ +

+

Design questions for the grammar library

+

+What should there be in the library? +

+ + +

+How do we organize and present the library? +

+ + +

+Where to get the data from? +

+ + +

+Extra constraint: we want open-source free software and +hence cannot use existing proprietary resources. +

+

+ +

+

Design decisions

+

+Coverage, for each language: +

+ + +

+Organization: +

+ + +

+Presentation: +

+ + +

+ +

+

Design decisions, cont'd

+

+Where do we get the data from? +

+ + +

+The resource grammar library is entirely open-source free software (under GNU GPL license). +

+

+ +

+

Success criteria and evaluation

+

+Grammatical correctness of everything generated. +

+

+Semantic coverage: you can express whatever you want. +

+

+Usability as library for non-linguists. +

+

+Evaluation: tested in third-party projects. +

+

+ +

+

These are not our success criteria

+

+Language coverage: +

+ + +

+Semantic correctness: +

+ + +

+Linguistic innovation in syntax: +

+ + +

+ +

+

Where is semantics?

+

+Application grammars use domain-specific +semantics to guarantee semantic well-formedness. +

+

+GF incorporates a Logical Framework and can express +

+ + +

+Language-independent API is a rough semantic model. +

+

+But we do not try to give semantics once and +for all for the whole language. +

+

+ +

+

Representations in different APIs

+

+Grammar composition: any grammar can serve as resource to another one. +

+

+No fixed set of representation levels; here some examples for +

+
+    2 is even
+    2 är jämnt
+
+

+In Arithm +

+
+    Even 2
+
+

+In Predication (high level resource API) +

+
+    predA (IntNP 2) (regA "even")
+    predA (IntNP 2) (regA "jämn")
+
+

+In Lang (ground level resource API) +

+
+    UseCl TPres ASimul PPos (PredVP (UsePN (IntPN 2)) 
+      (UseComp (CompAP (PositA (regA "even")))))
+    UseCl TPres ASimul PPos (PredVP (UsePN (IntPN 2)) 
+      (UseComp (CompAP (PositA (regA "jämn")))))
+
+

+

+ +

+

Languages

+

+The current GF Resource Project covers ten languages: +

+ + +

+Implementation of API v 1.0 projected for the end of February. +

+

+In addition, we have parts (morphology) of Arabic, Estonian, Latin, and Urdu +

+

+ +

+

Library structure 1: language-independent API

+

+ +

+

+Resource index page +

+

+Examples of each category +

+

+Cf. "matrix" in BLARK, LinGo +

+

+ +

+

Library structure 2: language-dependent APIs

+ + +

+ +

+

Difficulties encountered

+

+English: negation and auxiliary vs. non-auxiliary verbs +

+

+Finnish: object case +

+

+German: double infinitives +

+

+Romance: clitic pronouns +

+

+Scandinavian: determiners +

+

+In particular: how to make the grammars efficient +

+

+ +

+

How much can be language-independent?

+

+For the ten languages we have considered, it is possible +to implement the current API. +

+

+Reservations: +

+ + + + +

+ +

+

Using the library

+

+Simplest case: use the API in the same way for all languages. +

+ + +

+In practice: use the API in different ways for different languages +

+
+    -- Eng: x's name is y
+    Name x y = predNP (GenCN x (regN "name")) (StringNP y) 
+    -- Swe: x heter y
+    Name x y = predV2 x heta_V2 (StringNP y)               
+
+

+This amounts to compile-time transfer. +

+

+Surprisingly, writing an application grammar requires more native-speaker knowledge +than writing a resource grammar! +

+

+ +

+

Parametrized modules

+

+We can go even farther than share an abstract API: we can share implementations +among related languages. +

+

+Exploited in two families: +

+ + +

+The declarations of Scandinavian syntax differences +

+

+ +

+

Lexicon extension

+

+We cannot anticipate all vocabulary needed in application grammars. +

+

+Therefore we provide high-level paradigms to add new words. +

+

+Example heuristic, from ParadigsSwe: +

+
+    regV : (leker : Str) -> V ;
+  
+    regV leker = case leker of {
+      lek + ("a" | "ar")  => conj1 (lek + "a") ;
+      lek + "er"          => conj2 (lek + "a") ;
+      bo  + "r"           => conj3 bo
+      }
+
+

+

+ +

+

Example low-level morphological definition

+
+    decl2Noun : Str -> N = \bil ->
+      let 
+        bb : Str * Str = case bil of {
+          pojk + "e"                 => <pojk + "ar",    bil  + "n"> ;
+          nyck + "e" + l@("l" | "r") => <nyck + l + "ar",bil  + "n"> ;
+          sock + "e" + "n"           => <sock + "nar",   sock + "nen"> ;
+          _                          => <bil + "ar",     bil  + "en">
+          } ;
+      in mkN bil bb.p2 bb.p1 (bb.p1 + "na") ;
+
+

+

+ +

+

Some formats that can be generated from GF grammars

+
+  -printer=lbnf           BNF Converter, thereby C/Bison, Java/JavaCup
+  -printer=fullform       full-form lexicon, short format
+  -printer=xml            XML: DTD for the pg command, object for st
+  -printer=gsl            Nuance GSL speech recognition grammar
+  -printer=jsgf           Java Speech Grammar Format
+  -printer=srgs_xml       SRGS XML format
+  -printer=srgs_xml_prob  SRGS XML format, with weights
+  -printer=slf            a finite automaton in the HTK SLF format
+  -printer=regular        a regular grammar in a simple BNF
+  -printer=gfc-prolog     gfc in prolog format (also pg)
+
+

+

+ +

+

Use as program components

+

+Haskell, Java, Prolog +

+

+Parsing, generation, translation +

+

+Push-button creation of spoken language translators (using Nuance) +

+

+ +

+

Grammar library as linguistic resource

+

+Can we use the libraries outside domain-specific fragments? +

+

+We seem to be approaching full coverage from below. +

+

+The resource API is not good for heavy-duty parsing (too abstract and +therefore too inefficient). +

+

+Two ideas: +

+ + +

+ +

+

Corpus generation

+

+The most general format is multilingual treebank generation: +

+
+    > gr -tr | l -multi
+    UseCl TCond AAnter PNeg (PredVP (DetCN (DetSg DefSg NoOrd) 
+      (AdjCN (PositA young_A) (UseN woman_N))) (ComplV2 love_V2 (UsePron he_Pron)))
+  
+    The young woman wouldn't have loved him
+    Den unga kvinnan skulle inte ha älskat honom
+    Den unge kvinna ville ikke ha elska ham
+    La joven mujer no lo habría amado
+    La giovane donna non lo avrebbe amato
+    La jeune femme ne l' aurait pas aimé
+    Nuori nainen ei olisi rakastanut häntä
+
+

+This is either exhaustive or random, possibly +with probability weights attached to constructors. +

+

+A special case is corpus generation: just leave one language. +

+

+Can this be useful? Cf. Rebecca Jonson this afternoon. +

+

+ +

+

Related work

+

+CLE = Core Language Engine +

+ + +

+LinGo Matrix project (HPSG) +

+ + +

+Parsing detached from grammar (Nivre) - grammar detached from parsing +

+

+ +

+

Demo

+

+Stoneage grammar, based on the Swadesh word list. +

+

+Implemented as application on top of the resource grammar. +

+

+Illustrate generation and spoken-language parsing. +

+ + + +