changed names of resource-1.3; added a note on homepage on release

2026-05-07 02:02:51 -06:00 · 2008-06-25 16:54:35 +00:00
parent 7d721eb16e
commit c5c6d13546
1729 changed files with 113 additions and 32 deletions
--- a/lib/resource-1.3/doc/gslt-sem-2006.txt
+++ b/lib/resource-1.3/doc/gslt-sem-2006.txt
@@ -0,0 +1,933 @@
+Grammars as Software Libraries
+Author: Aarne Ranta <aarne (at) cs.chalmers.se>
+Last update: %%date(%c)
+
+% NOTE: this is a txt2tags file.
+% Create an html file from this file using:
+% txt2tags --toc gslt-sem-2006.txt
+
+%!target:html
+
+%!postproc(html): #NEW <!-- NEW -->
+
+#NEW
+
+==Setting==
+
+Current funding
+- VR: Library-Based Grammar Engineering (2006-2008) 
+  - Lars Borin (Swedish)
+  - Robin Cooper (Computational Linguistics)
+  - Sibylle Schupp and Aarne Ranta (Computer Science)
+
+
+Previous funding
+- VR: Record Types and Dialogue Semantics (2003-2005)
+- VINNOVA: Interactive Language Technology (2001-2004)
+
+
+Main applications
+- TALK: multilingual and multimodal dialogue systems
+- WebALT: multilingual generation of mathematical teaching material
+- KeY: multilingual authoring of software specifications
+
+
+#NEW
+
+==People==
+
+Staff contributions to grammar libraries:
+- Björn Bringert
+- Markus Forsberg
+- Harald Hammarström
+- Janna Khegai
+- Aarne Ranta
+
+
+Student projects on grammar libraries:
+- Inger Andersson & Therese Söderberg: Spanish morphology
+- Ludmilla Bogavac: Russian morphology
+- Karin Cavallin: comparison with Svenska Akademins Grammatik
+- Ali El Dada: Arabic morphology and syntax
+- Muhammad Humayoun: Urdu morphology
+- Michael Pellauer: Estonian morphology
+
+
+Technology, also:
+- Håkan Burden
+- Hans-Joachim Daniels
+- Kristofer Johannisson
+- Peter Ljunglöf
+
+
+Various grammar library contributions from the multilingual Chalmers community:
+- Ana Bove, Koen Claessen, Carlos Gonzalía, Patrik Jansson, 
+Wojciech Mostowski, Karol Ostrovský, David Wahlstedt
+
+
+Resource library patches and suggestions from the WebALT staff:
+- Lauri Carlson, Glòria Casanellas, Anni Laine, Wanjiku Ng'ang'a, Jordi Saludes
+
+
+#NEW
+
+==Software Libraries==
+
+The main device of **division of labour** in programming.
+
+Instead of writing a sorting algorithm over and over again,
+the programmers take it from a library. You write (in Haskell),
+```
+  Data.List.sort xs
+```
+instead of a lot of code actually implementing sorting.
+
+Practical advantages:
+- faster development of new software
+- quality guarantee and automatic improvements
+
+
+#NEW
+
+==Abstraction==
+
+Libraries promote **abstraction**: you abstract away from details.
+
+The use of libraries is therefore a good programming style.
+
+It is also **scientifically interesting** to create libraries:
+you have to think about abstractions on your domain of expertise.
+
+Notice: libraries can bring abstraction to almost any language,
+if it just has a support for functions or macros.
+
+
+#NEW
+
+==Grammars as libraries?==
+
+Example: we want to create a GUI (Graphical User Interface) button
+that says //yes//, and **localize** it to different languages:
+```
+  Yes   Ja   Kyllä   Oui   Ja   Sì
+```
+Possible ways to do this:
+ Go around dictionaries to find the word in different languages
+```
+  yesButton english = button "Yes"
+  yesButton swedish = button "Ja"
+  yesButton finnish = button "Kyllä"
+```
+
+ Hire more programmers to perform localization in different languages
+
+
+#NEW
+
+3. Use a library ``Text`` such that you can write
+```
+  yesButton lang = button (Text.render lang Text.Yes) 
+```
+The library has an API (Application Programmer's Interface) with:
+ A repository of text elements such as
+```
+  Yes    : Text
+  No     : Text
+```
+ A function rendering text elements in different languages:
+```
+  render : Language -> Text -> String
+```
+
+
+#NEW
+
+==A slightly more advanced example==
+
+This is what you often see as a feedback from a program:
+```
+  You have 1 messages.
+```
+Or perhaps with a little more thought:
+```
+  You have 1 message(s).
+```
+The code that should be written is of course
+```
+  mess n = "You have" +++ show n +++ messages ++ "." 
+    where
+      messages = if n==1 then "message" else "messages"
+```
+(E.g. VoiceXML supports this.)
+
+
+#NEW
+
+==Problems with the more advanced example==
+
+The same as with "Yes": you have to know the words "you",
+"have", "message".
+
+//Moreover//, you have to know the inflection of the equivalent
+of "message":
+```
+  if n == 1 then "meddelande" else "meddelanden"
+```
+//Moreover//, you have to know the congruence with different numbers
+(e.g. Arabic):
+```
+  if n == 1 then "risAlaö" else
+  if n == 2 then "risAlatAn" else 
+  if n < 11 then "rasA'il" else
+                 "risAlaö"
+```
+
+#NEW
+
+==More problems with the advanced example==
+
+You also have to know the case required by the verb "have" 
+e.g. Finnish: 
+```
+  1 viesti   -- nominative 
+  4 viestiä  -- partitive
+```
+//Moreover//, you have to know what is the proper way to politely
+address the user:
+```
+  Du har 3 meddelanden / Ni har 3 meddelanden
+  Vous avez 3 messages / Tu as 3 messages
+```
+(This can also depend on country and the kind of program.)
+
+
+#NEW
+
+==A library-based solution==
+
+In analogy with the "Yes" case, you write
+```
+  mess lang n = render lang (Text.YouHaveMessages n)
+```
+Hmm, is this so smart? What about if you want to say
+```
+  You have 4 documents.
+  You have 5 jewels.
+  I have 7 surprises.
+```
+It is time to move from **canned text** to a **grammar**.
+
+
+
+#NEW
+
+==An improved library-based solution==
+
+You may want to write
+```
+  mess  lang n = render lang (Have PolYou (Num n Message))
+  sword lang n = render lang (Have FamYou (Num n Jewel))
+  surpr lang n = render lang (Have I      (Num n Surprise))
+```
+For this purpose, you need a library with the API
+```
+  Have    : NounPhrase -> NounPhrase -> Sentence
+
+  PolYou  : NounPhrase
+  FamYou  : NounPhrase
+
+  Num     : Int -> Noun -> NounPhrase
+
+  Message : Noun  
+  Jewel   : Noun
+```
+
+
+#NEW
+
+==The ultimate solution?==
+
+The library API for language will certainly grow big and become
+difficult to use. Why couldn't I just write
+```
+  mess lang n = render lang (parse english "you have n messages")
+```
+To this end, the API should provide the top-level function
+```
+  parse : Language -> String -> Sentence
+```
+The library that we will present actually has this as well!
+
+
+#NEW
+
+The only complication is that ``parse`` does not always return
+just one sentence. Those may be zero:
+```
+  "you have n mesaggse"
+
+```
+or many:
+```
+  "you have n messages"
+
+  Have PolYou  (Num n Message)
+  Have FamYou  (Num n Message)
+  Have PlurYou (Num n Message)
+```
+Thus some amount of interaction is needed.
+
+
+#NEW
+
+==The components of a grammar library==
+
+The library has **construction functions** like
+```
+  Have   : NounPhrase -> NounPhrase -> Sentence
+  PolYou : NounPhrase
+```
+These functions build **grammatical structures**, which
+can have different realizations in different languages.
+
+Therefore we also need **realization functions**, 
+```
+  render : Language -> Sentence -> String
+  parse  : Language -> String   -> [Sentence]
+```
+Both of them require linguistic expertise to write - but,
+one this is done, they can be used with very little linguistic
+knowledge by application programmers!
+
+
+#NEW
+
+==Implementing a grammar library in GF==
+
+GF = Grammatical Framework
+
+Those who know GF have already seen the introduction as a
+seduction argument leading to GF.
+
+In GF,
+- construction functions = **abstract syntax**
+- realization functions = **concrete syntax**
+
+
+#NEW
+
+Simplest possible example:
+```
+  abstract Text = {
+    cat Text ;
+    fun Yes : Text ;
+    fun No  : Text ;
+    } 
+
+  concrete TextEng of Text = {
+    lin Yes = ss "yes" ;
+    lin No  = ss "no" ;
+    } 
+
+  concrete TextFin of Text = {
+    lin Yes = ss "kyllä" ;
+    lin No  = ss "ei" ;
+    } 
+```
+
+
+#NEW
+
+==Linearization and parsing==
+
+The realizatin function is, for each language, implemented by
+**linearization rules** (``lin``).
+
+The linearization rules directly give the ``render`` method:
+```
+  render english x = TextEng.lin x
+```
+The GF formalism moreover has the property of **reversibility**:
+- a set of linearization rules automatically generates a parser.
+
+
+%While reversibility has a minor importance for the applications
+%shown above, it is crucial for other applications of GF grammars.
+
+
+#NEW
+
+==Applying GF==
+
+**multilingual grammar** = abstract syntax + concrete syntaxes
+
+Examples of the idea: 
+- domain-specific translation
+- multilingual authoring
+- dialogue systems
+
+
+
+#NEW
+
+==Domain, ontology, idiom==
+
+An abstract syntax has other names:
+- a **semantic model**
+- an **ontology**
+
+
+The concrete syntax defines how the ontology
+is represented in a language.
+
+The following requirements are made:
+- linguistic correctness (inflection, agreement, word order,...)
+- semantic correctness (express the concepts properly)
+- conformance to the domain idiom (use proper terms and phrasing)
+
+
+Benefit: translation via semantic model of domain can reach high quality.
+
+Problem: the expertise of both a linguist and a domain expert are required. 
+
+
+#NEW
+
+==Example domain==
+
+Arithmetic of natural numbers: abstract syntax
+```
+  cat Prop ; Nat ;
+  fun Even : Nat -> Prop ;
+```
+**Concrete syntax**: mapping from abstract syntax trees to strings in a language
+(English, French, German, Swedish,...)
+```
+  lin Even x = {s = x.s ++ "is"  ++ "even"} ; 
+  lin Even x = {s = x.s ++ "est" ++ "pair"} ;
+  lin Even x = {s = x.s ++ "ist" ++ "gerade"} ;
+  lin Even x = {s = x.s ++ "är"  ++ "jämnt"} ;
+```
+
+#NEW
+
+==Translation system==
+
+We can translate using the abstract syntax as interlingua:
+```
+  4 is even                  4 ist gerade
+             \              /
+               Even (NInt 4)
+             /              \
+  4 est pair                  4 är jämnt
+```
+This idea is used e.g. in the WebALT project to generate mathematical
+teaching material in 7 languages.
+
+But is it really so simple?
+
+
+#NEW
+==Difficulties with concrete syntax==
+
+The previous multilingual grammar breaks these rules in many situations:
+```
+  2 and 3 is even
+  la somme de 3 et de 5 est pair
+  wenn 2 ist gerade, dann 2+2 ist gerade
+  om x är jämnt, summan av x och 2 är jämnt
+```
+All these sentences are grammatically incorrect.
+
+
+
+#NEW
+
+==Solving the difficulties==
+
+GF //can// express the linguistic rules that are needed to
+produce correct translations:
+
+In addition to strings, we use **parameters**, **tables**,
+and **record types**. For instance, French:
+```
+  param Mod = Ind | Subj ;
+  param Gen = Masc | Fem ;
+
+  lincat Nat  = {s : Str ; g : Gen} ;
+  lincat Prop = {s : Mod => Str} ;
+
+  lin Even x = {s =
+      table {
+        m => x.s ++
+             case m   of {Ind  => "est" ;  Subj => "soit"} ++
+             case x.g of {Masc => "pair" ; Fem  => "paire"}
+        }
+      } ;
+```
+Linguistic knowledge dominates in the size of this grammar.
+
+
+#NEW
+
+==Application grammars vs. resource grammars==
+
+Application grammar ("semantic grammar")
+- abstract syntax: domain semantics
+- concrete syntax: "controlled language"
+- author: domain expert
+
+
+Resource grammar ("syntactic grammar")
+- abstract syntax: linguistic structures
+- concrete syntax: (approximation of) entire language
+- author: linguist
+
+
+#NEW
+==GF as programming language==
+
+The expressive power is between TAG and HPSG.
+
+The language is more high-level: a modern, **typed functional programming language**.
+
+It enables linguistic generalizations and abstractions.
+
+But we don't want to bother application grammarians with these details.
+
+We have built a **module system** that can hide details.
+
+
+#NEW
+
+==Concrete syntax using library==
+
+Assume the following API
+```
+  cat S ; NP ; A ;
+
+  fun predA : A -> NP -> S ;
+
+  oper regA : Str -> A ;
+```
+Now implement ``Even`` for four languages
+```
+  lincat
+    Prop = S ;
+    Nat  = NP ;
+  lin
+    Even = predA (regA "even") ;   -- English
+    Even = predA (regA "jämn") ;   -- Swedish
+    Even = predA (regA "pair") ;   -- French
+    Even = predA (regA "gerade") ; -- German
+```
+Notice: the choice of adjective is domain expert knowledge.
+
+
+#NEW
+==Design questions for the grammar library==
+
+What should there be in the library?
+- morphology, lexicon, syntax, semantics,...
+
+
+How do we organize and present the library?
+- division into modules, level of granularity
+- "school grammar" vs. sophisticated linguistic concepts
+
+
+Where to get the data from?
+- automatic extraction or hand-writing?
+- reuse of existing resources?
+
+
+Extra constraint: we want open-source free software and
+hence cannot use existing proprietary resources.
+
+
+#NEW
+==Design decisions==
+
+Coverage, for each language:
+- complete morphology
+- lexicon of the most important structural words
+- test lexicon of ca. 300 content words
+- representative fragment of syntax (cf. CLE (Core Language Engine))
+- rather flat semantics (cf. Quasi-Logical Form of CLE)
+
+
+Organization:
+- top-level (API) modules
+- Ground API + special-purpose APIs ("macro packages")
+- "school grammar" concepts rather than advanced linguistic theory
+
+
+Presentation:
+- tool ``gfdoc`` for generating HTML from grammars
+- example collections
+
+
+#NEW
+==Design decisions, cont'd==
+
+Where do we get the data from?
+- morphology and syntax are hand-written
+- the test lexicon is hand-written
+- APIs for manual lexicon extension
+- tool for automatic lexicon extraction
+- we have not reused existing resources
+
+
+The resource grammar library is entirely open-source free software 
+(under GNU GPL license).
+
+
+
+
+
+#NEW
+==Success criteria and evaluation==
+
+Grammatical correctness of everything generated.
+
+Semantic coverage: you can express whatever you want.
+
+Usability as library for non-linguists.
+
+Evaluation: tested in third-party projects.
+
+Tools for regression testing (treebank generation and comparison)
+
+
+
+#NEW
+==These are not our success criteria==
+
+Language coverage: 
+- to be able to parse all expressions.
+- Example: French //passé simple//, although covered by the
+morphology, is not available through the language-independent API.
+- But: reconsidered to improve example-based grammar writing
+
+
+Semantic correctness: 
+- only to produce meaningful expressions.
+- Example: the following sentences can be generated
+```
+  colourless green ideas sleep furiously
+  the time is seventy past forty-two
+```
+
+
+Linguistic innovation in syntax:
+- rather a presentation of "known facts"
+- innovation would be hidden from users anyway...
+
+
+
+#NEW
+==Where is semantics?==
+
+Application grammars use domain-specific
+semantics to guarantee semantic well-formedness.
+
+GF incorporates a **Logical Framework** and can express
+- logical semantics //à la// Montague
+- anaphora and discourse using dependent types
+
+
+Language-independent API is a rough semantic model.
+
+But we do //not// try to give semantics once and
+for all for the whole language.
+
+
+#NEW
+==Representations in different APIs==
+
+**Grammar composition**: any grammar can serve as resource to another one.
+
+No fixed set of representation levels; here some examples for
+```
+  2 is even
+  2 är jämnt
+```
+In ``Arithm``
+```
+  Even 2
+```
+In ``Predication`` (high level resource API)
+```
+  predA (IntNP 2) (regA "even")
+  predA (IntNP 2) (regA "jämn")
+```
+In ``Lang`` (ground level resource API)
+```
+  UseCl TPres ASimul PPos (PredVP (UsePN (IntPN 2)) 
+    (UseComp (CompAP (PositA (regA "even")))))
+  UseCl TPres ASimul PPos (PredVP (UsePN (IntPN 2)) 
+    (UseComp (CompAP (PositA (regA "jämn")))))
+```
+
+
+
+#NEW
+==Languages==
+
+The current GF Resource Project covers ten languages:
+- ``Dan``ish
+- ``Eng``lish
+- ``Fin``nish
+- ``Fre``nch
+- ``Ger``man
+- ``Ita``lian
+- ``Nor``wegian (bokmål)
+- ``Rus``sian
+- ``Spa``nish
+- ``Swe``dish
+
+
+Implementation of API v 1.0 projected for the end of February.
+
+In addition, we have parts (morphology) of Arabic, Estonian, Latin, and Urdu
+
+
+#NEW
+==Library structure 1: language-independent API==
+
+[Lang.png]
+
+[Resource index page index.html]
+
+[Examples of each category  gfdoc/Cat.html]
+
+Cf. "matrix" in BLARK, LinGo
+
+
+#NEW
+==Library structure 2: language-dependent APIs==
+
+- morphological paradigms, e.g. ``ParadigmsSwe``
+```
+  mkN  : (man,mannen,män,männen : Str) -> N ;   -- worst-case nouns
+  regV : (leker : Str) -> V ;                   -- regular verbs
+```
+- irregular words esp. verbs, e.g. ``IrregSwe``
+```
+  angripa_V = irregV "angripa" "angrep" "angripit" ;
+```
+- exended syntax with language-specific rules, e.g. ``ExtNor``
+```
+  PostPoss : CN -> Pron -> NP ;     -- bilen min
+```
+
+
+
+#NEW
+==Difficulties encountered==
+
+English: negation and auxiliary vs. non-auxiliary verbs
+
+Finnish: object case
+
+German: double infinitives
+
+Romance: clitic pronouns
+
+Scandinavian: determiners
+
+//In particular//: how to make the grammars efficient
+
+
+#NEW
+==How much can be language-independent?==
+
+For the ten languages we have considered, it //is// possible
+to implement the current API.
+
+Reservations:
+
+- does not necessarily extend to all other languages
+- does not necessarily cover the most idiomatic expressions of each language
+- may not be the easiest API to implement 
+  - e.g. negation and inversion with  //do// in English suggest that some other
+  structure would be more natural
+
+
+- the structures may not have the same semantics in all different languages
+
+
+#NEW
+==Using the library==
+
+Simplest case: use the API in the same way for all languages.
+- **+** grammar localization for free
+- **-** not the best idioms for each language
+
+
+In practice: use the API in different ways for different languages
+```
+  -- Eng: x's name is y
+  Name x y = predNP (GenCN x (regN "name")) (StringNP y) 
+  -- Swe: x heter y
+  Name x y = predV2 x heta_V2 (StringNP y)               
+```
+This amounts to **compile-time transfer**.
+
+Surprisingly, writing an application grammar requires more native-speaker knowledge
+than writing a resource grammar!
+
+
+#NEW
+==Parametrized modules==
+
+We can go even farther than share an abstract API: we can share implementations
+among related languages.
+
+Exploited in two families:
+- Romance: French, Italian, Spanish
+- Scanndinavian: Danish, Norwegian, Swedish
+
+
+[The declarations of Scandinavian syntax differences  ../scandinavian/DiffScand.gf]
+
+
+
+
+
+
+#NEW
+==Lexicon extension==
+
+We cannot anticipate all vocabulary needed in application grammars.
+
+Therefore we provide high-level paradigms to add new words.
+
+Example heuristic, from [ParadigsSwe gfdoc/ParadigmsSwe.html]:
+```
+  regV : (leker : Str) -> V ;
+
+  regV leker = case leker of {
+    lek + ("a" | "ar")  => conj1 (lek + "a") ;
+    lek + "er"          => conj2 (lek + "a") ;
+    bo  + "r"           => conj3 bo
+    }
+```
+
+#NEW
+==Example low-level morphological definition==
+
+```  
+  decl2Noun : Str -> N = \bil ->
+    let 
+      bb : Str * Str = case bil of {
+        pojk + "e"                 => <pojk + "ar",    bil  + "n"> ;
+        nyck + "e" + l@("l" | "r") => <nyck + l + "ar",bil  + "n"> ;
+        sock + "e" + "n"           => <sock + "nar",   sock + "nen"> ;
+        _                          => <bil + "ar",     bil  + "en">
+        } ;
+    in mkN bil bb.p2 bb.p1 (bb.p1 + "na") ;
+```
+
+
+#NEW
+==Some formats that can be generated from GF grammars==
+
+```
+-printer=lbnf           BNF Converter, thereby C/Bison, Java/JavaCup
+-printer=fullform       full-form lexicon, short format
+-printer=xml            XML: DTD for the pg command, object for st
+-printer=gsl            Nuance GSL speech recognition grammar
+-printer=jsgf           Java Speech Grammar Format
+-printer=srgs_xml       SRGS XML format
+-printer=srgs_xml_prob  SRGS XML format, with weights
+-printer=slf            a finite automaton in the HTK SLF format
+-printer=regular        a regular grammar in a simple BNF
+-printer=gfc-prolog     gfc in prolog format (also pg)
+```
+
+
+#NEW
+==Use as program components==
+
+Haskell, Java, Prolog
+
+Parsing, generation, translation
+
+Push-button creation of spoken language translators (using Nuance)
+
+
+
+
+#NEW
+==Grammar library as linguistic resource==
+
+Can we use the libraries outside domain-specific fragments?
+
+We seem to be approaching full coverage from below.
+
+The resource API is not good for heavy-duty parsing (too abstract and 
+therefore too inefficient).
+
+Two ideas:
+- write shallow parsers as application grammars
+- generate corpora and use statistic parsing methods
+
+
+
+#NEW
+==Corpus generation==
+
+The most general format is **multilingual treebank** generation:
+```
+  > gr -tr | l -multi
+  UseCl TCond AAnter PNeg (PredVP (DetCN (DetSg DefSg NoOrd) 
+    (AdjCN (PositA young_A) (UseN woman_N))) (ComplV2 love_V2 (UsePron he_Pron)))
+
+  The young woman wouldn't have loved him
+  Den unga kvinnan skulle inte ha älskat honom
+  Den unge kvinna ville ikke ha elska ham
+  La joven mujer no lo habría amado
+  La giovane donna non lo avrebbe amato
+  La jeune femme ne l' aurait pas aimé
+  Nuori nainen ei olisi rakastanut häntä
+```
+This is either exhaustive or random, possibly
+with probability weights attached to constructors.
+
+A special case is **corpus generation**: just leave one language.
+
+Can this be useful? Cf. Rebecca Jonson this afternoon.
+
+
+#NEW
+==Related work==
+
+CLE = Core Language Engine
+- the closest point of comparison as for coverage and purpose
+- resource API similar to "Quasi-Logical Form" 
+- parametrized modules instead of grammar porting via macro packages
+- grammar specialization via partial evaluation instead of explanation-based learning
+  - therefore, transfer at compile time as often as possible
+
+
+LinGo Matrix project (HPSG)
+- methodology rather than formal discipline for multilingual grammars
+- not aimed as library, no grammar specialization?
+- wider coverage - parsing real texts
+
+
+Parsing detached from grammar (Nivre) - grammar detached from parsing
+
+#NEW
+==Demo==
+
+Stoneage grammar, based on the Swadesh word list.
+
+Implemented as application on top of the resource grammar.
+
+Illustrate generation and spoken-language parsing.
+
+
+
+%http://www.boost.org/