started CLT sem slides

2006-03-04 13:23:02 +00:00
parent 0eb9f74977
commit 277a333a02
5 changed files with 888 additions and 1 deletions
--- a/lib/resource-1.0/doc/Makefile
+++ b/lib/resource-1.0/doc/Makefile
@@ -1,3 +1,6 @@
+clt:
+	txt2tags clt2006.txt
+	htmls clt2006.html
 gslt:
 	txt2tags gslt-sem-2006.txt
 	htmls gslt-sem-2006.html
--- a/lib/resource-1.0/doc/clt2006.html
+++ b/lib/resource-1.0/doc/clt2006.html
@@ -0,0 +1,474 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
+<HTML>
+<HEAD>
+<META NAME="generator" CONTENT="http://txt2tags.sf.net">
+<TITLE>The GF Resource Grammar Library Version 1.0</TITLE>
+</HEAD><BODY BGCOLOR="white" TEXT="black">
+<P ALIGN="center"><CENTER><H1>The GF Resource Grammar Library Version 1.0</H1>
+<FONT SIZE="4">
+<I>Author: Aarne Ranta &lt;aarne (at) cs.chalmers.se&gt;</I><BR>
+Last update: Sat Mar  4 14:20:07 2006
+</FONT></CENTER>
+
+<P>
+<!-- NEW --> 
+</P>
+<H2>Plan</H2>
+<P>
+Purpose
+</P>
+<P>
+Background
+</P>
+<P>
+Coverage
+</P>
+<P>
+Structure
+</P>
+<P>
+How to use
+</P>
+<P>
+How to implement a new language
+</P>
+<P>
+How to extend the API
+</P>
+<P>
+<!-- NEW -->
+</P>
+<H2>Purpose</H2>
+<H3>Library for applications</H3>
+<P>
+High-level access to grammatical rules
+</P>
+<P>
+E.g. <I>You have k new messages</I> rendered in ten languages <I>X</I>
+</P>
+<PRE>
+    render X (Have (You (Number (k (New Message)))))
+</PRE>
+<P></P>
+<P>
+Usability for different purposes
+</P>
+<UL>
+<LI>translation systems
+<LI>software localization
+<LI>dialogue systems
+<LI>language teaching
+</UL>
+
+<P>
+<!-- NEW -->
+</P>
+<H3>Grammar as parser</H3>
+<P>
+Often in NLP, a grammar is just high-level code for a parser.
+</P>
+<P>
+But writing a grammar can be inadequate for parsing:
+</P>
+<UL>
+<LI>too much manual work
+<LI>too inefficient
+<LI>not robust
+<LI>too ambiguous
+</UL>
+
+<P>
+Moreover, a grammar fine-tuned for parsing may not be reusable
+</P>
+<UL>
+<LI>for generation
+<LI>for specialized grammars
+<LI>as library
+</UL>
+
+<P>
+<!-- NEW -->
+</P>
+<H3>Grammar as language definition</H3>
+<P>
+Linguistic ontology: <B>abstract syntax</B>
+</P>
+<P>
+E.g. adjectival modification
+</P>
+<PRE>
+    AdjCN : AP -&gt; CN -&gt; CN ;
+</PRE>
+<P></P>
+<P>
+Rendering in different languages: <B>concrete syntax</B>
+</P>
+<P>
+Resource grammars have generation perspective, rather than parsing
+</P>
+<UL>
+<LI>abstract syntax serves as a key to expressions in different languages
+</UL>
+
+<P>
+<!-- NEW -->
+</P>
+<H3>Usability by non-linguists</H3>
+<P>
+Division of labour: resource grammars hide linguistic details
+</P>
+<P>
+Presentation: "school grammar" concepts, dictionary-like conventions
+</P>
+<P>
+API = Application Programmer's Interface
+</P>
+<P>
+Documentation: <CODE>gfdoc</CODE>
+</P>
+<P>
+IDE = Interactive Development Environment (forthcoming)
+</P>
+<P>
+Example-based grammar writing
+</P>
+<PRE>
+    render Ita (parse Eng "you have k messages")
+</PRE>
+<P></P>
+<P>
+<!-- NEW -->
+</P>
+<H3>Scientific interest</H3>
+<P>
+Linguistics
+</P>
+<UL>
+<LI>definition of linguistic ontology
+<LI>coping with different problems in different languages
+<LI>sharing concrete-syntax code between languages
+<LI>creating a resource for other NLP applications
+</UL>
+
+<P>
+Computer science
+</P>
+<UL>
+<LI>datastructures for grammar rules
+<LI>type systems for grammars
+<LI>algorithms: parsing, generation, grammar compilation
+<LI>domain-specific programming language (GF)
+<LI>module system
+</UL>
+
+<P>
+<!-- NEW -->
+</P>
+<H2>Background</H2>
+<H3>History</H3>
+<P>
+2002: v. 0.2
+</P>
+<UL>
+<LI>English, French, German, Swedish
+</UL>
+
+<P>
+2003: v. 0.6
+</P>
+<UL>
+<LI>module system
+<LI>added Finnish, Italian, Russian
+<LI>used in KeY
+</UL>
+
+<P>
+2005: v. 0.9 
+</P>
+<UL>
+<LI>tenses
+<LI>added Danish, Norwegian, Spanish; no German
+<LI>used in WebALT
+</UL>
+
+<P>
+2006: v. 1.0
+</P>
+<UL>
+<LI>approximate CLE coverage
+<LI>reorganized module system and implementation
+<LI>not yet (4/3/2006) for Danish and Russian
+</UL>
+
+<P>
+<!-- NEW -->
+</P>
+<H3>Authors</H3>
+<P>
+Janna Khegai (Russian modules, forthcoming),
+Bjorn Bringert (many Swadesh lexica),
+Carlos Gonzalia (Spanish cardinals), 
+Partik Jansson (Swedish cardinals),
+Aarne Ranta.
+</P>
+<P>
+We are grateful for contributions and 
+comments to several other people who have used this and 
+the previous versions of the resource library, including
+Ana Bove,
+David Burke,
+Lauri Carlson,
+Gloria Casanellas,
+Karin Cavallin,
+Hans-Joachim Daniels,
+Kristofer Johannisson,
+Anni Laine,
+Wanjiku Ng'ang'a,
+Jordi Saludes.
+</P>
+<P>
+<!-- NEW -->
+</P>
+<H3>Related work</H3>
+<P>
+CLE (Core Language Engine, 
+<A HREF="http://mitpress.mit.edu/catalog/item/default.asp?tid=7739&amp;ttype=2">Book 1992</A>)
+</P>
+<UL>
+<LI>English, Swedish, French, Danish
+<LI>uses Definita Clause Grammars, implementation in Prolog
+<LI>coverage for SACTI corpus, 
+  <A HREF="http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=0521770777">Spoken Language Translator (2001)</A>
+<LI>grammar specialization via explanation-based learning
+</UL>
+
+<P>
+<A HREF="http://www.delph-in.net/matrix/">LinGO Grammar Matrix</A>
+</P>
+<UL>
+<LI>English, German, Japanese, Spanish, ...
+<LI>uses HPSG, implementation in LKB
+<LI>a check list for parallel grammar implementations
+</UL>
+
+<P>
+<A HREF="http://www2.parc.com/istl/groups/nltt/pargram/">Pargram</A>
+</P>
+<UL>
+<LI>Aimed: Arabic, Chinese, English, French, German, Hungarian, Japanese, 
+Malagasy, Norwegian, Turkish, Urdu, Vietnamese, and Welsh
+<LI>uses LFG
+<LI>one set of big grammars, transfer rules
+</UL>
+
+<P>
+Rosetta Machine Translation (<A HREF="http://citeseer.ist.psu.edu/181924.html">Book 1994</A>)
+</P>
+<UL>
+<LI>Dutch, English, French
+<LI>uses M-grammars, compositional translation inspired by Montague
+<LI>compositional transfer rules
+</UL>
+
+<P>
+<!-- NEW -->
+</P>
+<H2>Coverage</H2>
+<P>
+===Languages====
+</P>
+<P>
+The current GF Resource Project covers ten languages:
+</P>
+<UL>
+<LI><CODE>Dan</CODE>ish
+<LI><CODE>Eng</CODE>lish
+<LI><CODE>Fin</CODE>nish
+<LI><CODE>Fre</CODE>nch
+<LI><CODE>Ger</CODE>man
+<LI><CODE>Ita</CODE>lian
+<LI><CODE>Nor</CODE>wegian (bokmål)
+<LI><CODE>Rus</CODE>sian
+<LI><CODE>Spa</CODE>nish
+<LI><CODE>Swe</CODE>dish
+</UL>
+
+<P>
+In addition, parts (morphology) of Arabic, Estonian, Latin, and Urdu
+</P>
+<P>
+API 1.0 not yet implemented for Danish and Russian
+</P>
+<P>
+<!-- NEW -->
+</P>
+<P>
+===Morphology====
+</P>
+<P>
+Complete inflection engine
+</P>
+<UL>
+<LI>all word classes
+<LI>all forms
+<LI>all inflectional paradigms
+</UL>
+
+<P>
+High-level access via <CODE>ParadigmsX</CODE>; e.g. Swedish:
+</P>
+<UL>
+<LI>worst-case functions
+<PRE>
+      mkV : (supa,super,sup,söp,supit,supen : Str) -&gt; V ;
+</PRE>
+<LI>common patterns
+<PRE>
+      regV   : (talar : Str) -&gt; V ;
+      irregV : (dricka, drack, druckit : Str) -&gt; V ;
+</PRE>
+<LI>irregular words in <CODE>IrregX</CODE>:
+<PRE>
+      draga_V : V = 
+        mkV (variants { "dra"; "draga"}) (variants { "drar" ; "drager"}) 
+            (variants { "dra" ; "drag" }) "drog" "dragit" "dragen" ;
+</PRE>
+</UL>
+
+<P>
+<!-- NEW -->
+</P>
+<H3>Syntactic structures</H3>
+<P>
+<IMG ALIGN="middle" SRC="Lang.png" BORDER="0" ALT="">
+</P>
+<P>
+<!-- NEW -->
+</P>
+<H3>Quantitative measures</H3>
+<P>
+67 categories
+</P>
+<P>
+150 abstract syntax combination rules
+</P>
+<P>
+100 structural words
+</P>
+<P>
+350 content words in a test lexicon
+</P>
+<P>
+Lines of source code (4/3/2006):
+</P>
+<PRE>
+    abstract     1131
+    english      2344
+    german       2386
+    finnish      3396
+    norwegian    1257
+    swedish      1465
+    scandinavian 1023
+    french       3246 -- Besch + Irreg + Morpho 2111
+    italian      7797 -- Besch 6512
+    spanish      7120 -- Besch 5877
+    romance      1066
+</PRE>
+<P></P>
+<P>
+<!-- NEW -->
+</P>
+<H2>Structure</H2>
+<P>
+<!-- NEW -->
+</P>
+<H3>Language-independent ground API</H3>
+<P>
+<!-- NEW -->
+</P>
+<H3>Language-dependent paradigm modules</H3>
+<P>
+<!-- NEW -->
+</P>
+<H3>Language-dependent syntax extensions</H3>
+<P>
+<!-- NEW -->
+</P>
+<H3>Special-purpose APIs</H3>
+<P>
+<!-- NEW -->
+</P>
+<H3>How to use as top-level grammar</H3>
+<P>
+<!-- NEW -->
+</P>
+<H3>Parsing</H3>
+<P>
+<!-- NEW -->
+</P>
+<H3>Treebank generation</H3>
+<P>
+<!-- NEW -->
+</P>
+<H3>Treebank-based parsing</H3>
+<P>
+<!-- NEW -->
+</P>
+<H3>Morphology</H3>
+<P>
+<!-- NEW -->
+</P>
+<P>
+<!-- NEW -->
+</P>
+<H3>Syntax editing</H3>
+<P>
+<!-- NEW -->
+</P>
+<H3>Efficient parsing via application grammar</H3>
+<P>
+<!-- NEW -->
+</P>
+<H2>How to use as library</H2>
+<H3>Specialization through parametrized modules</H3>
+<P>
+<!-- NEW -->
+</P>
+<H3>Compile-time transfer</H3>
+<P>
+<!-- NEW -->
+</P>
+<H3>A natural division into modules</H3>
+<P>
+<!-- NEW -->
+</P>
+<H3>Example-based grammar writing</H3>
+<P>
+<!-- NEW -->
+</P>
+<H2>How to implement a new language</H2>
+<H3>Ordinary modules</H3>
+<P>
+<!-- NEW -->
+</P>
+<H3>Parametrized modules</H3>
+<P>
+<!-- NEW -->
+</P>
+<H3>The kernel of the API</H3>
+<P>
+<!-- NEW -->
+</P>
+<H3>How to proceed</H3>
+<P>
+<!-- NEW -->
+</P>
+<H2>How to extend the API</H2>
+<P>
+<!-- NEW -->
+</P>
+<H3>Extend old modules or add a new one?</H3>
+
+<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
+<!-- cmdline: txt2tags clt2006.txt -->
+</BODY></HTML>
--- a/lib/resource-1.0/doc/clt2006.txt
+++ b/lib/resource-1.0/doc/clt2006.txt
@@ -0,0 +1,409 @@
+The GF Resource Grammar Library Version 1.0
+Author: Aarne Ranta <aarne (at) cs.chalmers.se>
+Last update: %%date(%c)
+
+% NOTE: this is a txt2tags file.
+% Create an html file from this file using:
+% txt2tags --toc clt2006.txt
+
+%!target:html
+
+%!postproc(html): #NEW <!-- NEW -->
+
+
+#NEW 
+
+==Plan==
+
+Purpose
+
+Background
+
+Coverage
+
+Structure
+
+How to use
+
+How to implement a new language
+
+How to extend the API
+
+
+
+#NEW
+
+==Purpose==
+
+===Library for applications===
+
+High-level access to grammatical rules
+
+E.g. //You have k new messages// rendered in ten languages //X//
+```
+  render X (Have (You (Number (k (New Message)))))
+```
+
+Usability for different purposes
+- translation systems
+- software localization
+- dialogue systems
+- language teaching
+
+
+#NEW
+
+===Grammar as parser===
+
+Often in NLP, a grammar is just high-level code for a parser.
+
+But writing a grammar can be inadequate for parsing:
+- too much manual work
+- too inefficient
+- not robust
+- too ambiguous
+
+
+Moreover, a grammar fine-tuned for parsing may not be reusable
+- for generation
+- for specialized grammars
+- as library
+
+
+#NEW
+
+===Grammar as language definition===
+
+Linguistic ontology: **abstract syntax**
+
+E.g. adjectival modification
+```
+  AdjCN : AP -> CN -> CN ;
+```
+
+Rendering in different languages: **concrete syntax**
+
+Resource grammars have generation perspective, rather than parsing
+- abstract syntax serves as a key to expressions in different languages
+
+
+
+#NEW
+
+===Usability by non-linguists===
+
+Division of labour: resource grammars hide linguistic details
+
+Presentation: "school grammar" concepts, dictionary-like conventions
+
+API = Application Programmer's Interface
+
+Documentation: ``gfdoc``
+
+IDE = Interactive Development Environment (forthcoming)
+
+Example-based grammar writing
+```
+  render Ita (parse Eng "you have k messages")
+```
+
+
+#NEW
+
+===Scientific interest===
+
+Linguistics
+- definition of linguistic ontology
+- coping with different problems in different languages
+- sharing concrete-syntax code between languages
+- creating a resource for other NLP applications
+
+
+Computer science
+- datastructures for grammar rules
+- type systems for grammars
+- algorithms: parsing, generation, grammar compilation
+- domain-specific programming language (GF)
+- module system
+
+
+
+#NEW
+
+==Background==
+
+===History===
+
+2002: v. 0.2
+- English, French, German, Swedish
+
+
+2003: v. 0.6
+- module system
+- added Finnish, Italian, Russian
+- used in KeY
+
+
+2005: v. 0.9 
+- tenses
+- added Danish, Norwegian, Spanish; no German
+- used in WebALT
+
+
+2006: v. 1.0
+- approximate CLE coverage
+- reorganized module system and implementation
+- not yet (4/3/2006) for Danish and Russian
+
+
+#NEW
+
+===Authors===
+
+Janna Khegai (Russian modules, forthcoming),
+Bjorn Bringert (many Swadesh lexica),
+Carlos Gonzalia (Spanish cardinals), 
+Partik Jansson (Swedish cardinals),
+Aarne Ranta.
+
+We are grateful for contributions and 
+comments to several other people who have used this and 
+the previous versions of the resource library, including
+Ana Bove,
+David Burke,
+Lauri Carlson,
+Gloria Casanellas,
+Karin Cavallin,
+Hans-Joachim Daniels,
+Kristofer Johannisson,
+Anni Laine,
+Wanjiku Ng'ang'a,
+Jordi Saludes.
+
+
+#NEW
+
+===Related work===
+
+CLE (Core Language Engine, 
+[Book 1992 http://mitpress.mit.edu/catalog/item/default.asp?tid=7739&ttype=2])
+- English, Swedish, French, Danish
+- uses Definita Clause Grammars, implementation in Prolog
+- coverage for SACTI corpus, 
+  [Spoken Language Translator (2001) http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=0521770777]
+- grammar specialization via explanation-based learning
+
+
+[LinGO Grammar Matrix http://www.delph-in.net/matrix/]
+- English, German, Japanese, Spanish, ...
+- uses HPSG, implementation in LKB
+- a check list for parallel grammar implementations
+
+
+[Pargram http://www2.parc.com/istl/groups/nltt/pargram/]
+- Aimed: Arabic, Chinese, English, French, German, Hungarian, Japanese, 
+Malagasy, Norwegian, Turkish, Urdu, Vietnamese, and Welsh
+- uses LFG
+- one set of big grammars, transfer rules
+
+
+Rosetta Machine Translation ([Book 1994 http://citeseer.ist.psu.edu/181924.html])
+- Dutch, English, French
+- uses M-grammars, compositional translation inspired by Montague
+- compositional transfer rules
+
+
+#NEW
+
+==Coverage==
+
+===Languages====
+
+The current GF Resource Project covers ten languages:
+- ``Dan``ish
+- ``Eng``lish
+- ``Fin``nish
+- ``Fre``nch
+- ``Ger``man
+- ``Ita``lian
+- ``Nor``wegian (bokmål)
+- ``Rus``sian
+- ``Spa``nish
+- ``Swe``dish
+
+
+In addition, parts (morphology) of Arabic, Estonian, Latin, and Urdu
+
+API 1.0 not yet implemented for Danish and Russian
+
+
+
+#NEW
+
+===Morphology====
+
+Complete inflection engine
+- all word classes
+- all forms
+- all inflectional paradigms
+
+
+High-level access via ``ParadigmsX``; e.g. Swedish:
+- worst-case functions
+```
+    mkV : (supa,super,sup,söp,supit,supen : Str) -> V ;
+```
+- common patterns
+```
+    regV   : (talar : Str) -> V ;
+    irregV : (dricka, drack, druckit : Str) -> V ;
+```
+- irregular words in ``IrregX``:
+```
+    draga_V : V = 
+      mkV (variants { "dra"; "draga"}) (variants { "drar" ; "drager"}) 
+          (variants { "dra" ; "drag" }) "drog" "dragit" "dragen" ;
+```
+
+
+
+
+
+
+#NEW
+
+===Syntactic structures===
+
+[Lang.png]
+
+
+#NEW
+
+===Quantitative measures===
+
+67 categories
+
+150 abstract syntax combination rules
+
+100 structural words
+
+350 content words in a test lexicon
+
+Lines of source code (4/3/2006):
+```
+  abstract     1131
+  english      2344
+  german       2386
+  finnish      3396
+  norwegian    1257
+  swedish      1465
+  scandinavian 1023
+  french       3246 -- Besch + Irreg + Morpho 2111
+  italian      7797 -- Besch 6512
+  spanish      7120 -- Besch 5877
+  romance      1066
+```
+
+
+#NEW
+
+==Structure==
+
+#NEW
+
+===Language-independent ground API===
+
+#NEW
+
+===Language-dependent paradigm modules===
+
+#NEW
+
+===Language-dependent syntax extensions===
+
+#NEW
+
+===Special-purpose APIs===
+
+
+
+#NEW
+
+===How to use as top-level grammar===
+
+#NEW
+
+===Parsing===
+
+#NEW
+
+===Treebank generation===
+
+#NEW
+
+===Treebank-based parsing===
+
+#NEW
+
+===Morphology===
+
+#NEW
+
+#NEW
+
+===Syntax editing===
+
+#NEW
+
+===Efficient parsing via application grammar===
+
+
+
+#NEW
+
+==How to use as library==
+
+===Specialization through parametrized modules===
+
+#NEW
+
+===Compile-time transfer===
+
+#NEW
+
+===A natural division into modules===
+
+#NEW
+
+===Example-based grammar writing===
+
+
+
+#NEW
+
+==How to implement a new language==
+
+===Ordinary modules===
+
+#NEW
+
+===Parametrized modules===
+
+#NEW
+
+===The kernel of the API===
+
+#NEW
+
+===How to proceed===
+
+
+
+#NEW
+
+==How to extend the API==
+
+#NEW
+
+===Extend old modules or add a new one?===
+
--- a/lib/resource-1.0/doc/gslt-sem-2006.html
+++ b/lib/resource-1.0/doc/gslt-sem-2006.html
@@ -7,7 +7,7 @@
 <P ALIGN="center"><CENTER><H1>Grammars as Software Libraries</H1>
 <FONT SIZE="4">
 <I>Author: Aarne Ranta &lt;aarne (at) cs.chalmers.se&gt;</I><BR>
-Last update: Thu Feb  9 13:03:45 2006
+Last update: Sat Mar  4 14:16:15 2006
 </FONT></CENTER>

 <P>
--- a/lib/resource-1.0/doc/index.txt
+++ b/lib/resource-1.0/doc/index.txt
@@ -34,6 +34,7 @@ Aarne Ranta.
 We are grateful for contributions and 
 comments to several other people who have used this and 
 the previous versions of the resource library, including
+Ana Bove,
 David Burke,
 Lauri Carlson,
 Gloria Casanellas,