1
0
forked from GitHub/gf-core

started CLT sem slides

This commit is contained in:
aarne
2006-03-04 13:23:02 +00:00
parent 0eb9f74977
commit 277a333a02
5 changed files with 888 additions and 1 deletions

View File

@@ -1,3 +1,6 @@
clt:
txt2tags clt2006.txt
htmls clt2006.html
gslt:
txt2tags gslt-sem-2006.txt
htmls gslt-sem-2006.html

View File

@@ -0,0 +1,474 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
<TITLE>The GF Resource Grammar Library Version 1.0</TITLE>
</HEAD><BODY BGCOLOR="white" TEXT="black">
<P ALIGN="center"><CENTER><H1>The GF Resource Grammar Library Version 1.0</H1>
<FONT SIZE="4">
<I>Author: Aarne Ranta &lt;aarne (at) cs.chalmers.se&gt;</I><BR>
Last update: Sat Mar 4 14:20:07 2006
</FONT></CENTER>
<P>
<!-- NEW -->
</P>
<H2>Plan</H2>
<P>
Purpose
</P>
<P>
Background
</P>
<P>
Coverage
</P>
<P>
Structure
</P>
<P>
How to use
</P>
<P>
How to implement a new language
</P>
<P>
How to extend the API
</P>
<P>
<!-- NEW -->
</P>
<H2>Purpose</H2>
<H3>Library for applications</H3>
<P>
High-level access to grammatical rules
</P>
<P>
E.g. <I>You have k new messages</I> rendered in ten languages <I>X</I>
</P>
<PRE>
render X (Have (You (Number (k (New Message)))))
</PRE>
<P></P>
<P>
Usability for different purposes
</P>
<UL>
<LI>translation systems
<LI>software localization
<LI>dialogue systems
<LI>language teaching
</UL>
<P>
<!-- NEW -->
</P>
<H3>Grammar as parser</H3>
<P>
Often in NLP, a grammar is just high-level code for a parser.
</P>
<P>
But writing a grammar can be inadequate for parsing:
</P>
<UL>
<LI>too much manual work
<LI>too inefficient
<LI>not robust
<LI>too ambiguous
</UL>
<P>
Moreover, a grammar fine-tuned for parsing may not be reusable
</P>
<UL>
<LI>for generation
<LI>for specialized grammars
<LI>as library
</UL>
<P>
<!-- NEW -->
</P>
<H3>Grammar as language definition</H3>
<P>
Linguistic ontology: <B>abstract syntax</B>
</P>
<P>
E.g. adjectival modification
</P>
<PRE>
AdjCN : AP -&gt; CN -&gt; CN ;
</PRE>
<P></P>
<P>
Rendering in different languages: <B>concrete syntax</B>
</P>
<P>
Resource grammars have generation perspective, rather than parsing
</P>
<UL>
<LI>abstract syntax serves as a key to expressions in different languages
</UL>
<P>
<!-- NEW -->
</P>
<H3>Usability by non-linguists</H3>
<P>
Division of labour: resource grammars hide linguistic details
</P>
<P>
Presentation: "school grammar" concepts, dictionary-like conventions
</P>
<P>
API = Application Programmer's Interface
</P>
<P>
Documentation: <CODE>gfdoc</CODE>
</P>
<P>
IDE = Interactive Development Environment (forthcoming)
</P>
<P>
Example-based grammar writing
</P>
<PRE>
render Ita (parse Eng "you have k messages")
</PRE>
<P></P>
<P>
<!-- NEW -->
</P>
<H3>Scientific interest</H3>
<P>
Linguistics
</P>
<UL>
<LI>definition of linguistic ontology
<LI>coping with different problems in different languages
<LI>sharing concrete-syntax code between languages
<LI>creating a resource for other NLP applications
</UL>
<P>
Computer science
</P>
<UL>
<LI>datastructures for grammar rules
<LI>type systems for grammars
<LI>algorithms: parsing, generation, grammar compilation
<LI>domain-specific programming language (GF)
<LI>module system
</UL>
<P>
<!-- NEW -->
</P>
<H2>Background</H2>
<H3>History</H3>
<P>
2002: v. 0.2
</P>
<UL>
<LI>English, French, German, Swedish
</UL>
<P>
2003: v. 0.6
</P>
<UL>
<LI>module system
<LI>added Finnish, Italian, Russian
<LI>used in KeY
</UL>
<P>
2005: v. 0.9
</P>
<UL>
<LI>tenses
<LI>added Danish, Norwegian, Spanish; no German
<LI>used in WebALT
</UL>
<P>
2006: v. 1.0
</P>
<UL>
<LI>approximate CLE coverage
<LI>reorganized module system and implementation
<LI>not yet (4/3/2006) for Danish and Russian
</UL>
<P>
<!-- NEW -->
</P>
<H3>Authors</H3>
<P>
Janna Khegai (Russian modules, forthcoming),
Bjorn Bringert (many Swadesh lexica),
Carlos Gonzalia (Spanish cardinals),
Partik Jansson (Swedish cardinals),
Aarne Ranta.
</P>
<P>
We are grateful for contributions and
comments to several other people who have used this and
the previous versions of the resource library, including
Ana Bove,
David Burke,
Lauri Carlson,
Gloria Casanellas,
Karin Cavallin,
Hans-Joachim Daniels,
Kristofer Johannisson,
Anni Laine,
Wanjiku Ng'ang'a,
Jordi Saludes.
</P>
<P>
<!-- NEW -->
</P>
<H3>Related work</H3>
<P>
CLE (Core Language Engine,
<A HREF="http://mitpress.mit.edu/catalog/item/default.asp?tid=7739&amp;ttype=2">Book 1992</A>)
</P>
<UL>
<LI>English, Swedish, French, Danish
<LI>uses Definita Clause Grammars, implementation in Prolog
<LI>coverage for SACTI corpus,
<A HREF="http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=0521770777">Spoken Language Translator (2001)</A>
<LI>grammar specialization via explanation-based learning
</UL>
<P>
<A HREF="http://www.delph-in.net/matrix/">LinGO Grammar Matrix</A>
</P>
<UL>
<LI>English, German, Japanese, Spanish, ...
<LI>uses HPSG, implementation in LKB
<LI>a check list for parallel grammar implementations
</UL>
<P>
<A HREF="http://www2.parc.com/istl/groups/nltt/pargram/">Pargram</A>
</P>
<UL>
<LI>Aimed: Arabic, Chinese, English, French, German, Hungarian, Japanese,
Malagasy, Norwegian, Turkish, Urdu, Vietnamese, and Welsh
<LI>uses LFG
<LI>one set of big grammars, transfer rules
</UL>
<P>
Rosetta Machine Translation (<A HREF="http://citeseer.ist.psu.edu/181924.html">Book 1994</A>)
</P>
<UL>
<LI>Dutch, English, French
<LI>uses M-grammars, compositional translation inspired by Montague
<LI>compositional transfer rules
</UL>
<P>
<!-- NEW -->
</P>
<H2>Coverage</H2>
<P>
===Languages====
</P>
<P>
The current GF Resource Project covers ten languages:
</P>
<UL>
<LI><CODE>Dan</CODE>ish
<LI><CODE>Eng</CODE>lish
<LI><CODE>Fin</CODE>nish
<LI><CODE>Fre</CODE>nch
<LI><CODE>Ger</CODE>man
<LI><CODE>Ita</CODE>lian
<LI><CODE>Nor</CODE>wegian (bokmål)
<LI><CODE>Rus</CODE>sian
<LI><CODE>Spa</CODE>nish
<LI><CODE>Swe</CODE>dish
</UL>
<P>
In addition, parts (morphology) of Arabic, Estonian, Latin, and Urdu
</P>
<P>
API 1.0 not yet implemented for Danish and Russian
</P>
<P>
<!-- NEW -->
</P>
<P>
===Morphology====
</P>
<P>
Complete inflection engine
</P>
<UL>
<LI>all word classes
<LI>all forms
<LI>all inflectional paradigms
</UL>
<P>
High-level access via <CODE>ParadigmsX</CODE>; e.g. Swedish:
</P>
<UL>
<LI>worst-case functions
<PRE>
mkV : (supa,super,sup,söp,supit,supen : Str) -&gt; V ;
</PRE>
<LI>common patterns
<PRE>
regV : (talar : Str) -&gt; V ;
irregV : (dricka, drack, druckit : Str) -&gt; V ;
</PRE>
<LI>irregular words in <CODE>IrregX</CODE>:
<PRE>
draga_V : V =
mkV (variants { "dra"; "draga"}) (variants { "drar" ; "drager"})
(variants { "dra" ; "drag" }) "drog" "dragit" "dragen" ;
</PRE>
</UL>
<P>
<!-- NEW -->
</P>
<H3>Syntactic structures</H3>
<P>
<IMG ALIGN="middle" SRC="Lang.png" BORDER="0" ALT="">
</P>
<P>
<!-- NEW -->
</P>
<H3>Quantitative measures</H3>
<P>
67 categories
</P>
<P>
150 abstract syntax combination rules
</P>
<P>
100 structural words
</P>
<P>
350 content words in a test lexicon
</P>
<P>
Lines of source code (4/3/2006):
</P>
<PRE>
abstract 1131
english 2344
german 2386
finnish 3396
norwegian 1257
swedish 1465
scandinavian 1023
french 3246 -- Besch + Irreg + Morpho 2111
italian 7797 -- Besch 6512
spanish 7120 -- Besch 5877
romance 1066
</PRE>
<P></P>
<P>
<!-- NEW -->
</P>
<H2>Structure</H2>
<P>
<!-- NEW -->
</P>
<H3>Language-independent ground API</H3>
<P>
<!-- NEW -->
</P>
<H3>Language-dependent paradigm modules</H3>
<P>
<!-- NEW -->
</P>
<H3>Language-dependent syntax extensions</H3>
<P>
<!-- NEW -->
</P>
<H3>Special-purpose APIs</H3>
<P>
<!-- NEW -->
</P>
<H3>How to use as top-level grammar</H3>
<P>
<!-- NEW -->
</P>
<H3>Parsing</H3>
<P>
<!-- NEW -->
</P>
<H3>Treebank generation</H3>
<P>
<!-- NEW -->
</P>
<H3>Treebank-based parsing</H3>
<P>
<!-- NEW -->
</P>
<H3>Morphology</H3>
<P>
<!-- NEW -->
</P>
<P>
<!-- NEW -->
</P>
<H3>Syntax editing</H3>
<P>
<!-- NEW -->
</P>
<H3>Efficient parsing via application grammar</H3>
<P>
<!-- NEW -->
</P>
<H2>How to use as library</H2>
<H3>Specialization through parametrized modules</H3>
<P>
<!-- NEW -->
</P>
<H3>Compile-time transfer</H3>
<P>
<!-- NEW -->
</P>
<H3>A natural division into modules</H3>
<P>
<!-- NEW -->
</P>
<H3>Example-based grammar writing</H3>
<P>
<!-- NEW -->
</P>
<H2>How to implement a new language</H2>
<H3>Ordinary modules</H3>
<P>
<!-- NEW -->
</P>
<H3>Parametrized modules</H3>
<P>
<!-- NEW -->
</P>
<H3>The kernel of the API</H3>
<P>
<!-- NEW -->
</P>
<H3>How to proceed</H3>
<P>
<!-- NEW -->
</P>
<H2>How to extend the API</H2>
<P>
<!-- NEW -->
</P>
<H3>Extend old modules or add a new one?</H3>
<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
<!-- cmdline: txt2tags clt2006.txt -->
</BODY></HTML>

View File

@@ -0,0 +1,409 @@
The GF Resource Grammar Library Version 1.0
Author: Aarne Ranta <aarne (at) cs.chalmers.se>
Last update: %%date(%c)
% NOTE: this is a txt2tags file.
% Create an html file from this file using:
% txt2tags --toc clt2006.txt
%!target:html
%!postproc(html): #NEW <!-- NEW -->
#NEW
==Plan==
Purpose
Background
Coverage
Structure
How to use
How to implement a new language
How to extend the API
#NEW
==Purpose==
===Library for applications===
High-level access to grammatical rules
E.g. //You have k new messages// rendered in ten languages //X//
```
render X (Have (You (Number (k (New Message)))))
```
Usability for different purposes
- translation systems
- software localization
- dialogue systems
- language teaching
#NEW
===Grammar as parser===
Often in NLP, a grammar is just high-level code for a parser.
But writing a grammar can be inadequate for parsing:
- too much manual work
- too inefficient
- not robust
- too ambiguous
Moreover, a grammar fine-tuned for parsing may not be reusable
- for generation
- for specialized grammars
- as library
#NEW
===Grammar as language definition===
Linguistic ontology: **abstract syntax**
E.g. adjectival modification
```
AdjCN : AP -> CN -> CN ;
```
Rendering in different languages: **concrete syntax**
Resource grammars have generation perspective, rather than parsing
- abstract syntax serves as a key to expressions in different languages
#NEW
===Usability by non-linguists===
Division of labour: resource grammars hide linguistic details
Presentation: "school grammar" concepts, dictionary-like conventions
API = Application Programmer's Interface
Documentation: ``gfdoc``
IDE = Interactive Development Environment (forthcoming)
Example-based grammar writing
```
render Ita (parse Eng "you have k messages")
```
#NEW
===Scientific interest===
Linguistics
- definition of linguistic ontology
- coping with different problems in different languages
- sharing concrete-syntax code between languages
- creating a resource for other NLP applications
Computer science
- datastructures for grammar rules
- type systems for grammars
- algorithms: parsing, generation, grammar compilation
- domain-specific programming language (GF)
- module system
#NEW
==Background==
===History===
2002: v. 0.2
- English, French, German, Swedish
2003: v. 0.6
- module system
- added Finnish, Italian, Russian
- used in KeY
2005: v. 0.9
- tenses
- added Danish, Norwegian, Spanish; no German
- used in WebALT
2006: v. 1.0
- approximate CLE coverage
- reorganized module system and implementation
- not yet (4/3/2006) for Danish and Russian
#NEW
===Authors===
Janna Khegai (Russian modules, forthcoming),
Bjorn Bringert (many Swadesh lexica),
Carlos Gonzalia (Spanish cardinals),
Partik Jansson (Swedish cardinals),
Aarne Ranta.
We are grateful for contributions and
comments to several other people who have used this and
the previous versions of the resource library, including
Ana Bove,
David Burke,
Lauri Carlson,
Gloria Casanellas,
Karin Cavallin,
Hans-Joachim Daniels,
Kristofer Johannisson,
Anni Laine,
Wanjiku Ng'ang'a,
Jordi Saludes.
#NEW
===Related work===
CLE (Core Language Engine,
[Book 1992 http://mitpress.mit.edu/catalog/item/default.asp?tid=7739&ttype=2])
- English, Swedish, French, Danish
- uses Definita Clause Grammars, implementation in Prolog
- coverage for SACTI corpus,
[Spoken Language Translator (2001) http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=0521770777]
- grammar specialization via explanation-based learning
[LinGO Grammar Matrix http://www.delph-in.net/matrix/]
- English, German, Japanese, Spanish, ...
- uses HPSG, implementation in LKB
- a check list for parallel grammar implementations
[Pargram http://www2.parc.com/istl/groups/nltt/pargram/]
- Aimed: Arabic, Chinese, English, French, German, Hungarian, Japanese,
Malagasy, Norwegian, Turkish, Urdu, Vietnamese, and Welsh
- uses LFG
- one set of big grammars, transfer rules
Rosetta Machine Translation ([Book 1994 http://citeseer.ist.psu.edu/181924.html])
- Dutch, English, French
- uses M-grammars, compositional translation inspired by Montague
- compositional transfer rules
#NEW
==Coverage==
===Languages====
The current GF Resource Project covers ten languages:
- ``Dan``ish
- ``Eng``lish
- ``Fin``nish
- ``Fre``nch
- ``Ger``man
- ``Ita``lian
- ``Nor``wegian (bokmål)
- ``Rus``sian
- ``Spa``nish
- ``Swe``dish
In addition, parts (morphology) of Arabic, Estonian, Latin, and Urdu
API 1.0 not yet implemented for Danish and Russian
#NEW
===Morphology====
Complete inflection engine
- all word classes
- all forms
- all inflectional paradigms
High-level access via ``ParadigmsX``; e.g. Swedish:
- worst-case functions
```
mkV : (supa,super,sup,söp,supit,supen : Str) -> V ;
```
- common patterns
```
regV : (talar : Str) -> V ;
irregV : (dricka, drack, druckit : Str) -> V ;
```
- irregular words in ``IrregX``:
```
draga_V : V =
mkV (variants { "dra"; "draga"}) (variants { "drar" ; "drager"})
(variants { "dra" ; "drag" }) "drog" "dragit" "dragen" ;
```
#NEW
===Syntactic structures===
[Lang.png]
#NEW
===Quantitative measures===
67 categories
150 abstract syntax combination rules
100 structural words
350 content words in a test lexicon
Lines of source code (4/3/2006):
```
abstract 1131
english 2344
german 2386
finnish 3396
norwegian 1257
swedish 1465
scandinavian 1023
french 3246 -- Besch + Irreg + Morpho 2111
italian 7797 -- Besch 6512
spanish 7120 -- Besch 5877
romance 1066
```
#NEW
==Structure==
#NEW
===Language-independent ground API===
#NEW
===Language-dependent paradigm modules===
#NEW
===Language-dependent syntax extensions===
#NEW
===Special-purpose APIs===
#NEW
===How to use as top-level grammar===
#NEW
===Parsing===
#NEW
===Treebank generation===
#NEW
===Treebank-based parsing===
#NEW
===Morphology===
#NEW
#NEW
===Syntax editing===
#NEW
===Efficient parsing via application grammar===
#NEW
==How to use as library==
===Specialization through parametrized modules===
#NEW
===Compile-time transfer===
#NEW
===A natural division into modules===
#NEW
===Example-based grammar writing===
#NEW
==How to implement a new language==
===Ordinary modules===
#NEW
===Parametrized modules===
#NEW
===The kernel of the API===
#NEW
===How to proceed===
#NEW
==How to extend the API==
#NEW
===Extend old modules or add a new one?===

View File

@@ -7,7 +7,7 @@
<P ALIGN="center"><CENTER><H1>Grammars as Software Libraries</H1>
<FONT SIZE="4">
<I>Author: Aarne Ranta &lt;aarne (at) cs.chalmers.se&gt;</I><BR>
Last update: Thu Feb 9 13:03:45 2006
Last update: Sat Mar 4 14:16:15 2006
</FONT></CENTER>
<P>

View File

@@ -34,6 +34,7 @@ Aarne Ranta.
We are grateful for contributions and
comments to several other people who have used this and
the previous versions of the resource library, including
Ana Bove,
David Burke,
Lauri Carlson,
Gloria Casanellas,