diff --git a/lib/resource-1.0/doc/Makefile b/lib/resource-1.0/doc/Makefile index 28eb5712b..fe776c896 100644 --- a/lib/resource-1.0/doc/Makefile +++ b/lib/resource-1.0/doc/Makefile @@ -1,3 +1,6 @@ +clt: + txt2tags clt2006.txt + htmls clt2006.html gslt: txt2tags gslt-sem-2006.txt htmls gslt-sem-2006.html diff --git a/lib/resource-1.0/doc/clt2006.html b/lib/resource-1.0/doc/clt2006.html new file mode 100644 index 000000000..4ee440495 --- /dev/null +++ b/lib/resource-1.0/doc/clt2006.html @@ -0,0 +1,474 @@ + + + + +The GF Resource Grammar Library Version 1.0 + +

The GF Resource Grammar Library Version 1.0

+ +Author: Aarne Ranta <aarne (at) cs.chalmers.se>
+Last update: Sat Mar 4 14:20:07 2006 +
+ +

+ +

+

Plan

+

+Purpose +

+

+Background +

+

+Coverage +

+

+Structure +

+

+How to use +

+

+How to implement a new language +

+

+How to extend the API +

+

+ +

+

Purpose

+

Library for applications

+

+High-level access to grammatical rules +

+

+E.g. You have k new messages rendered in ten languages X +

+
+    render X (Have (You (Number (k (New Message)))))
+
+

+

+Usability for different purposes +

+ + +

+ +

+

Grammar as parser

+

+Often in NLP, a grammar is just high-level code for a parser. +

+

+But writing a grammar can be inadequate for parsing: +

+ + +

+Moreover, a grammar fine-tuned for parsing may not be reusable +

+ + +

+ +

+

Grammar as language definition

+

+Linguistic ontology: abstract syntax +

+

+E.g. adjectival modification +

+
+    AdjCN : AP -> CN -> CN ;
+
+

+

+Rendering in different languages: concrete syntax +

+

+Resource grammars have generation perspective, rather than parsing +

+ + +

+ +

+

Usability by non-linguists

+

+Division of labour: resource grammars hide linguistic details +

+

+Presentation: "school grammar" concepts, dictionary-like conventions +

+

+API = Application Programmer's Interface +

+

+Documentation: gfdoc +

+

+IDE = Interactive Development Environment (forthcoming) +

+

+Example-based grammar writing +

+
+    render Ita (parse Eng "you have k messages")
+
+

+

+ +

+

Scientific interest

+

+Linguistics +

+ + +

+Computer science +

+ + +

+ +

+

Background

+

History

+

+2002: v. 0.2 +

+ + +

+2003: v. 0.6 +

+ + +

+2005: v. 0.9 +

+ + +

+2006: v. 1.0 +

+ + +

+ +

+

Authors

+

+Janna Khegai (Russian modules, forthcoming), +Bjorn Bringert (many Swadesh lexica), +Carlos Gonzalia (Spanish cardinals), +Partik Jansson (Swedish cardinals), +Aarne Ranta. +

+

+We are grateful for contributions and +comments to several other people who have used this and +the previous versions of the resource library, including +Ana Bove, +David Burke, +Lauri Carlson, +Gloria Casanellas, +Karin Cavallin, +Hans-Joachim Daniels, +Kristofer Johannisson, +Anni Laine, +Wanjiku Ng'ang'a, +Jordi Saludes. +

+

+ +

+

Related work

+

+CLE (Core Language Engine, +Book 1992) +

+ + +

+LinGO Grammar Matrix +

+ + +

+Pargram +

+ + +

+Rosetta Machine Translation (Book 1994) +

+ + +

+ +

+

Coverage

+

+===Languages==== +

+

+The current GF Resource Project covers ten languages: +

+ + +

+In addition, parts (morphology) of Arabic, Estonian, Latin, and Urdu +

+

+API 1.0 not yet implemented for Danish and Russian +

+

+ +

+

+===Morphology==== +

+

+Complete inflection engine +

+ + +

+High-level access via ParadigmsX; e.g. Swedish: +

+ + +

+ +

+

Syntactic structures

+

+ +

+

+ +

+

Quantitative measures

+

+67 categories +

+

+150 abstract syntax combination rules +

+

+100 structural words +

+

+350 content words in a test lexicon +

+

+Lines of source code (4/3/2006): +

+
+    abstract     1131
+    english      2344
+    german       2386
+    finnish      3396
+    norwegian    1257
+    swedish      1465
+    scandinavian 1023
+    french       3246 -- Besch + Irreg + Morpho 2111
+    italian      7797 -- Besch 6512
+    spanish      7120 -- Besch 5877
+    romance      1066
+
+

+

+ +

+

Structure

+

+ +

+

Language-independent ground API

+

+ +

+

Language-dependent paradigm modules

+

+ +

+

Language-dependent syntax extensions

+

+ +

+

Special-purpose APIs

+

+ +

+

How to use as top-level grammar

+

+ +

+

Parsing

+

+ +

+

Treebank generation

+

+ +

+

Treebank-based parsing

+

+ +

+

Morphology

+

+ +

+

+ +

+

Syntax editing

+

+ +

+

Efficient parsing via application grammar

+

+ +

+

How to use as library

+

Specialization through parametrized modules

+

+ +

+

Compile-time transfer

+

+ +

+

A natural division into modules

+

+ +

+

Example-based grammar writing

+

+ +

+

How to implement a new language

+

Ordinary modules

+

+ +

+

Parametrized modules

+

+ +

+

The kernel of the API

+

+ +

+

How to proceed

+

+ +

+

How to extend the API

+

+ +

+

Extend old modules or add a new one?

+ + + + diff --git a/lib/resource-1.0/doc/clt2006.txt b/lib/resource-1.0/doc/clt2006.txt new file mode 100644 index 000000000..5b215297a --- /dev/null +++ b/lib/resource-1.0/doc/clt2006.txt @@ -0,0 +1,409 @@ +The GF Resource Grammar Library Version 1.0 +Author: Aarne Ranta +Last update: %%date(%c) + +% NOTE: this is a txt2tags file. +% Create an html file from this file using: +% txt2tags --toc clt2006.txt + +%!target:html + +%!postproc(html): #NEW + + +#NEW + +==Plan== + +Purpose + +Background + +Coverage + +Structure + +How to use + +How to implement a new language + +How to extend the API + + + +#NEW + +==Purpose== + +===Library for applications=== + +High-level access to grammatical rules + +E.g. //You have k new messages// rendered in ten languages //X// +``` + render X (Have (You (Number (k (New Message))))) +``` + +Usability for different purposes +- translation systems +- software localization +- dialogue systems +- language teaching + + +#NEW + +===Grammar as parser=== + +Often in NLP, a grammar is just high-level code for a parser. + +But writing a grammar can be inadequate for parsing: +- too much manual work +- too inefficient +- not robust +- too ambiguous + + +Moreover, a grammar fine-tuned for parsing may not be reusable +- for generation +- for specialized grammars +- as library + + +#NEW + +===Grammar as language definition=== + +Linguistic ontology: **abstract syntax** + +E.g. adjectival modification +``` + AdjCN : AP -> CN -> CN ; +``` + +Rendering in different languages: **concrete syntax** + +Resource grammars have generation perspective, rather than parsing +- abstract syntax serves as a key to expressions in different languages + + + +#NEW + +===Usability by non-linguists=== + +Division of labour: resource grammars hide linguistic details + +Presentation: "school grammar" concepts, dictionary-like conventions + +API = Application Programmer's Interface + +Documentation: ``gfdoc`` + +IDE = Interactive Development Environment (forthcoming) + +Example-based grammar writing +``` + render Ita (parse Eng "you have k messages") +``` + + +#NEW + +===Scientific interest=== + +Linguistics +- definition of linguistic ontology +- coping with different problems in different languages +- sharing concrete-syntax code between languages +- creating a resource for other NLP applications + + +Computer science +- datastructures for grammar rules +- type systems for grammars +- algorithms: parsing, generation, grammar compilation +- domain-specific programming language (GF) +- module system + + + +#NEW + +==Background== + +===History=== + +2002: v. 0.2 +- English, French, German, Swedish + + +2003: v. 0.6 +- module system +- added Finnish, Italian, Russian +- used in KeY + + +2005: v. 0.9 +- tenses +- added Danish, Norwegian, Spanish; no German +- used in WebALT + + +2006: v. 1.0 +- approximate CLE coverage +- reorganized module system and implementation +- not yet (4/3/2006) for Danish and Russian + + +#NEW + +===Authors=== + +Janna Khegai (Russian modules, forthcoming), +Bjorn Bringert (many Swadesh lexica), +Carlos Gonzalia (Spanish cardinals), +Partik Jansson (Swedish cardinals), +Aarne Ranta. + +We are grateful for contributions and +comments to several other people who have used this and +the previous versions of the resource library, including +Ana Bove, +David Burke, +Lauri Carlson, +Gloria Casanellas, +Karin Cavallin, +Hans-Joachim Daniels, +Kristofer Johannisson, +Anni Laine, +Wanjiku Ng'ang'a, +Jordi Saludes. + + +#NEW + +===Related work=== + +CLE (Core Language Engine, +[Book 1992 http://mitpress.mit.edu/catalog/item/default.asp?tid=7739&ttype=2]) +- English, Swedish, French, Danish +- uses Definita Clause Grammars, implementation in Prolog +- coverage for SACTI corpus, + [Spoken Language Translator (2001) http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=0521770777] +- grammar specialization via explanation-based learning + + +[LinGO Grammar Matrix http://www.delph-in.net/matrix/] +- English, German, Japanese, Spanish, ... +- uses HPSG, implementation in LKB +- a check list for parallel grammar implementations + + +[Pargram http://www2.parc.com/istl/groups/nltt/pargram/] +- Aimed: Arabic, Chinese, English, French, German, Hungarian, Japanese, +Malagasy, Norwegian, Turkish, Urdu, Vietnamese, and Welsh +- uses LFG +- one set of big grammars, transfer rules + + +Rosetta Machine Translation ([Book 1994 http://citeseer.ist.psu.edu/181924.html]) +- Dutch, English, French +- uses M-grammars, compositional translation inspired by Montague +- compositional transfer rules + + +#NEW + +==Coverage== + +===Languages==== + +The current GF Resource Project covers ten languages: +- ``Dan``ish +- ``Eng``lish +- ``Fin``nish +- ``Fre``nch +- ``Ger``man +- ``Ita``lian +- ``Nor``wegian (bokmål) +- ``Rus``sian +- ``Spa``nish +- ``Swe``dish + + +In addition, parts (morphology) of Arabic, Estonian, Latin, and Urdu + +API 1.0 not yet implemented for Danish and Russian + + + +#NEW + +===Morphology==== + +Complete inflection engine +- all word classes +- all forms +- all inflectional paradigms + + +High-level access via ``ParadigmsX``; e.g. Swedish: +- worst-case functions +``` + mkV : (supa,super,sup,söp,supit,supen : Str) -> V ; +``` +- common patterns +``` + regV : (talar : Str) -> V ; + irregV : (dricka, drack, druckit : Str) -> V ; +``` +- irregular words in ``IrregX``: +``` + draga_V : V = + mkV (variants { "dra"; "draga"}) (variants { "drar" ; "drager"}) + (variants { "dra" ; "drag" }) "drog" "dragit" "dragen" ; +``` + + + + + + +#NEW + +===Syntactic structures=== + +[Lang.png] + + +#NEW + +===Quantitative measures=== + +67 categories + +150 abstract syntax combination rules + +100 structural words + +350 content words in a test lexicon + +Lines of source code (4/3/2006): +``` + abstract 1131 + english 2344 + german 2386 + finnish 3396 + norwegian 1257 + swedish 1465 + scandinavian 1023 + french 3246 -- Besch + Irreg + Morpho 2111 + italian 7797 -- Besch 6512 + spanish 7120 -- Besch 5877 + romance 1066 +``` + + +#NEW + +==Structure== + +#NEW + +===Language-independent ground API=== + +#NEW + +===Language-dependent paradigm modules=== + +#NEW + +===Language-dependent syntax extensions=== + +#NEW + +===Special-purpose APIs=== + + + +#NEW + +===How to use as top-level grammar=== + +#NEW + +===Parsing=== + +#NEW + +===Treebank generation=== + +#NEW + +===Treebank-based parsing=== + +#NEW + +===Morphology=== + +#NEW + +#NEW + +===Syntax editing=== + +#NEW + +===Efficient parsing via application grammar=== + + + +#NEW + +==How to use as library== + +===Specialization through parametrized modules=== + +#NEW + +===Compile-time transfer=== + +#NEW + +===A natural division into modules=== + +#NEW + +===Example-based grammar writing=== + + + +#NEW + +==How to implement a new language== + +===Ordinary modules=== + +#NEW + +===Parametrized modules=== + +#NEW + +===The kernel of the API=== + +#NEW + +===How to proceed=== + + + +#NEW + +==How to extend the API== + +#NEW + +===Extend old modules or add a new one?=== + diff --git a/lib/resource-1.0/doc/gslt-sem-2006.html b/lib/resource-1.0/doc/gslt-sem-2006.html index e1fefb366..533d14a40 100644 --- a/lib/resource-1.0/doc/gslt-sem-2006.html +++ b/lib/resource-1.0/doc/gslt-sem-2006.html @@ -7,7 +7,7 @@

Grammars as Software Libraries

Author: Aarne Ranta <aarne (at) cs.chalmers.se>
-Last update: Thu Feb 9 13:03:45 2006 +Last update: Sat Mar 4 14:16:15 2006

diff --git a/lib/resource-1.0/doc/index.txt b/lib/resource-1.0/doc/index.txt index 37a1a40ff..01d380bf8 100644 --- a/lib/resource-1.0/doc/index.txt +++ b/lib/resource-1.0/doc/index.txt @@ -34,6 +34,7 @@ Aarne Ranta. We are grateful for contributions and comments to several other people who have used this and the previous versions of the resource library, including +Ana Bove, David Burke, Lauri Carlson, Gloria Casanellas,