1
0
forked from GitHub/gf-core

seminar slides

This commit is contained in:
aarne
2006-02-10 08:02:34 +00:00
parent dd3a575080
commit c12ee01480
2 changed files with 8 additions and 102 deletions

View File

@@ -7,67 +7,12 @@
<P ALIGN="center"><CENTER><H1>Grammars as Software Libraries</H1>
<FONT SIZE="4">
<I>Author: Aarne Ranta &lt;aarne (at) cs.chalmers.se&gt;</I><BR>
Last update: Thu Feb 9 11:57:20 2006
Last update: Thu Feb 9 13:03:45 2006
</FONT></CENTER>
<P></P>
<HR NOSHADE SIZE=1>
<P></P>
<UL>
<LI><A HREF="#toc1">Setting</A>
<LI><A HREF="#toc2">People</A>
<LI><A HREF="#toc3">Software Libraries</A>
<LI><A HREF="#toc4">Abstraction</A>
<LI><A HREF="#toc5">Grammars as libraries?</A>
<LI><A HREF="#toc6">A slightly more advanced example</A>
<LI><A HREF="#toc7">Problems with the more advanced example</A>
<LI><A HREF="#toc8">More problems with the advanced example</A>
<LI><A HREF="#toc9">A library-based solution</A>
<LI><A HREF="#toc10">An improved library-based solution</A>
<LI><A HREF="#toc11">The ultimate solution?</A>
<LI><A HREF="#toc12">The components of a grammar library</A>
<LI><A HREF="#toc13">Implementing a grammar library in GF</A>
<LI><A HREF="#toc14">Linearization and parsing</A>
<LI><A HREF="#toc15">Applying GF</A>
<LI><A HREF="#toc16">Domain, ontology, idiom</A>
<LI><A HREF="#toc17">Example domain</A>
<LI><A HREF="#toc18">Translation system</A>
<LI><A HREF="#toc19">Difficulties with concrete syntax</A>
<LI><A HREF="#toc20">Solving the difficulties</A>
<LI><A HREF="#toc21">Application grammars vs. resource grammars</A>
<LI><A HREF="#toc22">GF as programming language</A>
<LI><A HREF="#toc23">Concrete syntax using library</A>
<LI><A HREF="#toc24">Design questions for the grammar library</A>
<LI><A HREF="#toc25">Design decisions</A>
<LI><A HREF="#toc26">Design decisions, cont'd</A>
<LI><A HREF="#toc27">Success criteria and evaluation</A>
<LI><A HREF="#toc28">These are not our success criteria</A>
<LI><A HREF="#toc29">Where is semantics?</A>
<LI><A HREF="#toc30">Representations in different APIs</A>
<LI><A HREF="#toc31">Languages</A>
<LI><A HREF="#toc32">Library structure 1: language-independent API</A>
<LI><A HREF="#toc33">Library structure 2: language-dependent APIs</A>
<LI><A HREF="#toc34">Difficulties encountered</A>
<LI><A HREF="#toc35">How much can be language-independent?</A>
<LI><A HREF="#toc36">Using the library</A>
<LI><A HREF="#toc37">Parametrized modules</A>
<LI><A HREF="#toc38">Lexicon extension</A>
<LI><A HREF="#toc39">Example low-level morphological definition</A>
<LI><A HREF="#toc40">Some formats that can be generated from GF grammars</A>
<LI><A HREF="#toc41">Use as program components</A>
<LI><A HREF="#toc42">Grammar library as linguistic resource</A>
<LI><A HREF="#toc43">Corpus generation</A>
<LI><A HREF="#toc44">Related work</A>
<LI><A HREF="#toc45">Demo</A>
</UL>
<P></P>
<HR NOSHADE SIZE=1>
<P></P>
<P>
<!-- NEW -->
</P>
<A NAME="toc1"></A>
<H2>Setting</H2>
<P>
Current funding
@@ -101,7 +46,6 @@ Main applications
<P>
<!-- NEW -->
</P>
<A NAME="toc2"></A>
<H2>People</H2>
<P>
Staff contributions to grammar libraries:
@@ -154,7 +98,6 @@ Resource library patches and suggestions from the WebALT staff:
<P>
<!-- NEW -->
</P>
<A NAME="toc3"></A>
<H2>Software Libraries</H2>
<P>
The main device of <B>division of labour</B> in programming.
@@ -180,7 +123,6 @@ Practical advantages:
<P>
<!-- NEW -->
</P>
<A NAME="toc4"></A>
<H2>Abstraction</H2>
<P>
Libraries promote <B>abstraction</B>: you abstract away from details.
@@ -199,7 +141,6 @@ if it just has a support for functions or macros.
<P>
<!-- NEW -->
</P>
<A NAME="toc5"></A>
<H2>Grammars as libraries?</H2>
<P>
Example: we want to create a GUI (Graphical User Interface) button
@@ -249,7 +190,6 @@ The library has an API (Application Programmer's Interface) with:
<P>
<!-- NEW -->
</P>
<A NAME="toc6"></A>
<H2>A slightly more advanced example</H2>
<P>
This is what you often see as a feedback from a program:
@@ -277,7 +217,6 @@ The code that should be written is of course
<P>
<!-- NEW -->
</P>
<A NAME="toc7"></A>
<H2>Problems with the more advanced example</H2>
<P>
The same as with "Yes": you have to know the words "you",
@@ -304,7 +243,6 @@ of "message":
<P>
<!-- NEW -->
</P>
<A NAME="toc8"></A>
<H2>More problems with the advanced example</H2>
<P>
You also have to know the case required by the verb "have"
@@ -328,7 +266,6 @@ address the user:
<P>
<!-- NEW -->
</P>
<A NAME="toc9"></A>
<H2>A library-based solution</H2>
<P>
In analogy with the "Yes" case, you write
@@ -350,7 +287,6 @@ It is time to move from <B>canned text</B> to a <B>grammar</B>.
<P>
<!-- NEW -->
</P>
<A NAME="toc10"></A>
<H2>An improved library-based solution</H2>
<P>
You may want to write
@@ -378,7 +314,6 @@ For this purpose, you need a library with the API
<P>
<!-- NEW -->
</P>
<A NAME="toc11"></A>
<H2>The ultimate solution?</H2>
<P>
The library API for language will certainly grow big and become
@@ -423,7 +358,6 @@ Thus some amount of interaction is needed.
<P>
<!-- NEW -->
</P>
<A NAME="toc12"></A>
<H2>The components of a grammar library</H2>
<P>
The library has <B>construction functions</B> like
@@ -451,7 +385,6 @@ knowledge by application programmers!
<P>
<!-- NEW -->
</P>
<A NAME="toc13"></A>
<H2>Implementing a grammar library in GF</H2>
<P>
GF = Grammatical Framework
@@ -495,7 +428,6 @@ Simplest possible example:
<P>
<!-- NEW -->
</P>
<A NAME="toc14"></A>
<H2>Linearization and parsing</H2>
<P>
The realizatin function is, for each language, implemented by
@@ -517,7 +449,6 @@ The GF formalism moreover has the property of <B>reversibility</B>:
<P>
<!-- NEW -->
</P>
<A NAME="toc15"></A>
<H2>Applying GF</H2>
<P>
<B>multilingual grammar</B> = abstract syntax + concrete syntaxes
@@ -534,7 +465,6 @@ Examples of the idea:
<P>
<!-- NEW -->
</P>
<A NAME="toc16"></A>
<H2>Domain, ontology, idiom</H2>
<P>
An abstract syntax has other names:
@@ -566,7 +496,6 @@ Problem: the expertise of both a linguist and a domain expert are required.
<P>
<!-- NEW -->
</P>
<A NAME="toc17"></A>
<H2>Example domain</H2>
<P>
Arithmetic of natural numbers: abstract syntax
@@ -589,7 +518,6 @@ Arithmetic of natural numbers: abstract syntax
<P>
<!-- NEW -->
</P>
<A NAME="toc18"></A>
<H2>Translation system</H2>
<P>
We can translate using the abstract syntax as interlingua:
@@ -611,7 +539,6 @@ But is it really so simple?
<P>
<!-- NEW -->
</P>
<A NAME="toc19"></A>
<H2>Difficulties with concrete syntax</H2>
<P>
The previous multilingual grammar breaks these rules in many situations:
@@ -628,7 +555,6 @@ All these sentences are grammatically incorrect.
<P>
<!-- NEW -->
</P>
<A NAME="toc20"></A>
<H2>Solving the difficulties</H2>
<P>
GF <I>can</I> express the linguistic rules that are needed to
@@ -659,7 +585,6 @@ Linguistic knowledge dominates in the size of this grammar.
<P>
<!-- NEW -->
</P>
<A NAME="toc21"></A>
<H2>Application grammars vs. resource grammars</H2>
<P>
Application grammar ("semantic grammar")
@@ -682,7 +607,6 @@ Resource grammar ("syntactic grammar")
<P>
<!-- NEW -->
</P>
<A NAME="toc22"></A>
<H2>GF as programming language</H2>
<P>
The expressive power is between TAG and HPSG.
@@ -702,7 +626,6 @@ We have built a <B>module system</B> that can hide details.
<P>
<!-- NEW -->
</P>
<A NAME="toc23"></A>
<H2>Concrete syntax using library</H2>
<P>
Assume the following API
@@ -733,7 +656,6 @@ Notice: the choice of adjective is domain expert knowledge.
<P>
<!-- NEW -->
</P>
<A NAME="toc24"></A>
<H2>Design questions for the grammar library</H2>
<P>
What should there be in the library?
@@ -765,7 +687,6 @@ hence cannot use existing proprietary resources.
<P>
<!-- NEW -->
</P>
<A NAME="toc25"></A>
<H2>Design decisions</H2>
<P>
Coverage, for each language:
@@ -798,7 +719,6 @@ Presentation:
<P>
<!-- NEW -->
</P>
<A NAME="toc26"></A>
<H2>Design decisions, cont'd</H2>
<P>
Where do we get the data from?
@@ -818,7 +738,6 @@ The resource grammar library is entirely open-source free software
<P>
<!-- NEW -->
</P>
<A NAME="toc27"></A>
<H2>Success criteria and evaluation</H2>
<P>
Grammatical correctness of everything generated.
@@ -838,7 +757,6 @@ Tools for regression testing (treebank generation and comparison)
<P>
<!-- NEW -->
</P>
<A NAME="toc28"></A>
<H2>These are not our success criteria</H2>
<P>
Language coverage:
@@ -873,7 +791,6 @@ Linguistic innovation in syntax:
<P>
<!-- NEW -->
</P>
<A NAME="toc29"></A>
<H2>Where is semantics?</H2>
<P>
Application grammars use domain-specific
@@ -897,7 +814,6 @@ for all for the whole language.
<P>
<!-- NEW -->
</P>
<A NAME="toc30"></A>
<H2>Representations in different APIs</H2>
<P>
<B>Grammar composition</B>: any grammar can serve as resource to another one.
@@ -935,7 +851,6 @@ In <CODE>Lang</CODE> (ground level resource API)
<P>
<!-- NEW -->
</P>
<A NAME="toc31"></A>
<H2>Languages</H2>
<P>
The current GF Resource Project covers ten languages:
@@ -962,7 +877,6 @@ In addition, we have parts (morphology) of Arabic, Estonian, Latin, and Urdu
<P>
<!-- NEW -->
</P>
<A NAME="toc32"></A>
<H2>Library structure 1: language-independent API</H2>
<P>
<IMG ALIGN="middle" SRC="Lang.png" BORDER="0" ALT="">
@@ -979,7 +893,6 @@ Cf. "matrix" in BLARK, LinGo
<P>
<!-- NEW -->
</P>
<A NAME="toc33"></A>
<H2>Library structure 2: language-dependent APIs</H2>
<UL>
<LI>morphological paradigms, e.g. <CODE>ParadigmsSwe</CODE>
@@ -1000,7 +913,6 @@ Cf. "matrix" in BLARK, LinGo
<P>
<!-- NEW -->
</P>
<A NAME="toc34"></A>
<H2>Difficulties encountered</H2>
<P>
English: negation and auxiliary vs. non-auxiliary verbs
@@ -1023,7 +935,6 @@ Scandinavian: determiners
<P>
<!-- NEW -->
</P>
<A NAME="toc35"></A>
<H2>How much can be language-independent?</H2>
<P>
For the ten languages we have considered, it <I>is</I> possible
@@ -1049,7 +960,6 @@ Reservations:
<P>
<!-- NEW -->
</P>
<A NAME="toc36"></A>
<H2>Using the library</H2>
<P>
Simplest case: use the API in the same way for all languages.
@@ -1078,7 +988,6 @@ than writing a resource grammar!
<P>
<!-- NEW -->
</P>
<A NAME="toc37"></A>
<H2>Parametrized modules</H2>
<P>
We can go even farther than share an abstract API: we can share implementations
@@ -1098,7 +1007,6 @@ Exploited in two families:
<P>
<!-- NEW -->
</P>
<A NAME="toc38"></A>
<H2>Lexicon extension</H2>
<P>
We cannot anticipate all vocabulary needed in application grammars.
@@ -1122,7 +1030,6 @@ Example heuristic, from <A HREF="gfdoc/ParadigmsSwe.html">ParadigsSwe</A>:
<P>
<!-- NEW -->
</P>
<A NAME="toc39"></A>
<H2>Example low-level morphological definition</H2>
<PRE>
decl2Noun : Str -&gt; N = \bil -&gt;
@@ -1139,7 +1046,6 @@ Example heuristic, from <A HREF="gfdoc/ParadigmsSwe.html">ParadigsSwe</A>:
<P>
<!-- NEW -->
</P>
<A NAME="toc40"></A>
<H2>Some formats that can be generated from GF grammars</H2>
<PRE>
-printer=lbnf BNF Converter, thereby C/Bison, Java/JavaCup
@@ -1157,7 +1063,6 @@ Example heuristic, from <A HREF="gfdoc/ParadigmsSwe.html">ParadigsSwe</A>:
<P>
<!-- NEW -->
</P>
<A NAME="toc41"></A>
<H2>Use as program components</H2>
<P>
Haskell, Java, Prolog
@@ -1171,7 +1076,6 @@ Push-button creation of spoken language translators (using Nuance)
<P>
<!-- NEW -->
</P>
<A NAME="toc42"></A>
<H2>Grammar library as linguistic resource</H2>
<P>
Can we use the libraries outside domain-specific fragments?
@@ -1194,7 +1098,6 @@ Two ideas:
<P>
<!-- NEW -->
</P>
<A NAME="toc43"></A>
<H2>Corpus generation</H2>
<P>
The most general format is <B>multilingual treebank</B> generation:
@@ -1225,7 +1128,6 @@ Can this be useful? Cf. Rebecca Jonson this afternoon.
<P>
<!-- NEW -->
</P>
<A NAME="toc44"></A>
<H2>Related work</H2>
<P>
CLE = Core Language Engine
@@ -1255,7 +1157,6 @@ Parsing detached from grammar (Nivre) - grammar detached from parsing
<P>
<!-- NEW -->
</P>
<A NAME="toc45"></A>
<H2>Demo</H2>
<P>
Stoneage grammar, based on the Swadesh word list.
@@ -1267,6 +1168,6 @@ Implemented as application on top of the resource grammar.
Illustrate generation and spoken-language parsing.
</P>
<!-- html code generated by txt2tags 2.0 (http://txt2tags.sf.net) -->
<!-- cmdline: txt2tags -\-toc -thtml gslt-sem-2006.txt -->
<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
<!-- cmdline: txt2tags gslt-sem-2006.txt -->
</BODY></HTML>

View File

@@ -1,6 +1,11 @@
--# prob ConjNP 0.05
--# prob ConsNP 0.1
--# prob ConsAP 0.05
--# prob ConjAP 0.1
--# prob ConjAdv 0.1
--# prob AdvVP 0.1
--# prob ConjS 0.1
--# prob PredSCVP 0.05
--# prob PredetNP 0.05
--# prob SentCN 0.05
--# prob QuestCN 0.05