forked from GitHub/gf-core
seminar slides
This commit is contained in:
@@ -7,67 +7,12 @@
|
||||
<P ALIGN="center"><CENTER><H1>Grammars as Software Libraries</H1>
|
||||
<FONT SIZE="4">
|
||||
<I>Author: Aarne Ranta <aarne (at) cs.chalmers.se></I><BR>
|
||||
Last update: Thu Feb 9 11:57:20 2006
|
||||
Last update: Thu Feb 9 13:03:45 2006
|
||||
</FONT></CENTER>
|
||||
|
||||
<P></P>
|
||||
<HR NOSHADE SIZE=1>
|
||||
<P></P>
|
||||
<UL>
|
||||
<LI><A HREF="#toc1">Setting</A>
|
||||
<LI><A HREF="#toc2">People</A>
|
||||
<LI><A HREF="#toc3">Software Libraries</A>
|
||||
<LI><A HREF="#toc4">Abstraction</A>
|
||||
<LI><A HREF="#toc5">Grammars as libraries?</A>
|
||||
<LI><A HREF="#toc6">A slightly more advanced example</A>
|
||||
<LI><A HREF="#toc7">Problems with the more advanced example</A>
|
||||
<LI><A HREF="#toc8">More problems with the advanced example</A>
|
||||
<LI><A HREF="#toc9">A library-based solution</A>
|
||||
<LI><A HREF="#toc10">An improved library-based solution</A>
|
||||
<LI><A HREF="#toc11">The ultimate solution?</A>
|
||||
<LI><A HREF="#toc12">The components of a grammar library</A>
|
||||
<LI><A HREF="#toc13">Implementing a grammar library in GF</A>
|
||||
<LI><A HREF="#toc14">Linearization and parsing</A>
|
||||
<LI><A HREF="#toc15">Applying GF</A>
|
||||
<LI><A HREF="#toc16">Domain, ontology, idiom</A>
|
||||
<LI><A HREF="#toc17">Example domain</A>
|
||||
<LI><A HREF="#toc18">Translation system</A>
|
||||
<LI><A HREF="#toc19">Difficulties with concrete syntax</A>
|
||||
<LI><A HREF="#toc20">Solving the difficulties</A>
|
||||
<LI><A HREF="#toc21">Application grammars vs. resource grammars</A>
|
||||
<LI><A HREF="#toc22">GF as programming language</A>
|
||||
<LI><A HREF="#toc23">Concrete syntax using library</A>
|
||||
<LI><A HREF="#toc24">Design questions for the grammar library</A>
|
||||
<LI><A HREF="#toc25">Design decisions</A>
|
||||
<LI><A HREF="#toc26">Design decisions, cont'd</A>
|
||||
<LI><A HREF="#toc27">Success criteria and evaluation</A>
|
||||
<LI><A HREF="#toc28">These are not our success criteria</A>
|
||||
<LI><A HREF="#toc29">Where is semantics?</A>
|
||||
<LI><A HREF="#toc30">Representations in different APIs</A>
|
||||
<LI><A HREF="#toc31">Languages</A>
|
||||
<LI><A HREF="#toc32">Library structure 1: language-independent API</A>
|
||||
<LI><A HREF="#toc33">Library structure 2: language-dependent APIs</A>
|
||||
<LI><A HREF="#toc34">Difficulties encountered</A>
|
||||
<LI><A HREF="#toc35">How much can be language-independent?</A>
|
||||
<LI><A HREF="#toc36">Using the library</A>
|
||||
<LI><A HREF="#toc37">Parametrized modules</A>
|
||||
<LI><A HREF="#toc38">Lexicon extension</A>
|
||||
<LI><A HREF="#toc39">Example low-level morphological definition</A>
|
||||
<LI><A HREF="#toc40">Some formats that can be generated from GF grammars</A>
|
||||
<LI><A HREF="#toc41">Use as program components</A>
|
||||
<LI><A HREF="#toc42">Grammar library as linguistic resource</A>
|
||||
<LI><A HREF="#toc43">Corpus generation</A>
|
||||
<LI><A HREF="#toc44">Related work</A>
|
||||
<LI><A HREF="#toc45">Demo</A>
|
||||
</UL>
|
||||
|
||||
<P></P>
|
||||
<HR NOSHADE SIZE=1>
|
||||
<P></P>
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc1"></A>
|
||||
<H2>Setting</H2>
|
||||
<P>
|
||||
Current funding
|
||||
@@ -101,7 +46,6 @@ Main applications
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc2"></A>
|
||||
<H2>People</H2>
|
||||
<P>
|
||||
Staff contributions to grammar libraries:
|
||||
@@ -154,7 +98,6 @@ Resource library patches and suggestions from the WebALT staff:
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc3"></A>
|
||||
<H2>Software Libraries</H2>
|
||||
<P>
|
||||
The main device of <B>division of labour</B> in programming.
|
||||
@@ -180,7 +123,6 @@ Practical advantages:
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc4"></A>
|
||||
<H2>Abstraction</H2>
|
||||
<P>
|
||||
Libraries promote <B>abstraction</B>: you abstract away from details.
|
||||
@@ -199,7 +141,6 @@ if it just has a support for functions or macros.
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc5"></A>
|
||||
<H2>Grammars as libraries?</H2>
|
||||
<P>
|
||||
Example: we want to create a GUI (Graphical User Interface) button
|
||||
@@ -249,7 +190,6 @@ The library has an API (Application Programmer's Interface) with:
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc6"></A>
|
||||
<H2>A slightly more advanced example</H2>
|
||||
<P>
|
||||
This is what you often see as a feedback from a program:
|
||||
@@ -277,7 +217,6 @@ The code that should be written is of course
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc7"></A>
|
||||
<H2>Problems with the more advanced example</H2>
|
||||
<P>
|
||||
The same as with "Yes": you have to know the words "you",
|
||||
@@ -304,7 +243,6 @@ of "message":
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc8"></A>
|
||||
<H2>More problems with the advanced example</H2>
|
||||
<P>
|
||||
You also have to know the case required by the verb "have"
|
||||
@@ -328,7 +266,6 @@ address the user:
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc9"></A>
|
||||
<H2>A library-based solution</H2>
|
||||
<P>
|
||||
In analogy with the "Yes" case, you write
|
||||
@@ -350,7 +287,6 @@ It is time to move from <B>canned text</B> to a <B>grammar</B>.
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc10"></A>
|
||||
<H2>An improved library-based solution</H2>
|
||||
<P>
|
||||
You may want to write
|
||||
@@ -378,7 +314,6 @@ For this purpose, you need a library with the API
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc11"></A>
|
||||
<H2>The ultimate solution?</H2>
|
||||
<P>
|
||||
The library API for language will certainly grow big and become
|
||||
@@ -423,7 +358,6 @@ Thus some amount of interaction is needed.
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc12"></A>
|
||||
<H2>The components of a grammar library</H2>
|
||||
<P>
|
||||
The library has <B>construction functions</B> like
|
||||
@@ -451,7 +385,6 @@ knowledge by application programmers!
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc13"></A>
|
||||
<H2>Implementing a grammar library in GF</H2>
|
||||
<P>
|
||||
GF = Grammatical Framework
|
||||
@@ -495,7 +428,6 @@ Simplest possible example:
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc14"></A>
|
||||
<H2>Linearization and parsing</H2>
|
||||
<P>
|
||||
The realizatin function is, for each language, implemented by
|
||||
@@ -517,7 +449,6 @@ The GF formalism moreover has the property of <B>reversibility</B>:
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc15"></A>
|
||||
<H2>Applying GF</H2>
|
||||
<P>
|
||||
<B>multilingual grammar</B> = abstract syntax + concrete syntaxes
|
||||
@@ -534,7 +465,6 @@ Examples of the idea:
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc16"></A>
|
||||
<H2>Domain, ontology, idiom</H2>
|
||||
<P>
|
||||
An abstract syntax has other names:
|
||||
@@ -566,7 +496,6 @@ Problem: the expertise of both a linguist and a domain expert are required.
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc17"></A>
|
||||
<H2>Example domain</H2>
|
||||
<P>
|
||||
Arithmetic of natural numbers: abstract syntax
|
||||
@@ -589,7 +518,6 @@ Arithmetic of natural numbers: abstract syntax
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc18"></A>
|
||||
<H2>Translation system</H2>
|
||||
<P>
|
||||
We can translate using the abstract syntax as interlingua:
|
||||
@@ -611,7 +539,6 @@ But is it really so simple?
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc19"></A>
|
||||
<H2>Difficulties with concrete syntax</H2>
|
||||
<P>
|
||||
The previous multilingual grammar breaks these rules in many situations:
|
||||
@@ -628,7 +555,6 @@ All these sentences are grammatically incorrect.
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc20"></A>
|
||||
<H2>Solving the difficulties</H2>
|
||||
<P>
|
||||
GF <I>can</I> express the linguistic rules that are needed to
|
||||
@@ -659,7 +585,6 @@ Linguistic knowledge dominates in the size of this grammar.
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc21"></A>
|
||||
<H2>Application grammars vs. resource grammars</H2>
|
||||
<P>
|
||||
Application grammar ("semantic grammar")
|
||||
@@ -682,7 +607,6 @@ Resource grammar ("syntactic grammar")
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc22"></A>
|
||||
<H2>GF as programming language</H2>
|
||||
<P>
|
||||
The expressive power is between TAG and HPSG.
|
||||
@@ -702,7 +626,6 @@ We have built a <B>module system</B> that can hide details.
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc23"></A>
|
||||
<H2>Concrete syntax using library</H2>
|
||||
<P>
|
||||
Assume the following API
|
||||
@@ -733,7 +656,6 @@ Notice: the choice of adjective is domain expert knowledge.
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc24"></A>
|
||||
<H2>Design questions for the grammar library</H2>
|
||||
<P>
|
||||
What should there be in the library?
|
||||
@@ -765,7 +687,6 @@ hence cannot use existing proprietary resources.
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc25"></A>
|
||||
<H2>Design decisions</H2>
|
||||
<P>
|
||||
Coverage, for each language:
|
||||
@@ -798,7 +719,6 @@ Presentation:
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc26"></A>
|
||||
<H2>Design decisions, cont'd</H2>
|
||||
<P>
|
||||
Where do we get the data from?
|
||||
@@ -818,7 +738,6 @@ The resource grammar library is entirely open-source free software
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc27"></A>
|
||||
<H2>Success criteria and evaluation</H2>
|
||||
<P>
|
||||
Grammatical correctness of everything generated.
|
||||
@@ -838,7 +757,6 @@ Tools for regression testing (treebank generation and comparison)
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc28"></A>
|
||||
<H2>These are not our success criteria</H2>
|
||||
<P>
|
||||
Language coverage:
|
||||
@@ -873,7 +791,6 @@ Linguistic innovation in syntax:
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc29"></A>
|
||||
<H2>Where is semantics?</H2>
|
||||
<P>
|
||||
Application grammars use domain-specific
|
||||
@@ -897,7 +814,6 @@ for all for the whole language.
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc30"></A>
|
||||
<H2>Representations in different APIs</H2>
|
||||
<P>
|
||||
<B>Grammar composition</B>: any grammar can serve as resource to another one.
|
||||
@@ -935,7 +851,6 @@ In <CODE>Lang</CODE> (ground level resource API)
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc31"></A>
|
||||
<H2>Languages</H2>
|
||||
<P>
|
||||
The current GF Resource Project covers ten languages:
|
||||
@@ -962,7 +877,6 @@ In addition, we have parts (morphology) of Arabic, Estonian, Latin, and Urdu
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc32"></A>
|
||||
<H2>Library structure 1: language-independent API</H2>
|
||||
<P>
|
||||
<IMG ALIGN="middle" SRC="Lang.png" BORDER="0" ALT="">
|
||||
@@ -979,7 +893,6 @@ Cf. "matrix" in BLARK, LinGo
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc33"></A>
|
||||
<H2>Library structure 2: language-dependent APIs</H2>
|
||||
<UL>
|
||||
<LI>morphological paradigms, e.g. <CODE>ParadigmsSwe</CODE>
|
||||
@@ -1000,7 +913,6 @@ Cf. "matrix" in BLARK, LinGo
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc34"></A>
|
||||
<H2>Difficulties encountered</H2>
|
||||
<P>
|
||||
English: negation and auxiliary vs. non-auxiliary verbs
|
||||
@@ -1023,7 +935,6 @@ Scandinavian: determiners
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc35"></A>
|
||||
<H2>How much can be language-independent?</H2>
|
||||
<P>
|
||||
For the ten languages we have considered, it <I>is</I> possible
|
||||
@@ -1049,7 +960,6 @@ Reservations:
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc36"></A>
|
||||
<H2>Using the library</H2>
|
||||
<P>
|
||||
Simplest case: use the API in the same way for all languages.
|
||||
@@ -1078,7 +988,6 @@ than writing a resource grammar!
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc37"></A>
|
||||
<H2>Parametrized modules</H2>
|
||||
<P>
|
||||
We can go even farther than share an abstract API: we can share implementations
|
||||
@@ -1098,7 +1007,6 @@ Exploited in two families:
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc38"></A>
|
||||
<H2>Lexicon extension</H2>
|
||||
<P>
|
||||
We cannot anticipate all vocabulary needed in application grammars.
|
||||
@@ -1122,7 +1030,6 @@ Example heuristic, from <A HREF="gfdoc/ParadigmsSwe.html">ParadigsSwe</A>:
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc39"></A>
|
||||
<H2>Example low-level morphological definition</H2>
|
||||
<PRE>
|
||||
decl2Noun : Str -> N = \bil ->
|
||||
@@ -1139,7 +1046,6 @@ Example heuristic, from <A HREF="gfdoc/ParadigmsSwe.html">ParadigsSwe</A>:
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc40"></A>
|
||||
<H2>Some formats that can be generated from GF grammars</H2>
|
||||
<PRE>
|
||||
-printer=lbnf BNF Converter, thereby C/Bison, Java/JavaCup
|
||||
@@ -1157,7 +1063,6 @@ Example heuristic, from <A HREF="gfdoc/ParadigmsSwe.html">ParadigsSwe</A>:
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc41"></A>
|
||||
<H2>Use as program components</H2>
|
||||
<P>
|
||||
Haskell, Java, Prolog
|
||||
@@ -1171,7 +1076,6 @@ Push-button creation of spoken language translators (using Nuance)
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc42"></A>
|
||||
<H2>Grammar library as linguistic resource</H2>
|
||||
<P>
|
||||
Can we use the libraries outside domain-specific fragments?
|
||||
@@ -1194,7 +1098,6 @@ Two ideas:
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc43"></A>
|
||||
<H2>Corpus generation</H2>
|
||||
<P>
|
||||
The most general format is <B>multilingual treebank</B> generation:
|
||||
@@ -1225,7 +1128,6 @@ Can this be useful? Cf. Rebecca Jonson this afternoon.
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc44"></A>
|
||||
<H2>Related work</H2>
|
||||
<P>
|
||||
CLE = Core Language Engine
|
||||
@@ -1255,7 +1157,6 @@ Parsing detached from grammar (Nivre) - grammar detached from parsing
|
||||
<P>
|
||||
<!-- NEW -->
|
||||
</P>
|
||||
<A NAME="toc45"></A>
|
||||
<H2>Demo</H2>
|
||||
<P>
|
||||
Stoneage grammar, based on the Swadesh word list.
|
||||
@@ -1267,6 +1168,6 @@ Implemented as application on top of the resource grammar.
|
||||
Illustrate generation and spoken-language parsing.
|
||||
</P>
|
||||
|
||||
<!-- html code generated by txt2tags 2.0 (http://txt2tags.sf.net) -->
|
||||
<!-- cmdline: txt2tags -\-toc -thtml gslt-sem-2006.txt -->
|
||||
<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
|
||||
<!-- cmdline: txt2tags gslt-sem-2006.txt -->
|
||||
</BODY></HTML>
|
||||
|
||||
@@ -1,6 +1,11 @@
|
||||
--# prob ConjNP 0.05
|
||||
--# prob ConsNP 0.1
|
||||
--# prob ConsAP 0.05
|
||||
--# prob ConjAP 0.1
|
||||
--# prob ConjAdv 0.1
|
||||
--# prob AdvVP 0.1
|
||||
--# prob ConjS 0.1
|
||||
--# prob PredSCVP 0.05
|
||||
--# prob PredetNP 0.05
|
||||
--# prob SentCN 0.05
|
||||
--# prob QuestCN 0.05
|
||||
|
||||
Reference in New Issue
Block a user