1
0
forked from GitHub/gf-core

CLT slides, final

This commit is contained in:
aarne
2006-03-08 11:07:10 +00:00
parent e74f367971
commit 1ce8ef0ba9
2 changed files with 278 additions and 42 deletions

View File

@@ -7,7 +7,7 @@
<P ALIGN="center"><CENTER><H1>The GF Resource Grammar Library Version 1.0</H1>
<FONT SIZE="4">
<I>Author: Aarne Ranta &lt;aarne (at) cs.chalmers.se&gt;</I><BR>
Last update: Wed Mar 8 09:47:07 2006
Last update: Wed Mar 8 12:04:15 2006
</FONT></CENTER>
<P>
@@ -63,7 +63,7 @@ Usability for different purposes
<P>
<!-- NEW -->
</P>
<H3>Grammar as parser</H3>
<H3>Not primarily code for a parser</H3>
<P>
Often in NLP, a grammar is just high-level code for a parser.
</P>
@@ -94,20 +94,31 @@ Moreover, a grammar fine-tuned for parsing may not be reusable
Linguistic ontology: <B>abstract syntax</B>
</P>
<P>
E.g. adjectival modification
E.g. adjectival modification rule
</P>
<PRE>
AdjCN : AP -&gt; CN -&gt; CN ;
</PRE>
<P></P>
<P>
Rendering in different languages: <B>concrete syntax</B>
</P>
<PRE>
AdjCN (PositA even_A) (UseN number_N)
even number, even sums
jämnt tal, jämna summor
nombre pair, sommes paires
</PRE>
<P>
Abstract away from inflection, agreement, word order.
</P>
<P>
Resource grammars have generation perspective, rather than parsing
</P>
<UL>
<LI>abstract syntax serves as a key to expressions in different languages
<LI>abstract syntax serves as a key to renderings in different languages
</UL>
<P>
@@ -314,7 +325,7 @@ The current GF Resource Project covers ten languages:
</UL>
<P>
In addition, parts (morphology) of Arabic, Estonian, Latin, and Urdu
In addition, parts of Arabic, Estonian, Latin, and Urdu
</P>
<P>
API 1.0 not yet implemented for Danish and Russian
@@ -449,7 +460,7 @@ proper names, pronouns, determiners, possessives, cardinals and ordinals
<P>
<!-- NEW -->
</P>
<H3>Structure in syntax editor</H3>
<H3>The structure in the syntax editor</H3>
<P>
<IMG ALIGN="middle" SRC="editor.png" BORDER="0" ALT="">
</P>
@@ -476,7 +487,7 @@ In Swedish, less so:
regN "val" ---&gt; val, valen, valar, valarna
</PRE>
<P>
Initializing a lexicon with <CODE>regX</CODE>s is
Initializing a lexicon with <CODE>regX</CODE> for every entry is
usually a good starting point in grammar development.
</P>
<P>
@@ -494,9 +505,9 @@ In Swedish, giving the gender of <CODE>N</CODE> improves a lot
There are also special constructs taking other forms:
</P>
<PRE>
mk2N : (nyckel,nycklar : Str) -&gt; N
mk2N : (nyckel,nycklar : Str) -&gt; N
mk1N : (bilarna : Str) -&gt; N
mk1N : (bilarna : Str) -&gt; N
irregV : (dricka, drack, druckit : Str) -&gt; V
</PRE>
@@ -533,8 +544,13 @@ Iregular words in <CODE>IrregX</CODE>, e.g. Swedish:
</P>
<PRE>
draga_V : V =
mkV (variants { "dra"; "draga"}) (variants { "drar" ; "drager"})
(variants { "dra" ; "drag" }) "drog" "dragit" "dragen" ;
mkV
(variants { "dra" ; "draga"})
(variants { "drar" ; "drager"})
(variants { "dra" ; "drag" })
"drog"
"dragit"
"dragen" ;
</PRE>
<P>
Goal: eliminate the user's need of worst-case functions.
@@ -547,14 +563,18 @@ Goal: eliminate the user's need of worst-case functions.
Syntactic structures that are not shared by all languages.
</P>
<P>
Alternative (and often more idiomatic) ways to say what is already covered by the API.
</P>
<P>
Not implemented yet.
</P>
<P>
Candidates:
</P>
<UL>
<LI><CODE>Nor</CODE> post-possessives: <CODE>bilen min</CODE>
<LI><CODE>Fre</CODE> question forms: <CODE>est-ce que tu dors ?</CODE>
<LI>Norwegian post-possessives: <CODE>bilen min</CODE>
<LI>French question forms: <CODE>est-ce que tu dors ?</CODE>
<LI>Romance simple past tenses
</UL>
<P>
@@ -600,7 +620,7 @@ files again. Just do some of
gf -nocf -path=alltenses:prelude alltenses/LangSwe.gfc -- Swedish only
gf -nocf -path=alltenses:prelude present/LangSwe.gfc -- Swedish only, present tense only
gf -nocf -path=present:prelude present/LangSwe.gfc -- Swedish in present tense only
</PRE>
<P></P>
<P>
@@ -608,10 +628,14 @@ files again. Just do some of
</P>
<H3>Parsing</H3>
<P>
The default parser does not work!
The default parser does not work! (It is obsolete anyway.)
</P>
<P>
The MCFG parser works in some languages, after waiting appr. 20 seconds
The MCFG parser (the new standard) works in theory, but can
in practice be too slow to build.
</P>
<P>
But it does work in some languages, after waiting appr. 20 seconds
</P>
<PRE>
p -mcfg -lang=LangEng -cat=S "I would see her"
@@ -621,6 +645,14 @@ The MCFG parser works in some languages, after waiting appr. 20 seconds
<P>
Parsing in <CODE>present/</CODE> versions is quicker.
</P>
<P>
Remedies:
</P>
<UL>
<LI>write application grammars for parsing
<LI>use treebank lookup instead
</UL>
<P>
<!-- NEW -->
</P>
@@ -818,7 +850,7 @@ Problems:
<P>
<!-- NEW -->
</P>
<H3>The core of the API</H3>
<H3>The core API</H3>
<P>
Everything else is variations of this
</P>
@@ -842,15 +874,105 @@ Everything else is variations of this
<P>
<!-- NEW -->
</P>
<H3>The core of the API</H3>
<H3>The core API in Latin: parameters</H3>
<P>
This <A HREF="latin.gf">toy Latin grammar</A> shows in a nutshell how the core
can be implemented.
</P>
<PRE>
param
Number = Sg | Pl ;
Person = P1 | P2 | P3 ;
Tense = Pres | Past ;
Polarity = Pos | Neg ;
Case = Nom | Acc | Dat ;
Gender = Masc | Fem | Neutr ;
oper
Agr = {g : Gender ; n : Number ; p : Person} ; -- agreement features
</PRE>
<P></P>
<P>
Use this API as a first approximation when designing the parameter system of a new
language.
<!-- NEW -->
</P>
<H3>The core API in Latin: linearization types</H3>
<PRE>
lincat
Cl = {
s : Tense =&gt; Polarity =&gt; Str
} ;
VP = {
verb : Tense =&gt; Polarity =&gt; Agr =&gt; Str ; -- finite verb
neg : Polarity =&gt; Str ; -- negation
compl : Agr =&gt; Str -- complement
} ;
V2 = {
s : Tense =&gt; Number =&gt; Person =&gt; Str ;
c : Case -- complement case
} ;
NP = {
s : Case =&gt; Str ;
a : Agr -- agreement features
} ;
CN = {
s : Number =&gt; Case =&gt; Str ;
g : Gender
} ;
Det = {
s : Gender =&gt; Case =&gt; Str ;
n : Number
} ;
AP = {
s : Gender =&gt; Number =&gt; Case =&gt; Str
} ;
</PRE>
<P></P>
<P>
<!-- NEW -->
</P>
<H3>The core API in Latin: predication and complementization</H3>
<PRE>
lin
PredVP np vp = {
s = \\t,p =&gt;
let
agr = np.a ;
subject = np.s ! Nom ;
object = vp.compl ! agr ;
verb = vp.neg ! p ++ vp.verb ! t ! p ! agr
in
subject ++ object ++ verb
} ;
ComplV2 v np = {
verb = \\t,p,a =&gt; v.s ! t ! a.n ! a.p ;
compl = \\_ =&gt; np.s ! v.c ;
neg = table {Pos =&gt; [] ; Neg =&gt; "non"}
} ;
</PRE>
<P></P>
<P>
<!-- NEW -->
</P>
<H3>The core API in Latin: determination and modification</H3>
<PRE>
DetCN det cn =
let
g = cn.g ;
n = det.n
in {
s = \\c =&gt; det.s ! g ! c ++ cn.s ! n ! c ;
a = {g = g ; n = n ; p = P3}
} ;
ModCN ap cn =
let
g = cn.g
in {
s = \\n,c =&gt; cn.s ! n ! c ++ ap.s ! g ! n ! c ;
g = g
} ;
</PRE>
<P></P>
<P>
<!-- NEW -->
</P>
@@ -886,6 +1008,6 @@ Exception: if you are working with a language-specific API extension,
you can work directly in that module.
</P>
<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
<!-- html code generated by txt2tags 2.0 (http://txt2tags.sf.net) -->
<!-- cmdline: txt2tags clt2006.txt -->
</BODY></HTML>

View File

@@ -53,7 +53,7 @@ Usability for different purposes
#NEW
===Grammar as parser===
===Not primarily code for a parser===
Often in NLP, a grammar is just high-level code for a parser.
@@ -76,15 +76,24 @@ Moreover, a grammar fine-tuned for parsing may not be reusable
Linguistic ontology: **abstract syntax**
E.g. adjectival modification
E.g. adjectival modification rule
```
AdjCN : AP -> CN -> CN ;
```
Rendering in different languages: **concrete syntax**
```
AdjCN (PositA even_A) (UseN number_N)
even number, even sums
jämnt tal, jämna summor
nombre pair, sommes paires
```
Abstract away from inflection, agreement, word order.
Resource grammars have generation perspective, rather than parsing
- abstract syntax serves as a key to expressions in different languages
- abstract syntax serves as a key to renderings in different languages
@@ -247,7 +256,7 @@ The current GF Resource Project covers ten languages:
- ``Swe``dish
In addition, parts (morphology) of Arabic, Estonian, Latin, and Urdu
In addition, parts of Arabic, Estonian, Latin, and Urdu
API 1.0 not yet implemented for Danish and Russian
@@ -368,7 +377,7 @@ TFullStop : Phr -> Text -> Text | TQuestMark, TExclMar
#NEW
===Structure in syntax editor===
===The structure in the syntax editor===
[editor.png]
@@ -392,7 +401,7 @@ In Swedish, less so:
```
regN "val" ---> val, valen, valar, valarna
```
Initializing a lexicon with ``regX``s is
Initializing a lexicon with ``regX`` for every entry is
usually a good starting point in grammar development.
@@ -407,9 +416,9 @@ In Swedish, giving the gender of ``N`` improves a lot
There are also special constructs taking other forms:
```
mk2N : (nyckel,nycklar : Str) -> N
mk2N : (nyckel,nycklar : Str) -> N
mk1N : (bilarna : Str) -> N
mk1N : (bilarna : Str) -> N
irregV : (dricka, drack, druckit : Str) -> V
```
@@ -442,8 +451,13 @@ To cover all situations, worst-case paradigms are given. E.g. Swedish
Iregular words in ``IrregX``, e.g. Swedish:
```
draga_V : V =
mkV (variants { "dra"; "draga"}) (variants { "drar" ; "drager"})
(variants { "dra" ; "drag" }) "drog" "dragit" "dragen" ;
mkV
(variants { "dra" ; "draga"})
(variants { "drar" ; "drager"})
(variants { "dra" ; "drag" })
"drog"
"dragit"
"dragen" ;
```
Goal: eliminate the user's need of worst-case functions.
@@ -455,11 +469,14 @@ Goal: eliminate the user's need of worst-case functions.
Syntactic structures that are not shared by all languages.
Alternative (and often more idiomatic) ways to say what is already covered by the API.
Not implemented yet.
Candidates:
- ``Nor`` post-possessives: ``bilen min``
- ``Fre`` question forms: ``est-ce que tu dors ?``
- Norwegian post-possessives: ``bilen min``
- French question forms: ``est-ce que tu dors ?``
- Romance simple past tenses
#NEW
@@ -498,7 +515,7 @@ files again. Just do some of
gf -nocf -path=alltenses:prelude alltenses/LangSwe.gfc -- Swedish only
gf -nocf -path=alltenses:prelude present/LangSwe.gfc -- Swedish only, present tense only
gf -nocf -path=present:prelude present/LangSwe.gfc -- Swedish in present tense only
```
@@ -506,9 +523,12 @@ files again. Just do some of
===Parsing===
The default parser does not work!
The default parser does not work! (It is obsolete anyway.)
The MCFG parser works in some languages, after waiting appr. 20 seconds
The MCFG parser (the new standard) works in theory, but can
in practice be too slow to build.
But it does work in some languages, after waiting appr. 20 seconds
```
p -mcfg -lang=LangEng -cat=S "I would see her"
@@ -516,6 +536,10 @@ The MCFG parser works in some languages, after waiting appr. 20 seconds
```
Parsing in ``present/`` versions is quicker.
Remedies:
- write application grammars for parsing
- use treebank lookup instead
#NEW
@@ -679,7 +703,7 @@ Problems:
#NEW
===The core of the API===
===The core API===
Everything else is variations of this
```
@@ -701,13 +725,103 @@ fun
#NEW
===The core of the API===
===The core API in Latin: parameters===
This [toy Latin grammar latin.gf] shows in a nutshell how the core
can be implemented.
```
param
Number = Sg | Pl ;
Person = P1 | P2 | P3 ;
Tense = Pres | Past ;
Polarity = Pos | Neg ;
Case = Nom | Acc | Dat ;
Gender = Masc | Fem | Neutr ;
oper
Agr = {g : Gender ; n : Number ; p : Person} ; -- agreement features
```
Use this API as a first approximation when designing the parameter system of a new
language.
#NEW
===The core API in Latin: linearization types===
```
lincat
Cl = {
s : Tense => Polarity => Str
} ;
VP = {
verb : Tense => Polarity => Agr => Str ; -- finite verb
neg : Polarity => Str ; -- negation
compl : Agr => Str -- complement
} ;
V2 = {
s : Tense => Number => Person => Str ;
c : Case -- complement case
} ;
NP = {
s : Case => Str ;
a : Agr -- agreement features
} ;
CN = {
s : Number => Case => Str ;
g : Gender
} ;
Det = {
s : Gender => Case => Str ;
n : Number
} ;
AP = {
s : Gender => Number => Case => Str
} ;
```
#NEW
===The core API in Latin: predication and complementization===
```
lin
PredVP np vp = {
s = \\t,p =>
let
agr = np.a ;
subject = np.s ! Nom ;
object = vp.compl ! agr ;
verb = vp.neg ! p ++ vp.verb ! t ! p ! agr
in
subject ++ object ++ verb
} ;
ComplV2 v np = {
verb = \\t,p,a => v.s ! t ! a.n ! a.p ;
compl = \\_ => np.s ! v.c ;
neg = table {Pos => [] ; Neg => "non"}
} ;
```
#NEW
===The core API in Latin: determination and modification===
```
DetCN det cn =
let
g = cn.g ;
n = det.n
in {
s = \\c => det.s ! g ! c ++ cn.s ! n ! c ;
a = {g = g ; n = n ; p = P3}
} ;
ModCN ap cn =
let
g = cn.g
in {
s = \\n,c => cn.s ! n ! c ++ ap.s ! g ! n ! c ;
g = g
} ;
```
#NEW