CLT slides, final

2006-03-08 11:07:10 +00:00
parent e74f367971
commit 1ce8ef0ba9
2 changed files with 278 additions and 42 deletions
@@ -7,7 +7,7 @@
 <P ALIGN="center"><CENTER><H1>The GF Resource Grammar Library Version 1.0</H1>
 <FONT SIZE="4">
 <I>Author: Aarne Ranta &lt;aarne (at) cs.chalmers.se&gt;</I><BR>
-Last update: Wed Mar  8 09:47:07 2006
+Last update: Wed Mar  8 12:04:15 2006
 </FONT></CENTER>

 <P>
@@ -63,7 +63,7 @@ Usability for different purposes
 <P>
 <!-- NEW -->
 </P>
-<H3>Grammar as parser</H3>
+<H3>Not primarily code for a parser</H3>
 <P>
 Often in NLP, a grammar is just high-level code for a parser.
 </P>
@@ -94,20 +94,31 @@ Moreover, a grammar fine-tuned for parsing may not be reusable
 Linguistic ontology: <B>abstract syntax</B>
 </P>
 <P>
-E.g. adjectival modification
+E.g. adjectival modification rule
 </P>
 <PRE>
    AdjCN : AP -&gt; CN -&gt; CN ;
 </PRE>
-<P></P>
 <P>
 Rendering in different languages: <B>concrete syntax</B>
 </P>
+<PRE>
+    AdjCN (PositA even_A) (UseN number_N)
+  
+    even number, even sums
+  
+    jämnt tal, jämna summor
+  
+    nombre pair, sommes paires
+</PRE>
+<P>
+Abstract away from inflection, agreement, word order.
+</P>
 <P>
 Resource grammars have generation perspective, rather than parsing
 </P>
 <UL>
-<LI>abstract syntax serves as a key to expressions in different languages
+<LI>abstract syntax serves as a key to renderings in different languages
 </UL>

 <P>
@@ -314,7 +325,7 @@ The current GF Resource Project covers ten languages:
 </UL>

 <P>
-In addition, parts (morphology) of Arabic, Estonian, Latin, and Urdu
+In addition, parts of Arabic, Estonian, Latin, and Urdu
 </P>
 <P>
 API 1.0 not yet implemented for Danish and Russian
@@ -449,7 +460,7 @@ proper names, pronouns, determiners, possessives, cardinals and ordinals
 <P>
 <!-- NEW -->
 </P>
-<H3>Structure in syntax editor</H3>
+<H3>The structure in the syntax editor</H3>
 <P>
 <IMG ALIGN="middle" SRC="editor.png" BORDER="0" ALT="">
 </P>
@@ -476,7 +487,7 @@ In Swedish, less so:
    regN "val" ---&gt; val, valen, valar, valarna
 </PRE>
 <P>
-Initializing a lexicon with <CODE>regX</CODE>s is
+Initializing a lexicon with <CODE>regX</CODE> for every entry is
 usually a good starting point in grammar development.
 </P>
 <P>
@@ -494,9 +505,9 @@ In Swedish, giving the gender of <CODE>N</CODE> improves a lot
 There are also special constructs taking other forms:
 </P>
 <PRE>
-    mk2N : (nyckel,nycklar : Str) -&gt; N
+    mk2N   : (nyckel,nycklar : Str) -&gt; N
  
-    mk1N : (bilarna : Str) -&gt; N
+    mk1N   : (bilarna : Str) -&gt; N
  
    irregV : (dricka, drack, druckit : Str) -&gt; V
 </PRE>
@@ -533,8 +544,13 @@ Iregular words in <CODE>IrregX</CODE>, e.g. Swedish:
 </P>
 <PRE>
      draga_V : V = 
-        mkV (variants { "dra"; "draga"}) (variants { "drar" ; "drager"}) 
-            (variants { "dra" ; "drag" }) "drog" "dragit" "dragen" ;
+        mkV 
+          (variants { "dra"  ; "draga"}) 
+          (variants { "drar" ; "drager"}) 
+          (variants { "dra"  ; "drag" }) 
+          "drog" 
+          "dragit" 
+          "dragen" ;
 </PRE>
 <P>
 Goal: eliminate the user's need of worst-case functions.
@@ -547,14 +563,18 @@ Goal: eliminate the user's need of worst-case functions.
 Syntactic structures that are not shared by all languages.
 </P>
 <P>
+Alternative (and often more idiomatic) ways to say what is already covered by the API.
+</P>
+<P>
 Not implemented yet.
 </P>
 <P>
 Candidates:
 </P>
 <UL>
-<LI><CODE>Nor</CODE> post-possessives: <CODE>bilen min</CODE>
-<LI><CODE>Fre</CODE> question forms: <CODE>est-ce que tu dors ?</CODE>
+<LI>Norwegian post-possessives: <CODE>bilen min</CODE>
+<LI>French question forms: <CODE>est-ce que tu dors ?</CODE>
+<LI>Romance simple past tenses
 </UL>

 <P>
@@ -600,7 +620,7 @@ files again. Just do some of
   
    gf -nocf -path=alltenses:prelude alltenses/LangSwe.gfc -- Swedish only
  
-    gf -nocf -path=alltenses:prelude present/LangSwe.gfc   -- Swedish only, present tense only
+    gf -nocf -path=present:prelude present/LangSwe.gfc     -- Swedish in present tense only
 </PRE>
 <P></P>
 <P>
@@ -608,10 +628,14 @@ files again. Just do some of
 </P>
 <H3>Parsing</H3>
 <P>
-The default parser does not work!
+The default parser does not work! (It is obsolete anyway.)
 </P>
 <P>
-The MCFG parser works in some languages, after waiting appr. 20 seconds
+The MCFG parser (the new standard) works in theory, but can
+in practice be too slow to build.
+</P>
+<P>
+But it does work in some languages, after waiting appr. 20 seconds
 </P>
 <PRE>
    p -mcfg -lang=LangEng -cat=S "I would see her"
@@ -621,6 +645,14 @@ The MCFG parser works in some languages, after waiting appr. 20 seconds
 <P>
 Parsing in <CODE>present/</CODE> versions is quicker.
 </P>
+<P>
+Remedies:
+</P>
+<UL>
+<LI>write application grammars for parsing
+<LI>use treebank lookup instead
+</UL>
+
 <P>
 <!-- NEW -->
 </P>
@@ -818,7 +850,7 @@ Problems:
 <P>
 <!-- NEW -->
 </P>
-<H3>The core of the API</H3>
+<H3>The core API</H3>
 <P>
 Everything else is variations of this
 </P>
@@ -842,15 +874,105 @@ Everything else is variations of this
 <P>
 <!-- NEW -->
 </P>
-<H3>The core of the API</H3>
+<H3>The core API in Latin: parameters</H3>
 <P>
 This <A HREF="latin.gf">toy Latin grammar</A> shows in a nutshell how the core
 can be implemented.
 </P>
+<PRE>
+  param
+    Number   = Sg | Pl ;
+    Person   = P1 | P2 | P3 ;
+    Tense    = Pres | Past ;
+    Polarity = Pos | Neg ;
+    Case     = Nom | Acc | Dat ;
+    Gender   = Masc | Fem | Neutr ;
+  oper
+    Agr = {g : Gender ; n : Number ; p : Person} ; -- agreement features
+</PRE>
+<P></P>
 <P>
-Use this API as a first approximation when designing the parameter system of a new
-language. 
+<!-- NEW -->
 </P>
+<H3>The core API in Latin: linearization types</H3>
+<PRE>
+  lincat
+    Cl = {
+      s : Tense =&gt; Polarity =&gt; Str
+      } ;
+    VP  = {
+      verb  : Tense =&gt; Polarity =&gt; Agr =&gt; Str ;  -- finite verb
+      neg   : Polarity =&gt; Str ;                  -- negation
+      compl : Agr =&gt; Str                         -- complement
+      } ;
+    V2 = {
+      s : Tense =&gt; Number =&gt; Person =&gt; Str ; 
+      c : Case                                   -- complement case
+      } ;
+    NP = {
+      s : Case =&gt; Str ; 
+      a : Agr                                    -- agreement features
+      } ;
+    CN = {
+      s : Number =&gt; Case =&gt; Str ; 
+      g : Gender
+      } ;
+    Det = {
+      s : Gender =&gt; Case =&gt; Str ; 
+      n : Number
+      } ;
+    AP = {
+      s : Gender =&gt; Number =&gt; Case =&gt; Str
+      } ;
+</PRE>
+<P></P>
+<P>
+<!-- NEW -->
+</P>
+<H3>The core API in Latin: predication and complementization</H3>
+<PRE>
+  lin
+    PredVP np vp = {
+      s = \\t,p =&gt; 
+        let
+          agr = np.a ;
+          subject = np.s ! Nom ;
+          object  = vp.compl ! agr ;
+          verb    = vp.neg ! p ++ vp.verb ! t ! p ! agr  
+        in                      
+        subject ++ object ++ verb
+      } ;
+  
+    ComplV2 v np = {
+      verb  = \\t,p,a =&gt; v.s ! t ! a.n ! a.p ;
+      compl = \\_ =&gt; np.s ! v.c ;
+      neg   = table {Pos =&gt; [] ; Neg =&gt; "non"}
+      } ;
+</PRE>
+<P></P>
+<P>
+<!-- NEW -->
+</P>
+<H3>The core API in Latin: determination and modification</H3>
+<PRE>
+    DetCN det cn = 
+      let 
+        g = cn.g ; 
+        n = det.n
+      in {
+        s = \\c =&gt; det.s ! g ! c ++ cn.s ! n ! c ;
+        a = {g = g ; n = n ; p = P3}
+        } ;
+  
+    ModCN ap cn = 
+      let 
+        g = cn.g 
+      in {
+        s = \\n,c =&gt; cn.s ! n ! c ++ ap.s ! g ! n ! c ;
+        g = g
+        } ;
+</PRE>
+<P></P>
 <P>
 <!-- NEW -->
 </P>
@@ -886,6 +1008,6 @@ Exception: if you are working with a language-specific API extension,
 you can work directly in that module.
 </P>

-<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
+<!-- html code generated by txt2tags 2.0 (http://txt2tags.sf.net) -->
 <!-- cmdline: txt2tags clt2006.txt -->
 </BODY></HTML>
@@ -53,7 +53,7 @@ Usability for different purposes

 #NEW

-===Grammar as parser===
+===Not primarily code for a parser===

 Often in NLP, a grammar is just high-level code for a parser.

@@ -76,15 +76,24 @@ Moreover, a grammar fine-tuned for parsing may not be reusable

 Linguistic ontology: **abstract syntax**

-E.g. adjectival modification
+E.g. adjectival modification rule
 ```
  AdjCN : AP -> CN -> CN ;
 ```
-
 Rendering in different languages: **concrete syntax**
+```
+  AdjCN (PositA even_A) (UseN number_N)
+
+  even number, even sums
+
+  jämnt tal, jämna summor
+
+  nombre pair, sommes paires
+```
+Abstract away from inflection, agreement, word order.

 Resource grammars have generation perspective, rather than parsing
- abstract syntax serves as a key to expressions in different languages
+- abstract syntax serves as a key to renderings in different languages



@@ -247,7 +256,7 @@ The current GF Resource Project covers ten languages:
 - ``Swe``dish


-In addition, parts (morphology) of Arabic, Estonian, Latin, and Urdu
+In addition, parts of Arabic, Estonian, Latin, and Urdu

 API 1.0 not yet implemented for Danish and Russian

@@ -368,7 +377,7 @@ TFullStop              : Phr -> Text -> Text              | TQuestMark, TExclMar

 #NEW

-===Structure in syntax editor===
+===The structure in the syntax editor===

 [editor.png]

@@ -392,7 +401,7 @@ In Swedish, less so:
 ```
  regN "val" ---> val, valen, valar, valarna
 ```
-Initializing a lexicon with ``regX``s is
+Initializing a lexicon with ``regX`` for every entry is
 usually a good starting point in grammar development.


@@ -407,9 +416,9 @@ In Swedish, giving the gender of ``N`` improves a lot

 There are also special constructs taking other forms:
 ```
-  mk2N : (nyckel,nycklar : Str) -> N
+  mk2N   : (nyckel,nycklar : Str) -> N

-  mk1N : (bilarna : Str) -> N
+  mk1N   : (bilarna : Str) -> N

  irregV : (dricka, drack, druckit : Str) -> V
 ```
@@ -442,8 +451,13 @@ To cover all situations, worst-case paradigms are given. E.g. Swedish
 Iregular words in ``IrregX``, e.g. Swedish:
 ```
    draga_V : V = 
-      mkV (variants { "dra"; "draga"}) (variants { "drar" ; "drager"}) 
-          (variants { "dra" ; "drag" }) "drog" "dragit" "dragen" ;
+      mkV 
+        (variants { "dra"  ; "draga"}) 
+        (variants { "drar" ; "drager"}) 
+        (variants { "dra"  ; "drag" }) 
+        "drog" 
+        "dragit" 
+        "dragen" ;
 ```
 Goal: eliminate the user's need of worst-case functions.

@@ -455,11 +469,14 @@ Goal: eliminate the user's need of worst-case functions.

 Syntactic structures that are not shared by all languages.

+Alternative (and often more idiomatic) ways to say what is already covered by the API.
+
 Not implemented yet.

 Candidates:
- ``Nor`` post-possessives: ``bilen min``
- ``Fre`` question forms: ``est-ce que tu dors ?``
+- Norwegian post-possessives: ``bilen min``
+- French question forms: ``est-ce que tu dors ?``
+- Romance simple past tenses


 #NEW
@@ -498,7 +515,7 @@ files again. Just do some of
 
  gf -nocf -path=alltenses:prelude alltenses/LangSwe.gfc -- Swedish only

-  gf -nocf -path=alltenses:prelude present/LangSwe.gfc   -- Swedish only, present tense only
+  gf -nocf -path=present:prelude present/LangSwe.gfc     -- Swedish in present tense only
 ```


@@ -506,9 +523,12 @@ files again. Just do some of

 ===Parsing===

-The default parser does not work!
+The default parser does not work! (It is obsolete anyway.)

-The MCFG parser works in some languages, after waiting appr. 20 seconds
+The MCFG parser (the new standard) works in theory, but can
+in practice be too slow to build.
+
+But it does work in some languages, after waiting appr. 20 seconds
 ```
  p -mcfg -lang=LangEng -cat=S "I would see her"

@@ -516,6 +536,10 @@ The MCFG parser works in some languages, after waiting appr. 20 seconds
 ```
 Parsing in ``present/`` versions is quicker.

+Remedies:
+- write application grammars for parsing
+- use treebank lookup instead
+

 #NEW

@@ -679,7 +703,7 @@ Problems:

 #NEW

-===The core of the API===
+===The core API===

 Everything else is variations of this
 ```
@@ -701,13 +725,103 @@ fun

 #NEW

-===The core of the API===
+===The core API in Latin: parameters===

 This [toy Latin grammar  latin.gf] shows in a nutshell how the core
 can be implemented.
+```
+param
+  Number   = Sg | Pl ;
+  Person   = P1 | P2 | P3 ;
+  Tense    = Pres | Past ;
+  Polarity = Pos | Neg ;
+  Case     = Nom | Acc | Dat ;
+  Gender   = Masc | Fem | Neutr ;
+oper
+  Agr = {g : Gender ; n : Number ; p : Person} ; -- agreement features
+```

-Use this API as a first approximation when designing the parameter system of a new
-language. 
+#NEW
+
+===The core API in Latin: linearization types===
+
+```
+lincat
+  Cl = {
+    s : Tense => Polarity => Str
+    } ;
+  VP  = {
+    verb  : Tense => Polarity => Agr => Str ;  -- finite verb
+    neg   : Polarity => Str ;                  -- negation
+    compl : Agr => Str                         -- complement
+    } ;
+  V2 = {
+    s : Tense => Number => Person => Str ; 
+    c : Case                                   -- complement case
+    } ;
+  NP = {
+    s : Case => Str ; 
+    a : Agr                                    -- agreement features
+    } ;
+  CN = {
+    s : Number => Case => Str ; 
+    g : Gender
+    } ;
+  Det = {
+    s : Gender => Case => Str ; 
+    n : Number
+    } ;
+  AP = {
+    s : Gender => Number => Case => Str
+    } ;
+```
+
+#NEW
+
+===The core API in Latin: predication and complementization===
+
+```
+lin
+  PredVP np vp = {
+    s = \\t,p => 
+      let
+        agr = np.a ;
+        subject = np.s ! Nom ;
+        object  = vp.compl ! agr ;
+        verb    = vp.neg ! p ++ vp.verb ! t ! p ! agr  
+      in                      
+      subject ++ object ++ verb
+    } ;
+
+  ComplV2 v np = {
+    verb  = \\t,p,a => v.s ! t ! a.n ! a.p ;
+    compl = \\_ => np.s ! v.c ;
+    neg   = table {Pos => [] ; Neg => "non"}
+    } ;
+```
+
+#NEW
+
+===The core API in Latin: determination and modification===
+
+```
+  DetCN det cn = 
+    let 
+      g = cn.g ; 
+      n = det.n
+    in {
+      s = \\c => det.s ! g ! c ++ cn.s ! n ! c ;
+      a = {g = g ; n = n ; p = P3}
+      } ;
+
+  ModCN ap cn = 
+    let 
+      g = cn.g 
+    in {
+      s = \\n,c => cn.s ! n ! c ++ ap.s ! g ! n ! c ;
+      g = g
+      } ;
+```


 #NEW