more in the runtime documentation

This commit is contained in:
Krasimir Angelov
2017-08-28 14:23:47 +02:00
parent 85417da2e3
commit a0fc2f28e8
2 changed files with 144 additions and 40 deletions

View File

@@ -1,6 +1,5 @@
<html>
<head>
<link rel="stylesheet" type="text/css" href="cloud.css" title="Cloud">
<style>
body { background: #eee; }
@@ -268,7 +267,7 @@ red theatre
</pre>
This method produces only a single linearization. If you use variants
in the grammar then you might want to see all possible linearizations.
For that purpouse you should use linearizeAll:
For that purpouse you should use <tt>linearizeAll</tt>:
<pre class="python">
>>> for s in eng.linearizeAll(e):
print(s)
@@ -294,8 +293,8 @@ then the right method to use is <tt>tabularLinearize</tt>:
{'s Sg Nom': 'red theatre', 's Pl Nom': 'red theatres', 's Pl Gen': "red theatres'", 's Sg Gen': "red theatre's"}
</pre>
<pre class="haskell">
Prelude PGF2> tabularLinearize eng e
{'s Sg Nom': 'red theatre', 's Pl Nom': 'red theatres', 's Pl Gen': "red theatres'", 's Sg Gen': "red theatre's"}
Prelude PGF2> tabularLinearize eng e ---- TODO
fromList [("s Sg Nom", "red theatre"), ("s Pl Nom", "red theatres"), ("s Pl Gen", "red theatres'"), ("s Sg Gen", "red theatre's")]
</pre>
<pre class="java">
for (Map.Entry&lt;String,String&gt; entry : eng.tabularLinearize(e)) {
@@ -316,20 +315,53 @@ a list of phrases:
(CN:4 (AP:1 (A:0 red)) (CN:3 (N:2 theatre)))
</pre>
<pre class="haskell">
Prelude PGF2> let [b] = bracketedLinearize eng e
Prelude PGF2> let [b] = bracketedLinearize eng e ---- TODO
Prelude PGF2> print b
(CN:4 (AP:1 (A:0 red)) (CN:3 (N:2 theatre)))
</pre>
<pre class="java">
Object[] bs = eng.bracketedLinearize(e)
</pre>
Each bracket is actually an object of type pgf.Bracket. The property
<tt>cat</tt> of the object gives you the name of the category and
the property children gives you a list of nested brackets.
If a phrase is discontinuous then it is represented as more than
one brackets with the same category name. In that case, the index
that you see in the example above will have the same value for all
brackets of the same phrase.
<span class="python">
Each element in the sequence above is either a string or an object
of type <tt>pgf.Bracket</tt>. When it is actually a bracket then
the object has the following properties:
<ul>
<li><tt>cat</tt> - the syntactic category for this bracket</li>
<li><tt>fid</tt> - an id which identifies this bracket in the bracketed string. If there are discontinuous phrases this id will be shared for all brackets belonging to the same phrase.</li>
<li><tt>lindex</tt> - the constituent index</li>
<li><tt>fun</tt> - the abstract function for this bracket</li>
<li><tt>children</tt> - a list with the children of this bracket</li>
</ul>
</span>
<span class="haskell">
The list above contains elements of type <tt>BracketedString</tt>.
This type has two constructors:
<ul>
<li><tt>Leaf</tt> with only one argument of type <tt>String</tt> that contains the current word</li>
<li><tt>Bracket</tt> with the following arguments:
<ul>
<li><tt>cat :: String</tt> - the syntactic category for this bracket</li>
<li><tt>fid :: Int</tt> - an id which identifies this bracket in the bracketed string. If there are discontinuous phrases this id will be shared for all brackets belonging to the same phrase.</li>
<li><tt>lindex :: Int</tt> - the constituent index</li>
<li><tt>fun :: String</tt> - the abstract function for this bracket</li>
<li><tt>children :: [BracketedString]</tt> - a list with the children of this bracket</li>
</ul>
</li>
</ul>
</span>
<span class="java">
Each element in the sequence above is either a string or an object
of type <tt>Bracket</tt>. When it is actually a bracket then
the object has the following public final variables:
<ul>
<li><tt>String cat</tt> - the syntactic category for this bracket</li>
<li><tt>int fid</tt> - an id which identifies this bracket in the bracketed string. If there are discontinuous phrases this id will be shared for all brackets belonging to the same phrase.</li>
<li><tt>int lindex</tt> - the constituent index</li>
<li><tt>String fun</tt> - the abstract function for this bracket</li>
<li><tt>Object[] children</tt> - a list with the children of this bracket</li>
</ul>
</span>
</p>
The linearization works even if there are functions in the tree
@@ -357,20 +389,45 @@ a tree into a function name and a list of arguments:
>>> e.unpack()
('AdjCN', [&lt;pgf.Expr object at 0x7f7df6db78c8&gt;, &lt;pgf.Expr object at 0x7f7df6db7878&gt;])
</pre>
<pre class="haskell">
Prelude PGF2> unApp e
Just ("AdjCN", [..., ...])
</pre>
</p>
<p>
<span class="python">
The result from unpack can be different depending on the form of the
tree. If the tree is a function application then you always get
a tuple of function name and a list of arguments. If instead the
a tuple of a function name and a list of arguments. If instead the
tree is just a literal string then the return value is the actual
literal. For example the result from:
</span>
<pre class="python">
>>> pgf.readExpr('"literal"').unpack()
'literal'
</pre>
is just the string 'literal'. Situations like this can be detected
<span class="haskell">
The result from <tt>unApp</tt> is <tt>Just</tt> if the expression
is an application and <tt>Nothing</tt> in all other cases.
Similarly, if the tree is a literal string then the return value
from <tt>unStr</tt> will be <tt>Just</tt> with the actual literal.
For example the result from:
</span>
<pre class="haskell">
Prelude PGF2> unStr (readExpr "\"literal\"")
"literal"
</pre>
is just the string "literal".
<span class="python">Situations like this can be detected
in Python by checking the type of the result from <tt>unpack</tt>.
It is also possible to get an integer or a floating point number
for the other possible literal types in GF.</span>
<span class="haskell">
There are also the functions <tt>unAbs</tt>, <tt>unInt</tt>, <tt>unFloat</tt> and <tt>unMeta</tt> for all other possible cases.
</span>
</p>
<span class="python">
<p>
For more complex analyses you can use the visitor pattern.
In object oriented languages this is just a clumpsy way to do
@@ -406,10 +463,12 @@ the current tree is <tt>DetCN</tt> or <tt>AdjCN</tt>
correspondingly. In this example we just print a message and
we call <tt>visit</tt> recursively to go deeper into the tree.
</p>
</span>
Constructing new trees is also easy. You can either use
<tt>readExpr</tt> to read trees from strings, or you can
construct new trees from existing pieces. This is possible by
<span class="python">
using the constructor for <tt>pgf.Expr</tt>:
<pre class="python">
>>> quant = pgf.readExpr("DetQuant IndefArt NumSg")
@@ -417,7 +476,18 @@ using the constructor for <tt>pgf.Expr</tt>:
>>> print(e2)
DetCN (DetQuant IndefArt NumSg) (AdjCN (PositA red_A) (UseN theatre_N))
</pre>
</span>
<span class="haskell">
using the functions <tt>mkApp</tt>, <tt>mkStr</tt>, <tt>mkInt</tt>, <tt>mkFloat</tt> and <tt>mkMeta</tt>:
<pre class="haskell">
Prelude PGF2> let Just quant = readExpr "DetQuant IndefArt NumSg"
Prelude PGF2> let e2 = mkApp "DetCN" [quant, e]
Prelude PGF2> print e2
DetCN (DetQuant IndefArt NumSg) (AdjCN (PositA red_A) (UseN theatre_N))
</pre>
</span>
<span class="python">
<h2>Embedded GF Grammars</h2>
The GF compiler allows for easy integration of grammars in Haskell
@@ -439,6 +509,7 @@ functions:
>>> print(App.DetCN(quant,e))
DetCN (DetQuant IndefArt NumSg) (AdjCN (PositA red_A) (UseN house_N))
</pre>
</span>
<h2>Access the Morphological Lexicon</h2>
@@ -447,18 +518,27 @@ lexicon. The first makes it possible to dump the full form lexicon.
The following code just iterates over the lexicon and prints each
word form with its possible analyses:
<pre class="python">
for entry in eng.fullFormLexicon():
print(entry)
>>> for entry in eng.fullFormLexicon():
>>> print(entry)
</pre>
<pre class="haskell">
Prelude PGF2> mapM_ print [(form,lemma,analysis,prob) | (form,analyses) &lt;- fullFormLexicon eng, (lemma,analysis,prob) &lt- analyses]
</pre>
<pre class="java">
for (entry in eng.fullFormLexicon()) {
System.out.println(entry);
for (FullFormEntry entry in eng.fullFormLexicon()) {
for (MorphoAnalysis analysis : entry.getAnalyses()) {
System.out.println(entry.getForm()+" "+analysis.getProb()+" "+analysis.getLemma()+" "+analysis.getField());
}
}
</pre>
The second one implements a simple lookup. The argument is a word
form and the result is a list of analyses:
<pre class="python">
print(eng.lookupMorpho("letter"))
>>> print(eng.lookupMorpho("letter"))
[('letter_1_N', 's Sg Nom', inf), ('letter_2_N', 's Sg Nom', inf)]
</pre>
<pre class="python">
Prelude PGF2> print (lookupMorpho eng "letter")
[('letter_1_N', 's Sg Nom', inf), ('letter_2_N', 's Sg Nom', inf)]
</pre>
<pre class="java">
@@ -588,6 +668,7 @@ Expr e = gr.checkExpr(e,Type.readType("A"))
pgf.TypeError: The expected type of the expression AdjCN (PositA red_A) (UseN theatre_N) is A but CN is infered
</pre></p>
<span class="python">
<h2>Partial Grammar Loading</h2>
<p>By default the whole grammar is compiled into a single file
@@ -600,12 +681,6 @@ This is done by using the option <tt>-split-pgf</tt> in the compiler:
<pre class="python">
$ gf -make -split-pgf App12.pgf
</pre>
<pre class="haskell">
$ gf -make -split-pgf App12.pgf
</pre>
<pre class="java">
$ gf -make -split-pgf App12.pgf
</pre>
</p>
Now you can load the grammar as usual but this time only the
@@ -616,10 +691,6 @@ concrete syntax objects:
>>> gr = pgf.readPGF("App.pgf")
>>> eng = gr.languages["AppEng"]
</pre>
<pre class="java">
PGF gr = PGF.readPGF("App.pgf")
Concr eng = gr.getLanguages().get("AppEng")
</pre>
However, if you now try to use the concrete syntax then you will
get an exception:
<pre class="python">
@@ -628,12 +699,6 @@ Traceback (most recent call last):
File "<stdin>", line 1, in <module>
pgf.PGFError: The concrete syntax is not loaded
</pre>
<pre class="java">
eng.lookupMorpho("letter")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
pgf.PGFError: The concrete syntax is not loaded
</pre>
Before using the concrete syntax, you need to explicitly load it:
<pre class="python">
@@ -641,6 +706,47 @@ Before using the concrete syntax, you need to explicitly load it:
>>> print(eng.lookupMorpho("letter"))
[('letter_1_N', 's Sg Nom', inf), ('letter_2_N', 's Sg Nom', inf)]
</pre>
When you don't need the language anymore then you can simply
unload it:
<pre class="python">
>>> eng.unload()
</pre>
</span>
<span class="java">
<h2>Partial Grammar Loading</h2>
<p>By default the whole grammar is compiled into a single file
which consists of an abstract syntax together will all concrete
languages. For large grammars with many languages this might be
inconvinient because loading becomes slower and the grammar takes
more memory. For that purpose you could split the grammar into
one file for the abstract syntax and one file for every concrete syntax.
This is done by using the option <tt>-split-pgf</tt> in the compiler:
<pre class="java">
$ gf -make -split-pgf App12.pgf
</pre>
</p>
Now you can load the grammar as usual but this time only the
abstract syntax will be loaded. You can still use the <tt>languages</tt>
property to get the list of languages and the corresponding
concrete syntax objects:
<pre class="java">
PGF gr = PGF.readPGF("App.pgf")
Concr eng = gr.getLanguages().get("AppEng")
</pre>
However, if you now try to use the concrete syntax then you will
get an exception:
<pre class="java">
eng.lookupMorpho("letter")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
pgf.PGFError: The concrete syntax is not loaded
</pre>
Before using the concrete syntax, you need to explicitly load it:
<pre class="java">
eng.load("AppEng.pgf_c")
for (MorphoAnalysis an : eng.lookupMorpho("letter")) {
@@ -652,12 +758,10 @@ letter_2_N, s Sg Nom, inf
When you don't need the language anymore then you can simply
unload it:
<pre class="python">
>>> eng.unload()
</pre>
<pre class="java">
eng.unload()
</pre>
</span>
<h2>GraphViz</h2>

View File

@@ -1990,7 +1990,7 @@ static PyMemberDef Bracket_members[] = {
{"fun", T_OBJECT_EX, offsetof(BracketObject, fun), 0,
"the abstract function for this bracket"},
{"fid", T_INT, offsetof(BracketObject, fid), 0,
"an unique id which identifies this bracket in the whole bracketed string"},
"an id which identifies this bracket in the bracketed string. If there are discontinuous phrases this id will be shared for all brackets belonging to the same phrase."},
{"lindex", T_INT, offsetof(BracketObject, lindex), 0,
"the constituent index"},
{"children", T_OBJECT_EX, offsetof(BracketObject, children), 0,