1
0
forked from GitHub/gf-core
Files
gf-core/doc/runtime-api.html
2017-09-01 08:11:16 +02:00

1288 lines
41 KiB
HTML

<html>
<head>
<style>
body { background: #eee; padding-top: 200px; }
pre.python {background-color:#ffc; display: none}
pre.haskell {background-color:#ffc; display: block}
pre.java {background-color:#ffc; display: none}
pre.csharp {background-color:#ffc; display: none}
span.python {display: none}
span.haskell {display: inline}
span.java {display: none}
span.csharp {display: none}
.header {
position: fixed;
top: 0;
left: 0;
background: #ddd;
width: 100%;
padding: 5pt;
border-bottom: solid #bbb 2pt;
}
</style>
<script lang="javascript">
function change_language(href) {
var name = href.split("#")[1];
if (name == null)
name = "haskell";
for (var s = 0; s < document.styleSheets.length; s++) {
var sheet = document.styleSheets[s];
if (sheet.href == null) {
var rules = sheet.cssRules ? sheet.cssRules : sheet.rules;
if (rules == null) return;
for (var i = 0; i < rules.length; i++) {
if (rules[i].selectorText.endsWith(name)) {
if (rules[i].selectorText.startsWith("pre"))
rules[i].style["display"] = "block";
else
rules[i].style["display"] = "inline";
} else if (rules[i].selectorText.startsWith("pre") || rules[i].selectorText.startsWith("span")) {
rules[i].style["display"] = "none";
}
}
}
}
}
</script>
</head>
<body onload="change_language(window.location.href); window.addEventListener('hashchange', function(e){change_language(window.location.href);});">
<span class="header">
<h1>Using the <span class="python">Python</span> <span class="haskell">Haskell</span> <span class="java">Java</span> <span class="csharp">C#</span> binding to the C runtime</h1>
Choose a language: <a href="#haskell">Haskell</a> <a href="#python">Python</a> <a href="#java">Java</a> <a href="#csharp">C#</a>
</span>
<h4>Krasimir Angelov, July 2015 - August 2017</h4>
<h2>Loading the Grammar</h2>
Before you use the <span class="python">Python</span> binding you need to import the <span class="haskell">PGF2 module</span><span class="python">pgf module</span><span class="java">pgf package</span><span class="csharp">PGFSharp package</span>:
<pre class="python">
>>> import pgf
</pre>
<pre class="haskell">
Prelude> import PGF2
</pre>
<pre class="java">
import org.grammaticalframework.pgf.*;
</pre>
<pre class="csharp">
using PGFSharp;
</pre>
<span class="python">Once you have the module imported, you can use the <tt>dir</tt> and
<tt>help</tt> functions to see what kind of functionality is available.
<tt>dir</tt> takes an object and returns a list of methods available
in the object:
<pre class="python">
>>> dir(pgf)
</pre>
<tt>help</tt> is a little bit more advanced and it tries
to produce more human readable documentation, which more over
contains comments:
<pre class="python">
>>> help(pgf)
</pre>
</span>
A grammar is loaded by calling <span class="python">the method pgf.readPGF</span><span class="haskell">the function readPGF</span><span class="java">the method PGF.readPGF</span><span class="csharp">the method PGF.ReadPGF</span>:
<pre class="python">
>>> gr = pgf.readPGF("App12.pgf")
</pre>
<pre class="haskell">
Prelude PGF2> gr &lt;- readPGF "App12.pgf"
</pre>
<pre class="java">
PGF gr = PGF.readPGF("App12.pgf");
</pre>
<pre class="csharp">
PGF gr = PGF.ReadPGF("App12.pgf");
</pre>
From the grammar you can query the set of available languages.
It is accessible through the property <tt>languages</tt> which
is a map from language name to an object of <span class="python">class <tt>pgf.Concr</tt></span><span class="haskell">type <tt>Concr</tt></span><span class="java">class <tt>Concr</tt></span><span class="csharp">class <tt>Concr</tt></span>
which respresents the language.
For example the following will extract the English language:
<pre class="python">
>>> eng = gr.languages["AppEng"]
>>> print(eng)
&lt;pgf.Concr object at 0x7f7dfa4471d0&gt;
</pre>
<pre class="haskell">
Prelude PGF2> let Just eng = Data.Map.lookup "AppEng" (languages gr)
Prelude PGF2> :t eng
eng :: Concr
</pre>
<pre class="java">
Concr eng = gr.getLanguages().get("AppEng");
</pre>
<pre class="csharp">
Concr eng = gr.Languages["AppEng"];
</pre>
<h2>Parsing</h2>
All language specific services are available as
<span class="python">methods of the class <tt>pgf.Concr</tt></span><span class="haskell">functions that take as an argument an object of type <tt>Concr</tt></span><span class="java">methods of the class <tt>Concr</tt></span><span class="csharp">methods of the class <tt>Concr</tt></span>.
For example to invoke the parser, you can call:
<pre class="python">
>>> i = eng.parse("this is a small theatre")
</pre>
<pre class="haskell">
Prelude PGF2> let res = parse eng (startCat gr) "this is a small theatre"
</pre>
<pre class="java">
Iterable&lt;ExprProb&gt; iterable = eng.parse(gr.getStartCat(), "this is a small theatre");
</pre>
<pre class="csharp">
IEnumerable&lt;Tuple&lt;Expr, float&gt;&gt; enumerable = eng.Parse("this is a small theatre");
</pre>
<span class="python">
This gives you an iterator which can enumerate all possible
abstract trees. You can get the next tree by calling <tt>next</tt>:
<pre class="python">
>>> p,e = i.next()
</pre>
or by calling __next__ if you are using Python 3:
<pre class="python">
>>> p,e = i.__next__()
</pre>
</span>
<span class="haskell">
This gives you a result of type <tt>Either String [(Expr, Float)]</tt>.
If the result is <tt>Left</tt> then the parser has failed and you will
get the token where the parser got stuck. If the parsing was successful
then you get a potentially infinite list of parse results:
<pre class="haskell">
Prelude PGF2> let Right ((e,p):rest) = res
</pre>
</span>
<span class="java">
This gives you an iterable which can enumerate all possible
abstract trees. You can get the next tree by calling <tt>next</tt>:
<pre class="java">
Iterator&lt;ExprProb&gt; iter = iterable.iterator();
ExprProb ep = iter.next();
</pre>
</span>
<span class="csharp">
This gives you an enumerable which can enumerate all possible
abstract trees. You can get the next tree by calling <tt>MoveNext</tt>:
<pre class="csharp">
IEnumerator&lt;Tuple&lt;Expr, float&gt;&gt; enumerator = enumerable.GetEnumerator();
enumerator.MoveNext();
Tuple&lt;Expr, float&gt; ep = enumerator.Current;
</pre>
</span>
<p>The results are pairs of probability and tree. The probabilities
are negated logarithmic probabilities and this means that the lowest
number encodes the most probable result. The possible trees are
returned in decreasing probability order (i.e. increasing negated logarithm).
The first tree should have the smallest <tt>p</tt>:
</p>
<pre class="python">
>>> print(p)
35.9166526794
</pre>
<pre class="haskell">
Prelude PGF2> print p
35.9166526794
</pre>
<pre class="java">
System.out.println(ep.getProb());
35.9166526794
</pre>
<pre class="csharp">
Console.WriteLine(ep.Item2);
35.9166526794
</pre>
and this is the corresponding abstract tree:
<pre class="python">
>>> print(e)
PhrUtt NoPConj (UttS (UseCl (TTAnt TPres ASimul) PPos (PredVP (DetNP (DetQuant this_Quant NumSg)) (UseComp (CompNP (DetCN (DetQuant IndefArt NumSg) (AdjCN (PositA small_A) (UseN theatre_N)))))))) NoVoc
</pre>
<pre class="haskell">
Prelude PGF2> print e
PhrUtt NoPConj (UttS (UseCl (TTAnt TPres ASimul) PPos (PredVP (DetNP (DetQuant this_Quant NumSg)) (UseComp (CompNP (DetCN (DetQuant IndefArt NumSg) (AdjCN (PositA small_A) (UseN theatre_N)))))))) NoVoc
</pre>
<pre class="java">
System.out.println(ep.getExpr());
PhrUtt NoPConj (UttS (UseCl (TTAnt TPres ASimul) PPos (PredVP (DetNP (DetQuant this_Quant NumSg)) (UseComp (CompNP (DetCN (DetQuant IndefArt NumSg) (AdjCN (PositA small_A) (UseN theatre_N)))))))) NoVoc
</pre>
<pre class="csharp">
Console.WriteLine(ep.Item1);
PhrUtt NoPConj (UttS (UseCl (TTAnt TPres ASimul) PPos (PredVP (DetNP (DetQuant this_Quant NumSg)) (UseComp (CompNP (DetCN (DetQuant IndefArt NumSg) (AdjCN (PositA small_A) (UseN theatre_N)))))))) NoVoc
</pre>
<p>Note that depending on the grammar it is absolutely possible that for
a single sentence you might get infinitely many trees.
In other cases the number of trees might be finite but still enormous.
The parser is specifically designed to be lazy, which means that
each tree is returned as soon as it is found before exhausting
the full search space. For grammars with a patological number of
trees it is advisable to pick only the top <tt>N</tt> trees
and to ignore the rest.</p>
<span class="python">
The <tt>parse</tt> method has also the following optional parameters:
<table border=1>
<tr><td>cat</td><td>start category</td></tr>
<tr><td>n</td><td>maximum number of trees</td></tr>
<tr><td>heuristics</td><td>a real number from 0 to 1</td></tr>
<tr><td>callbacks</td><td>a list of category and callback function</td></tr>
</table>
<p>By using these parameters it is possible for instance to change the start category for
the parser or to limit the number of trees returned from the parser. For example
parsing with a different start category can be done as follows:</p>
<pre class="python">
>>> i = eng.parse("a small theatre", cat=pgf.readType("NP"))
</pre>
</span>
<span class="haskell">
There is also the function <tt>parseWithHeuristics</tt> which
takes two more paramaters which let you to have a better control
over the parser's behaviour:
<pre class="haskell">
Prelude PGF2> let res = parseWithHeuristics eng (startCat gr) heuristic_factor callbacks
</pre>
</span>
<span class="java">
There is also the method <tt>parseWithHeuristics</tt> which
takes two more paramaters which let you to have a better control
over the parser's behaviour:
<pre class="java">
Iterable&lt;ExprProb&gt; iterable = eng.parseWithHeuristics(gr.startCat(), heuristic_factor, callbacks);
</pre>
</span>
<span class="csharp">
The <tt>Parse</tt> method has also the following optional parameters:
<table border=1>
<tr><td>cat</td><td>start category</td></tr>
<tr><td>heuristics</td><td>a real number from 0 to 1</td></tr>
</table>
<p>By using these parameters it is possible for instance to change the start category for
the parser. For example parsing with a different start category can be done as follows:</p>
<pre class="csharp">
IEnumerable&lt;Tuple&lt;Expr, float&gt;&gt; enumerable = eng.Parse("this is a small theatre", cat: Type.ReadType("NP"));
</pre>
</span>
<p>The heuristics factor can be used to trade parsing speed for quality.
By default the list of trees is sorted by probability and this corresponds
to factor 0.0. When we increase the factor then parsing becomes faster
but at the same time the sorting becomes imprecise. The worst
factor is 1.0. In any case the parser always returns the same set of
trees but in different order. Our experience is that even a factor
of about 0.6-0.8 with the translation grammar still orders
the most probable tree on top of the list but further down the list,
the trees become shuffled.
</p>
<p>
The callbacks is a list of functions that can be used for recognizing
literals. For example we use those for recognizing names and unknown
words in the translator.
</p>
<h2>Linearization</h2>
You can either linearize the result from the parser back to another
language, or you can explicitly construct a tree and then
linearize it in any language. For example, we can create
a new expression like this:
<pre class="python">
>>> e = pgf.readExpr("AdjCN (PositA red_A) (UseN theatre_N)")
</pre>
<pre class="haskell">
Prelude PGF2> let Just e = readExpr "AdjCN (PositA red_A) (UseN theatre_N)"
</pre>
<pre class="java">
Expr e = Expr.readExpr("AdjCN (PositA red_A) (UseN theatre_N)");
</pre>
<pre class="csharp">
Expr e = Expr.ReadExpr("AdjCN (PositA red_A) (UseN theatre_N)");
</pre>
and then we can linearize it:
<pre class="python">
>>> print(eng.linearize(e))
red theatre
</pre>
<pre class="haskell">
Prelude PGF2> putStrLn (linearize eng e)
red theatre
</pre>
<pre class="java">
System.out.println(eng.linearize(e));
red theatre
</pre>
<pre class="csharp">
Console.WriteLine(eng.Linearize(e));
red theatre
</pre>
This method produces only a single linearization. If you use variants
in the grammar then you might want to see all possible linearizations.
For that purpouse you should use <tt>linearizeAll</tt>:
<pre class="python">
>>> for s in eng.linearizeAll(e):
print(s)
red theatre
red theater
</pre>
<pre class="haskell">
Prelude PGF2> mapM_ putStrLn (linearizeAll eng e)
red theatre
red theater
</pre>
<pre class="java">
for (String s : eng.linearizeAll(e)) {
System.out.println(s);
}
red theatre
red theater
</pre>
<pre class="csharp">
foreach (String s in eng.LinearizeAll(e)) {
Console.WriteLine(s);
}
red theatre
red theater
</pre>
If, instead, you need an inflection table with all possible forms
then the right method to use is <tt>tabularLinearize</tt>:
<pre class="python">
>>> eng.tabularLinearize(e):
{'s Sg Nom': 'red theatre', 's Pl Nom': 'red theatres', 's Pl Gen': "red theatres'", 's Sg Gen': "red theatre's"}
</pre>
<pre class="haskell">
Prelude PGF2> tabularLinearize eng e
[("s Sg Nom","red theatre"),("s Sg Gen","red theatre's"),("s Pl Nom","red theatres"),("s Pl Gen","red theatres'")]
</pre>
<pre class="java">
for (Map.Entry&lt;String,String&gt; entry : eng.tabularLinearize(e).entrySet()) {
System.out.println(entry.getKey() + ": " + entry.getValue());
}
s Sg Nom: red theatre
s Pl Nom: red theatres
s Pl Gen: red theatres'
s Sg Gen: red theatre's
</pre>
<pre class="csharp">
foreach (Map.Entry&lt;String,String&gt; entry in eng.TabularLinearize(e).EntrySet()) { //// TODO
Console.WriteLine(entry.Key + ": " + entry.Value);
}
s Sg Nom: red theatre
s Pl Nom: red theatres
s Pl Gen: red theatres'
s Sg Gen: red theatre's
</pre>
<p>
Finally, you could also get a linearization which is bracketed into
a list of phrases:
<pre class="python">
>>> [b] = eng.bracketedLinearize(e)
>>> print(b)
(CN:4 (AP:1 (A:0 red)) (CN:3 (N:2 theatre)))
</pre>
<pre class="haskell">
Prelude PGF2> let [b] = bracketedLinearize eng e
Prelude PGF2> putStrLn (showBracketedString b)
(CN:4 (AP:1 (A:0 red)) (CN:3 (N:2 theatre)))
</pre>
<pre class="java">
Object[] bs = eng.bracketedLinearize(e);
</pre>
<pre class="csharp">
Bracket b = eng.BracketedLinearize(e);
</pre>
<span class="python">
Each element in the sequence above is either a string or an object
of type <tt>pgf.Bracket</tt>. When it is actually a bracket then
the object has the following properties:
<ul>
<li><tt>cat</tt> - the syntactic category for this bracket</li>
<li><tt>fid</tt> - an id which identifies this bracket in the bracketed string. If there are discontinuous phrases this id will be shared for all brackets belonging to the same phrase.</li>
<li><tt>lindex</tt> - the constituent index</li>
<li><tt>fun</tt> - the abstract function for this bracket</li>
<li><tt>children</tt> - a list with the children of this bracket</li>
</ul>
</span>
<span class="haskell">
The list above contains elements of type <tt>BracketedString</tt>.
This type has two constructors:
<ul>
<li><tt>Leaf</tt> with only one argument of type <tt>String</tt> that contains the current word</li>
<li><tt>Bracket</tt> with the following arguments:
<ul>
<li><tt>cat :: String</tt> - the syntactic category for this bracket</li>
<li><tt>fid :: Int</tt> - an id which identifies this bracket in the bracketed string. If there are discontinuous phrases this id will be shared for all brackets belonging to the same phrase.</li>
<li><tt>lindex :: Int</tt> - the constituent index</li>
<li><tt>fun :: String</tt> - the abstract function for this bracket</li>
<li><tt>children :: [BracketedString]</tt> - a list with the children of this bracket</li>
</ul>
</li>
</ul>
</span>
<span class="java">
Each element in the sequence above is either a string or an object
of type <tt>Bracket</tt>. When it is actually a bracket then
the object has the following public final variables:
<ul>
<li><tt>String cat</tt> - the syntactic category for this bracket</li>
<li><tt>int fid</tt> - an id which identifies this bracket in the bracketed string. If there are discontinuous phrases this id will be shared for all brackets belonging to the same phrase.</li>
<li><tt>int lindex</tt> - the constituent index</li>
<li><tt>String fun</tt> - the abstract function for this bracket</li>
<li><tt>Object[] children</tt> - a list with the children of this bracket</li>
</ul>
</span>
<span class="csharp">
Each element in the sequence above is either a string or an object
of type <tt>Bracket</tt>. When it is actually a bracket then
the object has the following public final variables:
<ul>
<li><tt>String cat</tt> - the syntactic category for this bracket</li>
<li><tt>int fid</tt> - an id which identifies this bracket in the bracketed string. If there are discontinuous phrases this id will be shared for all brackets belonging to the same phrase.</li>
<li><tt>int lindex</tt> - the constituent index</li>
<li><tt>String fun</tt> - the abstract function for this bracket</li>
<li><tt>Object[] children</tt> - a list with the children of this bracket</li>
</ul>
</span>
</p>
The linearization works even if there are functions in the tree
that doesn't have linearization definitions. In that case you
will just see the name of the function in the generated string.
It is sometimes helpful to be able to see whether a function
is linearizable or not. This can be done in this way:
<pre class="python">
>>> print(eng.hasLinearization("apple_N"))
True
</pre>
<pre class="haskell">
Prelude PGF2> print (hasLinearization eng "apple_N")
True
</pre>
<pre class="java">
System.out.println(eng.hasLinearization("apple_N"));
true
</pre>
<pre class="csharp">
Console.WriteLine(eng.HasLinearization("apple_N")); //// TODO
true
</pre>
<h2>Analysing and Constructing Expressions</h2>
<p>
An already constructed tree can be analyzed and transformed
in the host application. For example you can deconstruct
a tree into a function name and a list of arguments:
<pre class="python">
>>> e.unpack()
('AdjCN', [&lt;pgf.Expr object at 0x7f7df6db78c8&gt;, &lt;pgf.Expr object at 0x7f7df6db7878&gt;])
</pre>
<pre class="haskell">
Prelude PGF2> unApp e
Just ("AdjCN", [..., ...])
</pre>
<pre class="java">
ExprApplication app = e.unApp();
System.out.println(app.getFunction());
for (Expr arg : app.getArguments()) {
System.out.println(arg);
}
</pre>
<pre class="csharp">
ExprApplication app = e.UnApp();
System.out.println(app.Function);
foreach (Expr arg in app.Arguments) {
Console.WriteLine(arg);
}
</pre>
</p>
<p>
<span class="python">
The result from unpack can be different depending on the form of the
tree. If the tree is a function application then you always get
a tuple of a function name and a list of arguments. If instead the
tree is just a literal string then the return value is the actual
literal. For example the result from:
</span>
<pre class="python">
>>> pgf.readExpr('"literal"').unpack()
'literal'
</pre>
<span class="haskell">
The result from <tt>unApp</tt> is <tt>Just</tt> if the expression
is an application and <tt>Nothing</tt> in all other cases.
Similarly, if the tree is a literal string then the return value
from <tt>unStr</tt> will be <tt>Just</tt> with the actual literal.
For example the result from:
</span>
<pre class="haskell">
Prelude PGF2> readExpr "\"literal\"" >>= unStr
"literal"
</pre>
<span class="java">
The result from <tt>unApp</tt> is not <tt>null</tt> if the expression
is an application, and <tt>null</tt> in all other cases.
Similarly, if the tree is a literal string then the return value
from <tt>unStr</tt> will not be <tt>null</tt> with the actual literal.
For example the output from:
</span>
<pre class="java">
Expr elit = Expr.readExpr("\"literal\"");
System.out.println(elit.unStr());
</pre>
<span class="csharp">
The result from <tt>UnApp</tt> is not <tt>null</tt> if the expression
is an application, and <tt>null</tt> in all other cases.
Similarly, if the tree is a literal string then the return value
from <tt>UnStr</tt> will not be <tt>null</tt> with the actual literal.
For example the output from:
</span>
<pre class="csharp">
Expr elit = Expr.ReadExpr("\"literal\"");
Console.WriteLine(elit.UnStr());
</pre>
is just the string "literal".
<span class="python">Situations like this can be detected
in Python by checking the type of the result from <tt>unpack</tt>.
It is also possible to get an integer or a floating point number
for the other possible literal types in GF.</span>
<span class="haskell">
There are also the functions <tt>unAbs</tt>, <tt>unInt</tt>, <tt>unFloat</tt> and <tt>unMeta</tt> for all other possible cases.
</span>
<span class="java">
There are also the methods <tt>unAbs</tt>, <tt>unInt</tt>, <tt>unFloat</tt> and <tt>unMeta</tt> for all other possible cases.
</span>
<span class="csharp">
There are also the methods <tt>UnAbs</tt>, <tt>UnInt</tt>, <tt>UnFloat</tt> and <tt>UnMeta</tt> for all other possible cases.
</span>
</p>
Constructing new trees is also easy. You can either use
<tt>readExpr</tt> to read trees from strings, or you can
construct new trees from existing pieces. This is possible by
<span class="python">
using the constructor for <tt>pgf.Expr</tt>:
<pre class="python">
>>> quant = pgf.readExpr("DetQuant IndefArt NumSg")
>>> e2 = pgf.Expr("DetCN", [quant, e])
>>> print(e2)
DetCN (DetQuant IndefArt NumSg) (AdjCN (PositA red_A) (UseN theatre_N))
</pre>
</span>
<span class="haskell">
using the functions <tt>mkApp</tt>, <tt>mkStr</tt>, <tt>mkInt</tt>, <tt>mkFloat</tt> and <tt>mkMeta</tt>:
<pre class="haskell">
Prelude PGF2> let Just quant = readExpr "DetQuant IndefArt NumSg"
Prelude PGF2> let e2 = mkApp "DetCN" [quant, e]
Prelude PGF2> print e2
DetCN (DetQuant IndefArt NumSg) (AdjCN (PositA red_A) (UseN theatre_N))
</pre>
</span>
<span class="java">
using the constructor for <tt>Expr</tt>:
<pre class="java">
Expr quant = Expr.readExpr("DetQuant IndefArt NumSg");
Expr e2 = new Expr("DetCN", new Expr[] {quant, e});
System.out.println(e2);
</pre>
</span>
<span class="csharp">
using the constructor for <tt>Expr</tt>:
<pre class="csharp">
Expr quant = Expr.ReadExpr("DetQuant IndefArt NumSg");
Expr e2 = new Expr("DetCN", new Expr[] {quant, e});
Console.WriteLine(e2);
</pre>
</span>
<h2>Embedded GF Grammars</h2>
<p>If the host application needs to do a lot of expression manipulations,
then it is helpful to use a higher-level API to the grammar,
also known as "embedded grammars" in GF. The advantage is that
you can construct and analyze expressions in a more compact way.</p>
<span class="python">
<p>In Python you first have to <tt>embed</tt> the grammar by calling:
<pre class="python">
>>> gr.embed("App")
&lt;module 'App' (built-in)&gt;
</pre>
After that whenever you need the API you should import the module:
<pre class="python">
>>> import App
</pre>
</p>
<p>Now creating new trees is just a matter of calling ordinary Python
functions:
<pre class="python">
>>> print(App.DetCN(quant,e))
DetCN (DetQuant IndefArt NumSg) (AdjCN (PositA red_A) (UseN house_N))
</pre>
</p>
</span>
<span class="haskell">
<p>In order to access the API you first need to generate
one boilerplate Haskell module with the compiler:
<pre class="haskell">
$ gf -make -output-format=haskell App.pgf
</pre>
This module will expose all functions in the abstract syntax
as data type constructors together with methods for conversion from
a generic expression to Haskell data and vice versa. When you need the API you can just import the module:
<pre class="haskell">
Prelude PGF2> import App
</pre>
</p>
<p>Now creating new trees is just a matter of writing ordinary Haskell
code:
<pre class="haskell">
Prelude PGF2 App> print (gf (GDetCN (GDetQuant GIndefArt GNumSg) (GAdjCN (GPositA Gred_A) (GUseN Ghouse_N))))
</pre>
The only difference is that to the name of every abstract syntax function
the compiler adds a capital 'G' in order to guarantee that there are no conflicts
and that all names are valid names for Haskell data constructors. Here <tt>gf</tt> is a function
which converts from the data type representation to generic GF expressions.</p>
<p>The converse function <tt>fg</tt> converts an expression to a data type expression.
This is useful for instance if you want to do pattern matching
on the structure of the expression:
<pre class="haskell">
visit = case fg e2 of
GDetCN quant cn -> do putStrLn "Found DetCN"
visit cn
GAdjCN adj cn -> do putStrLn "Found AdjCN"
visit cn
e -> return ()
</pre>
</p>
</span>
<span class="java">
<p>In order to access the API you first need to generate
one boilerplate Java class with the compiler:
<pre class="java">
$ gf -make -output-format=java App.pgf
</pre>
This class will expose all functions in the abstract syntax
as methods. Now creating new trees is just a matter of writing ordinary Java
code:
<pre class="java">
System.out.println(App.DetCN(quant, cn));
</pre>
If the grammar name is too long to write it in front of every function
name then you can create an instance with a shorter name:
<pre class="java">
App a = new App();
System.out.println(a.DetCN(quant, cn));
</pre>
</p>
</span>
<span class="csharp">
<p>In C# you first have to <tt>embed</tt> the grammar by calling:
<pre class="csharp">
dynamic g = gr.Embed()
</pre>
</p>
<p>Now creating new trees is just a matter of calling ordinary C#
methods:
<pre class="csharp">
Console.WriteLine(g.DetCN(quant,e))
DetCN (DetQuant IndefArt NumSg) (AdjCN (PositA red_A) (UseN house_N))
</pre>
</p>
</span>
<span class="python">
<p>
Analysing expressions is also made easier by using the visitor pattern.
In object oriented languages this is a clumpsy way to do
what is called pattern matching in most functional languages.
You need to define a class which has one method for each function
in the abstract syntax that you want to handle. If the functions is called
<tt>f</tt> then you need a method called <tt>on_f</tt>. The method
will be called each time when the corresponding function is encountered,
and its arguments will be the arguments from the original tree.
If there is no matching method name then the runtime will
call the method <tt>default</tt>. The following is an example:
<pre class="python">
>>> class ExampleVisitor:
def on_DetCN(self,quant,cn):
print("Found DetCN")
cn.visit(self)
def on_AdjCN(self,adj,cn):
print("Found AdjCN")
cn.visit(self)
def default(self,e):
pass
>>> e2.visit(ExampleVisitor())
Found DetCN
Found AdjCN
</pre>
Here we call the method <tt>visit</tt> from the tree e2 and we give
it, as parameter, an instance of class <tt>ExampleVisitor</tt>.
<tt>ExampleVisitor</tt> has two methods <tt>on_DetCN</tt>
and <tt>on_AdjCN</tt> which are called when the top function of
the current tree is <tt>DetCN</tt> or <tt>AdjCN</tt>
correspondingly. In this example we just print a message and
we call <tt>visit</tt> recursively to go deeper into the tree.
</p>
</span>
<span class="java">
<p>
Analysing expressions is also made easier by using the visitor pattern.
In object oriented languages this is a clumpsy way to do
what is called pattern matching in most functional languages.
You need to define a class which has one method for each function
in the abstract syntax that you want to handle. If the functions is called
<tt>f</tt> then you need a method called <tt>on_f</tt>. The method
will be called each time when the corresponding function is encountered,
and its arguments will be the arguments from the original tree.
If there is no matching method name then the runtime will
call the method <tt>defaultCase</tt>. The following is an example:
<pre class="java">
e2.visit(new Object() {
public void on_DetCN(Expr quant, Expr cn) {
System.out.println("found DetCN");
cn.visit(this);
}
public void on_AdjCN(Expr adj, Expr cn) {
System.out.println("found AdjCN");
cn.visit(this);
}
public void defaultCase(Expr e) {
System.out.println("found "+e);
}
});
Found DetCN
Found AdjCN
</pre>
Here we call the method <tt>visit</tt> from the tree e2 and we give
it, as parameter, an instance of a class with two methods <tt>on_DetCN</tt>
and <tt>on_AdjCN</tt> which are called when the top function of
the current tree is <tt>DetCN</tt> or <tt>AdjCN</tt>
correspondingly. In this example we just print a message and
we call <tt>visit</tt> recursively to go deeper into the tree.
</p>
</span>
<h2>Access the Morphological Lexicon</h2>
There are two methods that gives you direct access to the morphological
lexicon. The first makes it possible to dump the full form lexicon.
The following code just iterates over the lexicon and prints each
word form with its possible analyses:
<pre class="python">
>>> for entry in eng.fullFormLexicon():
>>> print(entry)
</pre>
<pre class="haskell">
Prelude PGF2> mapM_ print [(form,lemma,analysis,prob) | (form,analyses) &lt;- fullFormLexicon eng, (lemma,analysis,prob) &lt- analyses]
</pre>
<pre class="java">
for (FullFormEntry entry : eng.fullFormLexicon()) {
for (MorphoAnalysis analysis : entry.getAnalyses()) {
System.out.println(entry.getForm()+" "+analysis.getProb()+" "+analysis.getLemma()+" "+analysis.getField());
}
}
</pre>
<pre class="csharp">
foreach (FullFormEntry entry in eng.FullFormLexicon) { //// TODO
foreach (MorphoAnalysis analysis in entry.Analyses) {
Console.WriteLine(entry.Form+" "+analysis.Prob+" "+analysis.Lemma+" "+analysis.Field);
}
}
</pre>
The second one implements a simple lookup. The argument is a word
form and the result is a list of analyses:
<pre class="python">
>>> print(eng.lookupMorpho("letter"))
[('letter_1_N', 's Sg Nom', inf), ('letter_2_N', 's Sg Nom', inf)]
</pre>
<pre class="haskell">
Prelude PGF2> print (lookupMorpho eng "letter")
[('letter_1_N', 's Sg Nom', inf), ('letter_2_N', 's Sg Nom', inf)]
</pre>
<pre class="java">
for (MorphoAnalysis an : eng.lookupMorpho("letter")) {
System.out.println(an.getLemma()+", "+an.getField()+", "+an.getProb());
}
letter_1_N, s Sg Nom, inf
letter_2_N, s Sg Nom, inf
</pre>
<pre class="csharp">
foreach (MorphoAnalysis an in eng.LookupMorpho("letter")) { //// TODO
Console.WriteLine(an.Lemma+", "+an.Field+", "+an.Prob);
}
letter_1_N, s Sg Nom, inf
letter_2_N, s Sg Nom, inf
</pre>
<h2>Access the Abstract Syntax</h2>
There is a simple API for accessing the abstract syntax. For example,
you can get a list of abstract functions:
<pre class="python">
>>> gr.functions
....
</pre>
<pre class="haskell">
Prelude PGF2> functions gr
....
</pre>
<pre class="java">
List&lt;String&gt; funs = gr.getFunctions()
....
</pre>
<pre class="csharp">
IEnumerable&lt;String&gt; funs = gr.Functions;
....
</pre>
or a list of categories:
<pre class="python">
>>> gr.categories
....
</pre>
<pre class="haskell">
Prelude PGF2> categories gr
....
</pre>
<pre class="java">
List&lt;String&gt; cats = gr.getCategories();
....
</pre>
<pre class="csharp">
IEnumerable&lt;String&gt; cats = gr.Categories;
....
</pre>
You can also access all functions with the same result category:
<pre class="python">
>>> gr.functionsByCat("Weekday")
['friday_Weekday', 'monday_Weekday', 'saturday_Weekday', 'sunday_Weekday', 'thursday_Weekday', 'tuesday_Weekday', 'wednesday_Weekday']
</pre>
<pre class="haskell">
Prelude PGF2> functionsByCat gr "Weekday"
['friday_Weekday', 'monday_Weekday', 'saturday_Weekday', 'sunday_Weekday', 'thursday_Weekday', 'tuesday_Weekday', 'wednesday_Weekday']
</pre>
<pre class="java">
List&lt;String&gt; funsByCat = gr.getFunctionsByCat("Weekday");
....
</pre>
<pre class="csharp">
IEnumerable&lt;String&gt; funsByCat = gr.FunctionsByCat("Weekday");
....
</pre>
The full type of a function can be retrieved as:
<pre class="python">
>>> print(gr.functionType("DetCN"))
Det -> CN -> NP
</pre>
<pre class="haskell">
Prelude PGF2> print (functionType gr "DetCN")
Det -> CN -> NP
</pre>
<pre class="java">
System.out.println(gr.getFunctionType("DetCN"));
Det -> CN -> NP
</pre>
<pre class="csharp">
Console.WriteLine(gr.FunctionType("DetCN"));
Det -> CN -> NP
</pre>
<h2>Type Checking Abstract Trees</h2>
<p>The runtime type checker can do type checking and type inference
for simple types. Dependent types are still not fully implemented
in the current runtime. The inference is done with method <tt>inferExpr</tt>:
<pre class="python">
>>> e,ty = gr.inferExpr(e)
>>> print(e)
AdjCN (PositA red_A) (UseN theatre_N)
>>> print(ty)
CN
</pre>
<pre class="haskell">
Prelude PGF2> let Right (e',ty) = inferExpr gr e
Prelude PGF2> print e'
AdjCN (PositA red_A) (UseN theatre_N)
Prelude PGF2> print ty
CN
</pre>
<pre class="java">
TypedExpr te = gr.inferExpr(e);
System.out.println(te.getExpr()+" : "+te.getType());
AdjCN (PositA red_A) (UseN theatre_N) : CN
</pre>
<pre class="csharp">
TypedExpr te = gr.InferExpr(e); //// TODO
Console.WriteLine(te.Expr+" : "+te.Type);
AdjCN (PositA red_A) (UseN theatre_N) : CN
</pre>
The result is a potentially updated expression and its type. In this
case we always deal with simple types, which means that the new
expression will be always equal to the original expression. However, this
wouldn't be true when dependent types are added.
</p>
<p>Type checking is also trivial:
<pre class="python">
>>> e = gr.checkExpr(e,pgf.readType("CN"))
>>> print(e)
AdjCN (PositA red_A) (UseN theatre_N)
</pre>
<pre class="haskell">
Prelude PGF2> let Just ty = readType "CN"
Prelude PGF2> let Right e' = checkExpr gr e ty
Prelude PGF2> print e'
AdjCN (PositA red_A) (UseN theatre_N)
</pre>
<pre class="java">
Expr new_e = gr.checkExpr(e,Type.readType("CN"));
System.out.println(e)
</pre>
<pre class="csharp">
Expr new_e = gr.CheckExpr(e,Type.ReadType("CN")); //// TODO
Console.WriteLine(e)
</pre>
<p>In case of type error you will get an error:
<pre class="python">
>>> e = gr.checkExpr(e,pgf.readType("A"))
pgf.TypeError: The expected type of the expression AdjCN (PositA red_A) (UseN theatre_N) is A but CN is infered
</pre>
<pre class="haskell">
Prelude PGF2> let Just ty = readType "A"
Prelude PGF2> let Left msg = checkExpr gr e ty
Prelude PGF2> putStrLn msg
</pre>
<pre class="java">
Expr e = gr.checkExpr(e,Type.readType("A"))
TypeError: The expected type of the expression AdjCN (PositA red_A) (UseN theatre_N) is A but CN is infered
</pre></p>
<span class="python">
<h2>Partial Grammar Loading</h2>
<p>By default the whole grammar is compiled into a single file
which consists of an abstract syntax together will all concrete
languages. For large grammars with many languages this might be
inconvinient because loading becomes slower and the grammar takes
more memory. For that purpose you could split the grammar into
one file for the abstract syntax and one file for every concrete syntax.
This is done by using the option <tt>-split-pgf</tt> in the compiler:
<pre class="python">
$ gf -make -split-pgf App12.pgf
</pre>
</p>
Now you can load the grammar as usual but this time only the
abstract syntax will be loaded. You can still use the <tt>languages</tt>
property to get the list of languages and the corresponding
concrete syntax objects:
<pre class="python">
>>> gr = pgf.readPGF("App.pgf")
>>> eng = gr.languages["AppEng"]
</pre>
However, if you now try to use the concrete syntax then you will
get an exception:
<pre class="python">
>>> eng.lookupMorpho("letter")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
pgf.PGFError: The concrete syntax is not loaded
</pre>
Before using the concrete syntax, you need to explicitly load it:
<pre class="python">
>>> eng.load("AppEng.pgf_c")
>>> print(eng.lookupMorpho("letter"))
[('letter_1_N', 's Sg Nom', inf), ('letter_2_N', 's Sg Nom', inf)]
</pre>
When you don't need the language anymore then you can simply
unload it:
<pre class="python">
>>> eng.unload()
</pre>
</span>
<span class="java">
<h2>Partial Grammar Loading</h2>
<p>By default the whole grammar is compiled into a single file
which consists of an abstract syntax together will all concrete
languages. For large grammars with many languages this might be
inconvinient because loading becomes slower and the grammar takes
more memory. For that purpose you could split the grammar into
one file for the abstract syntax and one file for every concrete syntax.
This is done by using the option <tt>-split-pgf</tt> in the compiler:
<pre class="java">
$ gf -make -split-pgf App12.pgf
</pre>
</p>
Now you can load the grammar as usual but this time only the
abstract syntax will be loaded. You can still use the <tt>languages</tt>
property to get the list of languages and the corresponding
concrete syntax objects:
<pre class="java">
PGF gr = PGF.readPGF("App.pgf")
Concr eng = gr.getLanguages().get("AppEng")
</pre>
However, if you now try to use the concrete syntax then you will
get an exception:
<pre class="java">
eng.lookupMorpho("letter")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
pgf.PGFError: The concrete syntax is not loaded
</pre>
Before using the concrete syntax, you need to explicitly load it:
<pre class="java">
eng.load("AppEng.pgf_c")
for (MorphoAnalysis an : eng.lookupMorpho("letter")) {
System.out.println(an.getLemma()+", "+an.getField()+", "+an.getProb());
}
letter_1_N, s Sg Nom, inf
letter_2_N, s Sg Nom, inf
</pre>
When you don't need the language anymore then you can simply
unload it:
<pre class="java">
eng.unload()
</pre>
</span>
<h2>GraphViz</h2>
<p>GraphViz is used for visualizing abstract syntax trees and parse trees.
In both cases the result is a GraphViz code that can be used for
rendering the trees. See the examples bellow:</p>
<pre class="python">
>>> print(gr.graphvizAbstractTree(e))
graph {
n0[label = "AdjCN", style = "solid", shape = "plaintext"]
n1[label = "PositA", style = "solid", shape = "plaintext"]
n2[label = "red_A", style = "solid", shape = "plaintext"]
n1 -- n2 [style = "solid"]
n0 -- n1 [style = "solid"]
n3[label = "UseN", style = "solid", shape = "plaintext"]
n4[label = "theatre_N", style = "solid", shape = "plaintext"]
n3 -- n4 [style = "solid"]
n0 -- n3 [style = "solid"]
}
</pre>
<pre class="haskell">
Prelude PGF2> putStrLn (graphvizAbstractTree gr graphvizDefaults e)
graph {
n0[label = "AdjCN", style = "solid", shape = "plaintext"]
n1[label = "PositA", style = "solid", shape = "plaintext"]
n2[label = "red_A", style = "solid", shape = "plaintext"]
n1 -- n2 [style = "solid"]
n0 -- n1 [style = "solid"]
n3[label = "UseN", style = "solid", shape = "plaintext"]
n4[label = "theatre_N", style = "solid", shape = "plaintext"]
n3 -- n4 [style = "solid"]
n0 -- n3 [style = "solid"]
}
</pre>
<pre class="java">
System.out.println(gr.graphvizAbstractTree(e));
graph {
n0[label = "AdjCN", style = "solid", shape = "plaintext"]
n1[label = "PositA", style = "solid", shape = "plaintext"]
n2[label = "red_A", style = "solid", shape = "plaintext"]
n1 -- n2 [style = "solid"]
n0 -- n1 [style = "solid"]
n3[label = "UseN", style = "solid", shape = "plaintext"]
n4[label = "theatre_N", style = "solid", shape = "plaintext"]
n3 -- n4 [style = "solid"]
n0 -- n3 [style = "solid"]
}
</pre>
<pre class="csharp">
Console.WriteLine(gr.GraphvizAbstractTree(e)); //// TODO
graph {
n0[label = "AdjCN", style = "solid", shape = "plaintext"]
n1[label = "PositA", style = "solid", shape = "plaintext"]
n2[label = "red_A", style = "solid", shape = "plaintext"]
n1 -- n2 [style = "solid"]
n0 -- n1 [style = "solid"]
n3[label = "UseN", style = "solid", shape = "plaintext"]
n4[label = "theatre_N", style = "solid", shape = "plaintext"]
n3 -- n4 [style = "solid"]
n0 -- n3 [style = "solid"]
}
</pre>
<pre class="python">
>>> print(eng.graphvizParseTree(e))
graph {
node[shape=plaintext]
subgraph {rank=same;
n4[label="CN"]
}
subgraph {rank=same;
edge[style=invis]
n1[label="AP"]
n3[label="CN"]
n1 -- n3
}
n4 -- n1
n4 -- n3
subgraph {rank=same;
edge[style=invis]
n0[label="A"]
n2[label="N"]
n0 -- n2
}
n1 -- n0
n3 -- n2
subgraph {rank=same;
edge[style=invis]
n100000[label="red"]
n100001[label="theatre"]
n100000 -- n100001
}
n0 -- n100000
n2 -- n100001
}
</pre>
<pre class="haskell">
Prelude PGF2> putStrLn (graphvizParseTree eng graphvizDefaults e)
graph {
node[shape=plaintext]
subgraph {rank=same;
n4[label="CN"]
}
subgraph {rank=same;
edge[style=invis]
n1[label="AP"]
n3[label="CN"]
n1 -- n3
}
n4 -- n1
n4 -- n3
subgraph {rank=same;
edge[style=invis]
n0[label="A"]
n2[label="N"]
n0 -- n2
}
n1 -- n0
n3 -- n2
subgraph {rank=same;
edge[style=invis]
n100000[label="red"]
n100001[label="theatre"]
n100000 -- n100001
}
n0 -- n100000
n2 -- n100001
}
</pre>
<pre class="java">
System.out.println(eng.graphvizParseTree(e));
graph {
node[shape=plaintext]
subgraph {rank=same;
n4[label="CN"]
}
subgraph {rank=same;
edge[style=invis]
n1[label="AP"]
n3[label="CN"]
n1 -- n3
}
n4 -- n1
n4 -- n3
subgraph {rank=same;
edge[style=invis]
n0[label="A"]
n2[label="N"]
n0 -- n2
}
n1 -- n0
n3 -- n2
subgraph {rank=same;
edge[style=invis]
n100000[label="red"]
n100001[label="theatre"]
n100000 -- n100001
}
n0 -- n100000
n2 -- n100001
}
</pre>
<pre class="csharp">
Console.WriteLine(eng.GraphvizParseTree(e)); //// TODO
graph {
node[shape=plaintext]
subgraph {rank=same;
n4[label="CN"]
}
subgraph {rank=same;
edge[style=invis]
n1[label="AP"]
n3[label="CN"]
n1 -- n3
}
n4 -- n1
n4 -- n3
subgraph {rank=same;
edge[style=invis]
n0[label="A"]
n2[label="N"]
n0 -- n2
}
n1 -- n0
n3 -- n2
subgraph {rank=same;
edge[style=invis]
n100000[label="red"]
n100001[label="theatre"]
n100000 -- n100001
}
n0 -- n100000
n2 -- n100001
}
</pre>
</body>
</html>