From 270ba6887febd4d29cc91071b60acf4ed4c6392d Mon Sep 17 00:00:00 2001
From: krasimir The heuristics factor can be used to trade parsing speed for quality.
+By default the list of trees is sorted by probability this corresponds
+to factor 0.0. When we increase the factor then parsing becomes faster
+but at the same time the sorting becomes imprecise. The worst
+factor is 1.0. In any case the parser always returns the same set of
+trees but in different order. Our experience is that even a factor
+of about 0.6-0.8 with the translation grammar, still orders
+the most probable tree on top of the list but further down the list
+the trees become shuffled.
+
+The callbacks is a list of functions that can be used for recognizing
+literals. For example we use those for recognizing names and unknown
+words in the translator.
+
+Finally, you could also get a linearization which is bracketed into
+a list of phrases:
+Using the Python binding to the C runtime
+ Krasimir Angelov, July 2015
+
+Loading the Grammar
+
+Before you use the Python binding you need to import the pgf module.
+
+>>> import pgf
+
+
+Once you have the module imported, you can use the dir and
+help functions to see what kind of functionality is available.
+dir takes an object and returns a list of methods available
+in the object:
+
+>>> dir(pgf)
+
+help is a little bit more advanced and it tries
+to produce more human readable documentation, which more over
+contains comments:
+
+>>> help(pgf)
+
+
+A grammar is loaded by calling the method readPGF:
+
+>>> gr = pgf.readPGF("App12.pgf")
+
+
+From the grammar you can query the set of available languages.
+It is accessible through the property languages which
+is a map from language name to an object of class pgf.Concr
+which respresents the language.
+For example the following will extract the English language:
+
+>>> eng = gr.languages["AppEng"]
+>>> print eng
+<pgf.Concr object at 0x7f7dfa4471d0>
+
+
+Parsing
+
+All language specific services are available as methods of the
+class pgf.Concr. For example to invoke the parser, you
+can call:
+
+>>> i = eng.parse("this is a small theatre")
+
+This gives you an iterator which can enumerates all possible
+abstract trees. You can get the next tree by calling next:
+
+>>> p,e = i.next()
+
+The results are always pairs of probability and tree. The probabilities
+are negated logarithmic probabilities and which means that the lowest
+number encodes the most probable result. The possible trees are
+returned in decreasing probability order (i.e. increasing negated logarithm).
+The first tree should have the smallest p:
+
+>>> print p
+35.9166526794
+
+and this is the corresponding abstract tree:
+
+>>> print e
+PhrUtt NoPConj (UttS (UseCl (TTAnt TPres ASimul) PPos (PredVP (DetNP (DetQuant this_Quant NumSg)) (UseComp (CompNP (DetCN (DetQuant IndefArt NumSg) (AdjCN (PositA small_A) (UseN theatre_N)))))))) NoVoc
+
+
+The parse method has also the following optional parameters:
+
+
+
+By using these parameters it is possible for instance to change the start category for
+the parser or to limit the number of trees returned from the parser. For example
+parsing with a different start category can be done as follows:
+
+ cat start category
+ n maximum number of trees
+ heuristics a real number from 0 to 1
+callbacks a list of category and callback function
+>>> i = eng.parse("a small theatre", cat="NP")
+
+
+Linearization
+
+You can either linearize the result from the parser back to another
+language, or you can explicitly construct a tree and then
+linearize it in any language. For example, we can create
+a new expression like this:
+
+>>> e = pgf.readExpr("AdjCN (PositA red_A) (UseN theatre_N)")
+
+and then we can linearize it:
+
+>>> print eng.linearize(e)
+red theatre
+
+This method produces only a single linearization. If you use variants
+in the grammar then you might want to see all possible linearizations.
+For that purpouse you should use linearizeAll:
+
+>>> for s in eng.linearizeAll(e):
+ print s
+red theatre
+red theater
+
+If, instead, you need an inflection table with all possible forms
+then the right method to use is tabularLinearize:
+
+>>> eng.tabularLinearize(e):
+{'s Sg Nom': 'red theatre', 's Pl Nom': 'red theatres', 's Pl Gen': "red theatres'", 's Sg Gen': "red theatre's"}
+
+
+
+>>> [b] = eng.bracketedLinearize(e)
+>>> print b
+(CN:4 (AP:1 (A:0 red)) (CN:3 (N:2 theatre)))
+
+Each bracket is actually an object of type pgf.Bracket. The property
+cat of the object gives you the name of the category and
+the property children gives you a list of nested brackets.
+If a phrase is discontinuous then it is represented as more than
+one brackets with the same category name. In that case, the index
+that you see in the example above will have the same value for all
+brackets of the same phrase.
+
+>>> print eng.hasLinearization("apple_N")
+
+
++An already constructed tree can be analyzed and transformed +in the host application. For example you can deconstruct +a tree into a function name and a list of arguments: +
+>>> e.unpack()
+('AdjCN', [<pgf.Expr object at 0x7f7df6db78c8>, <pgf.Expr object at 0x7f7df6db7878>])
+
+
+The result from unpack can be different depending on the form of the
+tree. If the tree is a function application then you always get
+a tuple of function name and a list of arguments. If instead the
+tree is just a literal string then the return value is the actual
+literal. For example the result from:
+
+>>> pgf.readExpr('"literal"').unpack()
+'literal'
+
+is just the string 'literal'. Situations like this can be detected
+in Python by checking the type of the result from unpack.
+
+
++For more complex analyses you can use the visitor pattern. +In object oriented languages this is just a clumpsy way to do +what is called pattern matching in most functional languages. +You need to define a class which has one method for each function +in the abstract syntax of the grammar. If the functions is called +f then you need a method called on_f. The method +will be called each time when the corresponding function is encountered, +and its arguments will be the arguments from the original tree. +If there is no matching method name then the runtime will +to call the method default. The following is an example: +
+>>> class ExampleVisitor: + def on_DetCN(self,quant,cn): + print "Found DetCN" + cn.visit(self) + + def on_AdjCN(self,adj,cn): + print "Found AdjCN" + cn.visit(self) + + def default(self,e): + pass +>>> e2.visit(ExampleVisitor()) +Found DetCN +Found AdjCN ++Here we call the method visit from the tree e2 and we give +it, as parameter, an instance of class ExampleVisitor. +ExampleVisitor has two methods on_DetCN +and on_AdjCN which are called when the top function of +the current tree is DetCN or AdjCN +correspondingly. In this example we just print a message and +we call visit recursively to go deeper into the tree. + + +Constructing new trees is also easy. You can either use +readExpr to read trees from strings, or you can +construct new trees from existing pieces. This is possible by +using the constructor for pgf.Expr: +
+>>> quant = pgf.readExpr("DetQuant IndefArt NumSg")
+>>> e2 = pgf.Expr("DetCN", [quant, e])
+>>> print e2
+DetCN (DetQuant IndefArt NumSg) (AdjCN (PositA red_A) (UseN theatre_N))
+
+
+
+>>> gr.embed("App")
+<module 'App' (built-in)>
+>>> import App
+
+Now creating new trees is just a matter of calling ordinary Python
+functions:
++>>> print App.DetCN(quant,e) +DetCN (DetQuant IndefArt NumSg) (AdjCN (PositA red_A) (UseN house_N)) ++ +
+for entry in eng.fullFormLexicon(): + print entry ++The second one implements a simple lookup. The argument is a word +form and the result is a list of analyses: +
+print eng.lookupMorpho("letter")
+[('letter_1_N', 's Sg Nom', inf), ('letter_2_N', 's Sg Nom', inf)]
+
+
++>>> gr.functions +.... ++or a list of categories: +
+>>> gr.categories +.... ++You can also access all functions with the same result category: +
+>>> gr.functionsByCat("Weekday")
+['friday_Weekday', 'monday_Weekday', 'saturday_Weekday', 'sunday_Weekday', 'thursday_Weekday', 'tuesday_Weekday', 'wednesday_Weekday']
+
+The full type of a function can be retrieved as:
+
+>>> print gr.functionType("DetCN")
+Det -> CN -> NP
+
+
+The runtime type checker can do type checking and type inference +for simple types. Dependent types are still not fully implemented +in the current runtime. The inference is done with method inferExpr: +
+>>> e,ty = gr.inferExpr(e) +>>> print e +AdjCN (PositA red_A) (UseN theatre_N) +>>> print ty +CN ++The result is a potentially updated expression and its type. In this +case we always deal with simple types, which means that the new +expression will be always equal to the original expression. However, this +wouldn't be true when dependent types are added. + + +
Type checking is also trivial: +
+>>> e = gr.checkExpr(e,pgf.readType("CN"))
+>>> print e
+AdjCN (PositA red_A) (UseN theatre_N)
+
+In case of type error you will get an exception:
+
+>>> e = gr.checkExpr(e,pgf.readType("A"))
+pgf.TypeError: The expected type of the expression AdjCN (PositA red_A) (UseN theatre_N) is A but CN is infered
+
+
+
++$ gf -make -split-pgf App12.pgf ++ +Now you can load the grammar as usual but this time only the +abstract syntax will be loaded. You can still use the languages +property to get the list of languages and the corresponding +concrete syntax objects: +
+>>> gr = pgf.readPGF("App.pgf")
+>>> eng = gr.languages["AppEng"]
+
+However, if you now try to use the concrete syntax then you will
+get an exception:
+
+>>> gr.languages["AppEng"].lookupMorpho("letter")
+Traceback (most recent call last):
+ File "", line 1, in
+pgf.PGFError: The concrete syntax is not loaded
+
+
+Before using the concrete syntax, you need to explicitly load it:
+
+>>> eng.load("AppEng.pgf_c")
+>>> print eng.lookupMorpho("letter")
+[('letter_1_N', 's Sg Nom', inf), ('letter_2_N', 's Sg Nom', inf)]
+
+
+When you don't need the language anymore then you can simply
+unload it:
++>>> eng.unload() ++ +
+>>> print gr.graphvizAbstractTree(e)
+graph {
+n0[label = "AdjCN", style = "solid", shape = "plaintext"]
+n1[label = "PositA", style = "solid", shape = "plaintext"]
+n2[label = "red_A", style = "solid", shape = "plaintext"]
+n1 -- n2 [style = "solid"]
+n0 -- n1 [style = "solid"]
+n3[label = "UseN", style = "solid", shape = "plaintext"]
+n4[label = "theatre_N", style = "solid", shape = "plaintext"]
+n3 -- n4 [style = "solid"]
+n0 -- n3 [style = "solid"]
+}
+
+
+
+>>> print eng.graphvizParseTree(e)
+graph {
+ node[shape=plaintext]
+
+ subgraph {rank=same;
+ n4[label="CN"]
+ }
+
+ subgraph {rank=same;
+ edge[style=invis]
+ n1[label="AP"]
+ n3[label="CN"]
+ n1 -- n3
+ }
+ n4 -- n1
+ n4 -- n3
+
+ subgraph {rank=same;
+ edge[style=invis]
+ n0[label="A"]
+ n2[label="N"]
+ n0 -- n2
+ }
+ n1 -- n0
+ n3 -- n2
+
+ subgraph {rank=same;
+ edge[style=invis]
+ n100000[label="red"]
+ n100001[label="theatre"]
+ n100000 -- n100001
+ }
+ n0 -- n100000
+ n2 -- n100001
+}
+
+
+
+
+
diff --git a/index.html b/index.html
index 7bf14c9e6..57e65713f 100644
--- a/index.html
+++ b/index.html
@@ -81,7 +81,8 @@ function sitesearch() {