Merge branch 'master' of www.grammaticalframework.org:/usr/local/www/GF

2026-07-13 09:02:45 -06:00 · 2017-08-28 21:37:35 +02:00
parent fe2aeb839b 4e4d07aeae
commit 685307bf0d
4 changed files with 196 additions and 41 deletions
@@ -1,6 +1,5 @@
 <html>
 	<head>
 		<link rel="stylesheet" type="text/css" href="cloud.css" title="Cloud">
 		<style>
 			body { background: #eee; }
@@ -268,7 +267,7 @@ red theatre
 </pre>
 This method produces only a single linearization. If you use variants
 in the grammar then you might want to see all possible linearizations.
-For that purpouse you should use linearizeAll:
+For that purpouse you should use <tt>linearizeAll</tt>:
 <pre class="python">
 >>> for s in eng.linearizeAll(e):
       print(s)
@@ -294,8 +293,8 @@ then the right method to use is <tt>tabularLinearize</tt>:
 {'s Sg Nom': 'red theatre', 's Pl Nom': 'red theatres', 's Pl Gen': "red theatres'", 's Sg Gen': "red theatre's"}
 </pre>
 <pre class="haskell">
-Prelude PGF2> tabularLinearize eng e
+Prelude PGF2> tabularLinearize eng e   ---- TODO
-{'s Sg Nom': 'red theatre', 's Pl Nom': 'red theatres', 's Pl Gen': "red theatres'", 's Sg Gen': "red theatre's"}
+fromList [("s Sg Nom", "red theatre"), ("s Pl Nom", "red theatres"), ("s Pl Gen", "red theatres'"), ("s Sg Gen", "red theatre's")]
 </pre>
 <pre class="java">
 for (Map.Entry&lt;String,String&gt; entry : eng.tabularLinearize(e)) {
@@ -316,20 +315,53 @@ a list of phrases:
 (CN:4 (AP:1 (A:0 red)) (CN:3 (N:2 theatre)))
 </pre>
 <pre class="haskell">
-Prelude PGF2> let [b] = bracketedLinearize eng e
+Prelude PGF2> let [b] = bracketedLinearize eng e   ---- TODO
 Prelude PGF2> print b
 (CN:4 (AP:1 (A:0 red)) (CN:3 (N:2 theatre)))
 </pre>
 <pre class="java">
 Object[] bs = eng.bracketedLinearize(e)
 </pre>
-Each bracket is actually an object of type pgf.Bracket. The property
+<span class="python">
-<tt>cat</tt> of the object gives you the name of the category and 
+Each element in the sequence above is either a string or an object
-the property children gives you a list of nested brackets.
+of type <tt>pgf.Bracket</tt>. When it is actually a bracket then
-If a phrase is discontinuous then it is represented as more than
+the object has the following properties:
-one brackets with the same category name. In that case, the index
+<ul>
-that you see in the example above will have the same value for all
+	<li><tt>cat</tt> - the syntactic category for this bracket</li>
-brackets of the same phrase.
+	<li><tt>fid</tt> - an id which identifies this bracket in the bracketed string. If there are discontinuous phrases this id will be shared for all brackets belonging to the same phrase.</li>
 	<li><tt>lindex</tt> - the constituent index</li>
 	<li><tt>fun</tt> - the abstract function for this bracket</li>
 	<li><tt>children</tt> - a list with the children of this bracket</li>
 </ul>
 </span>
 <span class="haskell">
 The list above contains elements of type <tt>BracketedString</tt>.
 This type has two constructors:
 <ul>
 	<li><tt>Leaf</tt> with only one argument of type <tt>String</tt> that contains the current word</li>
 	<li><tt>Bracket</tt> with the following arguments:
 		<ul>
 			<li><tt>cat :: String</tt> - the syntactic category for this bracket</li>
 			<li><tt>fid :: Int</tt> - an id which identifies this bracket in the bracketed string. If there are discontinuous phrases this id will be shared for all brackets belonging to the same phrase.</li>
 			<li><tt>lindex :: Int</tt> - the constituent index</li>
 			<li><tt>fun :: String</tt> - the abstract function for this bracket</li>
 			<li><tt>children :: [BracketedString]</tt> - a list with the children of this bracket</li>
 		</ul>
 	</li>
 </ul>
 </span>
 <span class="java">
 Each element in the sequence above is either a string or an object
 of type <tt>Bracket</tt>. When it is actually a bracket then
 the object has the following public final variables:
 <ul>
 	<li><tt>String cat</tt> - the syntactic category for this bracket</li>
 	<li><tt>int fid</tt> - an id which identifies this bracket in the bracketed string. If there are discontinuous phrases this id will be shared for all brackets belonging to the same phrase.</li>
 	<li><tt>int lindex</tt> - the constituent index</li>
 	<li><tt>String fun</tt> - the abstract function for this bracket</li>
 	<li><tt>Object[] children</tt> - a list with the children of this bracket</li>
 </ul>
 </span>
 </p>
 The linearization works even if there are functions in the tree 
@@ -357,20 +389,45 @@ a tree into a function name and a list of arguments:
 >>> e.unpack()
 ('AdjCN', [&lt;pgf.Expr object at 0x7f7df6db78c8&gt;, &lt;pgf.Expr object at 0x7f7df6db7878&gt;])
 </pre>
-
+<pre class="haskell">
 Prelude PGF2> unApp e
 Just ("AdjCN", [..., ...])
 </pre>
 </p>
 <p>
 <span class="python">
 The result from unpack can be different depending on the form of the
 tree. If the tree is a function application then you always get
-a tuple of function name and a list of arguments. If instead the
+a tuple of a function name and a list of arguments. If instead the
 tree is just a literal string then the return value is the actual
 literal. For example the result from:
 </span>
 <pre class="python">
 >>> pgf.readExpr('"literal"').unpack()
 'literal'
 </pre>
-is just the string 'literal'. Situations like this can be detected
+<span class="haskell">
 The result from <tt>unApp</tt> is <tt>Just</tt> if the expression
 is an application and <tt>Nothing</tt> in all other cases.
 Similarly, if the tree is a literal string then the return value 
 from <tt>unStr</tt> will be <tt>Just</tt> with the actual literal. 
 For example the result from:
 </span>
 <pre class="haskell">
 Prelude PGF2> unStr (readExpr "\"literal\"")
 "literal"
 </pre>
 is just the string "literal". 
 <span class="python">Situations like this can be detected
 in Python by checking the type of the result from <tt>unpack</tt>.
 It is also possible to get an integer or a floating point number
 for the other possible literal types in GF.</span>
 <span class="haskell">
 There are also the functions <tt>unAbs</tt>, <tt>unInt</tt>, <tt>unFloat</tt> and <tt>unMeta</tt> for all other possible cases.
 </span>
 </p>
 <span class="python">
 <p>
 For more complex analyses you can use the visitor pattern.
 In object oriented languages this is just a clumpsy way to do
@@ -406,10 +463,12 @@ the current tree is <tt>DetCN</tt> or <tt>AdjCN</tt>
 correspondingly. In this example we just print a message and
 we call <tt>visit</tt> recursively to go deeper into the tree.
 </p>
 </span>
 Constructing new trees is also easy. You can either use 
 <tt>readExpr</tt> to read trees from strings, or you can
 construct new trees from existing pieces. This is possible by
 <span class="python">
 using the constructor for <tt>pgf.Expr</tt>:
 <pre class="python">
 >>> quant = pgf.readExpr("DetQuant IndefArt NumSg")
@@ -417,7 +476,18 @@ using the constructor for <tt>pgf.Expr</tt>:
 >>> print(e2)
 DetCN (DetQuant IndefArt NumSg) (AdjCN (PositA red_A) (UseN theatre_N))
 </pre>
 </span>
 <span class="haskell">
 using the functions <tt>mkApp</tt>, <tt>mkStr</tt>, <tt>mkInt</tt>, <tt>mkFloat</tt> and <tt>mkMeta</tt>:
 <pre class="haskell">
 Prelude PGF2> let Just quant = readExpr "DetQuant IndefArt NumSg"
 Prelude PGF2> let e2 = mkApp "DetCN" [quant, e]
 Prelude PGF2> print e2
 DetCN (DetQuant IndefArt NumSg) (AdjCN (PositA red_A) (UseN theatre_N))
 </pre>
 </span>
 <span class="python">
 <h2>Embedded GF Grammars</h2>
 The GF compiler allows for easy integration of grammars in Haskell
@@ -439,6 +509,7 @@ functions:
 >>> print(App.DetCN(quant,e))
 DetCN (DetQuant IndefArt NumSg) (AdjCN (PositA red_A) (UseN house_N))
 </pre>
 </span>
 <h2>Access the Morphological Lexicon</h2>
@@ -447,18 +518,27 @@ lexicon. The first makes it possible to dump the full form lexicon.
 The following code just iterates over the lexicon and prints each
 word form with its possible analyses:
 <pre class="python">
-for entry in eng.fullFormLexicon():
+>>> for entry in eng.fullFormLexicon():
-	print(entry)
+>>>    print(entry)
 </pre>
 <pre class="haskell">
 Prelude PGF2> mapM_ print [(form,lemma,analysis,prob) | (form,analyses) &lt;- fullFormLexicon eng, (lemma,analysis,prob) &lt- analyses]
 </pre>
 <pre class="java">
-for (entry in eng.fullFormLexicon()) {
+for (FullFormEntry entry in eng.fullFormLexicon()) {
-    System.out.println(entry);
+	for (MorphoAnalysis analysis : entry.getAnalyses()) {
 		System.out.println(entry.getForm()+" "+analysis.getProb()+" "+analysis.getLemma()+" "+analysis.getField());
 	}
 }
 </pre>
 The second one implements a simple lookup. The argument is a word
 form and the result is a list of analyses:
 <pre class="python">
-print(eng.lookupMorpho("letter"))
+>>> print(eng.lookupMorpho("letter"))
 [('letter_1_N', 's Sg Nom', inf), ('letter_2_N', 's Sg Nom', inf)]
 </pre>
 <pre class="python">
 Prelude PGF2> print (lookupMorpho eng "letter")
 [('letter_1_N', 's Sg Nom', inf), ('letter_2_N', 's Sg Nom', inf)]
 </pre>
 <pre class="java">
@@ -588,6 +668,7 @@ Expr e = gr.checkExpr(e,Type.readType("A"))
 pgf.TypeError: The expected type of the expression AdjCN (PositA red_A) (UseN theatre_N) is A but CN is infered
 </pre></p>
 <span class="python">
 <h2>Partial Grammar Loading</h2>
 <p>By default the whole grammar is compiled into a single file
@@ -600,12 +681,6 @@ This is done by using the option <tt>-split-pgf</tt> in the compiler:
 <pre class="python">
 $ gf -make -split-pgf App12.pgf
 </pre>
 <pre class="haskell">
 $ gf -make -split-pgf App12.pgf
 </pre>
 <pre class="java">
 $ gf -make -split-pgf App12.pgf
 </pre>
 </p>
 Now you can load the grammar as usual but this time only the
@@ -616,10 +691,6 @@ concrete syntax objects:
 >>> gr = pgf.readPGF("App.pgf")
 >>> eng = gr.languages["AppEng"]
 </pre>
 <pre class="java">
 PGF gr = PGF.readPGF("App.pgf")
 Concr eng = gr.getLanguages().get("AppEng")
 </pre>
 However, if you now try to use the concrete syntax then you will
 get an exception:
 <pre class="python">
@@ -628,12 +699,6 @@ Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
 pgf.PGFError: The concrete syntax is not loaded
 </pre>
 <pre class="java">
 eng.lookupMorpho("letter")
 Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
 pgf.PGFError: The concrete syntax is not loaded
 </pre>
 Before using the concrete syntax, you need to explicitly load it: 
 <pre class="python">
@@ -641,6 +706,47 @@ Before using the concrete syntax, you need to explicitly load it:
 >>> print(eng.lookupMorpho("letter"))
 [('letter_1_N', 's Sg Nom', inf), ('letter_2_N', 's Sg Nom', inf)]
 </pre>
 When you don't need the language anymore then you can simply
 unload it:
 <pre class="python">
 >>> eng.unload()
 </pre>
 </span>
 <span class="java">
 <h2>Partial Grammar Loading</h2>
 <p>By default the whole grammar is compiled into a single file
 which consists of an abstract syntax together will all concrete
 languages. For large grammars with many languages this might be
 inconvinient because loading becomes slower and the grammar takes
 more memory. For that purpose you could split the grammar into
 one file for the abstract syntax and one file for every concrete syntax.
 This is done by using the option <tt>-split-pgf</tt> in the compiler:
 <pre class="java">
 $ gf -make -split-pgf App12.pgf
 </pre>
 </p>
 Now you can load the grammar as usual but this time only the
 abstract syntax will be loaded. You can still use the <tt>languages</tt>
 property to get the list of languages and the corresponding
 concrete syntax objects:
 <pre class="java">
 PGF gr = PGF.readPGF("App.pgf")
 Concr eng = gr.getLanguages().get("AppEng")
 </pre>
 However, if you now try to use the concrete syntax then you will
 get an exception:
 <pre class="java">
 eng.lookupMorpho("letter")
 Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
 pgf.PGFError: The concrete syntax is not loaded
 </pre>
 Before using the concrete syntax, you need to explicitly load it: 
 <pre class="java">
 eng.load("AppEng.pgf_c")
 for (MorphoAnalysis an : eng.lookupMorpho("letter")) {
@@ -652,12 +758,10 @@ letter_2_N, s Sg Nom, inf
 When you don't need the language anymore then you can simply
 unload it:
 <pre class="python">
 >>> eng.unload()
 </pre>
 <pre class="java">
 eng.unload()
 </pre>
 </span>
 <h2>GraphViz</h2>
@@ -51,7 +51,7 @@ module PGF2 (-- * PGF
             -- * Concrete syntax
             ConcName,Concr,languages,
             -- ** Linearization
-             linearize,linearizeAll,
+             linearize,linearizeAll,tabularLinearize,
             FId, LIndex, BracketedString(..), showBracketedString, flattenBracketedString,
             alignWords,
@@ -640,6 +640,54 @@ linearizeAll lang e = unsafePerformIO $
        else do gu_pool_free pl
                throwIO (PGFError "The abstract tree cannot be linearized")
 -- | Generates a table of linearizations for an expression
 tabularLinearize :: Concr -> Expr -> Map.Map String String
 tabularLinearize lang e = unsafePerformIO $
  withGuPool $ \tmpPl -> do
    exn <- gu_new_exn tmpPl
    cts <- pgf_lzr_concretize (concr lang) (expr e) exn tmpPl
    failed <- gu_exn_is_raised exn
    if failed
      then throwExn exn
      else do ctree <- alloca $ \ptr -> do gu_enum_next cts ptr tmpPl
                                           peek ptr
              if ctree == nullPtr
                then do touchExpr e
                        return Map.empty
                else do labels <- alloca $ \p_n_lins ->
                                  alloca $ \p_labels -> do
                                    pgf_lzr_get_table (concr lang) ctree p_n_lins p_labels
                                    n_lins <- peek p_n_lins
                                    labels <- peek p_labels
                                    labels <- peekArray (fromIntegral n_lins) labels
                                    labels <- mapM peekCString labels
                                    return labels
                        lins <- collect lang ctree 0 labels exn tmpPl
                        return (Map.fromList lins)
  where
    collect lang ctree lin_idx []             exn tmpPl = return []
    collect lang ctree lin_idx (label:labels) exn tmpPl = do
      (sb,out) <- newOut tmpPl
      pgf_lzr_linearize_simple (concr lang) ctree lin_idx out exn tmpPl
      failed <- gu_exn_is_raised exn
      if failed
        then do is_nonexist <- gu_exn_caught exn gu_exn_type_PgfLinNonExist
                if is_nonexist
                  then collect lang ctree (lin_idx+1) labels exn tmpPl
                  else throwExn exn
        else do lin <- gu_string_buf_freeze sb tmpPl
                s  <- peekUtf8CString lin
                ss <- collect lang ctree (lin_idx+1) labels exn tmpPl
                return ((label,s):ss)
    throwExn exn = do
      is_exn <- gu_exn_caught exn gu_exn_type_PgfExn
      if is_exn
        then do c_msg <- (#peek GuExn, data.data) exn
                msg <- peekUtf8CString c_msg
                throwIO (PGFError msg)
        else do throwIO (PGFError "The abstract tree cannot be linearized")
 type FId    = Int
 type LIndex = Int
@@ -202,6 +202,9 @@ foreign import ccall "pgf/pgf.h pgf_lzr_wrap_linref"
 foreign import ccall "pgf/pgf.h pgf_lzr_linearize_simple"
  pgf_lzr_linearize_simple :: Ptr PgfConcr -> Ptr PgfCncTree -> CInt -> Ptr GuOut -> Ptr GuExn -> Ptr GuPool -> IO ()
 foreign import ccall "pgf/pgf.h pgf_lzr_get_table"
  pgf_lzr_get_table :: Ptr PgfConcr -> Ptr PgfCncTree -> Ptr CInt -> Ptr (Ptr CString) -> IO ()
 foreign import ccall "pgf/pgf.h pgf_align_words"
  pgf_align_words :: Ptr PgfConcr -> PgfExpr -> Ptr GuExn -> Ptr GuPool -> IO (Ptr GuSeq)
@@ -1990,7 +1990,7 @@ static PyMemberDef Bracket_members[] = {
    {"fun", T_OBJECT_EX, offsetof(BracketObject, fun), 0,
     "the abstract function for this bracket"},
    {"fid", T_INT, offsetof(BracketObject, fid), 0,
-     "an unique id which identifies this bracket in the whole bracketed string"},
+     "an id which identifies this bracket in the bracketed string. If there are discontinuous phrases this id will be shared for all brackets belonging to the same phrase."},
    {"lindex", T_INT, offsetof(BracketObject, lindex), 0,
     "the constituent index"},
    {"children", T_OBJECT_EX, offsetof(BracketObject, children), 0,