mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-11 05:49:31 -06:00
866 lines
29 KiB
HTML
866 lines
29 KiB
HTML
<html>
|
|
<body bgcolor="#FFFFFF" text="#000000" >
|
|
<center>
|
|
<IMG SRC="gf-logo.gif">
|
|
|
|
|
|
<h1>Grammatical Framework History of Changes</h1>
|
|
|
|
|
|
|
|
Changes in functionality since May 17, 2005, release of GF Version 2.2
|
|
|
|
</center>
|
|
|
|
<p>
|
|
|
|
25/6 (BB)
|
|
Added new speech recognition grammar printers for non-recursive SRGS grammars,
|
|
as used by Nuance Recognizer 9.0. Try <tt>pg -printer=srgs_xml_non_rec</tt>
|
|
or <tt>pg -printer=srgs_abnf_non_rec</tt>.
|
|
|
|
<p>
|
|
|
|
19/6 (AR)
|
|
Extended the functor syntax (<tt>with</tt> modules) so that the functor can have
|
|
restricted import and a module body (whose function is normally to complete restricted
|
|
import). Thus the following format is now possible:
|
|
<pre>
|
|
concrete C of A = E ** CI - [f,g] with (...) ** open R in {...}
|
|
</pre>
|
|
At the same time, the possibility of an empty module body was added to other modules
|
|
for symmetry. This can be useful for "proxy modules" that just collect other modules
|
|
without adding anything, e.g.
|
|
<pre>
|
|
abstract Math = Arithmetic, Geometry ;
|
|
</pre>
|
|
|
|
|
|
<p>
|
|
|
|
|
|
18/6 (AR)
|
|
Added a warning for clashing constants. A constant coming from multiple opened modules
|
|
was interpreted as "the first" found by the compiler, which was a source of difficult
|
|
errors. Clashing is officially forbidden, but we chose to give a warning instead of
|
|
raising an error to begin with (in version 2.8).
|
|
|
|
<p>
|
|
|
|
30/1/2007 (AR)
|
|
Semantics of variants fixed for complex types. Officially, it was only
|
|
defined for basic types (Str and parameters). When used for records, results were
|
|
multiplicative, which was nor usable. But now variants should work for any type.
|
|
|
|
<p>
|
|
|
|
<hr>
|
|
|
|
<p>
|
|
|
|
22/12 (AR) <b>Release of GF version 2.7</b>.
|
|
|
|
<p>
|
|
|
|
21/12 (AR)
|
|
Overloading rules for GF version 2.7:
|
|
<ol>
|
|
<li> If a unique instance is found by exact match with argument types,
|
|
that instance is used.
|
|
<li> Otherwise, if exact match with the expected value type gives a
|
|
uniques instance, that instance is used.
|
|
<li> Otherwise, if among possible instances only one returns a non-function
|
|
type, that instance is used, but a warning is issued.
|
|
<li> Otherwise, an error results, and the list of possible instances is shown.
|
|
</ol>
|
|
These rules are still experimental, but all future developments will guarantee
|
|
that their type-correct use will work. Rule (3) is only needed because the
|
|
current type checker does not always know an expected type. It can give
|
|
an incorrect result which is captured later in the compilation. To be noticed,
|
|
in particular, is that exact match is required. Match by subtyping will be
|
|
investigated later.
|
|
|
|
<p>
|
|
|
|
21/12 (BB) Java Speech Grammar Format with SISR tags can now be generated.
|
|
Use <tt>pg -printer=jsgf_sisr_old</tt>. The SISR tags are in Working Draft
|
|
20030401 format, which is supported by the OptimTALK VoiceXML interpreter
|
|
and the IBM XHTML+Voice implementation use by the Opera web browser.
|
|
|
|
<p>
|
|
|
|
21/12 (BB) <a name="voicexml">
|
|
VoiceXML 2.0 dialog systems can now be generated from GF grammars.
|
|
Use <tt>pg -printer=vxml</tt>.
|
|
|
|
<p>
|
|
|
|
21/12 (BB) <a name="javascript">
|
|
JavaScript code for linearization and type annotation can now be
|
|
generated from a multilingual GF grammar. Use <tt>pm -printer=js</tt>.
|
|
|
|
|
|
<p>
|
|
|
|
5/12 (BB) <a name="gfcc2c">
|
|
A new tool for generating C linearization libraries
|
|
from a GFCC file. <tt>make gfcc2c</tt> in <tt>src</tt>
|
|
compiles the tool. The generated
|
|
code includes header files in <tt>lib/c</tt> and should be linked
|
|
against <tt>libgfcc.a</tt> in <tt>lib/c</tt>. For an example of
|
|
using the generated code, see <tt>src/tools/c/examples/bronzeage</tt>.
|
|
<tt>make</tt> in that directory generates a GFCC file, then generates
|
|
C code from that, and then compiles a program <tt>bronzeage-test</tt>.
|
|
The <tt>main</tt> function for that program is defined in
|
|
<tt>bronzeage-test.c</tt>.
|
|
|
|
|
|
<p>
|
|
|
|
20/11 (AR) Type error messages in concrete syntax are printed with a
|
|
heuristic where a type of the form <tt>{... ; lock_C : {} ; ...}</tt>
|
|
is printed as <tt>C</tt>. This gives more readable error messages, but
|
|
can produce wrong results if lock fields are hand-written or if subtypes
|
|
of lock-fielded categories are used.
|
|
|
|
<p>
|
|
|
|
17/11 (AR) <a name="overloading">
|
|
Operation overloading: an <tt>oper</tt> can have many types,
|
|
from which one is picked at compile time. The types must have different
|
|
argument lists. Exact match with the arguments given to the <tt>oper</tt>
|
|
is required. An example is given in
|
|
<a href="../lib/resource-1.0/doc/gfdoc/Constructors.gf"><tt>Constructors.gf</tt></a>.
|
|
The purpose of overloading is to make libraries easier to use, since
|
|
only one name for each grammatical operation is needed: predication, modification,
|
|
coordination, etc. The concrete syntax is, at this experimental level, not
|
|
extended but relies on using a record with the function name repeated
|
|
as label name (see the example). The treatment of overloading is inspired
|
|
by C++, and was first suggested by Björn Nringert.
|
|
|
|
<p>
|
|
|
|
|
|
3/10 (AR) A new low-level format <tt>gfcc</tt> ("Canonical Canonical GF").
|
|
It is going to replace the <tt>gfc</tt> format later, but is already now
|
|
an efficient format for multilingual generation.
|
|
See <a href="../src/GF/Canon/GFCC/doc/gfcc.html">GFCC document</a>
|
|
for more information.
|
|
|
|
<p>
|
|
|
|
1/9 (AR) New way for managing errors in grammar compilation:
|
|
<pre>
|
|
Predef.Error : Type ;
|
|
Predef.error : Str -> Predef.Error ;
|
|
</pre>
|
|
Denotationally, <tt>Error</tt> is the empty type and thus a
|
|
subtype of any other types: it can be used anywhere. But the
|
|
<tt>error</tt> function is not canonical. Hence the compilation
|
|
is interrupted when <tt>(error s)</tt> is translated to GFC, and
|
|
the message <tt>s</tt> is emitted. An example use is given in
|
|
<tt>english/ParadigmsEng.gf</tt>:
|
|
<pre>
|
|
regDuplV : Str -> V ;
|
|
regDuplV fit =
|
|
case last fit of {
|
|
("a" | "e" | "i" | "o" | "u" | "y") =>
|
|
Predef.error (["final duplication makes no sense for"] ++ fit) ;
|
|
t =>
|
|
let fitt = fit + t in
|
|
mkV fit (fit + "s") (fitt + "ed") (fitt + "ed") (fitt + "ing")
|
|
} ;
|
|
</pre>
|
|
This function thus cannot be applied to a stem ending with a vowel,
|
|
which is exactly what we want. In future, it may be good to add similar
|
|
checks to all morphological paradigms in the resource.
|
|
|
|
|
|
<p>
|
|
|
|
16/8 (AR) New generation algorithm: slower but works with less
|
|
memory. Default of <tt>gt</tt>; use <tt>gt -mem</tt> for the old
|
|
algorithm. The new option <tt>gt -all</tt> lazily generates all
|
|
trees until interrupted. It cannot be piped to other GF commands,
|
|
hence use <tt>gt -all -lin</tt> to print out linearized strings
|
|
rather than trees.
|
|
|
|
<hr>
|
|
|
|
|
|
22/6 (AR) <b>Release of GF version 2.6</b>.
|
|
|
|
<p>
|
|
|
|
20/6 (AR) The FCFG parser is know the default, as it even handles literals.
|
|
The old default can be selected by <tt>p -old</tt>. Since
|
|
FCFG does not support variable bindings, <tt>-old</tt> is automatically
|
|
selected if the grammar has bindings - and unless the <tt>-fcfg</tt> flag
|
|
is used.
|
|
|
|
<p>
|
|
|
|
17/6 (AR) The FCFG parser is now the recommended method for parsing
|
|
heavy grammars such as the resource grammars. It does not yet support
|
|
literals and variable bindings.
|
|
|
|
<p>
|
|
|
|
1/6 (AR) Added the FCFG parser written by Krasimir Angelov. Invoked by
|
|
<tt>p -fcfg</tt>. This parser is as general as MCFG but faster.
|
|
It needs more testing and debugging.
|
|
|
|
<p>
|
|
|
|
1/6 (AR) The command <tt>r = reload</tt> repeats the latest
|
|
<tt>i = import</tt> command.
|
|
|
|
<p>
|
|
|
|
30/5 (AR) It is now possible to use the flags <tt>-all, -table, -record</tt>
|
|
in combination with <tt>l -multi</tt>, and also with <tt>tb</tt>.
|
|
|
|
<p>
|
|
|
|
18/5 (AR) Introduced a wordlist format <tt>gfwl</tt> for
|
|
quick creation of language exercises and (in future) multilingual lexica.
|
|
The format is now very simple:
|
|
<pre>
|
|
# Svenska - Franska - Finska
|
|
berg - montagne - vuori
|
|
klättra - grimper / escalader - kiivetä / kiipeillä
|
|
</pre>
|
|
but can be extended to cover paradigm functions in addition to just
|
|
words.
|
|
|
|
<p>
|
|
|
|
3/4 (AR) The predefined abstract syntax type <tt>Int</tt> now has two
|
|
inherent parameters indicating its last digit and its size. The (hard-coded)
|
|
linearization type is
|
|
<pre>
|
|
{s : Str ; size : Predef.Ints 1 ; last : Predef.Ints 9}
|
|
</pre>
|
|
The <tt>size</tt> field has value <tt>1</tt> for integers greater than 9, and
|
|
value <tt>0</tt> for other integers (which are never negative). This parameter can
|
|
be used e.g. in calculating number agreement,
|
|
<pre>
|
|
Risala i = {s = i.s ++ table (Predef.Ints 1 * Predef.Ints 9) {
|
|
<0,1> => "risalah" ;
|
|
<0,2> => "risalatan" ;
|
|
<0,_> | <1,0> => "rasail" ;
|
|
_ => "risalah"
|
|
} ! <i.size,i.last>
|
|
} ;
|
|
</pre>
|
|
Notice that the table has to be typed explicitly for <tt>Ints k</tt>,
|
|
because type inference would otherwise return <tt>Int</tt> and therefore
|
|
fail to expand the table.
|
|
|
|
|
|
<p>
|
|
|
|
31/3 (AR) Added flags and options to some commands, to help generation:
|
|
<ul>
|
|
<li> <tt>gt -noexpand=NP,V,TV</tt> does not expand these categories,
|
|
but only generates metavariables for them.
|
|
<li> <tt>gt -doexpand=NP,V,TV</tt> only expands these categories,
|
|
and generates metavariables for others.
|
|
<li> <tt>gr -cf</tt> has the same flags.
|
|
<li> <tt>l -mark=metacat</tt> marks the metavariables with their categories.
|
|
<li> <tt>p -fail</tt> marks with <tt>#FAIL</tt> strings that have no parse.
|
|
<li> <tt>p -ambiguous</tt> marks as <tt>#AMBIGUOUS</tt>
|
|
strings that have more than one parse.
|
|
</ul>
|
|
|
|
<p>
|
|
|
|
<hr>
|
|
|
|
21/3/2006 <b>Release of GF 2.5</b>.
|
|
|
|
<p>
|
|
|
|
16/3 (AR) Added two flag values to <tt>pt -transform=X</tt>:
|
|
<tt>nodup</tt> which excludes terms where a constant is duplicated,
|
|
and
|
|
<tt>nodupatom</tt> which excludes terms where an atomic constant is duplicated.
|
|
The latter, in particular, is useful as a filter in generation:
|
|
<pre>
|
|
gt -cat=Cl | pt -transform=nodupatom
|
|
</pre>
|
|
This gives a corpus where words don't (usually) occur twice in the same clause.
|
|
|
|
<p>
|
|
|
|
6/3 (AR) Generalized the <tt>gfe</tt> file format in two ways:
|
|
<ol>
|
|
<li> Use the real grammar parser, hence <tt>(in M.C "foo")</tt> expressions
|
|
may occur anywhere. But the <i>ad hoc</i> word substitution syntax is
|
|
abandoned: ordinary <tt>let</tt> (and <tt>where</tt>) expressions
|
|
can now be used instead.
|
|
<li> The resource may now be a treebank, not just a grammar. Parsing
|
|
is thus replaced by treebank lookup, which in most cases is faster.
|
|
</ol>
|
|
A minor novelty is that the <tt>--# -resource=FILE</tt> flag can now be
|
|
relative to <tt>GF_LIB_PATH</tt>, both for grammars and treebanks.
|
|
The flag <tt> --# -treebank=IDENT</tt> gives the language whose treebank
|
|
entries are used, in case of a multilingual treebank.
|
|
|
|
<p>
|
|
|
|
4/3 (AR) Added command <tt>use_treebank = ut</tt> for lookup in a treebank.
|
|
This command can be used as a fast substitute for parsing, but also as a
|
|
way to browse treebanks.
|
|
<pre>
|
|
ut "He adds this to that" | l -multi -- use treebank lookup as parser in translation
|
|
ut -assocs | grep "ComplV2" -- show all associations with ComplV2
|
|
</pre>
|
|
|
|
<p>
|
|
|
|
3/3 (AR) Added option <tt>-treebank</tt> to the <tt>i</tt> command. This adds treebanks to
|
|
the shell state. The possible file formats are
|
|
<ol>
|
|
<li> XML file with a multilingual treebank, produced by <tt>tb -xml</tt>
|
|
<li> tab-organized text file with a unilingual treebank, produced by <tt>ut -assocs</tt>
|
|
</ol>
|
|
Notice that the treebanks in shell state are unilingual, and have strings as keys.
|
|
Multilingual treebanks have trees as keys. In case 1, one unilingual treebank per
|
|
language is built in the shell state.
|
|
|
|
|
|
<p>
|
|
|
|
1/3 (AR) Added option <tt>-trees</tt> to the command <tt>tree_bank = tb</tt>.
|
|
By this option, the command just returns the trees in the treebank. It can be
|
|
used for producing new treebanks with the same trees:
|
|
<pre>
|
|
rf old.xml | tb -trees | tb -xml | wf new.xml
|
|
</pre>
|
|
Recall that only treebanks in the XML format can be read with the <tt>-trees</tt>
|
|
and <tt>-c</tt> flags.
|
|
|
|
<p>
|
|
|
|
1/3 (AR) A <tt>.gfe</tt> file can have a <tt>--# -path=PATH</tt> on its
|
|
second line. The file given on the first line (<tt>--# -resource=FILE</tt>)
|
|
is then read w.r.t. this path. This is useful if the resource file has
|
|
no path itself, which happens when it is gfc-only.
|
|
|
|
<p>
|
|
|
|
25/2 (AR) The flag <tt>preproc</tt> of the <tt>i</tt> command (and thereby
|
|
to <tt>gf</tt> itself) causes GF to apply a preprocessor to each sourcefile
|
|
it reads.
|
|
|
|
<p>
|
|
|
|
8/2 (AR) The command <tt>tb = tree_bank</tt> for creating and testing against
|
|
multilingual treebanks. Example uses:
|
|
<pre>
|
|
gr -cat=S -number=100 | tb -xml | wf tb.xml -- random treebank into file
|
|
rf tb.txt | tb -c -- read comparison treebank from file
|
|
</pre>
|
|
|
|
<p>
|
|
|
|
10/1 (AR) Forbade variable binding inside negation and Kleene star
|
|
patterns.
|
|
|
|
<p>
|
|
|
|
7/1 (AR) Full set of regular expression patterns, with
|
|
as-patterns to enable variable bindings to matched expressions:
|
|
<ul>
|
|
<li> <i>p</i> <tt>+</tt> <i>q</i> : token consisting of <i>p</i> followed by <i>q</i>
|
|
<li> <i>p</i> <tt>*</tt> : token <i>p</i> repeated 0 or more times
|
|
(max the length of the strin to be matched)
|
|
<li> <tt>-</tt> <i>p</i> : matches anything that <i>p</i> does not match
|
|
<li> <i>x</i> <tt>@</tt> <i>p</i> : bind to <i>x</i> what <i>p</i> matches
|
|
<li> <i>p</i> <tt>|</tt> <i>q</i> : matches what either <i>p</i> or <i>q</i> matches
|
|
</ul>
|
|
The last three apply to all types of patterns, the first two only to token strings.
|
|
Example: plural formation in Swedish 2nd declension
|
|
(<i>pojke-pojkar, nyckel-nycklar, seger-segrar, bil-bilar</i>):
|
|
<pre>
|
|
plural2 : Str -> Str = \w -> case w of {
|
|
pojk + "e" => pojk + "ar" ;
|
|
nyck + "e" + l@("l" | "r" | "n") => nyck + l + "ar" ;
|
|
bil => bil + "ar"
|
|
} ;
|
|
</pre>
|
|
Semantics: variables are always bound to the <b>first match</b>, in the sequence defined
|
|
as the list <tt>Match p v</tt> as follows:
|
|
<pre>
|
|
Match (p1|p2) v = Match p1 v ++ Match p2 v
|
|
Match (p1+p2) s = [Match p1 s1 ++ Match p2 s2 | i <- [0..length s], (s1,s2) = splitAt i s]
|
|
Match p* s = Match "" s ++ Match p s ++ Match (p + p) s ++ ...
|
|
Match c v = [[]] if c == v -- for constant patterns c
|
|
Match x v = [[(x,v)]] -- for variable patterns x
|
|
Match x@p v = [[(x,v)]] + M if M = Match p v /= []
|
|
Match p v = [] otherwise -- failure
|
|
</pre>
|
|
Examples:
|
|
<ul>
|
|
<li> <tt>x + "e" + y</tt> matches <tt>"peter"</tt> with <tt>x = "p", y = "ter"</tt>
|
|
<li> <tt>x@("foo"*)</tt> matches any token with <tt>x = ""</tt>
|
|
<li> <tt>x + y@("er"*)</tt> matches <tt>"burgerer"</tt> with <tt>x = "burg", y = "erer"</tt>
|
|
</ul>
|
|
<p>
|
|
|
|
6/1 (AR) Concatenative string patterns to help morphology definitions...
|
|
This can be seen as a step towards regular expression string patterns.
|
|
The natural notation <tt>p1 + p2</tt> will be considered later.
|
|
<b>Note</b>. This was done on 7/1.
|
|
|
|
<p>
|
|
|
|
5/1/2006 (BB) New grammar printers <tt>slf_sub</tt> and <tt>slf_sub_graphviz</tt>
|
|
for creating SLF networks with sub-automata.
|
|
|
|
<hr>
|
|
|
|
22/12 <b>Release of GF 2.4</b>.
|
|
|
|
<p>
|
|
|
|
21/12 (AR) It now works to parse escaped string literals from command
|
|
line, and also string literals with spaces:
|
|
<pre>
|
|
gf examples/tram0/TramEng.gf
|
|
> p -lexer=literals "I want to go to \"Gustaf Adolfs torg\" ;"
|
|
QInput (GoTo (DestNamed "Gustaf Adolfs torg"))
|
|
</pre>
|
|
|
|
<p>
|
|
|
|
20/12 (AR) Support for full disjunctive patterns (<tt>P|Q</tt>) i.e.
|
|
not just on top level.
|
|
|
|
<p>
|
|
|
|
14/12 (BB) The command <tt>si</tt> (<tt>speech_input</tt>) which creates
|
|
a speech recognizer from a grammar for English and admits speech input
|
|
of strings has been added. The command uses an
|
|
<a href="http://htk.eng.cam.ac.uk/develop/atk.shtml">ATK</a> recognizer and
|
|
creates a recognition
|
|
network which accepts strings in the currently active grammar.
|
|
In order to use the <tt>si</tt> command,
|
|
you need to install the
|
|
<a href="http://www.cs.chalmers.se/~bringert/darcs/atkrec/">atkrec library</a>
|
|
and configure GF with <tt>./configure --with-atk</tt> before compiling.
|
|
You need to set two environment variables for the <tt>si</tt> command to
|
|
work. <tt>ATK_HOME</tt> should contain the path to your copy of ATK
|
|
and <tt>GF_ATK_CFG</tt> should contain the path to your GF ATK configuration
|
|
file. A default version of this file can be found in
|
|
<tt>GF/src/gf_atk.cfg</tt>.
|
|
|
|
|
|
<p>
|
|
|
|
11/12 (AR) Parsing of float literals now possible in object language.
|
|
Use the flag <tt>lexer=literals</tt>.
|
|
|
|
<p>
|
|
|
|
6/12 (AR) Accept <tt>param</tt> and <tt>oper</tt> definitions in
|
|
<tt>concrete</tt> modules. The definitions are just inlined in the
|
|
current module and not inherited. The purpose is to support rapid
|
|
prototyping of grammars.
|
|
|
|
<p>
|
|
|
|
2/12 (AR) The built-in type <tt>Float</tt> added to abstract syntax (and
|
|
resource). Values are stored as Haskell's <tt>Double</tt> precision
|
|
floats. For the syntax of float literals, see BNFC document.
|
|
NB: some bug still prevents parsing float literals in object
|
|
languages. <b>Bug fixed 11/12.</b>
|
|
|
|
<p>
|
|
|
|
1/12 (BB,AR) The command <tt>at = apply_transfer</tt>, which applies
|
|
a transfer function to a term. This is used for noncompositional
|
|
translation. Transfer functions are defined in a special transfer
|
|
language (file suffix <tt>.tr</tt>), which is compiled into a
|
|
run-time transfer core language (file suffix <tt>.trc</tt>).
|
|
The compiler is included in <tt>GF/transfer</tt>. The following is
|
|
a complete example of how to try out transfer:
|
|
<pre>
|
|
% cd GF/transfer
|
|
% make -- compile the trc compiler
|
|
% cd examples -- GF/transfer/examples
|
|
% ../compile_to_core -i../lib numerals.tr
|
|
% mv numerals.trc ../../examples/numerals
|
|
% cd ../../examples/numerals -- GF/examples/numerals
|
|
% gf
|
|
> i decimal.gf
|
|
> i BinaryDigits.gf
|
|
> i numerals.trc
|
|
> p -lang=Cncdecimal "123" | at num2bin | l
|
|
1 0 0 1 1 0 0 1 1 1 0
|
|
</pre>
|
|
Other relevant commands are:
|
|
<ul>
|
|
<li> <tt>i file.trc</tt>: import a transfer module
|
|
<li> <tt>pg -printer=transfer</tt>: create a syntax datatype in <tt>.tr</tt> format
|
|
</ul>
|
|
For more information on the commands, see <tt>help</tt>. Documentation on
|
|
the transfer language: to appear.
|
|
|
|
<p>
|
|
|
|
17/11 (AR) Made it possible for lexers to be nondeterministic.
|
|
Now with a simple-minded implementation that the parser is sent
|
|
each lexing result in turn. The option <tt>-cut</tt> is used for
|
|
breaking after first lexing leading to successful parse. The only
|
|
nondeterministic lexer right now is <tt>-lexer=subseqs</tt>, which
|
|
first filters with <tt>-lexer=ignore</tt> (dropping words neither in
|
|
the grammar nor literals) and then starts ignoring other words from
|
|
longest to shortest subsequence. This is usable for parser tasks
|
|
of keyword spotting type, but expensive (2<sup>n</sup>) in long input.
|
|
A smarter implementation is therefore desirable.
|
|
|
|
<p>
|
|
|
|
14/11 (AR) Functions can be made unparsable (or "internal" as
|
|
in BNFC). This is done by <tt>i -noparse=file</tt>, where
|
|
the nonparsable functions are given in <tt>file</tt> using the
|
|
line format <tt>--# noparse Funs</tt>. This can be used e.g. to
|
|
rule out expensive parsing rules. It is used in
|
|
<tt>lib/resource/abstract/LangVP.gf</tt> to get parse values
|
|
structured with <tt>VP</tt>, which is obtained via transfer.
|
|
So far only the default (= old) parser generator supports this.
|
|
|
|
<p>
|
|
|
|
14/11 (AR) Removed the restrictions how a lincat may look like.
|
|
Now any record type that has a value in GFC (i.e. without any
|
|
functions in it) can be used, e.g. {np : NP ; cn : Bool => CN}.
|
|
To display linearization values, only <tt>l -record</tt> shows
|
|
nice results.
|
|
|
|
<p>
|
|
|
|
9/11 (AR) GF shell state can now have several abstract syntaxes with
|
|
their associated concrete syntaxes. This allows e.g. parsing with
|
|
resource while testing an application. One can also have a
|
|
parse-transfer-lin chain from one abstract syntax to another.
|
|
|
|
<p>
|
|
7/11 (BB) Running commands can now be interrupted with Ctrl-C, without
|
|
killing the GF process. This feature is not supported on Windows.
|
|
|
|
<p>
|
|
|
|
1/11 (AR) Yet another method for adding probabilities: append
|
|
<tt> --# prob Double</tt> to the end of a line defining a function.
|
|
This can be (1) a <tt>.cf</tt> rule (2) a <tt>fun</tt> rule, or
|
|
(3) a <tt>lin</tt> rule. The probability is attached to the
|
|
first identifier on the line.
|
|
|
|
<p>
|
|
1/11 (BB) Added generation of weighted SRGS grammars. The weights
|
|
are calculated from the function probabilities. The algorithm
|
|
for calculating the weights is not yet very good.
|
|
Use <tt>pg -printer=srgs_xml_prob</tt>.
|
|
|
|
<p>
|
|
31/10 (BB) Added option for converting grammars to SRGS grammars in XML format.
|
|
Use <tt>pg -printer=srgs_xml</tt>.
|
|
|
|
<p>
|
|
|
|
31/10 (AR) Probabilistic grammars. Probabilities can be used to
|
|
weight random generation (<tt>gr -prob</tt>) and to rank parse
|
|
results (<tt>p -prob</tt>). They are read from a separate file
|
|
(flag <tt>i -probs=File</tt>, format <tt>--# prob Fun Double</tt>)
|
|
or from the top-level grammar file itself (option <tt>i -prob</tt>).
|
|
To see the probabilities, use <tt>pg -printer=probs</tt>.
|
|
<br>
|
|
As a by-product, the probabilistic random generation algorithm is
|
|
available for any context-free abstract syntax. Use the flag
|
|
<tt>gr -cf</tt>. This algorithm is much faster than the
|
|
old (more general) one, but it may sometimes loop.
|
|
|
|
<p>
|
|
|
|
12/10 (AR) Flag <tt>-atoms=Int</tt> to the command <tt>gt = generate_trees</tt>
|
|
takes away all zero-argument functions except Int per category. In
|
|
this way, it is possible to generate a corpus illustrating each
|
|
syntactic structure even when the lexicon (which consists of
|
|
zero-argument functions) is large.
|
|
|
|
<p>
|
|
|
|
6/10 (AR) New commands <tt>dc = define_command</tt> and
|
|
<tt>dt = define_tree</tt> to define macros in a GF session.
|
|
See <tt>help</tt> for details and examples.
|
|
|
|
<p>
|
|
|
|
5/10 (AR) Printing missing linearization rules:
|
|
<tt>pm -printer=missing</tt>. Command <tt>g = grep</tt>,
|
|
which works in a way similar to Unix grep.
|
|
|
|
<p>
|
|
|
|
5/10 (PL) Printing graphs with function and category dependencies:
|
|
<tt>pg -printer=functiongraph</tt>, <tt>pg -printer=typegraph</tt>.
|
|
|
|
<p>
|
|
|
|
20/9 (AR) Added optimization by <b>common subexpression elimination</b>.
|
|
It works on GFC modules and creates <tt>oper</tt> definitions for
|
|
subterms that occur more than once in <tt>lin</tt> definitions. These
|
|
<tt>oper</tt> definitions are automatically reinlined in functionalities
|
|
that don't support <tt>oper</tt>s in GFC. This conversion is done by
|
|
module and the <tt>oper</tt>s are not inherited. Moreover, the subterms
|
|
can contain free variables which means that the <tt>oper</tt>s are not
|
|
always well typed. However, since all variables in GFC are type-specific
|
|
(and local variables are <tt>lin</tt>-specific), this does not destroy
|
|
subject reduction or cause illegal captures.
|
|
<br>
|
|
The optimization is triggered by the flag <tt>optimize=OPT_subs</tt>,
|
|
where <tt>OPT</tt> is any of the other optimizations (see <tt>h -optimize</tt>).
|
|
The most aggressive value of the flag is <tt>all_subs</tt>. In experiments,
|
|
the size of a GFC module can shrink by 85% compared to plain <tt>all</tt>.
|
|
|
|
<p>
|
|
|
|
18/9 (AR) Removed superfluous spaces from GFC printing. This shrinks
|
|
the GFC size by 5-10%.
|
|
|
|
<p>
|
|
|
|
15/9 (AR) Fixed some bugs in dependent-type type checking of abstract
|
|
modules at compile time. The type checker is more severe now, which means
|
|
that some old grammars may fail to compile - but this is usually the
|
|
right result. However, the type checker of <tt>def</tt> judgements still
|
|
needs work.
|
|
|
|
<p>
|
|
|
|
14/9 (AR) Added printing of grammars to a format without parameters, in
|
|
the spirit of Peanos "Latino sine flexione". The command <tt>pg -unpar</tt>
|
|
does the trick, and the result can be saved in a <tt>gfcm</tt> file. The generated
|
|
concrete syntax modules get the prefix <tt>UP_</tt>. The translation is briefly:
|
|
<pre>
|
|
(P => T)* = T*
|
|
(t ! p)* = t*
|
|
(table {p => t ; ...})* = t*
|
|
</pre>
|
|
In order for this to be maximally useful, the grammar should be written in such
|
|
a way that the first value of every parameter type is the desired one. For
|
|
instance, in Peano's case it would be the ablative for noun cases, the singular for
|
|
numbers, and the 2nd person singular imperative for verb forms.
|
|
|
|
<p>
|
|
|
|
14/9 (BB) Added finite state approximation of grammars.
|
|
Internally the conversion is done <tt>cfg -> regular -> fa -> slf</tt>, so the
|
|
different printers can be used to check the output of each stage.
|
|
The new options are:
|
|
<dl>
|
|
<dt><tt>pg -printer=slf</tt></dt>
|
|
<dd>A finite automaton in the HTK SLF format.</dd>
|
|
<dt><tt>pg -printer=slf_graphviz</tt></dt>
|
|
<dd>The same FA as in SLF, but in Graphviz format.</dd>
|
|
<dt><tt>pg -printer=fa_graphviz</tt></dt>
|
|
<dd>A finite automaton with labelled edges, instead of labelled nodes which SLF has.</dd>
|
|
<dt><tt>pg -printer=regular</tt></dt>
|
|
<dd>A regular grammar in a simple BNF.</dd>
|
|
</dl>
|
|
|
|
<p>
|
|
|
|
4/9 (AR) Added the option <tt>pg -printer=stat</tt> to show
|
|
statistics of gfc compilation result. To be extended with new information.
|
|
The most important stats now are the top-40 sized definitions.
|
|
|
|
<p>
|
|
<hr>
|
|
|
|
1/7 <b>Release of GF 2.3</b>.
|
|
|
|
<p>
|
|
|
|
|
|
1/7 (AR) Added the flag <tt>-o</tt> to the <tt>vt</tt> command
|
|
to just write the <tt>.dot</tt> file without going to <tt>.ps</tt>
|
|
(cf. 20/6).
|
|
|
|
<p>
|
|
|
|
29/6 (AR) The printer used by Embedded Java GF Interpreter
|
|
(<tt>pm -header</tt>) now produces
|
|
working code from all optimized grammars - hence you need not select a
|
|
weaker optimization just to use the interpreter. However, the
|
|
optimization <tt>-optimize=share</tt> usually produces smaller object
|
|
grammars because the "unoptimizer" just undoes all optimizations.
|
|
(This is to be considered a temporary solution until the interpreter
|
|
knows how to handle stronger optimizations.)
|
|
|
|
<p>
|
|
|
|
27/6 (AR) The flag <tt>flags optimize=noexpand</tt> placed in a
|
|
resource module prevents the optimization phase of the compiler when
|
|
the <tt>.gfr</tt> file is created. This can prevent serious code
|
|
explosion, but it will also make the processing of modules using the
|
|
resource slowwer. A favourable example is <tt>lib/resource/finnish/ParadigmsFin</tt>.
|
|
|
|
<p>
|
|
|
|
23/6 (HD,AR) The new editor GUI <tt>gfeditor</tt> by Hans-Joachim
|
|
Daniels can now be used. It is based on Janna Khegai's <tt>jgf</tt>.
|
|
New functionality include HTML display (<tt>gfeditor -h</tt>) and
|
|
programmable refinement tooltips.
|
|
|
|
<p>
|
|
|
|
23/6 (AR) The flag <tt>unlexer=finnish</tt> can be used to bind
|
|
Finnish suffixes (e.g. possessives) to preceding words. The GF source
|
|
notation is e.g. <tt>"isä" ++ "&*" ++ "nsa" ++ "&*" ++ "ko"</tt>,
|
|
which unlexes to <tt>"isänsäkö"</tt>. There is no corresponding lexer
|
|
support yet.
|
|
|
|
|
|
<p>
|
|
|
|
22/6 (PL,AR) The MCFG parser (<tt>p -mcfg</tt>) now works on all
|
|
optimized grammars - hence you need not select a weaker optimization
|
|
to use this parser. The same concerns the CFGM printer (<tt>pm -printer=cfgm</tt>).
|
|
|
|
<p>
|
|
|
|
20/6 (AR) Added the command <tt>visualize_tree</tt> = <tt>vt</tt>, to
|
|
display syntax trees graphically. Like <tt>vg</tt>, this command uses
|
|
GraphViz and Ghostview. The foremost use is to pipe the parser to this
|
|
command.
|
|
|
|
<p>
|
|
|
|
17/6 (BB) There is now support for lists in GF abstract syntax.
|
|
A list category is declared as:
|
|
<pre>
|
|
cat [C]
|
|
</pre>
|
|
or
|
|
<pre>
|
|
cat [C]{n}
|
|
</pre>
|
|
where <tt>C</tt> is a category and <tt>n</tt> is a non-negative integer.
|
|
<tt>cat [C]</tt> is equivalent to <tt>cat [C]{0}</tt>. List category
|
|
syntax can be used whereever categories are used.
|
|
|
|
<p>
|
|
|
|
<tt>cat [C]{n}</tt> is equivalent to the declarations:
|
|
<pre>
|
|
cat ListC
|
|
fun BaseC : C^n -> ListC
|
|
fun ConsC : C -> ListC -> ListC
|
|
</pre>
|
|
|
|
where <tt>C^0 -> X</tt> means <tt>X</tt>, and <tt>C^m</tt> (where
|
|
m > 0) means <tt>C -> C^(m-1)</tt>.
|
|
|
|
<p>
|
|
|
|
A lincat declaration on the form:
|
|
<pre>
|
|
lincat [C] = T
|
|
</pre>
|
|
is equivalent to
|
|
<pre>
|
|
lincat ListC = T
|
|
</pre>
|
|
|
|
The linearizations of the list constructors are written
|
|
just like they would be if the function declarations above
|
|
had been made manually, e.g.:
|
|
<pre>
|
|
lin BaseC x_1 ... x_n = t
|
|
lin ConsC x xs = t'
|
|
</pre>
|
|
|
|
<p>
|
|
|
|
10/6 (AR) Preprocessor of <tt>.gfe</tt> files can now be performed as part of
|
|
any grammar compilation. The flag <tt>-ex</tt> causes GF to look for
|
|
the <tt>.gfe</tt> files and preprocess those that are younger
|
|
than the corresponding <tt>.gf</tt> files. The files are first sorted
|
|
and grouped by the resource, so that each resource only need be compiled once.
|
|
|
|
<p>
|
|
|
|
10/6 (AR) Editor GUI can now be alternatively invoked by the shell
|
|
command <tt>gf -edit</tt> (equivalent to <tt>jgf</tt>).
|
|
|
|
<p>
|
|
|
|
10/6 (AR) Editor GUI command <tt>pc Int</tt> to pop <tt>Int</tt>
|
|
items from the clip board.
|
|
|
|
<p>
|
|
|
|
4/6 (AR) Sequence of commands in the Java editor GUI now possible.
|
|
The commands are separated by <tt> ;; </tt> (notice the space on
|
|
both sides of the two semicolons). Such a sequence can be sent
|
|
from the "GF Command" pop-up field, but is mostly intended
|
|
for external processes that communicate with GF.
|
|
|
|
<p>
|
|
|
|
3/6 (AR) The format <tt>.gfe</tt> defined to support
|
|
<b>grammar writing by examples</b>. Files of this format are first
|
|
converted to <tt>.gf</tt> files by the command
|
|
<pre>
|
|
gf -examples File.gfe
|
|
</pre>
|
|
See <a href="../lib/resource/doc/example/QuestionsI.gfe">
|
|
<tt>../lib/resource/doc/examples/QuestionsI.gfe</tt></a>
|
|
for an example.
|
|
|
|
<p>
|
|
|
|
31/5 (AR) Default of p -rawtrees=k changed to 999999.
|
|
|
|
<p>
|
|
|
|
31/5 (AR) Support for restricted inheritance. Syntax:
|
|
<pre>
|
|
M -- inherit everything from M, as before
|
|
M [a,b,c] -- only inherit constants a,b,c
|
|
M-[a,b,c] -- inherit everything except a,b,c
|
|
</pre>
|
|
Caution: there is no check yet for completeness and
|
|
consistency, but restricted inheritance can create
|
|
run-time failures.
|
|
|
|
<p>
|
|
|
|
29/5 (AR) Parser support for reading GFC files line per line.
|
|
The category <tt>Line</tt> in <tt>GFC.cf</tt> can be used
|
|
as entrypoint instead of <tt>Grammar</tt> to achieve this.
|
|
|
|
<p>
|
|
|
|
28/5 (AR) Environment variables and path wild cards.
|
|
<ul>
|
|
<li> <tt>GF_LIB_PATH</tt> gives the location of <tt>GF/lib</tt>
|
|
<li> <tt>GF_GRAMMAR_PATH</tt> gives a list of directories appended
|
|
to the explicitly given path
|
|
<li> <tt>DIR/*</tt> is expanded to the union of all subdirectories
|
|
of <tt>DIR</tt>
|
|
</ul>
|
|
<p>
|
|
|
|
|
|
26/5/2005 (BB) Notation for list categories.
|
|
|
|
|
|
|
|
</body>
|
|
</html>
|