The sequence operator (x+y) was implemented by splitting the string to be
matched at all positions and trying to match the parts against the two
subpatterns. To reduce the number of splits, we now estimate the minimum and
maximum length of the string that the subpatterns could match. For common
cases, where one of the subpatterns is a string of known length, like
in (x+"y") or (x + ("a"|"o"|"u"|"e")+"y"), only one split will be tried.
The parser works on raw byte sequences read from source files. If parsing
succeeds the raw byte sequences are converted to proper Unicode characters
in a later phase. But the parser calls the function buildAnyTree, which can
fail and generate error messages containing source code fragments, which might
then containing raw byte sequences. To render these error messages correctly,
they need to be converted in accordance with the coding flag in the source
file. This is now done for UTF-8-encoded source files, but should ideally also
be done for other character encodings. (Latin-1-encoded files never suffered
from this problem, since raw bytes are proper Unicode characters in this case.)
+ Instead of "Internal error in ...", you now get a proper error message with
a source location and a function name.
+ Also added some missing error value propagation in the partial evaluator.
+ Also some other minor cleanup and error handling fixes.
It failed to delay table selection when the selector contains a run-time
variable, causing "gf: Prelude.(!!): index too large" instead.
Also:
+ Show better source locations on unexpected errors, to aid bug hunting.
+ Removed unused SourceGrammar argument to value2term.
The work done by the partial evaluator is now divied in two stages:
- A static "term traversal" stage that happens only once per term and uses
only statically known information. In particular, the values of lambda bound
variables are unknown during this stage. Some tables are transformed to
reduce the cost of pattern matching.
- A dynamic "function application" stage, where function bodies can be
evaluated repeatedly with different arguments, without the term traversal
overhead and without recomputing statically known information.
Also the treatment of predefined functions has been reworked to take advantage
of the staging and better handle partial applications.
* Evaluate operators once, not every time they are looked up
* Remember the list of parameter values instead of recomputing it from the
pattern type every time a table selection is made.
* Quick fix for partial application of some predefined functions.
The pretty printer produced
mkDet pre {"a"; "an" / vowel} Sg
which is not accepted by the parser. The parser assigns pre { ... }, to
prededence level 4, and this is now reflected in the pretty printer, so
it prints
mkDet (pre {"a"; "an" / vowel}) Sg
(This caused a problem in GFSE since it parsers pretty printed grammars...)
GF.Compile.Compute.ConcreteNew + two new modules contain a new
partial evaluator intended to solve some performance problems with the old
partial evalutator in GF.Compile.Compute.ConcreteLazy. It has been around for
a while, but is now complete enough to compile the RGL and the Phrasebook.
The old partial evaluator is still used by default. The new one can be activated
in two ways:
- by using the command line option -new-comp when invoking GF.
- by using cabal configure -fnew-comp to make -new-comp the default. In this
case you can also use the command line option -old-comp to revert to the old
partial evaluator.
In the GF shell, the cc command uses the old evaluator regardless of -new-comp
for now, but you can use "cc -new ..." to invoke the new evaluator.
With -new-comp, computations happen in GF.Compile.GeneratePMCFG instead of
GF.Compile.Optimize. This is implemented by testing the flag optNewComp in
both modules, to omit calls to the old partial evaluator from GF.Compile.Optimize
and add calls to the new partial evaluator in GF.Compile.GeneratePMCFG.
This also means that -new-comp effectively implies -noexpand.
In GF.Compile.CheckGrammar, there is a check that restricted inheritance is used
correctly. However, when -noexpand is used, this check causes unexpected errors,
so it has been converted to generate warnings, for now.
-new-comp no longer enables the new type checker in
GF.Compile.Typeckeck.ConcreteNew.
The GF version number has been bumped to 3.3.10-darcs
There was 55 lines of rather repetitive code with calls to 6 compiler passes.
They have been replaced with 19 lines that call the 6 compiler passes
plus 26 lines of helper functions.
The output from commands is represented as ([Expr],String), where the [Expr] is
used when data is piped between commands and the String is used for the final
output. The String can represent the same list of trees as the [Expr] and/or
contain diagnostic information.
Sometimes the data that is piped between commands is not a list of trees, but
e.g. a string or a list of strings. In those cases, functions like fromStrings
and toStrings are used to encode the data as a [Expr].
This patch introduces a newtype for CommandOutput and collects the functions
dealing with command output in one place to make it clearer what is going on.
It also makes it easier to change to a more direct representation of piped
data, and make pipes more "type safe", if desired.
This is a simple change in GF.Grammar.Lookup.allOpers, which is used only in
the implementation of the show_operations command in the shell.
This is useful when importing a concrete syntax (like LexiconEng) as a resource.
However, the types don't always look as nice as I hoped...
+ The restrictions on arbitrary IO when GF is running in restricted mode is now
enforced in the types.
+ This hopefully also solves an intermittent problem when accessing the GF
shell through the web API provided by gf -server. This was visible in the
Simple Translation Tool and probably caused by some low-level bug in the
GHC IO libraries.
The SIO monad is a restriction of the IO monad with two purposes:
+ Access to arbitrary IO operations can be turned off by setting the environment
variable GF_RESTRICTED. There is a limited set of IO operations that are
considered safe and always allowed.
+ It allows output to stdout to be captured. This can be used in gf -server
mode, where output of GF shell commands is made part of HTTP responses
returned to clients.
The dependency on PGFEnv has been moved from the list to the exec function of
the commands in the list. This means that the help command no longer needs
to generate a new list of commands and that the state of the shell
(type GF.Command.Interpreter.CommandEnv) no longer needs to contain the list
of commands.