Most of the explicit uses of ByteStrings were eliminated by using identS,
identS = identC . BS.pack
which was found in GF.Grammar.CF and moved to GF.Infra.Ident. The function
prefixIdent :: String -> Ident -> Ident
allowed one additional import of ByteString to be eliminated. The functions
isArgIdent :: Ident -> Bool
getArgIndex :: Ident -> Maybe Int
were needed to eliminate explicit pattern matching on Ident from two modules.
Before, they were silently converted to linear patterns.
Nonlinear patterns in MorphoCat.gf, ParadigmsGre.gf and ParadigmsFin.gf have
been make linear by renaming pattern variables.
The refresh pass does not correctly keep track of the scope of local variables
and can convert things like \x->(\x->x) x into \x1->(\x2->x2) x2. Fortunately,
it appears that the refresh pass is not needed anymore, so it has been removed.
The function GF.Grammar.PatternMatch.isInConstantForm returned False for all
tables, causing matchPattern to fail, claiming that "variables occur in" the
term if it contains tables.
This problem is several years old, confirmed present in GF 3.2.10 (Oct 2010).
Also in Commands.hs: be explicit about things imported from the PGF library
that are not in the public API.
Also a couple of haddock documentation fixes.
When compiling a grammar containing characters that are not supported in the
current locale, warning messages could cause GF fail with
hPutChar: invalid argument (Invalid or incomplete multibyte or wide character)
With this quick fix, warning messages that can not be displayed are silently
truncated instead, and compilation continues.
This should prevent errors like
Internal error in Compute.ConcreteNew:
Applying Predef.drop: Expected a value of type String, got VFV [VString "gewandt",VString "gewendet"]
The sequence operator (x+y) was implemented by splitting the string to be
matched at all positions and trying to match the parts against the two
subpatterns. To reduce the number of splits, we now estimate the minimum and
maximum length of the string that the subpatterns could match. For common
cases, where one of the subpatterns is a string of known length, like
in (x+"y") or (x + ("a"|"o"|"u"|"e")+"y"), only one split will be tried.
The parser works on raw byte sequences read from source files. If parsing
succeeds the raw byte sequences are converted to proper Unicode characters
in a later phase. But the parser calls the function buildAnyTree, which can
fail and generate error messages containing source code fragments, which might
then containing raw byte sequences. To render these error messages correctly,
they need to be converted in accordance with the coding flag in the source
file. This is now done for UTF-8-encoded source files, but should ideally also
be done for other character encodings. (Latin-1-encoded files never suffered
from this problem, since raw bytes are proper Unicode characters in this case.)
+ Instead of "Internal error in ...", you now get a proper error message with
a source location and a function name.
+ Also added some missing error value propagation in the partial evaluator.
+ Also some other minor cleanup and error handling fixes.
It failed to delay table selection when the selector contains a run-time
variable, causing "gf: Prelude.(!!): index too large" instead.
Also:
+ Show better source locations on unexpected errors, to aid bug hunting.
+ Removed unused SourceGrammar argument to value2term.