The following are the outcomes:
- Predef.nonExist is fully supported by both the Haskell and the C runtimes
- Predef.BIND is now an internal compiler defined token. For now
it behaves just as usual for the Haskell runtime, i.e. it generates &+.
However, the special treatment will let us to handle it properly in
the C runtime.
- This required a major change in the PGF format since both
nonExist and BIND may appear inside 'pre' and this was not supported
before.
The fact that identifiers are represented as ByteStrings is now an internal
implentation detail in module GF.Infra.Ident. Conversion between ByteString
and identifiers is only needed in the lexer and the Binary instances.
Most of the explicit uses of ByteStrings were eliminated by using identS,
identS = identC . BS.pack
which was found in GF.Grammar.CF and moved to GF.Infra.Ident. The function
prefixIdent :: String -> Ident -> Ident
allowed one additional import of ByteString to be eliminated. The functions
isArgIdent :: Ident -> Bool
getArgIndex :: Ident -> Maybe Int
were needed to eliminate explicit pattern matching on Ident from two modules.
The refresh pass does not correctly keep track of the scope of local variables
and can convert things like \x->(\x->x) x into \x1->(\x2->x2) x2. Fortunately,
it appears that the refresh pass is not needed anymore, so it has been removed.
The function GF.Grammar.PatternMatch.isInConstantForm returned False for all
tables, causing matchPattern to fail, claiming that "variables occur in" the
term if it contains tables.
This problem is several years old, confirmed present in GF 3.2.10 (Oct 2010).
The sequence operator (x+y) was implemented by splitting the string to be
matched at all positions and trying to match the parts against the two
subpatterns. To reduce the number of splits, we now estimate the minimum and
maximum length of the string that the subpatterns could match. For common
cases, where one of the subpatterns is a string of known length, like
in (x+"y") or (x + ("a"|"o"|"u"|"e")+"y"), only one split will be tried.
The parser works on raw byte sequences read from source files. If parsing
succeeds the raw byte sequences are converted to proper Unicode characters
in a later phase. But the parser calls the function buildAnyTree, which can
fail and generate error messages containing source code fragments, which might
then containing raw byte sequences. To render these error messages correctly,
they need to be converted in accordance with the coding flag in the source
file. This is now done for UTF-8-encoded source files, but should ideally also
be done for other character encodings. (Latin-1-encoded files never suffered
from this problem, since raw bytes are proper Unicode characters in this case.)
* Evaluate operators once, not every time they are looked up
* Remember the list of parameter values instead of recomputing it from the
pattern type every time a table selection is made.
* Quick fix for partial application of some predefined functions.
The pretty printer produced
mkDet pre {"a"; "an" / vowel} Sg
which is not accepted by the parser. The parser assigns pre { ... }, to
prededence level 4, and this is now reflected in the pretty printer, so
it prints
mkDet (pre {"a"; "an" / vowel}) Sg
(This caused a problem in GFSE since it parsers pretty printed grammars...)
This is a simple change in GF.Grammar.Lookup.allOpers, which is used only in
the implementation of the show_operations command in the shell.
This is useful when importing a concrete syntax (like LexiconEng) as a resource.
However, the types don't always look as nice as I hoped...
In GF.Compile.CheckGrammar, use a new topological sorting function that
groups independent judgements, allowing them all to be checked before
continuing or reporting errors.
This turns error messages like
gf: too few bytes. Failed reading at byte position 1
gf: /some/path/somefile.gfo: too few bytes. Failed reading at byte position 1
but a better fix would be to ignore bad .gfo files and compile from source.
The problem is the way this decision is made in
GF.Compile.ReadFiles.selectFormat...
As a temporary workaround, alex is no longer invoked automatically when
building with cabal. Developers who want to modify the lexer need to run
alex on Lexer.x manually and record the modified Lexer.hs.
src/compiler/GF/Grammar/lexer/Lexer.x -- hidden from cabal
src/compiler/GF/Grammar/Lexer.hs -- update it manually