2 modules: Name clashes caused by Applicative-Monad change in Prelude
2 modules: Ambiguities caused by Foldable/Traversable in Prelude
2 modules: Backwards incompatible changes in time-1.5 for defaultTimeLocale
9 modules: {-# LANGUAGE FlexibleContexts #-} (because GHC checks inferred types
now, in addition to explicitly given type signatures)
Also silenced warnings about tab characters in source files.
Included renamings:
SourceGrammar -> Grammar
SourceModule -> Module
SourceModInfo -> ModuleInfo
emptySourceGrammar -> emptyGrammar
Also introduces a type synonym (which might be good to turn into a newtype):
type ModuleName = Ident
The reason is to make types like the following more self documenting:
type Module = (ModuleName,ModuleInfo)
type QIdent = (ModuleName,Ident)
* The following modules are no longer used and have been removed completely:
GF.Compile.Compute.ConcreteLazy
GF.Compile.Compute.ConcreteStrict
GF.Compile.Refresh
* The STM monad has been commented out. It was only used in
GF.Compile.SubExpOpt, where could be replaced with a plain State monad,
since no error handling was needed. One of the functions was hardwired to
the Err monad, but did in fact not use error handling, so it was turned
into a pure function.
* The function errVal has been renamed to fromErr (since it is analogous to
fromMaybe).
* Replaced 'fail' with 'raise' and 'return ()' with 'done' in a few places.
* Some additional old code that was already commented out has been removed.
(1) introduces the module GF.Infra.Concurreny with lifted concurrency
operators (to reduce uses of liftIO) and some additional concurrency
utilities, e.g. a function for sequential logging that is used in
both GF.CompileInParallel and GFServer.
(2) avoids leaving broken .gfo files behind if compilation is aborted.
GF.Text.Pretty provides the class Pretty and overloaded versions of the pretty
printing combinators in Text.PrettyPrint, allowing pretty printable values to
be used directly instead of first having to convert them to Doc with functions
like text, int, char and ppIdent. Some modules have been converted to use
GF.Text.Pretty, but not all. Precedences could be added to simplify the pretty
printers for terms and patterns.
GF.Infra.Location contains the types Location and L, factored out from
GF.Grammar.Grammar, and the class HasSourcePath. This allowed the import
of GF.Grammar.Grammar to be removed from GF.Infra.CheckM, making it more
like a pure library module.
PGF exports the public, stable API.
PGF.Internal exports additional things needed in the GF compiler & shell,
including the nonstardard version of Data.Binary.
This was a fairly simple change thanks to previous work on making the Ident
type abstract and the fact that PGF.CId already uses UTF-8-encoded
ByteStrings.
One potential pitfall is that Data.ByteString.UTF8 uses the same type for
ByteStrings as Data.ByteString. I renamed ident2bs to ident2utf8 and
bsCId to utf8CId, to make it clearer that they work with UTF-8-encoded
ByteStrings.
Since both the compiler input and identifiers are now UTF-8-encoded
ByteStrings, the lexer now creates identifiers without copying any characters.
**END OF DESCRIPTION***
Place the long patch description above the ***END OF DESCRIPTION*** marker.
The first line of this file will be the patch name.
This patch contains the following changes:
M ./src/compiler/GF/Compile/CheckGrammar.hs -3 +3
M ./src/compiler/GF/Compile/GrammarToPGF.hs -2 +2
M ./src/compiler/GF/Grammar/Binary.hs -5 +1
M ./src/compiler/GF/Grammar/Lexer.x -11 +13
M ./src/compiler/GF/Infra/Ident.hs -19 +36
M ./src/runtime/haskell/PGF.hs -1 +1
M ./src/runtime/haskell/PGF/CId.hs -2 +3
1. The default encoding is changed from Latin-1 to UTF-8.
2. Alternate encodings should be specified as "--# -coding=enc", the old
"flags coding=enc" declarations have no effect but are still checked for
consistency.
3. A transitional warning is generated for files that contain non-ASCII
characters without specifying a character encoding:
"Warning: default encoding has changed from Latin-1 to UTF-8"
4. Conversion to Unicode is now done *before* lexing. This makes it possible
to allow arbitrary Unicode characters in identifiers. But identifiers are
still stored as ByteStrings, so they are limited to Latin-1 characters
for now.
5. Lexer.hs is no longer part of the repository. We now generate the lexer
from Lexer.x with alex>=3. Some workarounds for bugs in alex-3.0 were
needed. These bugs might already be fixed in newer versions of alex, but
we should be compatible with what is shipped in the Haskell Platform.
+ Eliminated vairous ad-hoc coersion functions between specific monads
(IO, Err, IOE, Check) in favor of more general lifting functions
(liftIO, liftErr).
+ Generalized many basic monadic operations from specific monads to
arbitrary monads in the appropriate class (MonadIO and/or ErrorMonad),
thereby completely eliminating the need for lifting functions in lots
of places.
This can be considered a small step forward towards a cleaner
compiler API and more malleable compiler code in general.
The CF parser in GF.Grammar.CF assigns function names to the rules, but they
are not always unique, causing rules to be dropped in the follwing CF->GF
conversion. So a pass has been added before the CF->GF conversion, to make
sure that function names are unique.
A comment says "rules have an amazingly easy parser", but the parser looks
like quick hack. It is very sloppy and silently ignores many errors, e.g.
- Explicitly given function names should end with '.', but if the do not, the
last character in the function name is silently dropped.
- Everything following a ';' is silently dropped.
+ References to modules under src/compiler have been eliminated from the PGF
library (under src/runtime/haskell). Only two functions had to be moved (from
GF.Data.Utilities to PGF.Utilities) to make this possible, other apparent
dependencies turned out to be vacuous.
+ In gf.cabal, the GF executable no longer directly depends on the PGF library
source directory, but only on the exposed library modules. This means that
there is less duplication in gf.cabal and that the 30 modules in the
PGF library will no longer be compiled twice while building GF.
To make this possible, additional PGF library modules have been exposed, even
though they should probably be considered for internal use only. They could
be collected in a PGF.Internal module, or marked as "unstable", to make
this explicit.
+ Also, by using the -fwarn-unused-imports flag, ~220 redundant imports were
found and removed, reducing the total number of imports by ~15%.
The standard binary package has improved efficiency and error handling [1], so
in the long run we should consider switching to it. At the moment, using it is
possible but not recommended, since it results in incomatible PGF files.
The modified modules from the binary package have been moved from
src/runtime/haskell to src/binary.
[1] http://lennartkolmodin.blogspot.se/2013/03/binary-07.html
The following are the outcomes:
- Predef.nonExist is fully supported by both the Haskell and the C runtimes
- Predef.BIND is now an internal compiler defined token. For now
it behaves just as usual for the Haskell runtime, i.e. it generates &+.
However, the special treatment will let us to handle it properly in
the C runtime.
- This required a major change in the PGF format since both
nonExist and BIND may appear inside 'pre' and this was not supported
before.
The fact that identifiers are represented as ByteStrings is now an internal
implentation detail in module GF.Infra.Ident. Conversion between ByteString
and identifiers is only needed in the lexer and the Binary instances.
Most of the explicit uses of ByteStrings were eliminated by using identS,
identS = identC . BS.pack
which was found in GF.Grammar.CF and moved to GF.Infra.Ident. The function
prefixIdent :: String -> Ident -> Ident
allowed one additional import of ByteString to be eliminated. The functions
isArgIdent :: Ident -> Bool
getArgIndex :: Ident -> Maybe Int
were needed to eliminate explicit pattern matching on Ident from two modules.
The refresh pass does not correctly keep track of the scope of local variables
and can convert things like \x->(\x->x) x into \x1->(\x2->x2) x2. Fortunately,
it appears that the refresh pass is not needed anymore, so it has been removed.
The function GF.Grammar.PatternMatch.isInConstantForm returned False for all
tables, causing matchPattern to fail, claiming that "variables occur in" the
term if it contains tables.
This problem is several years old, confirmed present in GF 3.2.10 (Oct 2010).
The sequence operator (x+y) was implemented by splitting the string to be
matched at all positions and trying to match the parts against the two
subpatterns. To reduce the number of splits, we now estimate the minimum and
maximum length of the string that the subpatterns could match. For common
cases, where one of the subpatterns is a string of known length, like
in (x+"y") or (x + ("a"|"o"|"u"|"e")+"y"), only one split will be tried.
The parser works on raw byte sequences read from source files. If parsing
succeeds the raw byte sequences are converted to proper Unicode characters
in a later phase. But the parser calls the function buildAnyTree, which can
fail and generate error messages containing source code fragments, which might
then containing raw byte sequences. To render these error messages correctly,
they need to be converted in accordance with the coding flag in the source
file. This is now done for UTF-8-encoded source files, but should ideally also
be done for other character encodings. (Latin-1-encoded files never suffered
from this problem, since raw bytes are proper Unicode characters in this case.)
* Evaluate operators once, not every time they are looked up
* Remember the list of parameter values instead of recomputing it from the
pattern type every time a table selection is made.
* Quick fix for partial application of some predefined functions.
The pretty printer produced
mkDet pre {"a"; "an" / vowel} Sg
which is not accepted by the parser. The parser assigns pre { ... }, to
prededence level 4, and this is now reflected in the pretty printer, so
it prints
mkDet (pre {"a"; "an" / vowel}) Sg
(This caused a problem in GFSE since it parsers pretty printed grammars...)