On my laptop these changes speed up the full build of the RGL and example
grammars with 'cabal build' from ~95s to ~43s and the zero build from ~18s
to ~5s.
The main change is the introduction of the module GF.CompileInParallel that
replaces GF.Compile and the function GF.Compile.ReadFiles.getAllFiles. At
present, it is activated with the new -j flag, and it is only used when
combined with --make or --batch. In addition, to get parallel computations,
you need to add GHC run-time flags, e.g., +RTS -N -A20M -RTS, to the command
line.
The Setup.hs script has been modified to pass the appropriate flags to GF
for parallel compilation when compiling the RGL and example grammars, but you
need a recent version of Cabal for this to work (probably >=1.20).
Some additonal refactoring were made during this work. A new monad is used to
avoid warnings/error messages from different modules to be intertwined when
compiling in parallel, so some functios that were hardiwred to the IO or IOE
monads have been lifted to work in arbitrary monads that are instances in
the appropriate classes.
I prefer small functions with descriptive names over large monilithic chunks
of code, so I grouped the compiler passes called from compileSourceModule
into funcitons named frontend, middle and backend. This also makes decisions
about which passes to run clearly visible up front.
Also made some small changes in GF.Compile.
In particular, the function compileOne has been moved to the new module
GF.CompileOne and its type has been changed from
compileOne :: ... -> CompileEnv -> FilePath -> IOE CompileEnv
to
compileOne :: ... -> SourceGrammar -> FilePath -> IOE OneCompiledModule
making it more suitable for use in a parallel compiler.
The def rules are now compiled to byte code by the compiler and then to
native code by the JIT compiler in the runtime. Not all constructions
are implemented yet. The partial implementation is now in the repository
but it is not activated by default since this requires changes in the
PGF format. I will enable it only after it is complete.
GF.Text.Pretty provides the class Pretty and overloaded versions of the pretty
printing combinators in Text.PrettyPrint, allowing pretty printable values to
be used directly instead of first having to convert them to Doc with functions
like text, int, char and ppIdent. Some modules have been converted to use
GF.Text.Pretty, but not all. Precedences could be added to simplify the pretty
printers for terms and patterns.
GF.Infra.Location contains the types Location and L, factored out from
GF.Grammar.Grammar, and the class HasSourcePath. This allowed the import
of GF.Grammar.Grammar to be removed from GF.Infra.CheckM, making it more
like a pure library module.
PGF exports the public, stable API.
PGF.Internal exports additional things needed in the GF compiler & shell,
including the nonstardard version of Data.Binary.
(table { p_i => t_i } ! x).l ==> table { p_i => t_i.l } ! x
This was used in the old partial evaluator and can significantly reduce term
sizes in some cases.
Eta expansion is applied between partial evaluation and PMCFG generation.
The buggy version generated type incorrect terms, but PMCFG generation
apparently worked anyway.
Most PGF web API commands that produce linearizations now accept an
unlexer parameter. Possible values are "text", "code" and "mixed".
The web service now include Date and Last-Modified headers in the HTTP,
responses. This means that browsers can treat responses as static content and
cache them, so it becomes less critical to cache parse results in the server.
Also did some cleanup in PGFService.hs, e.g. removed a couple of functions
that can now be imported from PGF.Lexing instead.
The old type was [String] -> String. This function was only used
in GF.Text.Lexing.stringOp, which now uses (unwords . bindTok) instead,
with no change in behaviour.
The capitalization of the first word was done in GF.Text.Lexing.stringOp,
but is now done in the functions unlexText and unlexMixed in PGF.Lexing.
These functions are only used in stringOp and in PGFService (where the change
is needed), so the subtle change in behaviour should not cause any bugs.
+ The current type checker for concrete syntax is in
GF.Compile.TypeCheck.RConcrete, but GF.Compile.TypeCheck.Concrete was
still imported in GFI.
+ Fixed a bug that allowed Ints n as a subtype of Ints m, regardless of
m and n. It now requires n<=m. Note: the type checker still allows Int
as a subtype of Ints m, regardless of m.
+ Fixed a potential efficiency problem with large record types, by reducing
the number of recursive calls from |R|*|S| to |R| when checking if R<=S.
+ Fixed a misleading comment: "alpha g t u" checks that u is a subtype of t,
the other way around. Similarly, "checkIfEqLType gr g t u trm" checks that
u is a subtype of t, not the other way around, and not that t is equal to u.
This bug was introduced sometime between 2013-08-21 and 2013-11-01 and caused
the function convertTerm in GF.Compile.GeneratePMCFG to encounter a EPatt where
it expected Strs. I fixed it by applying the function getPatts (from the old
partial evaluator) to the pattern.
PGF service requests are stateless and can run in parallel, but some other
requests handled by the server are not and might even change the current
working directory temporarily, and this affects all threads, so it is
important that the PGF service requests access PGF files by absolute paths.
If the C run-time library is compiled and installed on your system, you can now
do 'cabal configure -fc-runtime' to get the following extras:
+ The haskell binding to the C run-time library will be included in the
PGF library (so you can import it in Haskell applications).
Documentation on the new modules will be included when you run
'cabal haddock'.
+ The new command 'pgf-shell', implemented on top of haskell binding to
the C run-time system.
+ Three new commands in the web API: c-parse, c-linearize and
c-translate. Their interfaces are similar to the corresponding commands
without the "c-" prefix, but they should be considered preliminary.
When running a command like
gf -make L_1.gf ... L_n.gf
gf now avoids recreating the target PGF file if it already exists and is
up-to-date.
gf still reads all required .gfo files, so significant additional speed
improvements are still possible. This could be done by reading .gfo files
more lazily...