This output format converts a GF grammar to a "canonical" GF grammar. A
canonical GF grammar consists of
- one self-contained module for the abstract syntax
- one self-contained module per concrete syntax
The concrete syntax modules contain param, lincat and lin definitions,
everything else has been eliminated by the partial evaluator, including
references to resource library modules and functors. Record types
and tables are retained.
The -output-format canonical_gf option writes canonical GF grammars to a
subdirectory "canonical/". The canonical GF grammars are written as
normal GF ".gf" source files, which can be compiled with GF in the normal way.
The translation to canonical form goes via an AST for canonical GF grammars,
defined in GF.Grammar.Canonical. This is a simple, self-contained format that
doesn't cover everyting in GF (e.g. omitting dependent types and HOAS), but it
is complete enough to translate the Foods and Phrasebook grammars found in
gf-contrib. The AST is based on the GF grammar "GFCanonical" presented here:
https://github.com/GrammaticalFramework/gf-core/issues/30#issuecomment-453556553
The translation of concrete syntax to canonical form is based on the
previously existing translation of concrete syntax to Haskell, implemented
in module GF.Compile.ConcreteToHaskell. This module could now be reimplemented
and simplified significantly by going via the canonical format. Perhaps exports
to other output formats could benefit by going via the canonical format too.
There is also the possibility of completing the GFCanonical grammar
mentioned above and using GF itself to convert canonical GF grammars to
other formats...
Traditionally, GF_LIB_PATH points to something like
`.../share/ghc-8.0.2-x86_64/gf-3.9/lib`
and if you want prelude and alltenses and present, you add a
`--# -path=.:present`
compiler pragma to the top of your .gf file
But if you are developing some kind of application grammar
library or contrib of your own, you might find yourself
repeating your library path at the top of all your .gf files.
After painstakingly maintaining the same library path at the
top of all your .gf files, you might say, let's factor this
out into GF_LIB_PATH.
Then you might then find to your surprise that GF_LIB_PATH
doesn't accept the usual colon:separated:path notation
familiar from, say, unix PATH and MANPATH.
This patch allows you to define
`GF_LIB_PATH=gf-3.9.lib:$HOME/gf-contrib/whatever/lib`
in a more natural way.
If you are an RGL hacker and have your own version of the
RGL tree sitting somewhere, you should be able to have both
paths in the GF_LIB_PATH, for added convenience. This minor
convenience will probably lead to obscure bugs and great
frustration when you find that your changes are mysteriously
not being picked up by GF; so keep this in mind and use it
cautiously.
This caution should probably sit in the documentation
somewhere. A subsequent commit will do that.
If you use zsh, you can do this to quickly build up a big
GF_LIB_PATH:
% gf_lib_path=( $HOME/src/GF/lib/src/{api,abstract,common,english,api/libraryBrowser,prelude,..} )
% typeset -xT GF_LIB_PATH gf_lib_path
Some C run-time functionality is now available in the GF shell, by starting
GF with 'gf -cshell' or 'gf -crun'. Only limited functionality is available
when running the shell in these modes:
- You can only import .pgf files, not source files.
- The -retain flag can not be used and the commands that require it to work
are not available.
- Only 18 of the 40 commands available in the usual shell have been
implemented. The 'linearize' and 'parse' commands are the only ones
that call the C run-time system, and they support only a limited set of
options and flags. Use the 'help' commmands for details.
- A new command 'generate_all', that calls PGF2.generateAll, has been added.
Unfortuntaly, using it causes 'segmentation fault'.
This is implemented by adding two new modules: GF.Command.Commands2 and
GF.Interactive2. They are copied and modified versions of GF.Command.Commands
and GF.Interactive, respectively. Code for unimplemented commands and other
code that has not been adapted to the C run-time system has been left in
place, but commented out, pending further work.
By adding the flag -haskell=variants to the command line, GF will now generate
linearization functions in Haskell that support variants. Variants are
represented as lists in Haskell.
Variants inside pre { ... } expressions are still ignored.
TODO: apply some monad laws to generate more compact code (using an
intermediate representation of the generated Haskell code, instead of
pretty printing directly from the GF code).
The translation is currently good enough to translate all concrete syntaxes
of the Foods and Letter grammars, and some concrete syntaxes of the Phrasebook
grammar (e.g. PhrasebookEng & PhrasebookSpa works, but there are problems with
e.g. PhrasebookSwe and PhrasebookChi)
This functionality is enabled by running
gf -make -output-format=haskell -haskell=concrete ...
TODO:
- variants
- pre { ... }
- eta expansion of linearization functions
- record subtyping can still cause type errors in the Haskell code
in some cases
- reduce code large tables
On my laptop these changes speed up the full build of the RGL and example
grammars with 'cabal build' from ~95s to ~43s and the zero build from ~18s
to ~5s.
The main change is the introduction of the module GF.CompileInParallel that
replaces GF.Compile and the function GF.Compile.ReadFiles.getAllFiles. At
present, it is activated with the new -j flag, and it is only used when
combined with --make or --batch. In addition, to get parallel computations,
you need to add GHC run-time flags, e.g., +RTS -N -A20M -RTS, to the command
line.
The Setup.hs script has been modified to pass the appropriate flags to GF
for parallel compilation when compiling the RGL and example grammars, but you
need a recent version of Cabal for this to work (probably >=1.20).
Some additonal refactoring were made during this work. A new monad is used to
avoid warnings/error messages from different modules to be intertwined when
compiling in parallel, so some functios that were hardiwred to the IO or IOE
monads have been lifted to work in arbitrary monads that are instances in
the appropriate classes.
PGF exports the public, stable API.
PGF.Internal exports additional things needed in the GF compiler & shell,
including the nonstardard version of Data.Binary.
This means that the -old-comp and -new-comp flags are not recognized anymore.
The only functional difference is that printnames were still normalized with
the old partial evaluator. Now that is done with the new partial evaluator.
1. The default encoding is changed from Latin-1 to UTF-8.
2. Alternate encodings should be specified as "--# -coding=enc", the old
"flags coding=enc" declarations have no effect but are still checked for
consistency.
3. A transitional warning is generated for files that contain non-ASCII
characters without specifying a character encoding:
"Warning: default encoding has changed from Latin-1 to UTF-8"
4. Conversion to Unicode is now done *before* lexing. This makes it possible
to allow arbitrary Unicode characters in identifiers. But identifiers are
still stored as ByteStrings, so they are limited to Latin-1 characters
for now.
5. Lexer.hs is no longer part of the repository. We now generate the lexer
from Lexer.x with alex>=3. Some workarounds for bugs in alex-3.0 were
needed. These bugs might already be fixed in newer versions of alex, but
we should be compatible with what is shipped in the Haskell Platform.
+ References to modules under src/compiler have been eliminated from the PGF
library (under src/runtime/haskell). Only two functions had to be moved (from
GF.Data.Utilities to PGF.Utilities) to make this possible, other apparent
dependencies turned out to be vacuous.
+ In gf.cabal, the GF executable no longer directly depends on the PGF library
source directory, but only on the exposed library modules. This means that
there is less duplication in gf.cabal and that the 30 modules in the
PGF library will no longer be compiled twice while building GF.
To make this possible, additional PGF library modules have been exposed, even
though they should probably be considered for internal use only. They could
be collected in a PGF.Internal module, or marked as "unstable", to make
this explicit.
+ Also, by using the -fwarn-unused-imports flag, ~220 redundant imports were
found and removed, reducing the total number of imports by ~15%.
Most of the explicit uses of ByteStrings were eliminated by using identS,
identS = identC . BS.pack
which was found in GF.Grammar.CF and moved to GF.Infra.Ident. The function
prefixIdent :: String -> Ident -> Ident
allowed one additional import of ByteString to be eliminated. The functions
isArgIdent :: Ident -> Bool
getArgIndex :: Ident -> Maybe Int
were needed to eliminate explicit pattern matching on Ident from two modules.
GF.Compile.Compute.ConcreteNew + two new modules contain a new
partial evaluator intended to solve some performance problems with the old
partial evalutator in GF.Compile.Compute.ConcreteLazy. It has been around for
a while, but is now complete enough to compile the RGL and the Phrasebook.
The old partial evaluator is still used by default. The new one can be activated
in two ways:
- by using the command line option -new-comp when invoking GF.
- by using cabal configure -fnew-comp to make -new-comp the default. In this
case you can also use the command line option -old-comp to revert to the old
partial evaluator.
In the GF shell, the cc command uses the old evaluator regardless of -new-comp
for now, but you can use "cc -new ..." to invoke the new evaluator.
With -new-comp, computations happen in GF.Compile.GeneratePMCFG instead of
GF.Compile.Optimize. This is implemented by testing the flag optNewComp in
both modules, to omit calls to the old partial evaluator from GF.Compile.Optimize
and add calls to the new partial evaluator in GF.Compile.GeneratePMCFG.
This also means that -new-comp effectively implies -noexpand.
In GF.Compile.CheckGrammar, there is a check that restricted inheritance is used
correctly. However, when -noexpand is used, this check causes unexpected errors,
so it has been converted to generate warnings, for now.
-new-comp no longer enables the new type checker in
GF.Compile.Typeckeck.ConcreteNew.
The GF version number has been bumped to 3.3.10-darcs
There was 55 lines of rather repetitive code with calls to 6 compiler passes.
They have been replaced with 19 lines that call the 6 compiler passes
plus 26 lines of helper functions.