+ Abstract syntax now is converted directly from the Grammar and not via PGF,
so you can use `gf -batch -no-pmcfg -f canonical_gf ...`, to export to
canonical_gf while skipping PMCFG and PGF file generation completely.
+ Flags that are normally copied to PGF files are now included in the
caninical_gf output as well (in particular the startcat flag).
This output format converts a GF grammar to a "canonical" GF grammar. A
canonical GF grammar consists of
- one self-contained module for the abstract syntax
- one self-contained module per concrete syntax
The concrete syntax modules contain param, lincat and lin definitions,
everything else has been eliminated by the partial evaluator, including
references to resource library modules and functors. Record types
and tables are retained.
The -output-format canonical_gf option writes canonical GF grammars to a
subdirectory "canonical/". The canonical GF grammars are written as
normal GF ".gf" source files, which can be compiled with GF in the normal way.
The translation to canonical form goes via an AST for canonical GF grammars,
defined in GF.Grammar.Canonical. This is a simple, self-contained format that
doesn't cover everyting in GF (e.g. omitting dependent types and HOAS), but it
is complete enough to translate the Foods and Phrasebook grammars found in
gf-contrib. The AST is based on the GF grammar "GFCanonical" presented here:
https://github.com/GrammaticalFramework/gf-core/issues/30#issuecomment-453556553
The translation of concrete syntax to canonical form is based on the
previously existing translation of concrete syntax to Haskell, implemented
in module GF.Compile.ConcreteToHaskell. This module could now be reimplemented
and simplified significantly by going via the canonical format. Perhaps exports
to other output formats could benefit by going via the canonical format too.
There is also the possibility of completing the GFCanonical grammar
mentioned above and using GF itself to convert canonical GF grammars to
other formats...
when debbuging labels, I find it useful to have comments saying what's
the original sentence (lazy, I know) and the original tree (depending
on the treebank, the trees can be similar).
I know this is not the goal exactly, but UDv2 treebanks
(http://universaldependencies.org/format.html) should always have a
'text =' comment, and a 'sent_id =' comment (which would be easy to
implement too, but not that useful).
Traditionally, GF_LIB_PATH points to something like
`.../share/ghc-8.0.2-x86_64/gf-3.9/lib`
and if you want prelude and alltenses and present, you add a
`--# -path=.:present`
compiler pragma to the top of your .gf file
But if you are developing some kind of application grammar
library or contrib of your own, you might find yourself
repeating your library path at the top of all your .gf files.
After painstakingly maintaining the same library path at the
top of all your .gf files, you might say, let's factor this
out into GF_LIB_PATH.
Then you might then find to your surprise that GF_LIB_PATH
doesn't accept the usual colon:separated:path notation
familiar from, say, unix PATH and MANPATH.
This patch allows you to define
`GF_LIB_PATH=gf-3.9.lib:$HOME/gf-contrib/whatever/lib`
in a more natural way.
If you are an RGL hacker and have your own version of the
RGL tree sitting somewhere, you should be able to have both
paths in the GF_LIB_PATH, for added convenience. This minor
convenience will probably lead to obscure bugs and great
frustration when you find that your changes are mysteriously
not being picked up by GF; so keep this in mind and use it
cautiously.
This caution should probably sit in the documentation
somewhere. A subsequent commit will do that.
If you use zsh, you can do this to quickly build up a big
GF_LIB_PATH:
% gf_lib_path=( $HOME/src/GF/lib/src/{api,abstract,common,english,api/libraryBrowser,prelude,..} )
% typeset -xT GF_LIB_PATH gf_lib_path