+ Abstract syntax now is converted directly from the Grammar and not via PGF,
so you can use `gf -batch -no-pmcfg -f canonical_gf ...`, to export to
canonical_gf while skipping PMCFG and PGF file generation completely.
+ Flags that are normally copied to PGF files are now included in the
caninical_gf output as well (in particular the startcat flag).
This output format converts a GF grammar to a "canonical" GF grammar. A
canonical GF grammar consists of
- one self-contained module for the abstract syntax
- one self-contained module per concrete syntax
The concrete syntax modules contain param, lincat and lin definitions,
everything else has been eliminated by the partial evaluator, including
references to resource library modules and functors. Record types
and tables are retained.
The -output-format canonical_gf option writes canonical GF grammars to a
subdirectory "canonical/". The canonical GF grammars are written as
normal GF ".gf" source files, which can be compiled with GF in the normal way.
The translation to canonical form goes via an AST for canonical GF grammars,
defined in GF.Grammar.Canonical. This is a simple, self-contained format that
doesn't cover everyting in GF (e.g. omitting dependent types and HOAS), but it
is complete enough to translate the Foods and Phrasebook grammars found in
gf-contrib. The AST is based on the GF grammar "GFCanonical" presented here:
https://github.com/GrammaticalFramework/gf-core/issues/30#issuecomment-453556553
The translation of concrete syntax to canonical form is based on the
previously existing translation of concrete syntax to Haskell, implemented
in module GF.Compile.ConcreteToHaskell. This module could now be reimplemented
and simplified significantly by going via the canonical format. Perhaps exports
to other output formats could benefit by going via the canonical format too.
There is also the possibility of completing the GFCanonical grammar
mentioned above and using GF itself to convert canonical GF grammars to
other formats...
when debbuging labels, I find it useful to have comments saying what's
the original sentence (lazy, I know) and the original tree (depending
on the treebank, the trees can be similar).
I know this is not the goal exactly, but UDv2 treebanks
(http://universaldependencies.org/format.html) should always have a
'text =' comment, and a 'sent_id =' comment (which would be easy to
implement too, but not that useful).
The raw HTML was invalid, and this way we use the common website template
for a uniform look without any duplication.
It seems gf-refman.html was once generated from txt2tags, although I have
been unable to find this original .t2t file.
I also tried to re-generate txt2tags from HTML but was not able to.
However I was able to convert HTML to Markdown using Pandoc and I think
the result is pretty good, so I think we should use this.
The original gf-refman.html can be obtained from git history, e.g.:
a7e43d872f/doc/gf-refman.html