mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-05-21 17:12:50 -06:00
reorganize the directories under src, and rescue the JavaScript interpreter from deprecated
This commit is contained in:
260
src/FILES
260
src/FILES
@@ -1,260 +0,0 @@
|
||||
|
||||
Code map for GF source files.
|
||||
|
||||
$Author: peb $
|
||||
$Date: 2005/02/07 10:58:08 $
|
||||
|
||||
Directories:
|
||||
|
||||
[top level] GF main function and runtime-related modules
|
||||
api high-level access to GF functionalities
|
||||
canonical GFC (= GF Canonical) basic functionalities
|
||||
cf context-free skeleton used in parsing
|
||||
cfgm multilingual context-free skeleton exported to Java
|
||||
compile compilation phases from GF to GFC
|
||||
conversions [OBSOLETE] formats used in parser generation
|
||||
for-ghc GHC-specific files (Glasgow Haskell Compiler)
|
||||
for-hugs Hugs-specific files (a Haskell interpreter)
|
||||
for-windows Windows-specific files (an operating system from Microsoft)
|
||||
grammar basic functionalities of GF grammars used in compilation
|
||||
infra GF-independent infrastructure and auxiliaries
|
||||
newparsing parsing with GF grammars: current version (cf. parsing)
|
||||
notrace debugging utilities for parser development (cf. trace)
|
||||
parsers parsers of GF and GFC files
|
||||
parsing [OBSOLETE] parsing with GF grammars: old version (cf. newparsing)
|
||||
shell interaction shells
|
||||
source utilities for reading in GF source files
|
||||
speech generation of speech recognition grammars
|
||||
trace debugging utilities for parser development (cf. notrace)
|
||||
useGrammar grammar functionalities for applications
|
||||
util utilities for using GF
|
||||
|
||||
|
||||
Individual files:
|
||||
|
||||
GF.hs the Main module
|
||||
GFModes.hs
|
||||
HelpFile.hs [AUTO] help file generated by util/MkHelpFile
|
||||
Today.hs [AUTO] file generated by "make today"
|
||||
|
||||
api/API.hs high-level access to GF functionalities
|
||||
api/BatchTranslate.hs
|
||||
api/GetMyTree.hs
|
||||
api/GrammarToHaskell.hs
|
||||
api/IOGrammar.hs
|
||||
api/MyParser.hs slot for defining your own parser
|
||||
|
||||
canonical/AbsGFC.hs [AUTO] abstract syntax of GFC
|
||||
canonical/CanonToGrammar.hs
|
||||
canonical/CMacros.hs
|
||||
canonical/ErrM.hs
|
||||
canonical/GetGFC.hs
|
||||
canonical/GFC.cf [LBNF] source of GFC parser
|
||||
canonical/GFC.hs
|
||||
canonical/LexGFC.hs
|
||||
canonical/Look.hs
|
||||
canonical/MkGFC.hs
|
||||
canonical/PrExp.hs
|
||||
canonical/PrintGFC.hs pretty-printer of GFC
|
||||
canonical/Share.hs
|
||||
canonical/SkelGFC.hs [AUTO]
|
||||
canonical/TestGFC.hs [AUTO]
|
||||
canonical/Unlex.hs
|
||||
|
||||
cf/CanonToCF.hs
|
||||
cf/CF.hs abstract syntax of context-free grammars
|
||||
cf/CFIdent.hs
|
||||
cf/CFtoGrammar.hs
|
||||
cf/CFtoSRG.hs
|
||||
cf/ChartParser.hs the current default parsing method
|
||||
cf/EBNF.hs
|
||||
cf/PPrCF.hs
|
||||
cf/PrLBNF.hs
|
||||
cf/Profile.hs
|
||||
|
||||
cfgm/CFG.cf [LBNF] source
|
||||
cfgm/AbsCFG.hs [AUTO]
|
||||
cfgm/LexCFG.hs [AUTO]
|
||||
cfgm/ParCFG.hs [AUTO]
|
||||
cfgm/PrintCFG.hs [AUTO]
|
||||
cfgm/PrintCFGrammar.hs
|
||||
|
||||
compile/CheckGrammar.hs
|
||||
compile/Compile.hs the complete compiler pipeline
|
||||
compile/Extend.hs
|
||||
compile/GetGrammar.hs
|
||||
compile/GrammarToCanon.hs
|
||||
compile/MkResource.hs
|
||||
compile/MkUnion.hs
|
||||
compile/ModDeps.hs
|
||||
compile/Optimize.hs
|
||||
compile/PGrammar.hs
|
||||
compile/PrOld.hs
|
||||
compile/Rebuild.hs
|
||||
compile/RemoveLiT.hs
|
||||
compile/Rename.hs
|
||||
compile/ShellState.hs the run-time multilingual grammar datastructure
|
||||
compile/Update.hs
|
||||
|
||||
for-ghc/ArchEdit.hs
|
||||
for-ghc/Arch.hs
|
||||
|
||||
for-ghc-nofud/ArchEdit.hs@
|
||||
for-ghc-nofud/Arch.hs@
|
||||
|
||||
for-hugs/ArchEdit.hs
|
||||
for-hugs/Arch.hs
|
||||
for-hugs/JGF.hs
|
||||
for-hugs/MoreCustom.hs
|
||||
for-hugs/Unicode.hs
|
||||
|
||||
for-hugs/Arch.hs
|
||||
for-hugs/ArchEdit.hs
|
||||
for-hugs/JGF.hs
|
||||
for-hugs/LexCFG.hs dummy CFG lexer
|
||||
for-hugs/LexGF.hs dummy GF lexer
|
||||
for-hugs/LexGFC.hs dummy GFC lexer
|
||||
for-hugs/MoreCustom.hs
|
||||
for-hugs/ParCFG.hs dummy CFG parser
|
||||
for-hugs/ParGFC.hs dummy GFC parser
|
||||
for-hugs/ParGF.hs dummy GF parser
|
||||
for-hugs/Tracing.hs
|
||||
for-hugs/Unicode.hs
|
||||
|
||||
for-windows/ArchEdit.hs
|
||||
for-windows/Arch.hs
|
||||
|
||||
grammar/AbsCompute.hs
|
||||
grammar/Abstract.hs GF and GFC abstract syntax datatypes
|
||||
grammar/AppPredefined.hs
|
||||
grammar/Compute.hs
|
||||
grammar/Grammar.hs GF source grammar datatypes
|
||||
grammar/LookAbs.hs
|
||||
grammar/Lookup.hs
|
||||
grammar/Macros.hs macros for creating GF terms and types
|
||||
grammar/MMacros.hs more macros, mainly for abstract syntax
|
||||
grammar/PatternMatch.hs
|
||||
grammar/PrGrammar.hs the top-level grammar printer
|
||||
grammar/Refresh.hs
|
||||
grammar/ReservedWords.hs
|
||||
grammar/TC.hs Coquand's type checking engine
|
||||
grammar/TypeCheck.hs
|
||||
grammar/Unify.hs
|
||||
grammar/Values.hs
|
||||
|
||||
infra/Arabic.hs ASCII coding of Arabic Unicode
|
||||
infra/Assoc.hs finite maps/association lists as binary search trees
|
||||
infra/CheckM.hs
|
||||
infra/Comments.hs
|
||||
infra/Devanagari.hs ASCII coding of Devanagari Unicode
|
||||
infra/ErrM.hs
|
||||
infra/Ethiopic.hs
|
||||
infra/EventF.hs
|
||||
infra/ExtendedArabic.hs
|
||||
infra/ExtraDiacritics.hs
|
||||
infra/FudgetOps.hs
|
||||
infra/Glue.hs
|
||||
infra/Greek.hs
|
||||
infra/Hebrew.hs
|
||||
infra/Hiragana.hs
|
||||
infra/Ident.hs
|
||||
infra/LatinASupplement.hs
|
||||
infra/Map.hs finite maps as red black trees
|
||||
infra/Modules.hs
|
||||
infra/OCSCyrillic.hs
|
||||
infra/Operations.hs library of strings, search trees, error monads
|
||||
infra/Option.hs
|
||||
infra/OrdMap2.hs abstract class of finite maps + implementation as association lists
|
||||
infra/OrdSet.hs abstract class of sets + implementation as sorted lists
|
||||
infra/Parsers.hs
|
||||
infra/ReadFiles.hs
|
||||
infra/RedBlack.hs red black trees
|
||||
infra/RedBlackSet.hs sets and maps as red black trees
|
||||
infra/Russian.hs
|
||||
infra/SortedList.hs sets as sorted lists
|
||||
infra/Str.hs
|
||||
infra/Tamil.hs
|
||||
infra/Text.hs
|
||||
infra/Trie2.hs
|
||||
infra/Trie.hs
|
||||
infra/UnicodeF.hs
|
||||
infra/Unicode.hs
|
||||
infra/UseIO.hs
|
||||
infra/UTF8.hs UTF3 en/decoding
|
||||
infra/Zipper.hs
|
||||
|
||||
newparsing/CFGrammar.hs type definitions for context-free grammars
|
||||
newparsing/CFParserGeneral.hs several variants of general CFG chart parsing
|
||||
newparsing/CFParserIncremental.hs several variants of incremental (Earley-style) CFG chart parsing
|
||||
newparsing/ConvertGFCtoMCFG.hs converting GFC to MCFG
|
||||
newparsing/ConvertGrammar.hs conversions between different grammar formats
|
||||
newparsing/ConvertMCFGtoCFG.hs converting MCFG to CFG
|
||||
newparsing/GeneralChart.hs Haskell framework for "parsing as deduction"
|
||||
newparsing/GrammarTypes.hs instantiations of grammar types
|
||||
newparsing/IncrementalChart.hs Haskell framework for incremental chart parsing
|
||||
newparsing/MCFGrammar.hs type definitions for multiple CFG
|
||||
newparsing/MCFParserBasic.hs MCFG chart parser
|
||||
newparsing/MCFRange.hs ranges for MCFG parsing
|
||||
newparsing/ParseCFG.hs parsing of CFG
|
||||
newparsing/ParseCF.hs parsing of the CF format
|
||||
newparsing/ParseGFC.hs parsing of GFC
|
||||
newparsing/ParseMCFG.hs parsing of MCFG
|
||||
newparsing/Parser.hs general definitions for parsers
|
||||
newparsing/PrintParser.hs pretty-printing class for parsers
|
||||
newparsing/PrintSimplifiedTerm.hs simplified pretty-printing for GFC terms
|
||||
|
||||
notrace/Tracing.hs tracing predicates when we DON'T want tracing capabilities (normal case)
|
||||
|
||||
parsers/ParGFC.hs [AUTO]
|
||||
parsers/ParGF.hs [AUTO]
|
||||
|
||||
shell/CommandF.hs
|
||||
shell/CommandL.hs line-based syntax of editor commands
|
||||
shell/Commands.hs commands of GF editor shell
|
||||
shell/IDE.hs
|
||||
shell/JGF.hs
|
||||
shell/PShell.hs
|
||||
shell/ShellCommands.hs commands of GF main shell
|
||||
shell/Shell.hs
|
||||
shell/SubShell.hs
|
||||
shell/TeachYourself.hs
|
||||
|
||||
source/AbsGF.hs [AUTO]
|
||||
source/ErrM.hs
|
||||
source/GF.cf [LBNF] source of GF parser
|
||||
source/GrammarToSource.hs
|
||||
source/LexGF.hs [AUTO]
|
||||
source/PrintGF.hs [AUTO]
|
||||
source/SourceToGrammar.hs
|
||||
|
||||
speech/PrGSL.hs
|
||||
speech/PrJSGF.hs
|
||||
speech/SRG.hs
|
||||
speech/TransformCFG.hs
|
||||
|
||||
trace/Tracing.hs tracing predicates when we want tracing capabilities
|
||||
|
||||
translate/GFT.hs Main module of html-producing batch translator
|
||||
|
||||
useGrammar/Custom.hs database for customizable commands
|
||||
useGrammar/Editing.hs
|
||||
useGrammar/Generate.hs
|
||||
useGrammar/GetTree.hs
|
||||
useGrammar/Information.hs
|
||||
useGrammar/Linear.hs the linearization algorithm
|
||||
useGrammar/MoreCustom.hs
|
||||
useGrammar/Morphology.hs
|
||||
useGrammar/Paraphrases.hs
|
||||
useGrammar/Parsing.hs the top-level parsing algorithm
|
||||
useGrammar/Randomized.hs
|
||||
useGrammar/RealMoreCustom.hs
|
||||
useGrammar/Session.hs
|
||||
useGrammar/TeachYourself.hs
|
||||
useGrammar/Tokenize.hs lexer definitions (listed in Custom)
|
||||
useGrammar/Transfer.hs
|
||||
|
||||
util/GFDoc.hs utility for producing LaTeX and HTML from GF
|
||||
util/HelpFile source of ../HelpFile.hs
|
||||
util/Htmls.hs utility for chopping a HTML document to slides
|
||||
util/MkHelpFile.hs
|
||||
util/WriteF.hs
|
||||
693
src/HelpFile
693
src/HelpFile
@@ -1,693 +0,0 @@
|
||||
-- GF help file updated for GF 2.6, 17/6/2006.
|
||||
-- *: Commands and options marked with * are currently not implemented.
|
||||
--
|
||||
-- Each command has a long and a short name, options, and zero or more
|
||||
-- arguments. Commands are sorted by functionality. The short name is
|
||||
-- given first.
|
||||
|
||||
-- Type "h -all" for full help file, "h <CommandName>" for full help on a command.
|
||||
|
||||
-- commands that change the state
|
||||
|
||||
i, import: i File
|
||||
Reads a grammar from File and compiles it into a GF runtime grammar.
|
||||
Files "include"d in File are read recursively, nubbing repetitions.
|
||||
If a grammar with the same language name is already in the state,
|
||||
it is overwritten - but only if compilation succeeds.
|
||||
The grammar parser depends on the file name suffix:
|
||||
.gf normal GF source
|
||||
.gfc canonical GF
|
||||
.gfr precompiled GF resource
|
||||
.gfcm multilingual canonical GF
|
||||
.gfe example-based grammar files (only with the -ex option)
|
||||
.gfwl multilingual word list (preprocessed to abs + cncs)
|
||||
.ebnf Extended BNF format
|
||||
.cf Context-free (BNF) format
|
||||
.trc TransferCore format
|
||||
options:
|
||||
-old old: parse in GF<2.0 format (not necessary)
|
||||
-v verbose: give lots of messages
|
||||
-s silent: don't give error messages
|
||||
-src from source: ignore precompiled gfc and gfr files
|
||||
-gfc from gfc: use compiled modules whenever they exist
|
||||
-retain retain operations: read resource modules (needed in comm cc)
|
||||
-nocf don't build old-style context-free grammar (default without HOAS)
|
||||
-docf do build old-style context-free grammar (default with HOAS)
|
||||
-nocheckcirc don't eliminate circular rules from CF
|
||||
-cflexer build an optimized parser with separate lexer trie
|
||||
-noemit do not emit code (default with old grammar format)
|
||||
-o do emit code (default with new grammar format)
|
||||
-ex preprocess .gfe files if needed
|
||||
-prob read probabilities from top grammar file (format --# prob Fun Double)
|
||||
-treebank read a treebank file to memory (xml format)
|
||||
flags:
|
||||
-abs set the name used for abstract syntax (with -old option)
|
||||
-cnc set the name used for concrete syntax (with -old option)
|
||||
-res set the name used for resource (with -old option)
|
||||
-path use the (colon-separated) search path to find modules
|
||||
-optimize select an optimization to override file-defined flags
|
||||
-conversion select parsing method (values strict|nondet)
|
||||
-probs read probabilities from file (format (--# prob) Fun Double)
|
||||
-preproc use a preprocessor on each source file
|
||||
-noparse read nonparsable functions from file (format --# noparse Funs)
|
||||
examples:
|
||||
i English.gf -- ordinary import of Concrete
|
||||
i -retain german/ParadigmsGer.gf -- import of Resource to test
|
||||
|
||||
r, reload: r
|
||||
Executes the previous import (i) command.
|
||||
|
||||
rl, remove_language: rl Language
|
||||
Takes away the language from the state.
|
||||
|
||||
e, empty: e
|
||||
Takes away all languages and resets all global flags.
|
||||
|
||||
sf, set_flags: sf Flag*
|
||||
The values of the Flags are set for Language. If no language
|
||||
is specified, the flags are set globally.
|
||||
examples:
|
||||
sf -nocpu -- stop showing CPU time
|
||||
sf -lang=Swe -- make Swe the default concrete
|
||||
|
||||
s, strip: s
|
||||
Prune the state by removing source and resource modules.
|
||||
|
||||
dc, define_command Name Anything
|
||||
Add a new defined command. The Name must star with '%'. Later,
|
||||
if 'Name X' is used, it is replaced by Anything where #1 is replaced
|
||||
by X.
|
||||
Restrictions: Currently at most one argument is possible, and a defined
|
||||
command cannot appear in a pipe.
|
||||
To see what definitions are in scope, use help -defs.
|
||||
examples:
|
||||
dc %tnp p -cat=NP -lang=Eng #1 | l -lang=Swe -- translate NPs
|
||||
%tnp "this man" -- translate and parse
|
||||
|
||||
dt, define_term Name Tree
|
||||
Add a constant for a tree. The constant can later be called by
|
||||
prefixing it with '$'.
|
||||
Restriction: These terms are not yet usable as a subterm.
|
||||
To see what definitions are in scope, use help -defs.
|
||||
examples:
|
||||
p -cat=NP "this man" | dt tm -- define tm as parse result
|
||||
l -all $tm -- linearize tm in all forms
|
||||
|
||||
-- commands that give information about the state
|
||||
|
||||
pg, print_grammar: pg
|
||||
Prints the actual grammar (overridden by the -lang=X flag).
|
||||
The -printer=X flag sets the format in which the grammar is
|
||||
written.
|
||||
N.B. since grammars are compiled when imported, this command
|
||||
generally does not show the grammar in the same format as the
|
||||
source. In particular, the -printer=latex is not supported.
|
||||
Use the command tg -printer=latex File to print the source
|
||||
grammar in LaTeX.
|
||||
options:
|
||||
-utf8 apply UTF8-encoding to the grammar
|
||||
flags:
|
||||
-printer
|
||||
-lang
|
||||
-startcat -- The start category of the generated grammar.
|
||||
Only supported by some grammar printers.
|
||||
examples:
|
||||
pg -printer=cf -- show the context-free skeleton
|
||||
|
||||
pm, print_multigrammar: pm
|
||||
Prints the current multilingual grammar in .gfcm form.
|
||||
(Automatically executes the strip command (s) before doing this.)
|
||||
options:
|
||||
-utf8 apply UTF8 encoding to the tokens in the grammar
|
||||
-utf8id apply UTF8 encoding to the identifiers in the grammar
|
||||
examples:
|
||||
pm | wf Letter.gfcm -- print the grammar into the file Letter.gfcm
|
||||
pm -printer=graph | wf D.dot -- then do 'dot -Tps D.dot > D.ps'
|
||||
|
||||
vg, visualize_graph: vg
|
||||
Show the dependency graph of multilingual grammar via dot and gv.
|
||||
|
||||
po, print_options: po
|
||||
Print what modules there are in the state. Also
|
||||
prints those flag values in the current state that differ from defaults.
|
||||
|
||||
pl, print_languages: pl
|
||||
Prints the names of currently available languages.
|
||||
|
||||
pi, print_info: pi Ident
|
||||
Prints information on the identifier.
|
||||
|
||||
-- commands that execute and show the session history
|
||||
|
||||
eh, execute_history: eh File
|
||||
Executes commands in the file.
|
||||
|
||||
ph, print_history; ph
|
||||
Prints the commands issued during the GF session.
|
||||
The result is readable by the eh command.
|
||||
examples:
|
||||
ph | wf foo.hist" -- save the history into a file
|
||||
|
||||
-- linearization, parsing, translation, and computation
|
||||
|
||||
l, linearize: l PattList? Tree
|
||||
Shows all linearization forms of Tree by the actual grammar
|
||||
(which is overridden by the -lang flag).
|
||||
The pattern list has the form [P, ... ,Q] where P,...,Q follow GF
|
||||
syntax for patterns. All those forms are generated that match with the
|
||||
pattern list. Too short lists are filled with variables in the end.
|
||||
Only the -table flag is available if a pattern list is specified.
|
||||
HINT: see GF language specification for the syntax of Pattern and Term.
|
||||
You can also copy and past parsing results.
|
||||
options:
|
||||
-struct bracketed form
|
||||
-table show parameters (not compatible with -record, -all)
|
||||
-record record, i.e. explicit GF concrete syntax term (not compatible with -table, -all)
|
||||
-all show all forms and variants (not compatible with -record, -table)
|
||||
-multi linearize to all languages (can be combined with the other options)
|
||||
flags:
|
||||
-lang linearize in this grammar
|
||||
-number give this number of forms at most
|
||||
-unlexer filter output through unlexer
|
||||
examples:
|
||||
l -lang=Swe -table -- show full inflection table in Swe
|
||||
|
||||
p, parse: p String
|
||||
Shows all Trees returned for String by the actual
|
||||
grammar (overridden by the -lang flag), in the category S (overridden
|
||||
by the -cat flag).
|
||||
options for batch input:
|
||||
-lines parse each line of input separately, ignoring empty lines
|
||||
-all as -lines, but also parse empty lines
|
||||
-prob rank results by probability
|
||||
-cut stop after first lexing result leading to parser success
|
||||
-fail show strings whose parse fails prefixed by #FAIL
|
||||
-ambiguous show strings that have more than one parse prefixed by #AMBIGUOUS
|
||||
options for selecting parsing method:
|
||||
-fcfg parse using a fast variant of MCFG (default is no HOAS in grammar)
|
||||
-old parse using an overgenerating CFG (default if HOAS in grammar)
|
||||
-cfg parse using a much less overgenerating CFG
|
||||
-mcfg parse using an even less overgenerating MCFG
|
||||
Note: the first time parsing with -cfg, -mcfg, and -fcfg may take a long time
|
||||
options that only work for the -old default parsing method:
|
||||
-n non-strict: tolerates morphological errors
|
||||
-ign ignore unknown words when parsing
|
||||
-raw return context-free terms in raw form
|
||||
-v verbose: give more information if parsing fails
|
||||
flags:
|
||||
-cat parse in this category
|
||||
-lang parse in this grammar
|
||||
-lexer filter input through this lexer
|
||||
-parser use this parsing strategy
|
||||
-number return this many results at most
|
||||
examples:
|
||||
p -cat=S -mcfg "jag är gammal" -- parse an S with the MCFG
|
||||
rf examples.txt | p -lines -- parse each non-empty line of the file
|
||||
|
||||
at, apply_transfer: at (Module.Fun | Fun)
|
||||
Transfer a term using Fun from Module, or the topmost transfer
|
||||
module. Transfer modules are given in the .trc format. They are
|
||||
shown by the 'po' command.
|
||||
flags:
|
||||
-lang typecheck the result in this lang instead of default lang
|
||||
examples:
|
||||
p -lang=Cncdecimal "123" | at num2bin | l -- convert dec to bin
|
||||
|
||||
tb, tree_bank: tb
|
||||
Generate a multilingual treebank from a list of trees (default) or compare
|
||||
to an existing treebank.
|
||||
options:
|
||||
-c compare to existing xml-formatted treebank
|
||||
-trees return the trees of the treebank
|
||||
-all show all linearization alternatives (branches and variants)
|
||||
-table show tables of linearizations with parameters
|
||||
-record show linearization records
|
||||
-xml wrap the treebank (or comparison results) with XML tags
|
||||
-mem write the treebank in memory instead of a file TODO
|
||||
examples:
|
||||
gr -cat=S -number=100 | tb -xml | wf tb.xml -- random treebank into file
|
||||
rf tb.xml | tb -c -- compare-test treebank from file
|
||||
rf old.xml | tb -trees | tb -xml -- create new treebank from old
|
||||
|
||||
ut, use_treebank: ut String
|
||||
Lookup a string in a treebank and return the resulting trees.
|
||||
Use 'tb' to create a treebank and 'i -treebank' to read one from
|
||||
a file.
|
||||
options:
|
||||
-assocs show all string-trees associations in the treebank
|
||||
-strings show all strings in the treebank
|
||||
-trees show all trees in the treebank
|
||||
-raw return the lookup result as string, without typechecking it
|
||||
flags:
|
||||
-treebank use this treebank (instead of the latest introduced one)
|
||||
examples:
|
||||
ut "He adds this to that" | l -multi -- use treebank lookup as parser in translation
|
||||
ut -assocs | grep "ComplV2" -- show all associations with ComplV2
|
||||
|
||||
tt, test_tokenizer: tt String
|
||||
Show the token list sent to the parser when String is parsed.
|
||||
HINT: can be useful when debugging the parser.
|
||||
flags:
|
||||
-lexer use this lexer
|
||||
examples:
|
||||
tt -lexer=codelit "2*(x + 3)" -- a favourite lexer for program code
|
||||
|
||||
g, grep: g String1 String2
|
||||
Grep the String1 in the String2. String2 is read line by line,
|
||||
and only those lines that contain String1 are returned.
|
||||
flags:
|
||||
-v return those lines that do not contain String1.
|
||||
examples:
|
||||
pg -printer=cf | grep "mother" -- show cf rules with word mother
|
||||
|
||||
cc, compute_concrete: cc Term
|
||||
Compute a term by concrete syntax definitions. Uses the topmost
|
||||
resource module (the last in listing by command po) to resolve
|
||||
constant names.
|
||||
N.B. You need the flag -retain when importing the grammar, if you want
|
||||
the oper definitions to be retained after compilation; otherwise this
|
||||
command does not expand oper constants.
|
||||
N.B.' The resulting Term is not a term in the sense of abstract syntax,
|
||||
and hence not a valid input to a Tree-demanding command.
|
||||
flags:
|
||||
-table show output in a similar readable format as 'l -table'
|
||||
-res use another module than the topmost one
|
||||
examples:
|
||||
cc -res=ParadigmsFin (nLukko "hyppy") -- inflect "hyppy" with nLukko
|
||||
|
||||
so, show_operations: so Type
|
||||
Show oper operations with the given value type. Uses the topmost
|
||||
resource module to resolve constant names.
|
||||
N.B. You need the flag -retain when importing the grammar, if you want
|
||||
the oper definitions to be retained after compilation; otherwise this
|
||||
command does not find any oper constants.
|
||||
N.B.' The value type may not be defined in a supermodule of the
|
||||
topmost resource. In that case, use appropriate qualified name.
|
||||
flags:
|
||||
-res use another module than the topmost one
|
||||
examples:
|
||||
so -res=ParadigmsFin ResourceFin.N -- show N-paradigms in ParadigmsFin
|
||||
|
||||
t, translate: t Lang Lang String
|
||||
Parses String in Lang1 and linearizes the resulting Trees in Lang2.
|
||||
flags:
|
||||
-cat
|
||||
-lexer
|
||||
-parser
|
||||
examples:
|
||||
t Eng Swe -cat=S "every number is even or odd"
|
||||
|
||||
gr, generate_random: gr Tree?
|
||||
Generates a random Tree of a given category. If a Tree
|
||||
argument is given, the command completes the Tree with values to
|
||||
the metavariables in the tree.
|
||||
options:
|
||||
-prob use probabilities (works for nondep types only)
|
||||
-cf use a very fast method (works for nondep types only)
|
||||
flags:
|
||||
-cat generate in this category
|
||||
-lang use the abstract syntax of this grammar
|
||||
-number generate this number of trees (not impl. with Tree argument)
|
||||
-depth use this number of search steps at most
|
||||
examples:
|
||||
gr -cat=Query -- generate in category Query
|
||||
gr (PredVP ? (NegVG ?)) -- generate a random tree of this form
|
||||
gr -cat=S -tr | l -- gererate and linearize
|
||||
|
||||
gt, generate_trees: gt Tree?
|
||||
Generates all trees up to a given depth. If the depth is large,
|
||||
a small -alts is recommended. If a Tree argument is given, the
|
||||
command completes the Tree with values to the metavariables in
|
||||
the tree.
|
||||
options:
|
||||
-metas also return trees that include metavariables
|
||||
-all generate all (can be infinitely many, lazily)
|
||||
-lin linearize result of -all (otherwise, use pipe to linearize)
|
||||
flags:
|
||||
-depth generate to this depth (default 3)
|
||||
-atoms take this number of atomic rules of each category (default unlimited)
|
||||
-alts take this number of alternatives at each branch (default unlimited)
|
||||
-cat generate in this category
|
||||
-nonub don't remove duplicates (faster, not effective with -mem)
|
||||
-mem use a memorizing algorithm (often faster, usually more memory-consuming)
|
||||
-lang use the abstract syntax of this grammar
|
||||
-number generate (at most) this number of trees (also works with -all)
|
||||
-noexpand don't expand these categories (comma-separated, e.g. -noexpand=V,CN)
|
||||
-doexpand only expand these categories (comma-separated, e.g. -doexpand=V,CN)
|
||||
examples:
|
||||
gt -depth=10 -cat=NP -- generate all NP's to depth 10
|
||||
gt (PredVP ? (NegVG ?)) -- generate all trees of this form
|
||||
gt -cat=S -tr | l -- generate and linearize
|
||||
gt -noexpand=NP | l -mark=metacat -- the only NP is meta, linearized "?0 +NP"
|
||||
gt | l | p -lines -ambiguous | grep "#AMBIGUOUS" -- show ambiguous strings
|
||||
|
||||
ma, morphologically_analyse: ma String
|
||||
Runs morphological analysis on each word in String and displays
|
||||
the results line by line.
|
||||
options:
|
||||
-short show analyses in bracketed words, instead of separate lines
|
||||
-status show just the work at success, prefixed with "*" at failure
|
||||
flags:
|
||||
-lang
|
||||
examples:
|
||||
wf Bible.txt | ma -short | wf Bible.tagged -- analyse the Bible
|
||||
|
||||
|
||||
-- elementary generation of Strings and Trees
|
||||
|
||||
ps, put_string: ps String
|
||||
Returns its argument String, like Unix echo.
|
||||
HINT. The strength of ps comes from the possibility to receive the
|
||||
argument from a pipeline, and altering it by the -filter flag.
|
||||
flags:
|
||||
-filter filter the result through this string processor
|
||||
-length cut the string after this number of characters
|
||||
examples:
|
||||
gr -cat=Letter | l | ps -filter=text -- random letter as text
|
||||
|
||||
pt, put_tree: pt Tree
|
||||
Returns its argument Tree, like a specialized Unix echo.
|
||||
HINT. The strength of pt comes from the possibility to receive
|
||||
the argument from a pipeline, and altering it by the -transform flag.
|
||||
flags:
|
||||
-transform transform the result by this term processor
|
||||
-number generate this number of terms at most
|
||||
examples:
|
||||
p "zero is even" | pt -transform=solve -- solve ?'s in parse result
|
||||
|
||||
* st, show_tree: st Tree
|
||||
Prints the tree as a string. Unlike pt, this command cannot be
|
||||
used in a pipe to produce a tree, since its output is a string.
|
||||
flags:
|
||||
-printer show the tree in a special format (-printer=xml supported)
|
||||
|
||||
wt, wrap_tree: wt Fun
|
||||
Wraps the tree as the sole argument of Fun.
|
||||
flags:
|
||||
-c compute the resulting new tree to normal form
|
||||
|
||||
vt, visualize_tree: vt Tree
|
||||
Shows the abstract syntax tree via dot and gv (via temporary files
|
||||
grphtmp.dot, grphtmp.ps).
|
||||
flags:
|
||||
-c show categories only (no functions)
|
||||
-f show functions only (no categories)
|
||||
-g show as graph (sharing uses of the same function)
|
||||
-o just generate the .dot file
|
||||
examples:
|
||||
p "hello world" | vt -o | wf my.dot ;; ! open -a GraphViz my.dot
|
||||
-- This writes the parse tree into my.dot and opens the .dot file
|
||||
-- with another application without generating .ps.
|
||||
|
||||
-- subshells
|
||||
|
||||
es, editing_session: es
|
||||
Opens an interactive editing session.
|
||||
N.B. Exit from a Fudget session is to the Unix shell, not to GF.
|
||||
options:
|
||||
-f Fudget GUI (necessary for Unicode; only available in X Window System)
|
||||
|
||||
ts, translation_session: ts
|
||||
Translates input lines from any of the actual languages to all other ones.
|
||||
To exit, type a full stop (.) alone on a line.
|
||||
N.B. Exit from a Fudget session is to the Unix shell, not to GF.
|
||||
HINT: Set -parser and -lexer locally in each grammar.
|
||||
options:
|
||||
-f Fudget GUI (necessary for Unicode; only available in X Windows)
|
||||
-lang prepend translation results with language names
|
||||
flags:
|
||||
-cat the parser category
|
||||
examples:
|
||||
ts -cat=Numeral -lang -- translate numerals, show language names
|
||||
|
||||
tq, translation_quiz: tq Lang Lang
|
||||
Random-generates translation exercises from Lang1 to Lang2,
|
||||
keeping score of success.
|
||||
To interrupt, type a full stop (.) alone on a line.
|
||||
HINT: Set -parser and -lexer locally in each grammar.
|
||||
flags:
|
||||
-cat
|
||||
examples:
|
||||
tq -cat=NP TestResourceEng TestResourceSwe -- quiz for NPs
|
||||
|
||||
tl, translation_list: tl Lang Lang
|
||||
Random-generates a list of ten translation exercises from Lang1
|
||||
to Lang2. The number can be changed by a flag.
|
||||
HINT: use wf to save the exercises in a file.
|
||||
flags:
|
||||
-cat
|
||||
-number
|
||||
examples:
|
||||
tl -cat=NP TestResourceEng TestResourceSwe -- quiz list for NPs
|
||||
|
||||
mq, morphology_quiz: mq
|
||||
Random-generates morphological exercises,
|
||||
keeping score of success.
|
||||
To interrupt, type a full stop (.) alone on a line.
|
||||
HINT: use printname judgements in your grammar to
|
||||
produce nice expressions for desired forms.
|
||||
flags:
|
||||
-cat
|
||||
-lang
|
||||
examples:
|
||||
mq -cat=N -lang=TestResourceSwe -- quiz for Swedish nouns
|
||||
|
||||
ml, morphology_list: ml
|
||||
Random-generates a list of ten morphological exercises,
|
||||
keeping score of success. The number can be changed with a flag.
|
||||
HINT: use wf to save the exercises in a file.
|
||||
flags:
|
||||
-cat
|
||||
-lang
|
||||
-number
|
||||
examples:
|
||||
ml -cat=N -lang=TestResourceSwe -- quiz list for Swedish nouns
|
||||
|
||||
|
||||
-- IO related commands
|
||||
|
||||
rf, read_file: rf File
|
||||
Returns the contents of File as a String; error if File does not exist.
|
||||
|
||||
wf, write_file: wf File String
|
||||
Writes String into File; File is created if it does not exist.
|
||||
N.B. the command overwrites File without a warning.
|
||||
|
||||
af, append_file: af File
|
||||
Writes String into the end of File; File is created if it does not exist.
|
||||
|
||||
* tg, transform_grammar: tg File
|
||||
Reads File, parses as a grammar,
|
||||
but instead of compiling further, prints it.
|
||||
The environment is not changed. When parsing the grammar, the same file
|
||||
name suffixes are supported as in the i command.
|
||||
HINT: use this command to print the grammar in
|
||||
another format (the -printer flag); pipe it to wf to save this format.
|
||||
flags:
|
||||
-printer (only -printer=latex supported currently)
|
||||
|
||||
* cl, convert_latex: cl File
|
||||
Reads File, which is expected to be in LaTeX form.
|
||||
Three environments are treated in special ways:
|
||||
\begGF - \end{verbatim}, which contains GF judgements,
|
||||
\begTGF - \end{verbatim}, which contains a GF expression (displayed)
|
||||
\begInTGF - \end{verbatim}, which contains a GF expressions (inlined).
|
||||
Moreover, certain macros should be included in the file; you can
|
||||
get those macros by applying 'tg -printer=latex foo.gf' to any grammar
|
||||
foo.gf. Notice that the same File can be imported as a GF grammar,
|
||||
consisting of all the judgements in \begGF environments.
|
||||
HINT: pipe with 'wf Foo.tex' to generate a new Latex file.
|
||||
|
||||
sa, speak_aloud: sa String
|
||||
Uses the Flite speech generator to produce speech for String.
|
||||
Works for American English spelling.
|
||||
examples:
|
||||
h | sa -- listen to the list of commands
|
||||
gr -cat=S | l | sa -- generate a random sentence and speak it aloud
|
||||
|
||||
si, speech_input: si
|
||||
Uses an ATK speech recognizer to get speech input.
|
||||
flags:
|
||||
-lang: The grammar to use with the speech recognizer.
|
||||
-cat: The grammar category to get input in.
|
||||
-language: Use acoustic model and dictionary for this language.
|
||||
-number: The number of utterances to recognize.
|
||||
|
||||
h, help: h Command?
|
||||
Displays the paragraph concerning the command from this help file.
|
||||
Without the argument, shows the first lines of all paragraphs.
|
||||
options
|
||||
-all show the whole help file
|
||||
-defs show user-defined commands and terms
|
||||
-FLAG show the values of FLAG (works for grammar-independent flags)
|
||||
examples:
|
||||
h print_grammar -- show all information on the pg command
|
||||
|
||||
q, quit: q
|
||||
Exits GF.
|
||||
HINT: you can use 'ph | wf history' to save your session.
|
||||
|
||||
!, system_command: ! String
|
||||
Issues a system command. No value is returned to GF.
|
||||
example:
|
||||
! ls
|
||||
|
||||
?, system_command: ? String
|
||||
Issues a system command that receives its arguments from GF pipe
|
||||
and returns a value to GF.
|
||||
example:
|
||||
h | ? 'wc -l' | p -cat=Num
|
||||
|
||||
|
||||
-- Flags. The availability of flags is defined separately for each command.
|
||||
|
||||
-cat, category in which parsing is performed.
|
||||
The default is S.
|
||||
|
||||
-depth, the search depth in e.g. random generation.
|
||||
The default depends on application.
|
||||
|
||||
-filter, operation performed on a string. The default is identity.
|
||||
-filter=identity no change
|
||||
-filter=erase erase the text
|
||||
-filter=take100 show the first 100 characters
|
||||
-filter=length show the length of the string
|
||||
-filter=text format as text (punctuation, capitalization)
|
||||
-filter=code format as code (spacing, indentation)
|
||||
|
||||
-lang, grammar used when executing a grammar-dependent command.
|
||||
The default is the last-imported grammar.
|
||||
|
||||
-language, voice used by Festival as its --language flag in the sa command.
|
||||
The default is system-dependent.
|
||||
|
||||
-length, the maximum number of characters shown of a string.
|
||||
The default is unlimited.
|
||||
|
||||
-lexer, tokenization transforming a string into lexical units for a parser.
|
||||
The default is words.
|
||||
-lexer=words tokens are separated by spaces or newlines
|
||||
-lexer=literals like words, but GF integer and string literals recognized
|
||||
-lexer=vars like words, but "x","x_...","$...$" as vars, "?..." as meta
|
||||
-lexer=chars each character is a token
|
||||
-lexer=code use Haskell's lex
|
||||
-lexer=codevars like code, but treat unknown words as variables, ?? as meta
|
||||
-lexer=textvars like text, but treat unknown words as variables, ?? as meta
|
||||
-lexer=text with conventions on punctuation and capital letters
|
||||
-lexer=codelit like code, but treat unknown words as string literals
|
||||
-lexer=textlit like text, but treat unknown words as string literals
|
||||
-lexer=codeC use a C-like lexer
|
||||
-lexer=ignore like literals, but ignore unknown words
|
||||
-lexer=subseqs like ignore, but then try all subsequences from longest
|
||||
|
||||
-number, the maximum number of generated items in a list.
|
||||
The default is unlimited.
|
||||
|
||||
-optimize, optimization on generated code.
|
||||
The default is share for concrete, none for resource modules.
|
||||
Each of the flags can have the suffix _subs, which performs
|
||||
common subexpression elimination after the main optimization.
|
||||
Thus, -optimize=all_subs is the most aggressive one. The _subs
|
||||
strategy only works in GFC, and applies therefore in concrete but
|
||||
not in resource modules.
|
||||
-optimize=share share common branches in tables
|
||||
-optimize=parametrize first try parametrize then do share with the rest
|
||||
-optimize=values represent tables as courses-of-values
|
||||
-optimize=all first try parametrize then do values with the rest
|
||||
-optimize=none no optimization
|
||||
|
||||
-parser, parsing strategy. The default is chart. If -cfg or -mcfg are
|
||||
selected, only bottomup and topdown are recognized.
|
||||
-parser=chart bottom-up chart parsing
|
||||
-parser=bottomup a more up to date bottom-up strategy
|
||||
-parser=topdown top-down strategy
|
||||
-parser=old an old bottom-up chart parser
|
||||
|
||||
-printer, format in which the grammar is printed. The default is
|
||||
gfc. Those marked with M are (only) available for pm, the rest
|
||||
for pg.
|
||||
-printer=gfc GFC grammar
|
||||
-printer=gf GF grammar
|
||||
-printer=old old GF grammar
|
||||
-printer=cf context-free grammar, with profiles
|
||||
-printer=bnf context-free grammar, without profiles
|
||||
-printer=lbnf labelled context-free grammar for BNF Converter
|
||||
-printer=plbnf grammar for BNF Converter, with precedence levels
|
||||
*-printer=happy source file for Happy parser generator (use lbnf!)
|
||||
-printer=haskell abstract syntax in Haskell, with transl to/from GF
|
||||
-printer=haskell_gadt abstract syntax GADT in Haskell, with transl to/from GF
|
||||
-printer=morpho full-form lexicon, long format
|
||||
*-printer=latex LaTeX file (for the tg command)
|
||||
-printer=fullform full-form lexicon, short format
|
||||
*-printer=xml XML: DTD for the pg command, object for st
|
||||
-printer=old old GF: file readable by GF 1.2
|
||||
-printer=stat show some statistics of generated GFC
|
||||
-printer=probs show probabilities of all functions
|
||||
-printer=gsl Nuance GSL speech recognition grammar
|
||||
-printer=jsgf Java Speech Grammar Format
|
||||
-printer=jsgf_sisr_old Java Speech Grammar Format with semantic tags in
|
||||
SISR WD 20030401 format
|
||||
-printer=srgs_abnf SRGS ABNF format
|
||||
-printer=srgs_abnf_non_rec SRGS ABNF format, without any recursion.
|
||||
-printer=srgs_abnf_sisr_old SRGS ABNF format, with semantic tags in
|
||||
SISR WD 20030401 format
|
||||
-printer=srgs_xml SRGS XML format
|
||||
-printer=srgs_xml_non_rec SRGS XML format, without any recursion.
|
||||
-printer=srgs_xml_prob SRGS XML format, with weights
|
||||
-printer=srgs_xml_sisr_old SRGS XML format, with semantic tags in
|
||||
SISR WD 20030401 format
|
||||
-printer=vxml Generate a dialogue system in VoiceXML.
|
||||
-printer=slf a finite automaton in the HTK SLF format
|
||||
-printer=slf_graphviz the same automaton as slf, but in Graphviz format
|
||||
-printer=slf_sub a finite automaton with sub-automata in the
|
||||
HTK SLF format
|
||||
-printer=slf_sub_graphviz the same automaton as slf_sub, but in
|
||||
Graphviz format
|
||||
-printer=fa_graphviz a finite automaton with labelled edges
|
||||
-printer=regular a regular grammar in a simple BNF
|
||||
-printer=unpar a gfc grammar with parameters eliminated
|
||||
-printer=functiongraph abstract syntax functions in 'dot' format
|
||||
-printer=typegraph abstract syntax categories in 'dot' format
|
||||
-printer=transfer Transfer language datatype (.tr file format)
|
||||
-printer=cfg-prolog M cfg in prolog format (also pg)
|
||||
-printer=gfc-prolog M gfc in prolog format (also pg)
|
||||
-printer=gfcm M gfcm file (default for pm)
|
||||
-printer=graph M module dependency graph in 'dot' (graphviz) format
|
||||
-printer=header M gfcm file with header (for GF embedded in Java)
|
||||
-printer=js M JavaScript type annotator and linearizer
|
||||
-printer=mcfg-prolog M mcfg in prolog format (also pg)
|
||||
-printer=missing M the missing linearizations of each concrete
|
||||
|
||||
-startcat, like -cat, but used in grammars (to avoid clash with keyword cat)
|
||||
|
||||
-transform, transformation performed on a syntax tree. The default is identity.
|
||||
-transform=identity no change
|
||||
-transform=compute compute by using definitions in the grammar
|
||||
-transform=nodup return the term only if it has no constants duplicated
|
||||
-transform=nodupatom return the term only if it has no atomic constants duplicated
|
||||
-transform=typecheck return the term only if it is type-correct
|
||||
-transform=solve solve metavariables as derived refinements
|
||||
-transform=context solve metavariables by unique refinements as variables
|
||||
-transform=delete replace the term by metavariable
|
||||
|
||||
-unlexer, untokenization transforming linearization output into a string.
|
||||
The default is unwords.
|
||||
-unlexer=unwords space-separated token list (like unwords)
|
||||
-unlexer=text format as text: punctuation, capitals, paragraph <p>
|
||||
-unlexer=code format as code (spacing, indentation)
|
||||
-unlexer=textlit like text, but remove string literal quotes
|
||||
-unlexer=codelit like code, but remove string literal quotes
|
||||
-unlexer=concat remove all spaces
|
||||
-unlexer=bind like identity, but bind at "&+"
|
||||
|
||||
-mark, marking of parts of tree in linearization. The default is none.
|
||||
-mark=metacat append "+CAT" to every metavariable, showing its category
|
||||
-mark=struct show tree structure with brackets
|
||||
-mark=java show tree structure with XML tags (used in gfeditor)
|
||||
|
||||
-coding, Some grammars are in UTF-8, some in isolatin-1.
|
||||
If the letters ä (a-umlaut) and ö (o-umlaut) look strange, either
|
||||
change your terminal to isolatin-1, or rewrite the grammar with
|
||||
'pg -utf8'.
|
||||
|
||||
-- *: Commands and options marked with * are not currently implemented.
|
||||
250
src/Makefile
250
src/Makefile
@@ -1,250 +0,0 @@
|
||||
include config.mk
|
||||
|
||||
|
||||
GHMAKE=$(GHC) --make
|
||||
GHCXMAKE=ghcxmake
|
||||
GHCFLAGS+= -fglasgow-exts
|
||||
GHCOPTFLAGS=-O2
|
||||
GHCFUDFLAG=
|
||||
|
||||
DIST_DIR=GF-$(PACKAGE_VERSION)
|
||||
NOT_IN_DIST= \
|
||||
grammars \
|
||||
download \
|
||||
doc/release2.html \
|
||||
src/tools/AlphaConvGF.hs
|
||||
|
||||
BIN_DIST_DIR=$(DIST_DIR)-$(host)
|
||||
|
||||
GRAMMAR_PACKAGE_VERSION=$(shell date +%Y%m%d)
|
||||
GRAMMAR_DIST_DIR=gf-grammars-$(GRAMMAR_PACKAGE_VERSION)
|
||||
|
||||
MSI_FILE=gf-$(subst .,_,$(PACKAGE_VERSION)).msi
|
||||
|
||||
GF_DATA_DIR=$(datadir)/GF-$(PACKAGE_VERSION)
|
||||
GF_LIB_DIR=$(GF_DATA_DIR)/lib
|
||||
|
||||
EMBED = GF/Embed/TemplateApp
|
||||
|
||||
# use the temporary binary file name 'gf-bin' to not clash with directory 'GF'
|
||||
# on case insensitive file systems (such as FAT)
|
||||
GF_EXE=gf$(EXEEXT)
|
||||
GF_EXE_TMP=gf-bin$(EXEEXT)
|
||||
GF_DOC_EXE=gfdoc$(EXEEXT)
|
||||
|
||||
|
||||
ifeq ("$(READLINE)","readline")
|
||||
GHCFLAGS += -package readline -DUSE_READLINE
|
||||
endif
|
||||
|
||||
ifneq ("$(CPPFLAGS)","")
|
||||
GHCFLAGS += $(addprefix -optP, $(CPPFLAGS))
|
||||
endif
|
||||
|
||||
ifneq ("$(LDFLAGS)","")
|
||||
GHCFLAGS += $(addprefix -optl, $(LDFLAGS))
|
||||
endif
|
||||
|
||||
ifeq ("$(INTERRUPT)","yes")
|
||||
GHCFLAGS += -DUSE_INTERRUPT
|
||||
endif
|
||||
|
||||
ifeq ("$(ATK)","yes")
|
||||
GHCFLAGS += -DUSE_ATK
|
||||
endif
|
||||
|
||||
ifeq ("$(ENABLE_JAVA)", "yes")
|
||||
BUILD_JAR=jar
|
||||
else
|
||||
BUILD_JAR=
|
||||
endif
|
||||
|
||||
.PHONY: all unix jar tags gfdoc windows install install-gf \
|
||||
lib temp install-gfdoc \
|
||||
today help clean windows-msi dist gfc
|
||||
|
||||
all: unix gfc lib
|
||||
|
||||
static: GHCFLAGS += -optl-static
|
||||
static: unix
|
||||
|
||||
|
||||
gf: unix
|
||||
|
||||
unix: today opt
|
||||
|
||||
windows: unix
|
||||
|
||||
temp: today noopt
|
||||
|
||||
|
||||
build:
|
||||
$(GHMAKE) $(GHCFLAGS) GF.hs -o $(GF_EXE_TMP)
|
||||
strip $(GF_EXE_TMP)
|
||||
mv $(GF_EXE_TMP) ../bin/$(GF_EXE)
|
||||
|
||||
opt: GHCFLAGS += $(GHCOPTFLAGS)
|
||||
opt: build
|
||||
|
||||
embed: GHCFLAGS += $(GHCOPTFLAGS)
|
||||
embed:
|
||||
$(GHMAKE) $(GHCFLAGS) $(EMBED) -o $(EMBED)
|
||||
strip $(EMBED)
|
||||
|
||||
noopt: build
|
||||
|
||||
clean:
|
||||
find . '(' -name '*~' -o -name '*.hi' -o -name '*.ghi' -o -name '*.o' ')' -exec rm -f '{}' ';'
|
||||
-rm -f gf.wixobj
|
||||
-rm -f ../bin/$(GF_EXE)
|
||||
$(MAKE) -C tools/c clean
|
||||
$(MAKE) -C ../lib/c clean
|
||||
-rm -f ../bin/gfcc2c
|
||||
|
||||
distclean: clean
|
||||
-rm -f tools/$(GF_DOC_EXE)
|
||||
-rm -f config.status config.mk config.log
|
||||
-rm -f *.tgz *.zip
|
||||
-rm -rf $(DIST_DIR) $(BIN_DIST_DIR)
|
||||
-rm -rf gf.wxs *.msi
|
||||
|
||||
today:
|
||||
echo 'module Paths_gf (version, getDataDir) where' > Paths_gf.hs
|
||||
echo 'import Data.Version' >> Paths_gf.hs
|
||||
echo '{-# NOINLINE version #-}' >> Paths_gf.hs
|
||||
echo 'version :: Version' >> Paths_gf.hs
|
||||
echo 'version = Version {versionBranch = [3,0], versionTags = ["beta3"]}' >> Paths_gf.hs
|
||||
echo 'getDataDir = return "$(GF_DATA_DIR)" :: IO FilePath' >> Paths_gf.hs
|
||||
|
||||
|
||||
showflags:
|
||||
@echo $(GHCFLAGS)
|
||||
|
||||
# added by peb:
|
||||
tracing: GHCFLAGS += -DTRACING
|
||||
tracing: temp
|
||||
|
||||
ghci-trace: GHCFLAGS += -DTRACING
|
||||
ghci-trace: ghci
|
||||
|
||||
#touch-files:
|
||||
# rm -f GF/System/Tracing.{hi,o}
|
||||
# touch GF/System/Tracing.hs
|
||||
|
||||
# profiling
|
||||
prof: GHCOPTFLAGS += -prof -auto-all
|
||||
prof: unix
|
||||
|
||||
tags:
|
||||
find GF Transfer -name '*.hs' | xargs hasktags
|
||||
|
||||
#
|
||||
# Help file
|
||||
#
|
||||
|
||||
tools/MkHelpFile: tools/MkHelpFile.hs
|
||||
$(GHMAKE) -o $@ $^
|
||||
|
||||
help: GF/Shell/HelpFile.hs
|
||||
|
||||
GF/Shell/HelpFile.hs: tools/MkHelpFile HelpFile
|
||||
tools/MkHelpFile
|
||||
|
||||
#
|
||||
# Tools
|
||||
#
|
||||
|
||||
gfdoc: tools/$(GF_DOC_EXE)
|
||||
|
||||
tools/$(GF_DOC_EXE): tools/GFDoc.hs
|
||||
$(GHMAKE) $(GHCOPTFLAGS) -o $@ $^
|
||||
|
||||
gfc: gf
|
||||
echo GFC!
|
||||
cp -f gfc ../bin/
|
||||
chmod a+x ../bin/gfc
|
||||
|
||||
gfcc2c:
|
||||
$(MAKE) -C tools/c
|
||||
$(MAKE) -C ../lib/c
|
||||
mv tools/c/gfcc2c ../bin
|
||||
|
||||
#
|
||||
# Resource grammars
|
||||
#
|
||||
|
||||
lib:
|
||||
$(MAKE) -C ../lib/resource clean all
|
||||
|
||||
#
|
||||
# Distribution
|
||||
#
|
||||
|
||||
dist:
|
||||
-rm -rf $(DIST_DIR)
|
||||
darcs dist --dist-name=$(DIST_DIR)
|
||||
tar -zxf ../$(DIST_DIR).tar.gz
|
||||
rm ../$(DIST_DIR).tar.gz
|
||||
cd $(DIST_DIR)/src && perl -pi -e "s/^AC_INIT\(\[GF\],\[[^\]]*\]/AC_INIT([GF],[$(PACKAGE_VERSION)]/" configure.ac
|
||||
cd $(DIST_DIR)/src && autoconf && rm -rf autom4te.cache
|
||||
# cd $(DIST_DIR)/grammars && sh mkLib.sh
|
||||
cd $(DIST_DIR) && rm -rf $(NOT_IN_DIST)
|
||||
$(TAR) -zcf $(DIST_DIR).tgz $(DIST_DIR)
|
||||
rm -rf $(DIST_DIR)
|
||||
|
||||
snapshot: PACKAGE_VERSION=$(shell date +%Y%m%d)
|
||||
snapshot: DIST_DIR=GF-$(PACKAGE_VERSION)
|
||||
snapshot: dist
|
||||
|
||||
rpm: dist
|
||||
rpmbuild -ta $(DIST_DIR).tgz
|
||||
|
||||
|
||||
binary-dist:
|
||||
rm -rf $(BIN_DIST_DIR)
|
||||
mkdir $(BIN_DIST_DIR)
|
||||
mkdir $(BIN_DIST_DIR)/lib
|
||||
./configure --host="$(host)" --build="$(build)"
|
||||
$(MAKE) gfc gfdoc
|
||||
$(INSTALL) ../bin/$(GF_EXE) tools/$(GF_DOC_EXE) $(BIN_DIST_DIR)
|
||||
$(INSTALL) configure config.guess config.sub install-sh config.mk.in $(BIN_DIST_DIR)
|
||||
$(INSTALL) gfc.in $(BIN_DIST_DIR)
|
||||
$(INSTALL) -m 0644 ../README ../LICENSE $(BIN_DIST_DIR)
|
||||
$(INSTALL) -m 0644 INSTALL.binary $(BIN_DIST_DIR)/INSTALL
|
||||
$(INSTALL) -m 0644 Makefile.binary $(BIN_DIST_DIR)/Makefile
|
||||
# $(TAR) -C $(BIN_DIST_DIR)/lib -zxf ../lib/compiled.tgz
|
||||
$(TAR) -zcf GF-$(PACKAGE_VERSION)-$(host).tgz $(BIN_DIST_DIR)
|
||||
rm -rf $(BIN_DIST_DIR)
|
||||
|
||||
grammar-dist:
|
||||
-rm -rf $(GRAMMAR_DIST_DIR)
|
||||
mkdir $(GRAMMAR_DIST_DIR)
|
||||
cp -r ../_darcs/current/{lib,examples} $(GRAMMAR_DIST_DIR)
|
||||
$(MAKE) GF_LIB_PATH=.. -C $(GRAMMAR_DIST_DIR)/lib/resource-1.0 show-path prelude present alltenses mathematical api multimodal langs
|
||||
$(TAR) -zcf $(GRAMMAR_DIST_DIR).tgz $(GRAMMAR_DIST_DIR)
|
||||
rm -rf $(GRAMMAR_DIST_DIR)
|
||||
|
||||
gf.wxs: config.status gf.wxs.in
|
||||
./config.status --file=$@
|
||||
|
||||
windows-msi: gf.wxs
|
||||
candle -nologo gf.wxs
|
||||
light -nologo -o $(MSI_FILE) gf.wixobj
|
||||
|
||||
#
|
||||
# Installation
|
||||
#
|
||||
|
||||
install: install-gf install-gfdoc install-lib
|
||||
|
||||
install-gf:
|
||||
$(INSTALL) -d $(bindir)
|
||||
$(INSTALL) ../bin/$(GF_EXE) $(bindir)
|
||||
|
||||
install-gfdoc:
|
||||
$(INSTALL) -d $(bindir)
|
||||
$(INSTALL) tools/$(GF_DOC_EXE) $(bindir)
|
||||
|
||||
install-lib:
|
||||
$(INSTALL) -d $(GF_LIB_DIR)
|
||||
$(TAR) -C $(GF_LIB_DIR) -zxf ../lib/compiled.tgz
|
||||
@@ -1,20 +0,0 @@
|
||||
include config.mk
|
||||
|
||||
GF_DATA_DIR=$(datadir)/GF-$(PACKAGE_VERSION)
|
||||
GF_LIB_DIR=$(GF_DATA_DIR)/lib
|
||||
|
||||
.PHONY: install uninstall
|
||||
|
||||
install:
|
||||
$(INSTALL) -d $(bindir)
|
||||
$(INSTALL) gf$(EXEEXT) gfdoc$(EXEEXT) $(bindir)
|
||||
$(INSTALL) gfc$(EXEEXT) $(bindir)
|
||||
$(INSTALL) -d $(GF_DATA_DIR)
|
||||
cp -r lib $(GF_DATA_DIR)
|
||||
|
||||
uninstall:
|
||||
-rm -f $(bindir)/gf$(EXEEXT) $(bindir)/gfdoc$(EXEEXT)
|
||||
-rm -f $GF_LIB_DIR)/*/*.gf{o}
|
||||
-rmdir $(GF_LIB_DIR)/*
|
||||
-rmdir $(GF_LIB_DIR)
|
||||
-rmdir $(GF_DATA_DIR)
|
||||
@@ -1,13 +0,0 @@
|
||||
concrete Eng of Ex = {
|
||||
lincat
|
||||
S = {s : Str} ;
|
||||
NP = {s : Str ; n : Num} ;
|
||||
VP = {s : Num => Str} ;
|
||||
param
|
||||
Num = Sg | Pl ;
|
||||
lin
|
||||
Pred np vp = {s = np.s ++ vp.s ! np.n} ;
|
||||
She = {s = "she" ; n = Sg} ;
|
||||
They = {s = "they" ; n = Pl} ;
|
||||
Sleep = {s = table {Sg => "sleeps" ; Pl => "sleep"}} ;
|
||||
}
|
||||
@@ -1,8 +0,0 @@
|
||||
abstract Ex = {
|
||||
cat
|
||||
S ; NP ; VP ;
|
||||
fun
|
||||
Pred : NP -> VP -> S ;
|
||||
She, They : NP ;
|
||||
Sleep : VP ;
|
||||
}
|
||||
@@ -1,13 +0,0 @@
|
||||
concrete Swe of Ex = {
|
||||
lincat
|
||||
S = {s : Str} ;
|
||||
NP = {s : Str} ;
|
||||
VP = {s : Str} ;
|
||||
param
|
||||
Num = Sg | Pl ;
|
||||
lin
|
||||
Pred np vp = {s = np.s ++ vp.s} ;
|
||||
She = {s = "hon"} ;
|
||||
They = {s = "de"} ;
|
||||
Sleep = {s = "sover"} ;
|
||||
}
|
||||
@@ -1,64 +0,0 @@
|
||||
-- to test GFCC compilation
|
||||
|
||||
flags coding=utf8 ;
|
||||
|
||||
cat S ; NP ; N ; VP ;
|
||||
|
||||
fun Pred : NP -> VP -> S ;
|
||||
fun Pred2 : NP -> VP -> NP -> S ;
|
||||
fun Det, Dets : N -> NP ;
|
||||
fun Mina, Sina, Me, Te : NP ;
|
||||
fun Raha, Paska, Pallo : N ;
|
||||
fun Puhua, Munia, Sanoa : VP ;
|
||||
|
||||
param Person = P1 | P2 | P3 ;
|
||||
param Number = Sg | Pl ;
|
||||
param Case = Nom | Part ;
|
||||
|
||||
param NForm = NF Number Case ;
|
||||
param VForm = VF Number Person ;
|
||||
|
||||
lincat N = Noun ;
|
||||
lincat VP = Verb ;
|
||||
|
||||
oper Noun = {s : NForm => Str} ;
|
||||
oper Verb = {s : VForm => Str} ;
|
||||
|
||||
lincat NP = {s : Case => Str ; a : {n : Number ; p : Person}} ;
|
||||
|
||||
lin Pred np vp = {s = np.s ! Nom ++ vp.s ! VF np.a.n np.a.p} ;
|
||||
lin Pred2 np vp ob = {s = np.s ! Nom ++ vp.s ! VF np.a.n np.a.p ++ ob.s ! Part} ;
|
||||
lin Det no = {s = \\c => no.s ! NF Sg c ; a = {n = Sg ; p = P3}} ;
|
||||
lin Dets no = {s = \\c => no.s ! NF Pl c ; a = {n = Pl ; p = P3}} ;
|
||||
lin Mina = {s = table Case ["minä" ; "minua"] ; a = {n = Sg ; p = P1}} ;
|
||||
lin Te = {s = table Case ["te" ; "teitä"] ; a = {n = Pl ; p = P2}} ;
|
||||
lin Sina = {s = table Case ["sinä" ; "sinua"] ; a = {n = Sg ; p = P2}} ;
|
||||
lin Me = {s = table Case ["me" ; "meitä"] ; a = {n = Pl ; p = P1}} ;
|
||||
|
||||
lin Raha = mkN "raha" ;
|
||||
lin Paska = mkN "paska" ;
|
||||
lin Pallo = mkN "pallo" ;
|
||||
lin Puhua = mkV "puhu" ;
|
||||
lin Munia = mkV "muni" ;
|
||||
lin Sanoa = mkV "sano" ;
|
||||
|
||||
oper mkN : Str -> Noun = \raha -> {
|
||||
s = table {
|
||||
NF Sg Nom => raha ;
|
||||
NF Sg Part => raha + "a" ;
|
||||
NF Pl Nom => raha + "t" ;
|
||||
NF Pl Part => Predef.tk 1 raha + "oja"
|
||||
}
|
||||
} ;
|
||||
|
||||
oper mkV : Str -> Verb = \puhu -> {
|
||||
s = table {
|
||||
VF Sg P1 => puhu + "n" ;
|
||||
VF Sg P2 => puhu + "t" ;
|
||||
VF Sg P3 => puhu + Predef.dp 1 puhu ;
|
||||
VF Pl P1 => puhu + "mme" ;
|
||||
VF Pl P2 => puhu + "tte" ;
|
||||
VF Pl P3 => puhu + "vat"
|
||||
}
|
||||
} ;
|
||||
|
||||
@@ -1,809 +0,0 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<HTML>
|
||||
<HEAD>
|
||||
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
|
||||
<TITLE>The GFCC Grammar Format</TITLE>
|
||||
</HEAD><BODY BGCOLOR="white" TEXT="black">
|
||||
<P ALIGN="center"><CENTER><H1>The GFCC Grammar Format</H1>
|
||||
<FONT SIZE="4">
|
||||
<I>Aarne Ranta</I><BR>
|
||||
October 5, 2007
|
||||
</FONT></CENTER>
|
||||
|
||||
<P>
|
||||
Author's address:
|
||||
<A HREF="http://www.cs.chalmers.se/~aarne"><CODE>http://www.cs.chalmers.se/~aarne</CODE></A>
|
||||
</P>
|
||||
<P>
|
||||
History:
|
||||
</P>
|
||||
<UL>
|
||||
<LI>5 Oct 2007: new, better structured GFCC with full expressive power
|
||||
<LI>19 Oct: translation of lincats, new figures on C++
|
||||
<LI>3 Oct 2006: first version
|
||||
</UL>
|
||||
|
||||
<H2>What is GFCC</H2>
|
||||
<P>
|
||||
GFCC is a low-level format for GF grammars. Its aim is to contain the minimum
|
||||
that is needed to process GF grammars at runtime. This minimality has three
|
||||
advantages:
|
||||
</P>
|
||||
<UL>
|
||||
<LI>compact grammar files and run-time objects
|
||||
<LI>time and space efficient processing
|
||||
<LI>simple definition of interpreters
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
Thus we also want to call GFCC the <B>portable grammar format</B>.
|
||||
</P>
|
||||
<P>
|
||||
The idea is that all embedded GF applications use GFCC.
|
||||
The GF system would be primarily used as a compiler and as a grammar
|
||||
development tool.
|
||||
</P>
|
||||
<P>
|
||||
Since GFCC is implemented in BNFC, a parser of the format is readily
|
||||
available for C, C++, C#, Haskell, Java, and OCaml. Also an XML
|
||||
representation can be generated in BNFC. A
|
||||
<A HREF="../">reference implementation</A>
|
||||
of linearization and some other functions has been written in Haskell.
|
||||
</P>
|
||||
<H2>GFCC vs. GFC</H2>
|
||||
<P>
|
||||
GFCC is aimed to replace GFC as the run-time grammar format. GFC was designed
|
||||
to be a run-time format, but also to
|
||||
support separate compilation of grammars, i.e.
|
||||
to store the results of compiling
|
||||
individual GF modules. But this means that GFC has to contain extra information,
|
||||
such as type annotations, which is only needed in compilation and not at
|
||||
run-time. In particular, the pattern matching syntax and semantics of GFC is
|
||||
complex and therefore difficult to implement in new platforms.
|
||||
</P>
|
||||
<P>
|
||||
Actually, GFC is planned to be omitted also as the target format of
|
||||
separate compilation, where plain GF (type annotated and partially evaluated)
|
||||
will be used instead. GFC provides only marginal advantages as a target format
|
||||
compared with GF, and it is therefore just extra weight to carry around this
|
||||
format.
|
||||
</P>
|
||||
<P>
|
||||
The main differences of GFCC compared with GFC (and GF) can be summarized as follows:
|
||||
</P>
|
||||
<UL>
|
||||
<LI>there are no modules, and therefore no qualified names
|
||||
<LI>a GFCC grammar is multilingual, and consists of a common abstract syntax
|
||||
together with one concrete syntax per language
|
||||
<LI>records and tables are replaced by arrays
|
||||
<LI>record labels and parameter values are replaced by integers
|
||||
<LI>record projection and table selection are replaced by array indexing
|
||||
<LI>even though the format does support dependent types and higher-order abstract
|
||||
syntax, there is no interpreted yet that does this
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
Here is an example of a GF grammar, consisting of three modules,
|
||||
as translated to GFCC. The representations are aligned; thus they do not completely
|
||||
reflect the order of judgements in GFCC files, which have different orders of
|
||||
blocks of judgements, and alphabetical sorting.
|
||||
</P>
|
||||
<PRE>
|
||||
grammar Ex(Eng,Swe);
|
||||
|
||||
abstract Ex = { abstract {
|
||||
cat cat
|
||||
S ; NP ; VP ; NP[]; S[]; VP[];
|
||||
fun fun
|
||||
Pred : NP -> VP -> S ; Pred=[(($ 0! 1),(($ 1! 0)!($ 0! 0)))];
|
||||
She, They : NP ; She=[0,"she"];
|
||||
Sleep : VP ; They=[1,"they"];
|
||||
Sleep=[["sleeps","sleep"]];
|
||||
} } ;
|
||||
|
||||
concrete Eng of Ex = { concrete Eng {
|
||||
lincat lincat
|
||||
S = {s : Str} ; S=[()];
|
||||
NP = {s : Str ; n : Num} ; NP=[1,()];
|
||||
VP = {s : Num => Str} ; VP=[[(),()]];
|
||||
param
|
||||
Num = Sg | Pl ;
|
||||
lin lin
|
||||
Pred np vp = { Pred=[(($ 0! 1),(($ 1! 0)!($ 0! 0)))];
|
||||
s = np.s ++ vp.s ! np.n} ;
|
||||
She = {s = "she" ; n = Sg} ; She=[0,"she"];
|
||||
They = {s = "they" ; n = Pl} ; They = [1, "they"];
|
||||
Sleep = {s = table { Sleep=[["sleeps","sleep"]];
|
||||
Sg => "sleeps" ;
|
||||
Pl => "sleep"
|
||||
}
|
||||
} ;
|
||||
} } ;
|
||||
|
||||
concrete Swe of Ex = { concrete Swe {
|
||||
lincat lincat
|
||||
S = {s : Str} ; S=[()];
|
||||
NP = {s : Str} ; NP=[()];
|
||||
VP = {s : Str} ; VP=[()];
|
||||
param
|
||||
Num = Sg | Pl ;
|
||||
lin lin
|
||||
Pred np vp = { Pred = [(($0!0),($1!0))];
|
||||
s = np.s ++ vp.s} ;
|
||||
She = {s = "hon"} ; She = ["hon"];
|
||||
They = {s = "de"} ; They = ["de"];
|
||||
Sleep = {s = "sover"} ; Sleep = ["sover"];
|
||||
} } ;
|
||||
</PRE>
|
||||
<P></P>
|
||||
<H2>The syntax of GFCC files</H2>
|
||||
<P>
|
||||
The complete BNFC grammar, from which
|
||||
the rules in this section are taken, is in the file
|
||||
<A HREF="../DataGFCC.cf"><CODE>GF/GFCC/GFCC.cf</CODE></A>.
|
||||
</P>
|
||||
<H3>Top level</H3>
|
||||
<P>
|
||||
A grammar has a header telling the name of the abstract syntax
|
||||
(often specifying an application domain), and the names of
|
||||
the concrete languages. The abstract syntax and the concrete
|
||||
syntaxes themselves follow.
|
||||
</P>
|
||||
<PRE>
|
||||
Grm. Grammar ::=
|
||||
"grammar" CId "(" [CId] ")" ";"
|
||||
Abstract ";"
|
||||
[Concrete] ;
|
||||
|
||||
Abs. Abstract ::=
|
||||
"abstract" "{"
|
||||
"flags" [Flag]
|
||||
"fun" [FunDef]
|
||||
"cat" [CatDef]
|
||||
"}" ;
|
||||
|
||||
Cnc. Concrete ::=
|
||||
"concrete" CId "{"
|
||||
"flags" [Flag]
|
||||
"lin" [LinDef]
|
||||
"oper" [LinDef]
|
||||
"lincat" [LinDef]
|
||||
"lindef" [LinDef]
|
||||
"printname" [LinDef]
|
||||
"}" ;
|
||||
</PRE>
|
||||
<P>
|
||||
This syntax organizes each module to a sequence of <B>fields</B>, such
|
||||
as flags, linearizations, operations, linearization types, etc.
|
||||
It is envisaged that particular applications can ignore some
|
||||
of the fields, typically so that earlier fields are more
|
||||
important than later ones.
|
||||
</P>
|
||||
<P>
|
||||
The judgement forms have the following syntax.
|
||||
</P>
|
||||
<PRE>
|
||||
Flg. Flag ::= CId "=" String ;
|
||||
Cat. CatDef ::= CId "[" [Hypo] "]" ;
|
||||
Fun. FunDef ::= CId ":" Type "=" Exp ;
|
||||
Lin. LinDef ::= CId "=" Term ;
|
||||
</PRE>
|
||||
<P>
|
||||
For the run-time system, the reference implementation in Haskell
|
||||
uses a structure that gives efficient look-up:
|
||||
</P>
|
||||
<PRE>
|
||||
data GFCC = GFCC {
|
||||
absname :: CId ,
|
||||
cncnames :: [CId] ,
|
||||
abstract :: Abstr ,
|
||||
concretes :: Map CId Concr
|
||||
}
|
||||
|
||||
data Abstr = Abstr {
|
||||
aflags :: Map CId String, -- value of a flag
|
||||
funs :: Map CId (Type,Exp), -- type and def of a fun
|
||||
cats :: Map CId [Hypo], -- context of a cat
|
||||
catfuns :: Map CId [CId] -- funs yielding a cat (redundant, for fast lookup)
|
||||
}
|
||||
|
||||
data Concr = Concr {
|
||||
flags :: Map CId String, -- value of a flag
|
||||
lins :: Map CId Term, -- lin of a fun
|
||||
opers :: Map CId Term, -- oper generated by subex elim
|
||||
lincats :: Map CId Term, -- lin type of a cat
|
||||
lindefs :: Map CId Term, -- lin default of a cat
|
||||
printnames :: Map CId Term -- printname of a cat or a fun
|
||||
}
|
||||
</PRE>
|
||||
<P>
|
||||
These definitions are from <A HREF="../DataGFCC.hs"><CODE>GF/GFCC/DataGFCC.hs</CODE></A>.
|
||||
</P>
|
||||
<P>
|
||||
Identifiers (<CODE>CId</CODE>) are like <CODE>Ident</CODE> in GF, except that
|
||||
the compiler produces constants prefixed with <CODE>_</CODE> in
|
||||
the common subterm elimination optimization.
|
||||
</P>
|
||||
<PRE>
|
||||
token CId (('_' | letter) (letter | digit | '\'' | '_')*) ;
|
||||
</PRE>
|
||||
<P></P>
|
||||
<H3>Abstract syntax</H3>
|
||||
<P>
|
||||
Types are first-order function types built from argument type
|
||||
contexts and value types.
|
||||
category symbols. Syntax trees (<CODE>Exp</CODE>) are
|
||||
rose trees with nodes consisting of a head (<CODE>Atom</CODE>) and
|
||||
bound variables (<CODE>CId</CODE>).
|
||||
</P>
|
||||
<PRE>
|
||||
DTyp. Type ::= "[" [Hypo] "]" CId [Exp] ;
|
||||
DTr. Exp ::= "[" "(" [CId] ")" Atom [Exp] "]" ;
|
||||
Hyp. Hypo ::= CId ":" Type ;
|
||||
</PRE>
|
||||
<P>
|
||||
The head Atom is either a function
|
||||
constant, a bound variable, or a metavariable, or a string, integer, or float
|
||||
literal.
|
||||
</P>
|
||||
<PRE>
|
||||
AC. Atom ::= CId ;
|
||||
AS. Atom ::= String ;
|
||||
AI. Atom ::= Integer ;
|
||||
AF. Atom ::= Double ;
|
||||
AM. Atom ::= "?" Integer ;
|
||||
</PRE>
|
||||
<P>
|
||||
The context-free types and trees of the "old GFCC" are special
|
||||
cases, which can be defined as follows:
|
||||
</P>
|
||||
<PRE>
|
||||
Typ. Type ::= [CId] "->" CId
|
||||
Typ args val = DTyp [Hyp (CId "_") arg | arg <- args] val
|
||||
|
||||
Tr. Exp ::= "(" CId [Exp] ")"
|
||||
Tr fun exps = DTr [] fun exps
|
||||
</PRE>
|
||||
<P>
|
||||
To store semantic (<CODE>def</CODE>) definitions by cases, the following expression
|
||||
form is provided, but it is only meaningful in the last field of a function
|
||||
declaration in an abstract syntax:
|
||||
</P>
|
||||
<PRE>
|
||||
EEq. Exp ::= "{" [Equation] "}" ;
|
||||
Equ. Equation ::= [Exp] "->" Exp ;
|
||||
</PRE>
|
||||
<P>
|
||||
Notice that expressions are used to encode patterns. Primitive notions
|
||||
(the default semantics in GF) are encoded as empty sets of equations
|
||||
(<CODE>[]</CODE>). For a constructor (canonical form) of a category <CODE>C</CODE>, we
|
||||
aim to use the encoding as the application <CODE>(_constr C)</CODE>.
|
||||
</P>
|
||||
<H3>Concrete syntax</H3>
|
||||
<P>
|
||||
Linearization terms (<CODE>Term</CODE>) are built as follows.
|
||||
Constructor names are shown to make the later code
|
||||
examples readable.
|
||||
</P>
|
||||
<PRE>
|
||||
R. Term ::= "[" [Term] "]" ; -- array (record/table)
|
||||
P. Term ::= "(" Term "!" Term ")" ; -- access to field (projection/selection)
|
||||
S. Term ::= "(" [Term] ")" ; -- concatenated sequence
|
||||
K. Term ::= Tokn ; -- token
|
||||
V. Term ::= "$" Integer ; -- argument (subtree)
|
||||
C. Term ::= Integer ; -- array index (label/parameter value)
|
||||
FV. Term ::= "[|" [Term] "|]" ; -- free variation
|
||||
TM. Term ::= "?" ; -- linearization of metavariable
|
||||
</PRE>
|
||||
<P>
|
||||
Tokens are strings or (maybe obsolescent) prefix-dependent
|
||||
variant lists.
|
||||
</P>
|
||||
<PRE>
|
||||
KS. Tokn ::= String ;
|
||||
KP. Tokn ::= "[" "pre" [String] "[" [Variant] "]" "]" ;
|
||||
Var. Variant ::= [String] "/" [String] ;
|
||||
</PRE>
|
||||
<P>
|
||||
Two special forms of terms are introduced by the compiler
|
||||
as optimizations. They can in principle be eliminated, but
|
||||
their presence makes grammars much more compact. Their semantics
|
||||
will be explained in a later section.
|
||||
</P>
|
||||
<PRE>
|
||||
F. Term ::= CId ; -- global constant
|
||||
W. Term ::= "(" String "+" Term ")" ; -- prefix + suffix table
|
||||
</PRE>
|
||||
<P>
|
||||
There is also a deprecated form of "record parameter alias",
|
||||
</P>
|
||||
<PRE>
|
||||
RP. Term ::= "(" Term "@" Term ")"; -- DEPRECATED
|
||||
</PRE>
|
||||
<P>
|
||||
which will be removed when the migration to new GFCC is complete.
|
||||
</P>
|
||||
<H2>The semantics of concrete syntax terms</H2>
|
||||
<P>
|
||||
The code in this section is from <A HREF="../Linearize.hs"><CODE>GF/GFCC/Linearize.hs</CODE></A>.
|
||||
</P>
|
||||
<H3>Linearization and realization</H3>
|
||||
<P>
|
||||
The linearization algorithm is essentially the same as in
|
||||
GFC: a tree is linearized by evaluating its linearization term
|
||||
in the environment of the linearizations of the subtrees.
|
||||
Literal atoms are linearized in the obvious way.
|
||||
The function also needs to know the language (i.e. concrete syntax)
|
||||
in which linearization is performed.
|
||||
</P>
|
||||
<PRE>
|
||||
linExp :: GFCC -> CId -> Exp -> Term
|
||||
linExp gfcc lang tree@(DTr _ at trees) = case at of
|
||||
AC fun -> comp (Prelude.map lin trees) $ look fun
|
||||
AS s -> R [kks (show s)] -- quoted
|
||||
AI i -> R [kks (show i)]
|
||||
AF d -> R [kks (show d)]
|
||||
AM -> TM
|
||||
where
|
||||
lin = linExp gfcc lang
|
||||
comp = compute gfcc lang
|
||||
look = lookLin gfcc lang
|
||||
</PRE>
|
||||
<P>
|
||||
TODO: bindings must be supported.
|
||||
</P>
|
||||
<P>
|
||||
The result of linearization is usually a record, which is realized as
|
||||
a string using the following algorithm.
|
||||
</P>
|
||||
<PRE>
|
||||
realize :: Term -> String
|
||||
realize trm = case trm of
|
||||
R (t:_) -> realize t
|
||||
S ss -> unwords $ Prelude.map realize ss
|
||||
K (KS s) -> s
|
||||
K (KP s _) -> unwords s ---- prefix choice TODO
|
||||
W s t -> s ++ realize t
|
||||
FV (t:_) -> realize t
|
||||
TM -> "?"
|
||||
</PRE>
|
||||
<P>
|
||||
Notice that realization always picks the first field of a record.
|
||||
If a linearization type has more than one field, the first field
|
||||
does not necessarily contain the desired string.
|
||||
Also notice that the order of record fields in GFCC is not necessarily
|
||||
the same as in GF source.
|
||||
</P>
|
||||
<H3>Term evaluation</H3>
|
||||
<P>
|
||||
Evaluation follows call-by-value order, with two environments
|
||||
needed:
|
||||
</P>
|
||||
<UL>
|
||||
<LI>the grammar (a concrete syntax) to give the global constants
|
||||
<LI>an array of terms to give the subtree linearizations
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
The code is presented in one-level pattern matching, to
|
||||
enable reimplementations in languages that do not permit
|
||||
deep patterns (such as Java and C++).
|
||||
</P>
|
||||
<PRE>
|
||||
compute :: GFCC -> CId -> [Term] -> Term -> Term
|
||||
compute gfcc lang args = comp where
|
||||
comp trm = case trm of
|
||||
P r p -> proj (comp r) (comp p)
|
||||
W s t -> W s (comp t)
|
||||
R ts -> R $ Prelude.map comp ts
|
||||
V i -> idx args (fromInteger i) -- already computed
|
||||
F c -> comp $ look c -- not computed (if contains V)
|
||||
FV ts -> FV $ Prelude.map comp ts
|
||||
S ts -> S $ Prelude.filter (/= S []) $ Prelude.map comp ts
|
||||
_ -> trm
|
||||
|
||||
look = lookOper gfcc lang
|
||||
|
||||
idx xs i = xs !! i
|
||||
|
||||
proj r p = case (r,p) of
|
||||
(_, FV ts) -> FV $ Prelude.map (proj r) ts
|
||||
(W s t, _) -> kks (s ++ getString (proj t p))
|
||||
_ -> comp $ getField r (getIndex p)
|
||||
|
||||
getString t = case t of
|
||||
K (KS s) -> s
|
||||
_ -> trace ("ERROR in grammar compiler: string from "++ show t) "ERR"
|
||||
|
||||
getIndex t = case t of
|
||||
C i -> fromInteger i
|
||||
RP p _ -> getIndex p
|
||||
TM -> 0 -- default value for parameter
|
||||
_ -> trace ("ERROR in grammar compiler: index from " ++ show t) 0
|
||||
|
||||
getField t i = case t of
|
||||
R rs -> idx rs i
|
||||
RP _ r -> getField r i
|
||||
TM -> TM
|
||||
_ -> trace ("ERROR in grammar compiler: field from " ++ show t) t
|
||||
</PRE>
|
||||
<P></P>
|
||||
<H3>The special term constructors</H3>
|
||||
<P>
|
||||
The three forms introduced by the compiler may a need special
|
||||
explanation.
|
||||
</P>
|
||||
<P>
|
||||
Global constants
|
||||
</P>
|
||||
<PRE>
|
||||
Term ::= CId ;
|
||||
</PRE>
|
||||
<P>
|
||||
are shorthands for complex terms. They are produced by the
|
||||
compiler by (iterated) <B>common subexpression elimination</B>.
|
||||
They are often more powerful than hand-devised code sharing in the source
|
||||
code. They could be computed off-line by replacing each identifier by
|
||||
its definition.
|
||||
</P>
|
||||
<P>
|
||||
<B>Prefix-suffix tables</B>
|
||||
</P>
|
||||
<PRE>
|
||||
Term ::= "(" String "+" Term ")" ;
|
||||
</PRE>
|
||||
<P>
|
||||
represent tables of word forms divided to the longest common prefix
|
||||
and its array of suffixes. In the example grammar above, we have
|
||||
</P>
|
||||
<PRE>
|
||||
Sleep = [("sleep" + ["s",""])]
|
||||
</PRE>
|
||||
<P>
|
||||
which in fact is equal to the array of full forms
|
||||
</P>
|
||||
<PRE>
|
||||
["sleeps", "sleep"]
|
||||
</PRE>
|
||||
<P>
|
||||
The power of this construction comes from the fact that suffix sets
|
||||
tend to be repeated in a language, and can therefore be collected
|
||||
by common subexpression elimination. It is this technique that
|
||||
explains the used syntax rather than the more accurate
|
||||
</P>
|
||||
<PRE>
|
||||
"(" String "+" [String] ")"
|
||||
</PRE>
|
||||
<P>
|
||||
since we want the suffix part to be a <CODE>Term</CODE> for the optimization to
|
||||
take effect.
|
||||
</P>
|
||||
<H2>Compiling to GFCC</H2>
|
||||
<P>
|
||||
Compilation to GFCC is performed by the GF grammar compiler, and
|
||||
GFCC interpreters need not know what it does. For grammar writers,
|
||||
however, it might be interesting to know what happens to the grammars
|
||||
in the process.
|
||||
</P>
|
||||
<P>
|
||||
The compilation phases are the following
|
||||
</P>
|
||||
<OL>
|
||||
<LI>type check and partially evaluate GF source
|
||||
<LI>create a symbol table mapping the GF parameter and record types to
|
||||
fixed-size arrays, and parameter values and record labels to integers
|
||||
<LI>traverse the linearization rules replacing parameters and labels by integers
|
||||
<LI>reorganize the created GF grammar so that it has just one abstract syntax
|
||||
and one concrete syntax per language
|
||||
<LI>TODO: apply UTF8 encoding to the grammar, if not yet applied (this is told by the
|
||||
<CODE>coding</CODE> flag)
|
||||
<LI>translate the GF grammar object to a GFCC grammar object, using a simple
|
||||
compositional mapping
|
||||
<LI>perform the word-suffix optimization on GFCC linearization terms
|
||||
<LI>perform subexpression elimination on each concrete syntax module
|
||||
<LI>print out the GFCC code
|
||||
</OL>
|
||||
|
||||
<H3>Problems in GFCC compilation</H3>
|
||||
<P>
|
||||
Two major problems had to be solved in compiling GF to GFCC:
|
||||
</P>
|
||||
<UL>
|
||||
<LI>consistent order of tables and records, to permit the array translation
|
||||
<LI>run-time variables in complex parameter values.
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
The current implementation is still experimental and may fail
|
||||
to generate correct code. Any errors remaining are likely to be
|
||||
related to the two problems just mentioned.
|
||||
</P>
|
||||
<P>
|
||||
The order problem is solved in slightly different ways for tables and records.
|
||||
In both cases, <B>eta expansion</B> is used to establish a
|
||||
canonical order. Tables are ordered by applying the preorder induced
|
||||
by <CODE>param</CODE> definitions. Records are ordered by sorting them by labels.
|
||||
This means that
|
||||
e.g. the <CODE>s</CODE> field will in general no longer appear as the first
|
||||
field, even if it does so in the GF source code. But relying on the
|
||||
order of fields in a labelled record would be misplaced anyway.
|
||||
</P>
|
||||
<P>
|
||||
The canonical form of records is further complicated by lock fields,
|
||||
i.e. dummy fields of form <CODE>lock_C = <></CODE>, which are added to grammar
|
||||
libraries to force intensionality of linearization types. The problem
|
||||
is that the absence of a lock field only generates a warning, not
|
||||
an error. Therefore a GF grammar can contain objects of the same
|
||||
type with and without a lock field. This problem was solved in GFCC
|
||||
generation by just removing all lock fields (defined as fields whose
|
||||
type is the empty record type). This has the further advantage of
|
||||
(slightly) reducing the grammar size. More importantly, it is safe
|
||||
to remove lock fields, because they are never used in computation,
|
||||
and because intensional types are only needed in grammars reused
|
||||
as libraries, not in grammars used at runtime.
|
||||
</P>
|
||||
<P>
|
||||
While the order problem is rather bureaucratic in nature, run-time
|
||||
variables are an interesting problem. They arise in the presence
|
||||
of complex parameter values, created by argument-taking constructors
|
||||
and parameter records. To give an example, consider the GF parameter
|
||||
type system
|
||||
</P>
|
||||
<PRE>
|
||||
Number = Sg | Pl ;
|
||||
Person = P1 | P2 | P3 ;
|
||||
Agr = Ag Number Person ;
|
||||
</PRE>
|
||||
<P>
|
||||
The values can be translated to integers in the expected way,
|
||||
</P>
|
||||
<PRE>
|
||||
Sg = 0, Pl = 1
|
||||
P1 = 0, P2 = 1, P3 = 2
|
||||
Ag Sg P1 = 0, Ag Sg P2 = 1, Ag Sg P3 = 2,
|
||||
Ag Pl P1 = 3, Ag Pl P2 = 4, Ag Pl P3 = 5
|
||||
</PRE>
|
||||
<P>
|
||||
However, an argument of <CODE>Agr</CODE> can be a run-time variable, as in
|
||||
</P>
|
||||
<PRE>
|
||||
Ag np.n P3
|
||||
</PRE>
|
||||
<P>
|
||||
This expression must first be translated to a case expression,
|
||||
</P>
|
||||
<PRE>
|
||||
case np.n of {
|
||||
0 => 2 ;
|
||||
1 => 5
|
||||
}
|
||||
</PRE>
|
||||
<P>
|
||||
which can then be translated to the GFCC term
|
||||
</P>
|
||||
<PRE>
|
||||
([2,5] ! ($0 ! $1))
|
||||
</PRE>
|
||||
<P>
|
||||
assuming that the variable <CODE>np</CODE> is the first argument and that its
|
||||
<CODE>Number</CODE> field is the second in the record.
|
||||
</P>
|
||||
<P>
|
||||
This transformation of course has to be performed recursively, since
|
||||
there can be several run-time variables in a parameter value:
|
||||
</P>
|
||||
<PRE>
|
||||
Ag np.n np.p
|
||||
</PRE>
|
||||
<P>
|
||||
A similar transformation would be possible to deal with the double
|
||||
role of parameter records discussed above. Thus the type
|
||||
</P>
|
||||
<PRE>
|
||||
RNP = {n : Number ; p : Person}
|
||||
</PRE>
|
||||
<P>
|
||||
could be uniformly translated into the set <CODE>{0,1,2,3,4,5}</CODE>
|
||||
as <CODE>Agr</CODE> above. Selections would be simple instances of indexing.
|
||||
But any projection from the record should be translated into
|
||||
a case expression,
|
||||
</P>
|
||||
<PRE>
|
||||
rnp.n ===>
|
||||
case rnp of {
|
||||
0 => 0 ;
|
||||
1 => 0 ;
|
||||
2 => 0 ;
|
||||
3 => 1 ;
|
||||
4 => 1 ;
|
||||
5 => 1
|
||||
}
|
||||
</PRE>
|
||||
<P>
|
||||
To avoid the code bloat resulting from this, we have chosen to
|
||||
deal with records by a <B>currying</B> transformation:
|
||||
</P>
|
||||
<PRE>
|
||||
table {n : Number ; p : Person} {... ...}
|
||||
===>
|
||||
table Number {Sg => table Person {...} ; table Person {...}}
|
||||
</PRE>
|
||||
<P>
|
||||
This is performed when GFCC is generated. Selections with
|
||||
records have to be treated likewise,
|
||||
</P>
|
||||
<PRE>
|
||||
t ! r ===> t ! r.n ! r.p
|
||||
</PRE>
|
||||
<P></P>
|
||||
<H3>The representation of linearization types</H3>
|
||||
<P>
|
||||
Linearization types (<CODE>lincat</CODE>) are not needed when generating with
|
||||
GFCC, but they have been added to enable parser generation directly from
|
||||
GFCC. The linearization type definitions are shown as a part of the
|
||||
concrete syntax, by using terms to represent types. Here is the table
|
||||
showing how different linearization types are encoded.
|
||||
</P>
|
||||
<PRE>
|
||||
P* = max(P) -- parameter type
|
||||
{r1 : T1 ; ... ; rn : Tn}* = [T1*,...,Tn*] -- record
|
||||
(P => T)* = [T* ,...,T*] -- table, size(P) cases
|
||||
Str* = ()
|
||||
</PRE>
|
||||
<P>
|
||||
For example, the linearization type <CODE>present/CatEng.NP</CODE> is
|
||||
translated as follows:
|
||||
</P>
|
||||
<PRE>
|
||||
NP = {
|
||||
a : { -- 6 = 2*3 values
|
||||
n : {ParamX.Number} ; -- 2 values
|
||||
p : {ParamX.Person} -- 3 values
|
||||
} ;
|
||||
s : {ResEng.Case} => Str -- 3 values
|
||||
}
|
||||
|
||||
__NP = [[1,2],[(),(),()]]
|
||||
</PRE>
|
||||
<P></P>
|
||||
<H3>Running the compiler and the GFCC interpreter</H3>
|
||||
<P>
|
||||
GFCC generation is a part of the
|
||||
<A HREF="http://www.cs.chalmers.se/Cs/Research/Language-technology/darcs/GF/doc/darcs.html">developers' version</A>
|
||||
of GF since September 2006. To invoke the compiler, the flag
|
||||
<CODE>-printer=gfcc</CODE> to the command
|
||||
<CODE>pm = print_multi</CODE> is used. It is wise to recompile the grammar from
|
||||
source, since previously compiled libraries may not obey the canonical
|
||||
order of records.
|
||||
Here is an example, performed in
|
||||
<A HREF="../../../../../examples/bronzeage">example/bronzeage</A>.
|
||||
</P>
|
||||
<PRE>
|
||||
i -src -path=.:prelude:resource-1.0/* -optimize=all_subs BronzeageEng.gf
|
||||
i -src -path=.:prelude:resource-1.0/* -optimize=all_subs BronzeageGer.gf
|
||||
strip
|
||||
pm -printer=gfcc | wf bronze.gfcc
|
||||
</PRE>
|
||||
<P>
|
||||
There is also an experimental batch compiler, which does not use the GFC
|
||||
format or the record aliases. It can be produced by
|
||||
</P>
|
||||
<PRE>
|
||||
make gfc
|
||||
</PRE>
|
||||
<P>
|
||||
in <CODE>GF/src</CODE>, and invoked by
|
||||
</P>
|
||||
<PRE>
|
||||
gfc --make FILES
|
||||
</PRE>
|
||||
<P></P>
|
||||
<H2>The reference interpreter</H2>
|
||||
<P>
|
||||
The reference interpreter written in Haskell consists of the following files:
|
||||
</P>
|
||||
<PRE>
|
||||
-- source file for BNFC
|
||||
GFCC.cf -- labelled BNF grammar of gfcc
|
||||
|
||||
-- files generated by BNFC
|
||||
AbsGFCC.hs -- abstrac syntax datatypes
|
||||
ErrM.hs -- error monad used internally
|
||||
LexGFCC.hs -- lexer of gfcc files
|
||||
ParGFCC.hs -- parser of gfcc files and syntax trees
|
||||
PrintGFCC.hs -- printer of gfcc files and syntax trees
|
||||
|
||||
-- hand-written files
|
||||
DataGFCC.hs -- grammar datatype, post-parser grammar creation
|
||||
Linearize.hs -- linearization and evaluation
|
||||
Macros.hs -- utilities abstracting away from GFCC datatypes
|
||||
Generate.hs -- random and exhaustive generation, generate-and-test parsing
|
||||
API.hs -- functionalities accessible in embedded GF applications
|
||||
Generate.hs -- random and exhaustive generation
|
||||
Shell.hs -- main function - a simple command interpreter
|
||||
</PRE>
|
||||
<P>
|
||||
It is included in the
|
||||
<A HREF="http://www.cs.chalmers.se/Cs/Research/Language-technology/darcs/GF/doc/darcs.html">developers' version</A>
|
||||
of GF, in the subdirectories <A HREF="../"><CODE>GF/src/GF/GFCC</CODE></A> and
|
||||
<A HREF="../../Devel"><CODE>GF/src/GF/Devel</CODE></A>.
|
||||
</P>
|
||||
<P>
|
||||
As of September 2007, default parsing in main GF uses GFCC (implemented by Krasimir
|
||||
Angelov). The interpreter uses the relevant modules
|
||||
</P>
|
||||
<PRE>
|
||||
GF/Conversions/SimpleToFCFG.hs -- generate parser from GFCC
|
||||
GF/Parsing/FCFG.hs -- run the parser
|
||||
</PRE>
|
||||
<P></P>
|
||||
<P>
|
||||
To compile the interpreter, type
|
||||
</P>
|
||||
<PRE>
|
||||
make gfcc
|
||||
</PRE>
|
||||
<P>
|
||||
in <CODE>GF/src</CODE>. To run it, type
|
||||
</P>
|
||||
<PRE>
|
||||
./gfcc <GFCC-file>
|
||||
</PRE>
|
||||
<P>
|
||||
The available commands are
|
||||
</P>
|
||||
<UL>
|
||||
<LI><CODE>gr <Cat> <Int></CODE>: generate a number of random trees in category.
|
||||
and show their linearizations in all languages
|
||||
<LI><CODE>grt <Cat> <Int></CODE>: generate a number of random trees in category.
|
||||
and show the trees and their linearizations in all languages
|
||||
<LI><CODE>gt <Cat> <Int></CODE>: generate a number of trees in category from smallest,
|
||||
and show their linearizations in all languages
|
||||
<LI><CODE>gtt <Cat> <Int></CODE>: generate a number of trees in category from smallest,
|
||||
and show the trees and their linearizations in all languages
|
||||
<LI><CODE>p <Lang> <Cat> <String></CODE>: parse a string into a set of trees
|
||||
<LI><CODE>lin <Tree></CODE>: linearize tree in all languages, also showing full records
|
||||
<LI><CODE>q</CODE>: terminate the system cleanly
|
||||
</UL>
|
||||
|
||||
<H2>Embedded formats</H2>
|
||||
<UL>
|
||||
<LI>JavaScript: compiler of linearization and abstract syntax
|
||||
<P></P>
|
||||
<LI>Haskell: compiler of abstract syntax and interpreter with parsing,
|
||||
linearization, and generation
|
||||
<P></P>
|
||||
<LI>C: compiler of linearization (old GFCC)
|
||||
<P></P>
|
||||
<LI>C++: embedded interpreter supporting linearization (old GFCC)
|
||||
</UL>
|
||||
|
||||
<H2>Some things to do</H2>
|
||||
<P>
|
||||
Support for dependent types, higher-order abstract syntax, and
|
||||
semantic definition in GFCC generation and interpreters.
|
||||
</P>
|
||||
<P>
|
||||
Replacing the entire GF shell by one based on GFCC.
|
||||
</P>
|
||||
<P>
|
||||
Interpreter in Java.
|
||||
</P>
|
||||
<P>
|
||||
Hand-written parsers for GFCC grammars to reduce code size
|
||||
(and efficiency?) of interpreters.
|
||||
</P>
|
||||
<P>
|
||||
Binary format and/or file compression of GFCC output.
|
||||
</P>
|
||||
<P>
|
||||
Syntax editor based on GFCC.
|
||||
</P>
|
||||
<P>
|
||||
Rewriting of resource libraries in order to exploit the
|
||||
word-suffix sharing better (depth-one tables, as in FM).
|
||||
</P>
|
||||
|
||||
<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
|
||||
<!-- cmdline: txt2tags -thtml gfcc.txt -->
|
||||
</BODY></HTML>
|
||||
@@ -1,712 +0,0 @@
|
||||
The GFCC Grammar Format
|
||||
Aarne Ranta
|
||||
December 14, 2007
|
||||
|
||||
Author's address:
|
||||
[``http://www.cs.chalmers.se/~aarne`` http://www.cs.chalmers.se/~aarne]
|
||||
|
||||
% to compile: txt2tags -thtml --toc gfcc.txt
|
||||
|
||||
History:
|
||||
- 14 Dec 2007: simpler, Lisp-like concrete syntax of GFCC
|
||||
- 5 Oct 2007: new, better structured GFCC with full expressive power
|
||||
- 19 Oct: translation of lincats, new figures on C++
|
||||
- 3 Oct 2006: first version
|
||||
|
||||
|
||||
==What is GFCC==
|
||||
|
||||
GFCC is a low-level format for GF grammars. Its aim is to contain the minimum
|
||||
that is needed to process GF grammars at runtime. This minimality has three
|
||||
advantages:
|
||||
- compact grammar files and run-time objects
|
||||
- time and space efficient processing
|
||||
- simple definition of interpreters
|
||||
|
||||
|
||||
Thus we also want to call GFCC the **portable grammar format**.
|
||||
|
||||
The idea is that all embedded GF applications use GFCC.
|
||||
The GF system would be primarily used as a compiler and as a grammar
|
||||
development tool.
|
||||
|
||||
Since GFCC is implemented in BNFC, a parser of the format is readily
|
||||
available for C, C++, C#, Haskell, Java, and OCaml. Also an XML
|
||||
representation can be generated in BNFC. A
|
||||
[reference implementation ../]
|
||||
of linearization and some other functions has been written in Haskell.
|
||||
|
||||
|
||||
==GFCC vs. GFC==
|
||||
|
||||
GFCC is aimed to replace GFC as the run-time grammar format. GFC was designed
|
||||
to be a run-time format, but also to
|
||||
support separate compilation of grammars, i.e.
|
||||
to store the results of compiling
|
||||
individual GF modules. But this means that GFC has to contain extra information,
|
||||
such as type annotations, which is only needed in compilation and not at
|
||||
run-time. In particular, the pattern matching syntax and semantics of GFC is
|
||||
complex and therefore difficult to implement in new platforms.
|
||||
|
||||
Actually, GFC is planned to be omitted also as the target format of
|
||||
separate compilation, where plain GF (type annotated and partially evaluated)
|
||||
will be used instead. GFC provides only marginal advantages as a target format
|
||||
compared with GF, and it is therefore just extra weight to carry around this
|
||||
format.
|
||||
|
||||
The main differences of GFCC compared with GFC (and GF) can be
|
||||
summarized as follows:
|
||||
- there are no modules, and therefore no qualified names
|
||||
- a GFCC grammar is multilingual, and consists of a common abstract syntax
|
||||
together with one concrete syntax per language
|
||||
- records and tables are replaced by arrays
|
||||
- record labels and parameter values are replaced by integers
|
||||
- record projection and table selection are replaced by array indexing
|
||||
- even though the format does support dependent types and higher-order abstract
|
||||
syntax, there is no interpreted yet that does this
|
||||
|
||||
|
||||
|
||||
Here is an example of a GF grammar, consisting of three modules,
|
||||
as translated to GFCC. The representations are aligned;
|
||||
thus they do not completely
|
||||
reflect the order of judgements in GFCC files, which have different orders of
|
||||
blocks of judgements, and alphabetical sorting.
|
||||
```
|
||||
grammar Ex(Eng,Swe);
|
||||
|
||||
abstract Ex = { abstract {
|
||||
cat cat
|
||||
S ; NP ; VP ; NP[]; S[]; VP[];
|
||||
fun fun
|
||||
Pred : NP -> VP -> S ; Pred=[(($ 0! 1),(($ 1! 0)!($ 0! 0)))];
|
||||
She, They : NP ; She=[0,"she"];
|
||||
Sleep : VP ; They=[1,"they"];
|
||||
Sleep=[["sleeps","sleep"]];
|
||||
} } ;
|
||||
|
||||
concrete Eng of Ex = { concrete Eng {
|
||||
lincat lincat
|
||||
S = {s : Str} ; S=[()];
|
||||
NP = {s : Str ; n : Num} ; NP=[1,()];
|
||||
VP = {s : Num => Str} ; VP=[[(),()]];
|
||||
param
|
||||
Num = Sg | Pl ;
|
||||
lin lin
|
||||
Pred np vp = { Pred=[(($ 0! 1),(($ 1! 0)!($ 0! 0)))];
|
||||
s = np.s ++ vp.s ! np.n} ;
|
||||
She = {s = "she" ; n = Sg} ; She=[0,"she"];
|
||||
They = {s = "they" ; n = Pl} ; They = [1, "they"];
|
||||
Sleep = {s = table { Sleep=[["sleeps","sleep"]];
|
||||
Sg => "sleeps" ;
|
||||
Pl => "sleep"
|
||||
}
|
||||
} ;
|
||||
} } ;
|
||||
|
||||
concrete Swe of Ex = { concrete Swe {
|
||||
lincat lincat
|
||||
S = {s : Str} ; S=[()];
|
||||
NP = {s : Str} ; NP=[()];
|
||||
VP = {s : Str} ; VP=[()];
|
||||
param
|
||||
Num = Sg | Pl ;
|
||||
lin lin
|
||||
Pred np vp = { Pred = [(($0!0),($1!0))];
|
||||
s = np.s ++ vp.s} ;
|
||||
She = {s = "hon"} ; She = ["hon"];
|
||||
They = {s = "de"} ; They = ["de"];
|
||||
Sleep = {s = "sover"} ; Sleep = ["sover"];
|
||||
} } ;
|
||||
```
|
||||
|
||||
==The syntax of GFCC files==
|
||||
|
||||
The complete BNFC grammar, from which
|
||||
the rules in this section are taken, is in the file
|
||||
[``GF/GFCC/GFCC.cf`` ../DataGFCC.cf].
|
||||
|
||||
|
||||
===Top level===
|
||||
|
||||
A grammar has a header telling the name of the abstract syntax
|
||||
(often specifying an application domain), and the names of
|
||||
the concrete languages. The abstract syntax and the concrete
|
||||
syntaxes themselves follow.
|
||||
```
|
||||
Grm. Grammar ::=
|
||||
"grammar" CId "(" [CId] ")" ";"
|
||||
Abstract ";"
|
||||
[Concrete] ;
|
||||
|
||||
Abs. Abstract ::=
|
||||
"abstract" "{"
|
||||
"flags" [Flag]
|
||||
"fun" [FunDef]
|
||||
"cat" [CatDef]
|
||||
"}" ;
|
||||
|
||||
Cnc. Concrete ::=
|
||||
"concrete" CId "{"
|
||||
"flags" [Flag]
|
||||
"lin" [LinDef]
|
||||
"oper" [LinDef]
|
||||
"lincat" [LinDef]
|
||||
"lindef" [LinDef]
|
||||
"printname" [LinDef]
|
||||
"}" ;
|
||||
```
|
||||
This syntax organizes each module to a sequence of **fields**, such
|
||||
as flags, linearizations, operations, linearization types, etc.
|
||||
It is envisaged that particular applications can ignore some
|
||||
of the fields, typically so that earlier fields are more
|
||||
important than later ones.
|
||||
|
||||
The judgement forms have the following syntax.
|
||||
```
|
||||
Flg. Flag ::= CId "=" String ;
|
||||
Cat. CatDef ::= CId "[" [Hypo] "]" ;
|
||||
Fun. FunDef ::= CId ":" Type "=" Exp ;
|
||||
Lin. LinDef ::= CId "=" Term ;
|
||||
```
|
||||
For the run-time system, the reference implementation in Haskell
|
||||
uses a structure that gives efficient look-up:
|
||||
```
|
||||
data GFCC = GFCC {
|
||||
absname :: CId ,
|
||||
cncnames :: [CId] ,
|
||||
abstract :: Abstr ,
|
||||
concretes :: Map CId Concr
|
||||
}
|
||||
|
||||
data Abstr = Abstr {
|
||||
aflags :: Map CId String, -- value of a flag
|
||||
funs :: Map CId (Type,Exp), -- type and def of a fun
|
||||
cats :: Map CId [Hypo], -- context of a cat
|
||||
catfuns :: Map CId [CId] -- funs yielding a cat (redundant, for fast lookup)
|
||||
}
|
||||
|
||||
data Concr = Concr {
|
||||
flags :: Map CId String, -- value of a flag
|
||||
lins :: Map CId Term, -- lin of a fun
|
||||
opers :: Map CId Term, -- oper generated by subex elim
|
||||
lincats :: Map CId Term, -- lin type of a cat
|
||||
lindefs :: Map CId Term, -- lin default of a cat
|
||||
printnames :: Map CId Term -- printname of a cat or a fun
|
||||
}
|
||||
```
|
||||
These definitions are from [``GF/GFCC/DataGFCC.hs`` ../DataGFCC.hs].
|
||||
|
||||
Identifiers (``CId``) are like ``Ident`` in GF, except that
|
||||
the compiler produces constants prefixed with ``_`` in
|
||||
the common subterm elimination optimization.
|
||||
```
|
||||
token CId (('_' | letter) (letter | digit | '\'' | '_')*) ;
|
||||
```
|
||||
|
||||
|
||||
===Abstract syntax===
|
||||
|
||||
Types are first-order function types built from argument type
|
||||
contexts and value types.
|
||||
category symbols. Syntax trees (``Exp``) are
|
||||
rose trees with nodes consisting of a head (``Atom``) and
|
||||
bound variables (``CId``).
|
||||
```
|
||||
DTyp. Type ::= "[" [Hypo] "]" CId [Exp] ;
|
||||
DTr. Exp ::= "[" "(" [CId] ")" Atom [Exp] "]" ;
|
||||
Hyp. Hypo ::= CId ":" Type ;
|
||||
```
|
||||
The head Atom is either a function
|
||||
constant, a bound variable, or a metavariable, or a string, integer, or float
|
||||
literal.
|
||||
```
|
||||
AC. Atom ::= CId ;
|
||||
AS. Atom ::= String ;
|
||||
AI. Atom ::= Integer ;
|
||||
AF. Atom ::= Double ;
|
||||
AM. Atom ::= "?" Integer ;
|
||||
```
|
||||
The context-free types and trees of the "old GFCC" are special
|
||||
cases, which can be defined as follows:
|
||||
```
|
||||
Typ. Type ::= [CId] "->" CId
|
||||
Typ args val = DTyp [Hyp (CId "_") arg | arg <- args] val
|
||||
|
||||
Tr. Exp ::= "(" CId [Exp] ")"
|
||||
Tr fun exps = DTr [] fun exps
|
||||
```
|
||||
To store semantic (``def``) definitions by cases, the following expression
|
||||
form is provided, but it is only meaningful in the last field of a function
|
||||
declaration in an abstract syntax:
|
||||
```
|
||||
EEq. Exp ::= "{" [Equation] "}" ;
|
||||
Equ. Equation ::= [Exp] "->" Exp ;
|
||||
```
|
||||
Notice that expressions are used to encode patterns. Primitive notions
|
||||
(the default semantics in GF) are encoded as empty sets of equations
|
||||
(``[]``). For a constructor (canonical form) of a category ``C``, we
|
||||
aim to use the encoding as the application ``(_constr C)``.
|
||||
|
||||
|
||||
|
||||
===Concrete syntax===
|
||||
|
||||
Linearization terms (``Term``) are built as follows.
|
||||
Constructor names are shown to make the later code
|
||||
examples readable.
|
||||
```
|
||||
R. Term ::= "[" [Term] "]" ; -- array (record/table)
|
||||
P. Term ::= "(" Term "!" Term ")" ; -- access to field (projection/selection)
|
||||
S. Term ::= "(" [Term] ")" ; -- concatenated sequence
|
||||
K. Term ::= Tokn ; -- token
|
||||
V. Term ::= "$" Integer ; -- argument (subtree)
|
||||
C. Term ::= Integer ; -- array index (label/parameter value)
|
||||
FV. Term ::= "[|" [Term] "|]" ; -- free variation
|
||||
TM. Term ::= "?" ; -- linearization of metavariable
|
||||
```
|
||||
Tokens are strings or (maybe obsolescent) prefix-dependent
|
||||
variant lists.
|
||||
```
|
||||
KS. Tokn ::= String ;
|
||||
KP. Tokn ::= "[" "pre" [String] "[" [Variant] "]" "]" ;
|
||||
Var. Variant ::= [String] "/" [String] ;
|
||||
```
|
||||
Two special forms of terms are introduced by the compiler
|
||||
as optimizations. They can in principle be eliminated, but
|
||||
their presence makes grammars much more compact. Their semantics
|
||||
will be explained in a later section.
|
||||
```
|
||||
F. Term ::= CId ; -- global constant
|
||||
W. Term ::= "(" String "+" Term ")" ; -- prefix + suffix table
|
||||
```
|
||||
There is also a deprecated form of "record parameter alias",
|
||||
```
|
||||
RP. Term ::= "(" Term "@" Term ")"; -- DEPRECATED
|
||||
```
|
||||
which will be removed when the migration to new GFCC is complete.
|
||||
|
||||
|
||||
|
||||
==The semantics of concrete syntax terms==
|
||||
|
||||
The code in this section is from [``GF/GFCC/Linearize.hs`` ../Linearize.hs].
|
||||
|
||||
|
||||
===Linearization and realization===
|
||||
|
||||
The linearization algorithm is essentially the same as in
|
||||
GFC: a tree is linearized by evaluating its linearization term
|
||||
in the environment of the linearizations of the subtrees.
|
||||
Literal atoms are linearized in the obvious way.
|
||||
The function also needs to know the language (i.e. concrete syntax)
|
||||
in which linearization is performed.
|
||||
```
|
||||
linExp :: GFCC -> CId -> Exp -> Term
|
||||
linExp gfcc lang tree@(DTr _ at trees) = case at of
|
||||
AC fun -> comp (Prelude.map lin trees) $ look fun
|
||||
AS s -> R [kks (show s)] -- quoted
|
||||
AI i -> R [kks (show i)]
|
||||
AF d -> R [kks (show d)]
|
||||
AM -> TM
|
||||
where
|
||||
lin = linExp gfcc lang
|
||||
comp = compute gfcc lang
|
||||
look = lookLin gfcc lang
|
||||
```
|
||||
TODO: bindings must be supported.
|
||||
|
||||
The result of linearization is usually a record, which is realized as
|
||||
a string using the following algorithm.
|
||||
```
|
||||
realize :: Term -> String
|
||||
realize trm = case trm of
|
||||
R (t:_) -> realize t
|
||||
S ss -> unwords $ Prelude.map realize ss
|
||||
K (KS s) -> s
|
||||
K (KP s _) -> unwords s ---- prefix choice TODO
|
||||
W s t -> s ++ realize t
|
||||
FV (t:_) -> realize t
|
||||
TM -> "?"
|
||||
```
|
||||
Notice that realization always picks the first field of a record.
|
||||
If a linearization type has more than one field, the first field
|
||||
does not necessarily contain the desired string.
|
||||
Also notice that the order of record fields in GFCC is not necessarily
|
||||
the same as in GF source.
|
||||
|
||||
|
||||
===Term evaluation===
|
||||
|
||||
Evaluation follows call-by-value order, with two environments
|
||||
needed:
|
||||
- the grammar (a concrete syntax) to give the global constants
|
||||
- an array of terms to give the subtree linearizations
|
||||
|
||||
|
||||
The code is presented in one-level pattern matching, to
|
||||
enable reimplementations in languages that do not permit
|
||||
deep patterns (such as Java and C++).
|
||||
```
|
||||
compute :: GFCC -> CId -> [Term] -> Term -> Term
|
||||
compute gfcc lang args = comp where
|
||||
comp trm = case trm of
|
||||
P r p -> proj (comp r) (comp p)
|
||||
W s t -> W s (comp t)
|
||||
R ts -> R $ Prelude.map comp ts
|
||||
V i -> idx args (fromInteger i) -- already computed
|
||||
F c -> comp $ look c -- not computed (if contains V)
|
||||
FV ts -> FV $ Prelude.map comp ts
|
||||
S ts -> S $ Prelude.filter (/= S []) $ Prelude.map comp ts
|
||||
_ -> trm
|
||||
|
||||
look = lookOper gfcc lang
|
||||
|
||||
idx xs i = xs !! i
|
||||
|
||||
proj r p = case (r,p) of
|
||||
(_, FV ts) -> FV $ Prelude.map (proj r) ts
|
||||
(FV ts, _ ) -> FV $ Prelude.map (\t -> proj t p) ts
|
||||
(W s t, _) -> kks (s ++ getString (proj t p))
|
||||
_ -> comp $ getField r (getIndex p)
|
||||
|
||||
getString t = case t of
|
||||
K (KS s) -> s
|
||||
_ -> trace ("ERROR in grammar compiler: string from "++ show t) "ERR"
|
||||
|
||||
getIndex t = case t of
|
||||
C i -> fromInteger i
|
||||
RP p _ -> getIndex p
|
||||
TM -> 0 -- default value for parameter
|
||||
_ -> trace ("ERROR in grammar compiler: index from " ++ show t) 0
|
||||
|
||||
getField t i = case t of
|
||||
R rs -> idx rs i
|
||||
RP _ r -> getField r i
|
||||
TM -> TM
|
||||
_ -> trace ("ERROR in grammar compiler: field from " ++ show t) t
|
||||
```
|
||||
|
||||
===The special term constructors===
|
||||
|
||||
The three forms introduced by the compiler may a need special
|
||||
explanation.
|
||||
|
||||
Global constants
|
||||
```
|
||||
Term ::= CId ;
|
||||
```
|
||||
are shorthands for complex terms. They are produced by the
|
||||
compiler by (iterated) **common subexpression elimination**.
|
||||
They are often more powerful than hand-devised code sharing in the source
|
||||
code. They could be computed off-line by replacing each identifier by
|
||||
its definition.
|
||||
|
||||
**Prefix-suffix tables**
|
||||
```
|
||||
Term ::= "(" String "+" Term ")" ;
|
||||
```
|
||||
represent tables of word forms divided to the longest common prefix
|
||||
and its array of suffixes. In the example grammar above, we have
|
||||
```
|
||||
Sleep = [("sleep" + ["s",""])]
|
||||
```
|
||||
which in fact is equal to the array of full forms
|
||||
```
|
||||
["sleeps", "sleep"]
|
||||
```
|
||||
The power of this construction comes from the fact that suffix sets
|
||||
tend to be repeated in a language, and can therefore be collected
|
||||
by common subexpression elimination. It is this technique that
|
||||
explains the used syntax rather than the more accurate
|
||||
```
|
||||
"(" String "+" [String] ")"
|
||||
```
|
||||
since we want the suffix part to be a ``Term`` for the optimization to
|
||||
take effect.
|
||||
|
||||
|
||||
|
||||
==Compiling to GFCC==
|
||||
|
||||
Compilation to GFCC is performed by the GF grammar compiler, and
|
||||
GFCC interpreters need not know what it does. For grammar writers,
|
||||
however, it might be interesting to know what happens to the grammars
|
||||
in the process.
|
||||
|
||||
The compilation phases are the following
|
||||
+ type check and partially evaluate GF source
|
||||
+ create a symbol table mapping the GF parameter and record types to
|
||||
fixed-size arrays, and parameter values and record labels to integers
|
||||
+ traverse the linearization rules replacing parameters and labels by integers
|
||||
+ reorganize the created GF grammar so that it has just one abstract syntax
|
||||
and one concrete syntax per language
|
||||
+ TODO: apply UTF8 encoding to the grammar, if not yet applied (this is told by the
|
||||
``coding`` flag)
|
||||
+ translate the GF grammar object to a GFCC grammar object, using a simple
|
||||
compositional mapping
|
||||
+ perform the word-suffix optimization on GFCC linearization terms
|
||||
+ perform subexpression elimination on each concrete syntax module
|
||||
+ print out the GFCC code
|
||||
|
||||
|
||||
|
||||
|
||||
===Problems in GFCC compilation===
|
||||
|
||||
Two major problems had to be solved in compiling GF to GFCC:
|
||||
- consistent order of tables and records, to permit the array translation
|
||||
- run-time variables in complex parameter values.
|
||||
|
||||
|
||||
The current implementation is still experimental and may fail
|
||||
to generate correct code. Any errors remaining are likely to be
|
||||
related to the two problems just mentioned.
|
||||
|
||||
The order problem is solved in slightly different ways for tables and records.
|
||||
In both cases, **eta expansion** is used to establish a
|
||||
canonical order. Tables are ordered by applying the preorder induced
|
||||
by ``param`` definitions. Records are ordered by sorting them by labels.
|
||||
This means that
|
||||
e.g. the ``s`` field will in general no longer appear as the first
|
||||
field, even if it does so in the GF source code. But relying on the
|
||||
order of fields in a labelled record would be misplaced anyway.
|
||||
|
||||
The canonical form of records is further complicated by lock fields,
|
||||
i.e. dummy fields of form ``lock_C = <>``, which are added to grammar
|
||||
libraries to force intensionality of linearization types. The problem
|
||||
is that the absence of a lock field only generates a warning, not
|
||||
an error. Therefore a GF grammar can contain objects of the same
|
||||
type with and without a lock field. This problem was solved in GFCC
|
||||
generation by just removing all lock fields (defined as fields whose
|
||||
type is the empty record type). This has the further advantage of
|
||||
(slightly) reducing the grammar size. More importantly, it is safe
|
||||
to remove lock fields, because they are never used in computation,
|
||||
and because intensional types are only needed in grammars reused
|
||||
as libraries, not in grammars used at runtime.
|
||||
|
||||
While the order problem is rather bureaucratic in nature, run-time
|
||||
variables are an interesting problem. They arise in the presence
|
||||
of complex parameter values, created by argument-taking constructors
|
||||
and parameter records. To give an example, consider the GF parameter
|
||||
type system
|
||||
```
|
||||
Number = Sg | Pl ;
|
||||
Person = P1 | P2 | P3 ;
|
||||
Agr = Ag Number Person ;
|
||||
```
|
||||
The values can be translated to integers in the expected way,
|
||||
```
|
||||
Sg = 0, Pl = 1
|
||||
P1 = 0, P2 = 1, P3 = 2
|
||||
Ag Sg P1 = 0, Ag Sg P2 = 1, Ag Sg P3 = 2,
|
||||
Ag Pl P1 = 3, Ag Pl P2 = 4, Ag Pl P3 = 5
|
||||
```
|
||||
However, an argument of ``Agr`` can be a run-time variable, as in
|
||||
```
|
||||
Ag np.n P3
|
||||
```
|
||||
This expression must first be translated to a case expression,
|
||||
```
|
||||
case np.n of {
|
||||
0 => 2 ;
|
||||
1 => 5
|
||||
}
|
||||
```
|
||||
which can then be translated to the GFCC term
|
||||
```
|
||||
([2,5] ! ($0 ! $1))
|
||||
```
|
||||
assuming that the variable ``np`` is the first argument and that its
|
||||
``Number`` field is the second in the record.
|
||||
|
||||
This transformation of course has to be performed recursively, since
|
||||
there can be several run-time variables in a parameter value:
|
||||
```
|
||||
Ag np.n np.p
|
||||
```
|
||||
A similar transformation would be possible to deal with the double
|
||||
role of parameter records discussed above. Thus the type
|
||||
```
|
||||
RNP = {n : Number ; p : Person}
|
||||
```
|
||||
could be uniformly translated into the set ``{0,1,2,3,4,5}``
|
||||
as ``Agr`` above. Selections would be simple instances of indexing.
|
||||
But any projection from the record should be translated into
|
||||
a case expression,
|
||||
```
|
||||
rnp.n ===>
|
||||
case rnp of {
|
||||
0 => 0 ;
|
||||
1 => 0 ;
|
||||
2 => 0 ;
|
||||
3 => 1 ;
|
||||
4 => 1 ;
|
||||
5 => 1
|
||||
}
|
||||
```
|
||||
To avoid the code bloat resulting from this, we have chosen to
|
||||
deal with records by a **currying** transformation:
|
||||
```
|
||||
table {n : Number ; p : Person} {... ...}
|
||||
===>
|
||||
table Number {Sg => table Person {...} ; table Person {...}}
|
||||
```
|
||||
This is performed when GFCC is generated. Selections with
|
||||
records have to be treated likewise,
|
||||
```
|
||||
t ! r ===> t ! r.n ! r.p
|
||||
```
|
||||
|
||||
|
||||
===The representation of linearization types===
|
||||
|
||||
Linearization types (``lincat``) are not needed when generating with
|
||||
GFCC, but they have been added to enable parser generation directly from
|
||||
GFCC. The linearization type definitions are shown as a part of the
|
||||
concrete syntax, by using terms to represent types. Here is the table
|
||||
showing how different linearization types are encoded.
|
||||
```
|
||||
P* = max(P) -- parameter type
|
||||
{r1 : T1 ; ... ; rn : Tn}* = [T1*,...,Tn*] -- record
|
||||
(P => T)* = [T* ,...,T*] -- table, size(P) cases
|
||||
Str* = ()
|
||||
```
|
||||
For example, the linearization type ``present/CatEng.NP`` is
|
||||
translated as follows:
|
||||
```
|
||||
NP = {
|
||||
a : { -- 6 = 2*3 values
|
||||
n : {ParamX.Number} ; -- 2 values
|
||||
p : {ParamX.Person} -- 3 values
|
||||
} ;
|
||||
s : {ResEng.Case} => Str -- 3 values
|
||||
}
|
||||
|
||||
__NP = [[1,2],[(),(),()]]
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
===Running the compiler and the GFCC interpreter===
|
||||
|
||||
GFCC generation is a part of the
|
||||
[developers' version http://www.cs.chalmers.se/Cs/Research/Language-technology/darcs/GF/doc/darcs.html]
|
||||
of GF since September 2006. To invoke the compiler, the flag
|
||||
``-printer=gfcc`` to the command
|
||||
``pm = print_multi`` is used. It is wise to recompile the grammar from
|
||||
source, since previously compiled libraries may not obey the canonical
|
||||
order of records.
|
||||
Here is an example, performed in
|
||||
[example/bronzeage ../../../../../examples/bronzeage].
|
||||
```
|
||||
i -src -path=.:prelude:resource-1.0/* -optimize=all_subs BronzeageEng.gf
|
||||
i -src -path=.:prelude:resource-1.0/* -optimize=all_subs BronzeageGer.gf
|
||||
strip
|
||||
pm -printer=gfcc | wf bronze.gfcc
|
||||
```
|
||||
There is also an experimental batch compiler, which does not use the GFC
|
||||
format or the record aliases. It can be produced by
|
||||
```
|
||||
make gfc
|
||||
```
|
||||
in ``GF/src``, and invoked by
|
||||
```
|
||||
gfc --make FILES
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
==The reference interpreter==
|
||||
|
||||
The reference interpreter written in Haskell consists of the following files:
|
||||
```
|
||||
-- source file for BNFC
|
||||
GFCC.cf -- labelled BNF grammar of gfcc
|
||||
|
||||
-- files generated by BNFC
|
||||
AbsGFCC.hs -- abstrac syntax datatypes
|
||||
ErrM.hs -- error monad used internally
|
||||
LexGFCC.hs -- lexer of gfcc files
|
||||
ParGFCC.hs -- parser of gfcc files and syntax trees
|
||||
PrintGFCC.hs -- printer of gfcc files and syntax trees
|
||||
|
||||
-- hand-written files
|
||||
DataGFCC.hs -- grammar datatype, post-parser grammar creation
|
||||
Linearize.hs -- linearization and evaluation
|
||||
Macros.hs -- utilities abstracting away from GFCC datatypes
|
||||
Generate.hs -- random and exhaustive generation, generate-and-test parsing
|
||||
API.hs -- functionalities accessible in embedded GF applications
|
||||
Generate.hs -- random and exhaustive generation
|
||||
Shell.hs -- main function - a simple command interpreter
|
||||
```
|
||||
It is included in the
|
||||
[developers' version http://www.cs.chalmers.se/Cs/Research/Language-technology/darcs/GF/doc/darcs.html]
|
||||
of GF, in the subdirectories [``GF/src/GF/GFCC`` ../] and
|
||||
[``GF/src/GF/Devel`` ../../Devel].
|
||||
|
||||
As of September 2007, default parsing in main GF uses GFCC (implemented by Krasimir
|
||||
Angelov). The interpreter uses the relevant modules
|
||||
```
|
||||
GF/Conversions/SimpleToFCFG.hs -- generate parser from GFCC
|
||||
GF/Parsing/FCFG.hs -- run the parser
|
||||
```
|
||||
|
||||
|
||||
To compile the interpreter, type
|
||||
```
|
||||
make gfcc
|
||||
```
|
||||
in ``GF/src``. To run it, type
|
||||
```
|
||||
./gfcc <GFCC-file>
|
||||
```
|
||||
The available commands are
|
||||
- ``gr <Cat> <Int>``: generate a number of random trees in category.
|
||||
and show their linearizations in all languages
|
||||
- ``grt <Cat> <Int>``: generate a number of random trees in category.
|
||||
and show the trees and their linearizations in all languages
|
||||
- ``gt <Cat> <Int>``: generate a number of trees in category from smallest,
|
||||
and show their linearizations in all languages
|
||||
- ``gtt <Cat> <Int>``: generate a number of trees in category from smallest,
|
||||
and show the trees and their linearizations in all languages
|
||||
- ``p <Lang> <Cat> <String>``: parse a string into a set of trees
|
||||
- ``lin <Tree>``: linearize tree in all languages, also showing full records
|
||||
- ``q``: terminate the system cleanly
|
||||
|
||||
|
||||
|
||||
==Embedded formats==
|
||||
|
||||
- JavaScript: compiler of linearization and abstract syntax
|
||||
|
||||
- Haskell: compiler of abstract syntax and interpreter with parsing,
|
||||
linearization, and generation
|
||||
|
||||
- C: compiler of linearization (old GFCC)
|
||||
|
||||
- C++: embedded interpreter supporting linearization (old GFCC)
|
||||
|
||||
|
||||
|
||||
==Some things to do==
|
||||
|
||||
Support for dependent types, higher-order abstract syntax, and
|
||||
semantic definition in GFCC generation and interpreters.
|
||||
|
||||
Replacing the entire GF shell by one based on GFCC.
|
||||
|
||||
Interpreter in Java.
|
||||
|
||||
Hand-written parsers for GFCC grammars to reduce code size
|
||||
(and efficiency?) of interpreters.
|
||||
|
||||
Binary format and/or file compression of GFCC output.
|
||||
|
||||
Syntax editor based on GFCC.
|
||||
|
||||
Rewriting of resource libraries in order to exploit the
|
||||
word-suffix sharing better (depth-one tables, as in FM).
|
||||
|
||||
@@ -1,50 +0,0 @@
|
||||
Grm. Grammar ::= Header ";" Abstract ";" [Concrete] ;
|
||||
Hdr. Header ::= "grammar" CId "(" [CId] ")" ;
|
||||
Abs. Abstract ::= "abstract" "{" [AbsDef] "}" ;
|
||||
Cnc. Concrete ::= "concrete" CId "{" [CncDef] "}" ;
|
||||
|
||||
Fun. AbsDef ::= CId ":" Type "=" Exp ;
|
||||
--AFl. AbsDef ::= "%" CId "=" String ; -- flag
|
||||
Lin. CncDef ::= CId "=" Term ;
|
||||
--CFl. CncDef ::= "%" CId "=" String ; -- flag
|
||||
|
||||
Typ. Type ::= [CId] "->" CId ;
|
||||
Tr. Exp ::= "(" Atom [Exp] ")" ;
|
||||
AC. Atom ::= CId ;
|
||||
AS. Atom ::= String ;
|
||||
AI. Atom ::= Integer ;
|
||||
AF. Atom ::= Double ;
|
||||
AM. Atom ::= "?" ;
|
||||
trA. Exp ::= Atom ;
|
||||
define trA a = Tr a [] ;
|
||||
|
||||
R. Term ::= "[" [Term] "]" ; -- record/table
|
||||
P. Term ::= "(" Term "!" Term ")" ; -- projection/selection
|
||||
S. Term ::= "(" [Term] ")" ; -- sequence with ++
|
||||
K. Term ::= Tokn ; -- token
|
||||
V. Term ::= "$" Integer ; -- argument
|
||||
C. Term ::= Integer ; -- parameter value/label
|
||||
F. Term ::= CId ; -- global constant
|
||||
FV. Term ::= "[|" [Term] "|]" ; -- free variation
|
||||
W. Term ::= "(" String "+" Term ")" ; -- prefix + suffix table
|
||||
RP. Term ::= "(" Term "@" Term ")"; -- record parameter alias
|
||||
TM. Term ::= "?" ; -- lin of metavariable
|
||||
|
||||
L. Term ::= "(" CId "->" Term ")" ; -- lambda abstracted table
|
||||
BV. Term ::= "#" CId ; -- lambda-bound variable
|
||||
|
||||
KS. Tokn ::= String ;
|
||||
KP. Tokn ::= "[" "pre" [String] "[" [Variant] "]" "]" ;
|
||||
Var. Variant ::= [String] "/" [String] ;
|
||||
|
||||
|
||||
terminator Concrete ";" ;
|
||||
terminator AbsDef ";" ;
|
||||
terminator CncDef ";" ;
|
||||
separator CId "," ;
|
||||
separator Term "," ;
|
||||
terminator Exp "" ;
|
||||
terminator String "" ;
|
||||
separator Variant "," ;
|
||||
|
||||
token CId (('_' | letter) (letter | digit | '\'' | '_')*) ;
|
||||
@@ -1,656 +0,0 @@
|
||||
The GFCC Grammar Format
|
||||
Aarne Ranta
|
||||
October 19, 2006
|
||||
|
||||
Author's address:
|
||||
[``http://www.cs.chalmers.se/~aarne`` http://www.cs.chalmers.se/~aarne]
|
||||
|
||||
% to compile: txt2tags -thtml --toc gfcc.txt
|
||||
|
||||
History:
|
||||
- 19 Oct: translation of lincats, new figures on C++
|
||||
- 3 Oct 2006: first version
|
||||
|
||||
|
||||
==What is GFCC==
|
||||
|
||||
GFCC is a low-level format for GF grammars. Its aim is to contain the minimum
|
||||
that is needed to process GF grammars at runtime. This minimality has three
|
||||
advantages:
|
||||
- compact grammar files and run-time objects
|
||||
- time and space efficient processing
|
||||
- simple definition of interpreters
|
||||
|
||||
|
||||
The idea is that all embedded GF applications are compiled to GFCC.
|
||||
The GF system would be primarily used as a compiler and as a grammar
|
||||
development tool.
|
||||
|
||||
Since GFCC is implemented in BNFC, a parser of the format is readily
|
||||
available for C, C++, Haskell, Java, and OCaml. Also an XML
|
||||
representation is generated in BNFC. A
|
||||
[reference implementation ../]
|
||||
of linearization and some other functions has been written in Haskell.
|
||||
|
||||
|
||||
==GFCC vs. GFC==
|
||||
|
||||
GFCC is aimed to replace GFC as the run-time grammar format. GFC was designed
|
||||
to be a run-time format, but also to
|
||||
support separate compilation of grammars, i.e.
|
||||
to store the results of compiling
|
||||
individual GF modules. But this means that GFC has to contain extra information,
|
||||
such as type annotations, which is only needed in compilation and not at
|
||||
run-time. In particular, the pattern matching syntax and semantics of GFC is
|
||||
complex and therefore difficult to implement in new platforms.
|
||||
|
||||
The main differences of GFCC compared with GFC can be summarized as follows:
|
||||
- there are no modules, and therefore no qualified names
|
||||
- a GFCC grammar is multilingual, and consists of a common abstract syntax
|
||||
together with one concrete syntax per language
|
||||
- records and tables are replaced by arrays
|
||||
- record labels and parameter values are replaced by integers
|
||||
- record projection and table selection are replaced by array indexing
|
||||
- there is (so far) no support for dependent types or higher-order abstract
|
||||
syntax (which would be easy to add, but make interpreters much more difficult
|
||||
to write)
|
||||
|
||||
|
||||
Here is an example of a GF grammar, consisting of three modules,
|
||||
as translated to GFCC. The representations are aligned, with the exceptions
|
||||
due to the alphabetical sorting of GFCC grammars.
|
||||
```
|
||||
grammar Ex(Eng,Swe);
|
||||
|
||||
abstract Ex = { abstract {
|
||||
cat
|
||||
S ; NP ; VP ;
|
||||
fun
|
||||
Pred : NP -> VP -> S ; Pred : NP,VP -> S = (Pred);
|
||||
She, They : NP ; She : -> NP = (She);
|
||||
Sleep : VP ; Sleep : -> VP = (Sleep);
|
||||
They : -> NP = (They);
|
||||
} } ;
|
||||
|
||||
concrete Eng of Ex = { concrete Eng {
|
||||
lincat
|
||||
S = {s : Str} ;
|
||||
NP = {s : Str ; n : Num} ;
|
||||
VP = {s : Num => Str} ;
|
||||
param
|
||||
Num = Sg | Pl ;
|
||||
lin
|
||||
Pred np vp = { Pred = [(($0!1),(($1!0)!($0!0)))];
|
||||
s = np.s ++ vp.s ! np.n} ;
|
||||
She = {s = "she" ; n = Sg} ; She = [0, "she"];
|
||||
They = {s = "they" ; n = Pl} ;
|
||||
Sleep = {s = table { Sleep = [("sleep" + ["s",""])];
|
||||
Sg => "sleeps" ;
|
||||
Pl => "sleep" They = [1, "they"];
|
||||
} } ;
|
||||
} ;
|
||||
}
|
||||
|
||||
concrete Swe of Ex = { concrete Swe {
|
||||
lincat
|
||||
S = {s : Str} ;
|
||||
NP = {s : Str} ;
|
||||
VP = {s : Str} ;
|
||||
param
|
||||
Num = Sg | Pl ;
|
||||
lin
|
||||
Pred np vp = { Pred = [(($0!0),($1!0))];
|
||||
s = np.s ++ vp.s} ;
|
||||
She = {s = "hon"} ; She = ["hon"];
|
||||
They = {s = "de"} ; They = ["de"];
|
||||
Sleep = {s = "sover"} ; Sleep = ["sover"];
|
||||
} } ;
|
||||
```
|
||||
|
||||
==The syntax of GFCC files==
|
||||
|
||||
===Top level===
|
||||
|
||||
A grammar has a header telling the name of the abstract syntax
|
||||
(often specifying an application domain), and the names of
|
||||
the concrete languages. The abstract syntax and the concrete
|
||||
syntaxes themselves follow.
|
||||
```
|
||||
Grammar ::= Header ";" Abstract ";" [Concrete] ;
|
||||
Header ::= "grammar" CId "(" [CId] ")" ;
|
||||
Abstract ::= "abstract" "{" [AbsDef] "}" ;
|
||||
Concrete ::= "concrete" CId "{" [CncDef] "}" ;
|
||||
```
|
||||
Abstract syntax judgements give typings and semantic definitions.
|
||||
Concrete syntax judgements give linearizations.
|
||||
```
|
||||
AbsDef ::= CId ":" Type "=" Exp ;
|
||||
CncDef ::= CId "=" Term ;
|
||||
```
|
||||
Also flags are possible, local to each "module" (i.e. abstract and concretes).
|
||||
```
|
||||
AbsDef ::= "%" CId "=" String ;
|
||||
CncDef ::= "%" CId "=" String ;
|
||||
```
|
||||
For the run-time system, the reference implementation in Haskell
|
||||
uses a structure that gives efficient look-up:
|
||||
```
|
||||
data GFCC = GFCC {
|
||||
absname :: CId ,
|
||||
cncnames :: [CId] ,
|
||||
abstract :: Abstr ,
|
||||
concretes :: Map CId Concr
|
||||
}
|
||||
|
||||
data Abstr = Abstr {
|
||||
funs :: Map CId Type, -- find the type of a fun
|
||||
cats :: Map CId [CId] -- find the funs giving a cat
|
||||
}
|
||||
|
||||
type Concr = Map CId Term
|
||||
```
|
||||
|
||||
|
||||
===Abstract syntax===
|
||||
|
||||
Types are first-order function types built from
|
||||
category symbols. Syntax trees (``Exp``) are
|
||||
rose trees with the head (``Atom``) either a function
|
||||
constant, a metavariable, or a string, integer, or float
|
||||
literal.
|
||||
```
|
||||
Type ::= [CId] "->" CId ;
|
||||
Exp ::= "(" Atom [Exp] ")" ;
|
||||
Atom ::= CId ; -- function constant
|
||||
Atom ::= "?" ; -- metavariable
|
||||
Atom ::= String ; -- string literal
|
||||
Atom ::= Integer ; -- integer literal
|
||||
Atom ::= Double ; -- float literal
|
||||
```
|
||||
|
||||
|
||||
===Concrete syntax===
|
||||
|
||||
Linearization terms (``Term``) are built as follows.
|
||||
Constructor names are shown to make the later code
|
||||
examples readable.
|
||||
```
|
||||
R. Term ::= "[" [Term] "]" ; -- array
|
||||
P. Term ::= "(" Term "!" Term ")" ; -- access to indexed field
|
||||
S. Term ::= "(" [Term] ")" ; -- sequence with ++
|
||||
K. Term ::= Tokn ; -- token
|
||||
V. Term ::= "$" Integer ; -- argument
|
||||
C. Term ::= Integer ; -- array index
|
||||
FV. Term ::= "[|" [Term] "|]" ; -- free variation
|
||||
TM. Term ::= "?" ; -- linearization of metavariable
|
||||
```
|
||||
Tokens are strings or (maybe obsolescent) prefix-dependent
|
||||
variant lists.
|
||||
```
|
||||
KS. Tokn ::= String ;
|
||||
KP. Tokn ::= "[" "pre" [String] "[" [Variant] "]" "]" ;
|
||||
Var. Variant ::= [String] "/" [String] ;
|
||||
```
|
||||
Three special forms of terms are introduced by the compiler
|
||||
as optimizations. They can in principle be eliminated, but
|
||||
their presence makes grammars much more compact. Their semantics
|
||||
will be explained in a later section.
|
||||
```
|
||||
F. Term ::= CId ; -- global constant
|
||||
W. Term ::= "(" String "+" Term ")" ; -- prefix + suffix table
|
||||
RP. Term ::= "(" Term "@" Term ")"; -- record parameter alias
|
||||
```
|
||||
Identifiers are like ``Ident`` in GF and GFC, except that
|
||||
the compiler produces constants prefixed with ``_`` in
|
||||
the common subterm elimination optimization.
|
||||
```
|
||||
token CId (('_' | letter) (letter | digit | '\'' | '_')*) ;
|
||||
```
|
||||
|
||||
|
||||
==The semantics of concrete syntax terms==
|
||||
|
||||
===Linearization and realization===
|
||||
|
||||
The linearization algorithm is essentially the same as in
|
||||
GFC: a tree is linearized by evaluating its linearization term
|
||||
in the environment of the linearizations of the subtrees.
|
||||
Literal atoms are linearized in the obvious way.
|
||||
The function also needs to know the language (i.e. concrete syntax)
|
||||
in which linearization is performed.
|
||||
```
|
||||
linExp :: GFCC -> CId -> Exp -> Term
|
||||
linExp mcfg lang tree@(Tr at trees) = case at of
|
||||
AC fun -> comp (Prelude.map lin trees) $ look fun
|
||||
AS s -> R [kks (show s)] -- quoted
|
||||
AI i -> R [kks (show i)]
|
||||
AF d -> R [kks (show d)]
|
||||
AM -> TM
|
||||
where
|
||||
lin = linExp mcfg lang
|
||||
comp = compute mcfg lang
|
||||
look = lookLin mcfg lang
|
||||
```
|
||||
The result of linearization is usually a record, which is realized as
|
||||
a string using the following algorithm.
|
||||
```
|
||||
realize :: Term -> String
|
||||
realize trm = case trm of
|
||||
R (t:_) -> realize t
|
||||
S ss -> unwords $ Prelude.map realize ss
|
||||
K (KS s) -> s
|
||||
K (KP s _) -> unwords s ---- prefix choice TODO
|
||||
W s t -> s ++ realize t
|
||||
FV (t:_) -> realize t
|
||||
TM -> "?"
|
||||
```
|
||||
Since the order of record fields is not necessarily
|
||||
the same as in GF source,
|
||||
this realization does not work securely for
|
||||
categories whose lincats more than one field.
|
||||
|
||||
|
||||
===Term evaluation===
|
||||
|
||||
Evaluation follows call-by-value order, with two environments
|
||||
needed:
|
||||
- the grammar (a concrete syntax) to give the global constants
|
||||
- an array of terms to give the subtree linearizations
|
||||
|
||||
|
||||
The code is presented in one-level pattern matching, to
|
||||
enable reimplementations in languages that do not permit
|
||||
deep patterns (such as Java and C++).
|
||||
```
|
||||
compute :: GFCC -> CId -> [Term] -> Term -> Term
|
||||
compute mcfg lang args = comp where
|
||||
comp trm = case trm of
|
||||
P r p -> proj (comp r) (comp p)
|
||||
RP i t -> RP (comp i) (comp t)
|
||||
W s t -> W s (comp t)
|
||||
R ts -> R $ Prelude.map comp ts
|
||||
V i -> idx args (fromInteger i) -- already computed
|
||||
F c -> comp $ look c -- not computed (if contains V)
|
||||
FV ts -> FV $ Prelude.map comp ts
|
||||
S ts -> S $ Prelude.filter (/= S []) $ Prelude.map comp ts
|
||||
_ -> trm
|
||||
|
||||
look = lookLin mcfg lang
|
||||
|
||||
idx xs i = xs !! i
|
||||
|
||||
proj r p = case (r,p) of
|
||||
(_, FV ts) -> FV $ Prelude.map (proj r) ts
|
||||
(W s t, _) -> kks (s ++ getString (proj t p))
|
||||
_ -> comp $ getField r (getIndex p)
|
||||
|
||||
getString t = case t of
|
||||
K (KS s) -> s
|
||||
_ -> trace ("ERROR in grammar compiler: string from "++ show t) "ERR"
|
||||
|
||||
getIndex t = case t of
|
||||
C i -> fromInteger i
|
||||
RP p _ -> getIndex p
|
||||
TM -> 0 -- default value for parameter
|
||||
_ -> trace ("ERROR in grammar compiler: index from " ++ show t) 0
|
||||
|
||||
getField t i = case t of
|
||||
R rs -> idx rs i
|
||||
RP _ r -> getField r i
|
||||
TM -> TM
|
||||
_ -> trace ("ERROR in grammar compiler: field from " ++ show t) t
|
||||
```
|
||||
|
||||
===The special term constructors===
|
||||
|
||||
The three forms introduced by the compiler may a need special
|
||||
explanation.
|
||||
|
||||
Global constants
|
||||
```
|
||||
Term ::= CId ;
|
||||
```
|
||||
are shorthands for complex terms. They are produced by the
|
||||
compiler by (iterated) common subexpression elimination.
|
||||
They are often more powerful than hand-devised code sharing in the source
|
||||
code. They could be computed off-line by replacing each identifier by
|
||||
its definition.
|
||||
|
||||
Prefix-suffix tables
|
||||
```
|
||||
Term ::= "(" String "+" Term ")" ;
|
||||
```
|
||||
represent tables of word forms divided to the longest common prefix
|
||||
and its array of suffixes. In the example grammar above, we have
|
||||
```
|
||||
Sleep = [("sleep" + ["s",""])]
|
||||
```
|
||||
which in fact is equal to the array of full forms
|
||||
```
|
||||
["sleeps", "sleep"]
|
||||
```
|
||||
The power of this construction comes from the fact that suffix sets
|
||||
tend to be repeated in a language, and can therefore be collected
|
||||
by common subexpression elimination. It is this technique that
|
||||
explains the used syntax rather than the more accurate
|
||||
```
|
||||
"(" String "+" [String] ")"
|
||||
```
|
||||
since we want the suffix part to be a ``Term`` for the optimization to
|
||||
take effect.
|
||||
|
||||
The most curious construct of GFCC is the parameter array alias,
|
||||
```
|
||||
Term ::= "(" Term "@" Term ")";
|
||||
```
|
||||
This form is used as the value of parameter records, such as the type
|
||||
```
|
||||
{n : Number ; p : Person}
|
||||
```
|
||||
The problem with parameter records is their double role.
|
||||
They can be used like parameter values, as indices in selection,
|
||||
```
|
||||
VP.s ! {n = Sg ; p = P3}
|
||||
```
|
||||
but also as records, from which parameters can be projected:
|
||||
```
|
||||
{n = Sg ; p = P3}.n
|
||||
```
|
||||
Whichever use is selected as primary, a prohibitively complex
|
||||
case expression must be generated at compilation to GFCC to get the
|
||||
other use. The adopted
|
||||
solution is to generate a pair containing both a parameter value index
|
||||
and an array of indices of record fields. For instance, if we have
|
||||
```
|
||||
param Number = Sg | Pl ; Person = P1 | P2 | P3 ;
|
||||
```
|
||||
we get the encoding
|
||||
```
|
||||
{n = Sg ; p = P3} ---> (2 @ [0,2])
|
||||
```
|
||||
The GFCC computation rules are essentially
|
||||
```
|
||||
(t ! (i @ _)) = (t ! i)
|
||||
((_ @ r) ! j) =(r ! j)
|
||||
```
|
||||
|
||||
|
||||
==Compiling to GFCC==
|
||||
|
||||
Compilation to GFCC is performed by the GF grammar compiler, and
|
||||
GFCC interpreters need not know what it does. For grammar writers,
|
||||
however, it might be interesting to know what happens to the grammars
|
||||
in the process.
|
||||
|
||||
The compilation phases are the following
|
||||
+ translate GF source to GFC, as always in GF
|
||||
+ undo GFC back-end optimizations
|
||||
+ perform the ``values`` optimization to normalize tables
|
||||
+ create a symbol table mapping the GFC parameter and record types to
|
||||
fixed-size arrays, and parameter values and record labels to integers
|
||||
+ traverse the linearization rules replacing parameters and labels by integers
|
||||
+ reorganize the created GFC grammar so that it has just one abstract syntax
|
||||
and one concrete syntax per language
|
||||
+ apply UTF8 encoding to the grammar, if not yet applied (this is told by the
|
||||
``coding`` flag)
|
||||
+ translate the GFC syntax tree to a GFCC syntax tree, using a simple
|
||||
compositional mapping
|
||||
+ perform the word-suffix optimization on GFCC linearization terms
|
||||
+ perform subexpression elimination on each concrete syntax module
|
||||
+ print out the GFCC code
|
||||
|
||||
|
||||
Notice that a major part of the compilation is done within GFC, so that
|
||||
GFC-related tasks (such as parser generation) could be performed by
|
||||
using the old algorithms.
|
||||
|
||||
|
||||
===Problems in GFCC compilation===
|
||||
|
||||
Two major problems had to be solved in compiling GFC to GFCC:
|
||||
- consistent order of tables and records, to permit the array translation
|
||||
- run-time variables in complex parameter values.
|
||||
|
||||
|
||||
The current implementation is still experimental and may fail
|
||||
to generate correct code. Any errors remaining are likely to be
|
||||
related to the two problems just mentioned.
|
||||
|
||||
The order problem is solved in different ways for tables and records.
|
||||
For tables, the ``values`` optimization of GFC already manages to
|
||||
maintain a canonical order. But this order can be destroyed by the
|
||||
``share`` optimization. To make sure that GFCC compilation works properly,
|
||||
it is safest to recompile the GF grammar by using the ``values``
|
||||
optimization flag.
|
||||
|
||||
Records can be canonically ordered by sorting them by labels.
|
||||
In fact, this was done in connection of the GFCC work as a part
|
||||
of the GFC generation, to guarantee consistency. This means that
|
||||
e.g. the ``s`` field will in general no longer appear as the first
|
||||
field, even if it does so in the GF source code. But relying on the
|
||||
order of fields in a labelled record would be misplaced anyway.
|
||||
|
||||
The canonical form of records is further complicated by lock fields,
|
||||
i.e. dummy fields of form ``lock_C = <>``, which are added to grammar
|
||||
libraries to force intensionality of linearization types. The problem
|
||||
is that the absence of a lock field only generates a warning, not
|
||||
an error. Therefore a GFC grammar can contain objects of the same
|
||||
type with and without a lock field. This problem was solved in GFCC
|
||||
generation by just removing all lock fields (defined as fields whose
|
||||
type is the empty record type). This has the further advantage of
|
||||
(slightly) reducing the grammar size. More importantly, it is safe
|
||||
to remove lock fields, because they are never used in computation,
|
||||
and because intensional types are only needed in grammars reused
|
||||
as libraries, not in grammars used at runtime.
|
||||
|
||||
While the order problem is rather bureaucratic in nature, run-time
|
||||
variables are an interesting problem. They arise in the presence
|
||||
of complex parameter values, created by argument-taking constructors
|
||||
and parameter records. To give an example, consider the GF parameter
|
||||
type system
|
||||
```
|
||||
Number = Sg | Pl ;
|
||||
Person = P1 | P2 | P3 ;
|
||||
Agr = Ag Number Person ;
|
||||
```
|
||||
The values can be translated to integers in the expected way,
|
||||
```
|
||||
Sg = 0, Pl = 1
|
||||
P1 = 0, P2 = 1, P3 = 2
|
||||
Ag Sg P1 = 0, Ag Sg P2 = 1, Ag Sg P3 = 2,
|
||||
Ag Pl P1 = 3, Ag Pl P2 = 4, Ag Pl P3 = 5
|
||||
```
|
||||
However, an argument of ``Agr`` can be a run-time variable, as in
|
||||
```
|
||||
Ag np.n P3
|
||||
```
|
||||
This expression must first be translated to a case expression,
|
||||
```
|
||||
case np.n of {
|
||||
0 => 2 ;
|
||||
1 => 5
|
||||
}
|
||||
```
|
||||
which can then be translated to the GFCC term
|
||||
```
|
||||
([2,5] ! ($0 ! $1))
|
||||
```
|
||||
assuming that the variable ``np`` is the first argument and that its
|
||||
``Number`` field is the second in the record.
|
||||
|
||||
This transformation of course has to be performed recursively, since
|
||||
there can be several run-time variables in a parameter value:
|
||||
```
|
||||
Ag np.n np.p
|
||||
```
|
||||
A similar transformation would be possible to deal with the double
|
||||
role of parameter records discussed above. Thus the type
|
||||
```
|
||||
RNP = {n : Number ; p : Person}
|
||||
```
|
||||
could be uniformly translated into the set ``{0,1,2,3,4,5}``
|
||||
as ``Agr`` above. Selections would be simple instances of indexing.
|
||||
But any projection from the record should be translated into
|
||||
a case expression,
|
||||
```
|
||||
rnp.n ===>
|
||||
case rnp of {
|
||||
0 => 0 ;
|
||||
1 => 0 ;
|
||||
2 => 0 ;
|
||||
3 => 1 ;
|
||||
4 => 1 ;
|
||||
5 => 1
|
||||
}
|
||||
```
|
||||
To avoid the code bloat resulting from this, we chose the alias representation
|
||||
which is easy enough to deal with in interpreters.
|
||||
|
||||
|
||||
===The representation of linearization types===
|
||||
|
||||
Linearization types (``lincat``) are not needed when generating with
|
||||
GFCC, but they have been added to enable parser generation directly from
|
||||
GFCC. The linearization type definitions are shown as a part of the
|
||||
concrete syntax, by using terms to represent types. Here is the table
|
||||
showing how different linearization types are encoded.
|
||||
```
|
||||
P* = size(P) -- parameter type
|
||||
{_ : I ; __ : R}* = (I* @ R*) -- record of parameters
|
||||
{r1 : T1 ; ... ; rn : Tn}* = [T1*,...,Tn*] -- other record
|
||||
(P => T)* = [T* ,...,T*] -- size(P) times
|
||||
Str* = ()
|
||||
```
|
||||
The category symbols are prefixed with two underscores (``__``).
|
||||
For example, the linearization type ``present/CatEng.NP`` is
|
||||
translated as follows:
|
||||
```
|
||||
NP = {
|
||||
a : { -- 6 = 2*3 values
|
||||
n : {ParamX.Number} ; -- 2 values
|
||||
p : {ParamX.Person} -- 3 values
|
||||
} ;
|
||||
s : {ResEng.Case} => Str -- 3 values
|
||||
}
|
||||
|
||||
__NP = [(6@[2,3]),[(),(),()]]
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
===Running the compiler and the GFCC interpreter===
|
||||
|
||||
GFCC generation is a part of the
|
||||
[developers' version http://www.cs.chalmers.se/Cs/Research/Language-technology/darcs/GF/doc/darcs.html]
|
||||
of GF since September 2006. To invoke the compiler, the flag
|
||||
``-printer=gfcc`` to the command
|
||||
``pm = print_multi`` is used. It is wise to recompile the grammar from
|
||||
source, since previously compiled libraries may not obey the canonical
|
||||
order of records. To ``strip`` the grammar before
|
||||
GFCC translation removes unnecessary interface references.
|
||||
Here is an example, performed in
|
||||
[example/bronzeage ../../../../../examples/bronzeage].
|
||||
```
|
||||
i -src -path=.:prelude:resource-1.0/* -optimize=all_subs BronzeageEng.gf
|
||||
i -src -path=.:prelude:resource-1.0/* -optimize=all_subs BronzeageGer.gf
|
||||
strip
|
||||
pm -printer=gfcc | wf bronze.gfcc
|
||||
```
|
||||
|
||||
|
||||
|
||||
==The reference interpreter==
|
||||
|
||||
The reference interpreter written in Haskell consists of the following files:
|
||||
```
|
||||
-- source file for BNFC
|
||||
GFCC.cf -- labelled BNF grammar of gfcc
|
||||
|
||||
-- files generated by BNFC
|
||||
AbsGFCC.hs -- abstrac syntax of gfcc
|
||||
ErrM.hs -- error monad used internally
|
||||
LexGFCC.hs -- lexer of gfcc files
|
||||
ParGFCC.hs -- parser of gfcc files and syntax trees
|
||||
PrintGFCC.hs -- printer of gfcc files and syntax trees
|
||||
|
||||
-- hand-written files
|
||||
DataGFCC.hs -- post-parser grammar creation, linearization and evaluation
|
||||
GenGFCC.hs -- random and exhaustive generation, generate-and-test parsing
|
||||
RunGFCC.hs -- main function - a simple command interpreter
|
||||
```
|
||||
It is included in the
|
||||
[developers' version http://www.cs.chalmers.se/Cs/Research/Language-technology/darcs/GF/doc/darcs.html]
|
||||
of GF, in the subdirectory [``GF/src/GF/Canon/GFCC`` ../].
|
||||
|
||||
To compile the interpreter, type
|
||||
```
|
||||
make gfcc
|
||||
```
|
||||
in ``GF/src``. To run it, type
|
||||
```
|
||||
./gfcc <GFCC-file>
|
||||
```
|
||||
The available commands are
|
||||
- ``gr <Cat> <Int>``: generate a number of random trees in category.
|
||||
and show their linearizations in all languages
|
||||
- ``grt <Cat> <Int>``: generate a number of random trees in category.
|
||||
and show the trees and their linearizations in all languages
|
||||
- ``gt <Cat> <Int>``: generate a number of trees in category from smallest,
|
||||
and show their linearizations in all languages
|
||||
- ``gtt <Cat> <Int>``: generate a number of trees in category from smallest,
|
||||
and show the trees and their linearizations in all languages
|
||||
- ``p <Int> <Cat> <String>``: "parse", i.e. generate trees until match or
|
||||
until the given number have been generated
|
||||
- ``<Tree>``: linearize tree in all languages, also showing full records
|
||||
- ``quit``: terminate the system cleanly
|
||||
|
||||
|
||||
==Interpreter in C++==
|
||||
|
||||
A base-line interpreter in C++ has been started.
|
||||
Its main functionality is random generation of trees and linearization of them.
|
||||
|
||||
Here are some results from running the different interpreters, compared
|
||||
to running the same grammar in GF, saved in ``.gfcm`` format.
|
||||
The grammar contains the English, German, and Norwegian
|
||||
versions of Bronzeage. The experiment was carried out on
|
||||
Ubuntu Linux laptop with 1.5 GHz Intel centrino processor.
|
||||
|
||||
|| | GF | gfcc(hs) | gfcc++ |
|
||||
| program size | 7249k | 803k | 113k
|
||||
| grammar size | 336k | 119k | 119k
|
||||
| read grammar | 1150ms | 510ms | 100ms
|
||||
| generate 222 | 9500ms | 450ms | 800ms
|
||||
| memory | 21M | 10M | 20M
|
||||
|
||||
|
||||
|
||||
To summarize:
|
||||
- going from GF to gfcc is a major win in both code size and efficiency
|
||||
- going from Haskell to C++ interpreter is not a win yet, because of a space
|
||||
leak in the C++ version
|
||||
|
||||
|
||||
|
||||
==Some things to do==
|
||||
|
||||
Interpreter in Java.
|
||||
|
||||
Parsing via MCFG
|
||||
- the FCFG format can possibly be simplified
|
||||
- parser grammars should be saved in files to make interpreters easier
|
||||
|
||||
|
||||
Hand-written parsers for GFCC grammars to reduce code size
|
||||
(and efficiency?) of interpreters.
|
||||
|
||||
Binary format and/or file compression of GFCC output.
|
||||
|
||||
Syntax editor based on GFCC.
|
||||
|
||||
Rewriting of resource libraries in order to exploit the
|
||||
word-suffix sharing better (depth-one tables, as in FM).
|
||||
|
||||
|
||||
|
||||
@@ -1,180 +0,0 @@
|
||||
GFCC Syntax
|
||||
|
||||
|
||||
==Syntax of GFCC files==
|
||||
|
||||
The parser syntax is very simple, as defined in BNF:
|
||||
```
|
||||
Grm. Grammar ::= [RExp] ;
|
||||
|
||||
App. RExp ::= "(" CId [RExp] ")" ;
|
||||
AId. RExp ::= CId ;
|
||||
AInt. RExp ::= Integer ;
|
||||
AStr. RExp ::= String ;
|
||||
AFlt. RExp ::= Double ;
|
||||
AMet. RExp ::= "?" ;
|
||||
|
||||
terminator RExp "" ;
|
||||
|
||||
token CId (('_' | letter) (letter | digit | '\'' | '_')*) ;
|
||||
```
|
||||
While a parser and a printer can be generated for many languages
|
||||
from this grammar by using the BNF Converter, a parser is also
|
||||
easy to write by hand using recursive descent.
|
||||
|
||||
|
||||
==Syntax of well-formed GFCC code==
|
||||
|
||||
Here is a summary of well-formed syntax,
|
||||
with a comment on the semantics of each construction.
|
||||
```
|
||||
Grammar ::=
|
||||
("grammar" CId CId*) -- abstract syntax name and concrete syntax names
|
||||
"(" "flags" Flag* ")" -- global and abstract flags
|
||||
"(" "abstract" Abstract ")" -- abstract syntax
|
||||
"(" "concrete" Concrete* ")" -- concrete syntaxes
|
||||
|
||||
Abstract ::=
|
||||
"(" "fun" FunDef* ")" -- function definitions
|
||||
"(" "cat" CatDef* ")" -- category definitions
|
||||
|
||||
Concrete ::=
|
||||
"(" CId -- language name
|
||||
"flags" Flag* -- concrete flags
|
||||
"lin" LinDef* -- linearization rules
|
||||
"oper" LinDef* -- operations (macros)
|
||||
"lincat" LinDef* -- linearization type definitions
|
||||
"lindef" LinDef* -- linearization default definitions
|
||||
"printname" LinDef* -- printname definitions
|
||||
"param" LinDef* -- lincats with labels and parameter value names
|
||||
")"
|
||||
|
||||
Flag ::= "(" CId String ")" -- flag and value
|
||||
FunDef ::= "(" CId Type Exp ")" -- function, type, and definition
|
||||
CatDef ::= "(" CId Hypo* ")" -- category and context
|
||||
LinDef ::= "(" CId Term ")" -- function and definition
|
||||
|
||||
Type ::=
|
||||
"(" CId -- value category
|
||||
"(" "H" Hypo* ")" -- argument context
|
||||
"(" "X" Exp* ")" ")" -- arguments (of dependent value type)
|
||||
|
||||
Exp ::=
|
||||
"(" CId -- function
|
||||
"(" "B" CId* ")" -- bindings
|
||||
"(" "X" Exp* ")" ")" -- arguments
|
||||
| CId -- variable
|
||||
| "?" -- metavariable
|
||||
| "(" "Eq" Equation* ")" -- group of pattern equations
|
||||
| Integer -- integer literal (non-negative)
|
||||
| Float -- floating-point literal (non-negative)
|
||||
| String -- string literal (in double quotes)
|
||||
|
||||
Hypo ::= "(" CId Type ")" -- variable and type
|
||||
|
||||
Equation ::= "(" "E" Exp Exp* ")" -- value and pattern list
|
||||
|
||||
Term ::=
|
||||
"(" "R" Term* ")" -- array (record or table)
|
||||
| "(" "S" Term* ")" -- concatenated sequence
|
||||
| "(" "FV" Term* ")" -- free variant list
|
||||
| "(" "P" Term Term ")" -- access to index (projection or selection)
|
||||
| "(" "W" String Term ")" -- token prefix with suffix list
|
||||
| "(" "A" Integer ")" -- pointer to subtree
|
||||
| String -- token (in double quotes)
|
||||
| Integer -- index in array
|
||||
| CId -- macro constant
|
||||
| "?" -- metavariable
|
||||
```
|
||||
|
||||
|
||||
==GFCC interpreter==
|
||||
|
||||
The first phase in interpreting GFCC is to parse a GFCC file and
|
||||
build an internal abstract syntax representation, as specified
|
||||
in the previous section.
|
||||
|
||||
With this representation, linearization can be performed by
|
||||
a straightforward function from expressions (``Exp``) to terms
|
||||
(``Term``). All expressions except groups of pattern equations
|
||||
can be linearized.
|
||||
|
||||
Here is a reference Haskell implementation of linearization:
|
||||
```
|
||||
linExp :: GFCC -> CId -> Exp -> Term
|
||||
linExp gfcc lang tree@(DTr _ at trees) = case at of
|
||||
AC fun -> comp (map lin trees) $ look fun
|
||||
AS s -> R [K (show s)] -- quoted
|
||||
AI i -> R [K (show i)]
|
||||
AF d -> R [K (show d)]
|
||||
AM -> TM
|
||||
where
|
||||
lin = linExp gfcc lang
|
||||
comp = compute gfcc lang
|
||||
look = lookLin gfcc lang
|
||||
```
|
||||
TODO: bindings must be supported.
|
||||
|
||||
Terms resulting from linearization are evaluated in
|
||||
call-by-value order, with two environments needed:
|
||||
- the grammar (a concrete syntax) to give the global constants
|
||||
- an array of terms to give the subtree linearizations
|
||||
|
||||
|
||||
The Haskell implementation works as follows:
|
||||
```
|
||||
compute :: GFCC -> CId -> [Term] -> Term -> Term
|
||||
compute gfcc lang args = comp where
|
||||
comp trm = case trm of
|
||||
P r p -> proj (comp r) (comp p)
|
||||
W s t -> W s (comp t)
|
||||
R ts -> R $ map comp ts
|
||||
V i -> idx args (fromInteger i) -- already computed
|
||||
F c -> comp $ look c -- not computed (if contains V)
|
||||
FV ts -> FV $ Prelude.map comp ts
|
||||
S ts -> S $ Prelude.filter (/= S []) $ Prelude.map comp ts
|
||||
_ -> trm
|
||||
|
||||
look = lookOper gfcc lang
|
||||
|
||||
idx xs i = xs !! i
|
||||
|
||||
proj r p = case (r,p) of
|
||||
(_, FV ts) -> FV $ Prelude.map (proj r) ts
|
||||
(FV ts, _ ) -> FV $ Prelude.map (\t -> proj t p) ts
|
||||
(W s t, _) -> kks (s ++ getString (proj t p))
|
||||
_ -> comp $ getField r (getIndex p)
|
||||
|
||||
getString t = case t of
|
||||
K (KS s) -> s
|
||||
_ -> trace ("ERROR in grammar compiler: string from "++ show t) "ERR"
|
||||
|
||||
getIndex t = case t of
|
||||
C i -> fromInteger i
|
||||
RP p _ -> getIndex p
|
||||
TM -> 0 -- default value for parameter
|
||||
_ -> trace ("ERROR in grammar compiler: index from " ++ show t) 0
|
||||
|
||||
getField t i = case t of
|
||||
R rs -> idx rs i
|
||||
RP _ r -> getField r i
|
||||
TM -> TM
|
||||
_ -> trace ("ERROR in grammar compiler: field from " ++ show t) t
|
||||
```
|
||||
The result of linearization is usually a record, which is realized as
|
||||
a string using the following algorithm.
|
||||
```
|
||||
realize :: Term -> String
|
||||
realize trm = case trm of
|
||||
R (t:_) -> realize t
|
||||
S ss -> unwords $ map realize ss
|
||||
K s -> s
|
||||
W s t -> s ++ realize t
|
||||
FV (t:_) -> realize t -- TODO: all variants
|
||||
TM -> "?"
|
||||
```
|
||||
Notice that realization always picks the first field of a record.
|
||||
If a linearization type has more than one field, the first field
|
||||
does not necessarily contain the desired string.
|
||||
Also notice that the order of record fields in GFCC is not necessarily
|
||||
the same as in GF source.
|
||||
@@ -1,153 +0,0 @@
|
||||
Procedure for making a GF release:
|
||||
|
||||
1. Make sure everything that should be in the release has been
|
||||
checked in.
|
||||
|
||||
2. Go to the src/ dir.
|
||||
|
||||
$ cd src
|
||||
|
||||
3. Edit configure.ac to set the right version number
|
||||
(the second argument to the AC_INIT macro).
|
||||
|
||||
4. Edit gf.spec to set the version and release numbers
|
||||
(change %define version and %define release).
|
||||
|
||||
5. Commit configure.ac and gf.spec:
|
||||
|
||||
$ darcs record -m 'Updated version numbers.' configure.ac gf.spec
|
||||
|
||||
6. Run autoconf to generate configure with the right version number:
|
||||
|
||||
$ autoconf
|
||||
|
||||
7. Go back to the root of the tree.
|
||||
|
||||
$ cd ..
|
||||
|
||||
8. Tag the release. (X_X should be replaced by the version number, with
|
||||
_ instead of ., e.g. 2_0)
|
||||
|
||||
$ darcs tag -m RELEASE-X_X
|
||||
|
||||
9. Push the changes that you made for the release to the main repo:
|
||||
|
||||
$ darcs push
|
||||
|
||||
10. Build a source package:
|
||||
|
||||
$ cd src
|
||||
$ ./configure
|
||||
$ make dist
|
||||
|
||||
11. (Only if releasing a new grammars distribution)
|
||||
Build a grammar tarball:
|
||||
|
||||
$ cd src
|
||||
$ ./configure && make grammar-dist
|
||||
|
||||
12. Build an x86/linux RPM (should be done on a Mandrake Linux box):
|
||||
|
||||
Setup for building RPMs (first time only):
|
||||
|
||||
- Make sure that you have the directories neccessary to build
|
||||
RPMs:
|
||||
|
||||
$ mkdir -p ~/rpm/{BUILD,RPMS/i586,RPMS/noarch,SOURCES,SRPMS,SPECS,tmp}
|
||||
|
||||
- Create ~/.rpmrc with the following contents:
|
||||
|
||||
buildarchtranslate: i386: i586
|
||||
buildarchtranslate: i486: i586
|
||||
buildarchtranslate: i586: i586
|
||||
buildarchtranslate: i686: i586
|
||||
|
||||
- Create ~/.rpmmacros with the following contents:
|
||||
|
||||
%_topdir %(echo ${HOME}/rpm)
|
||||
%_tmppath %{_topdir}/tmp
|
||||
|
||||
%packager Your Name <yourusername@cs.chalmers.se>
|
||||
|
||||
Build the RPM:
|
||||
|
||||
$ cd src
|
||||
$ ./configure && make rpm
|
||||
|
||||
13. Build a generic binary x86/linux package (should be done on a Linux box,
|
||||
e.g. banded.medic.chalmers.se):
|
||||
|
||||
$ cd src
|
||||
$ ./configure --host=i386-pc-linux-gnu && make binary-dist
|
||||
|
||||
14. Build a generic binary sparc/solaris package (should be done
|
||||
on a Solaris box, e.g. remote1.cs.chalmers.se):
|
||||
|
||||
$ cd src
|
||||
$ ./configure --host=sparc-sun-solaris2 && gmake binary-dist
|
||||
|
||||
15. Build a Mac OS X package (should be done on a Mac OS X box,
|
||||
e.g. csmisc99.cs.chalmers.se):
|
||||
|
||||
$ cd src
|
||||
$ ./configure && make binary-dist
|
||||
|
||||
Note that to run GHC-compiled binaries on OS X, you need
|
||||
a "Haskell Support Framework". This should be available
|
||||
separately from the GF download page.
|
||||
|
||||
TODO: Use OS X PackageMaker to build a .pkg-file which can
|
||||
be installed using the standard OS X Installer program.
|
||||
|
||||
16. Build a binary Cygwin package (should be done on a Windows
|
||||
machine with Cygwin):
|
||||
|
||||
$ cd src
|
||||
$ ./configure && make binary-dist
|
||||
|
||||
17. Build a Windows MSI package (FIXME: This doesn't work right,
|
||||
pathnames with backslashes and spaces are not handled
|
||||
correctly in Windows. We only release a binary tarball
|
||||
for Cygwin right now.):
|
||||
|
||||
$ cd src
|
||||
$ ./configure && make all windows-msi
|
||||
|
||||
18. Add new GF package release to SourceForge:
|
||||
|
||||
- https://sourceforge.net/projects/gf-tools
|
||||
|
||||
- Project page -> Admin -> File releases -> Add release (for the
|
||||
GF package)
|
||||
|
||||
- New release name: X.X (just the version number, e.g. 2.2)
|
||||
|
||||
- Paste in release notes
|
||||
|
||||
- Upload files using anonymous FTP to upload.sourceforge.net
|
||||
in the incoming directory.
|
||||
|
||||
- Add the files to the release and set the processor
|
||||
and file type for each file (remember to press
|
||||
Update/Refresh for each file):
|
||||
* x86 rpm -> i386/.rpm
|
||||
* source rpm -> Any/Source .rpm
|
||||
* x86 binary tarball -> i386/.gz
|
||||
* sparc binary tarball -> Sparc/.gz
|
||||
* source package -> Any/Source .gz
|
||||
|
||||
19. Add new GF-editor release. Repeat the steps above, but
|
||||
with GF-editor:
|
||||
|
||||
- Add files and set properties:
|
||||
|
||||
* editor rpm -> i386/.rpm (not really true, but I haven't
|
||||
figured out how to make noarch rpms from the same spec as
|
||||
arch-specific ones)
|
||||
|
||||
20. Mail to gf-tools-users@lists.sourceforge.net
|
||||
|
||||
21. Update website.
|
||||
|
||||
22. Party!
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user