1
0
forked from GitHub/gf-core

more rm in doc

This commit is contained in:
aarne
2008-06-27 11:32:49 +00:00
parent 032531c6a6
commit 64d2a981a9
7 changed files with 0 additions and 4411 deletions

View File

@@ -1,221 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
<TITLE>Graduate Course: GF (Grammatical Framework)</TITLE>
</HEAD><BODY BGCOLOR="white" TEXT="black">
<P ALIGN="center"><CENTER><H1>Graduate Course: GF (Grammatical Framework)</H1>
<FONT SIZE="4">
<I>Aarne Ranta</I><BR>
Wed Oct 24 09:49:27 2007
</FONT></CENTER>
<P>
<A HREF="http://www.gslt.hum.gu.se">GSLT</A>,
<A HREF="http://ngslt.org/">NGSLT</A>,
and
<A HREF="http://www.chalmers.se/cse/EN/">Department of Computer Science and Engineering</A>,
Chalmers University of Technology and Gothenburg University.
</P>
<P>
Autumn Term 2007.
</P>
<H1>News</H1>
<P>
24/10 Tomorrow's session starts at 8.15. A detailed plan has been added to
the table below. Material (new chapters) will appear later today.
It will explain some of the files in
</P>
<UL>
<LI><A HREF="http://digitalgrammars.com/gf/examples/tutorial/syntax/"><CODE>syntax/</CODE></A>:
linguistic grammar programming
<LI><A HREF="http://digitalgrammars.com/gf/examples/tutorial/semantics/"><CODE>semantics/</CODE></A>:
a question-answer system based on logical semantics
</UL>
<P>
12/9 The course starts tomorrow at 8.00. A detailed plan for the day is
right below. Don't forget to
</P>
<UL>
<LI>join the mailing list (send a mail to <CODE>gf-subscribe at gslt hum gu se</CODE>)
<LI>install GF on your laptops from <A HREF="../download.html">here</A>
<LI>take with you a copy of the book (as sent to the mailing list yesterday)
</UL>
<P>
31/8 Revised the description of the one- and five-point variants.
</P>
<P>
21/8 Course mailing list started.
To subscribe, send a mail to <CODE>gf-subscribe at gslt hum gu se</CODE>
(replacing spaces by dots except around the word at, where the spaces
are just removed, and the word itself is replaced by the at symbol).
</P>
<P>
20/8/2007 <A HREF="http://www.gslt.hum.gu.se/courses/schedule.html">Schedule</A>.
The course will start on Thursday 13 September in Room C430 at the Humanities
Building of Gothenburg University ("Humanisten").
</P>
<H1>Plan</H1>
<P>
First week (13-14/9)
</P>
<TABLE CELLPADDING="4" BORDER="1">
<TR>
<TH>Time</TH>
<TH>Subject</TH>
<TH COLSPAN="2">Assignment</TH>
</TR>
<TR>
<TD>Thu 8.00-9.30</TD>
<TD>Chapters 1-3</TD>
<TD>Hello and Food in a new language</TD>
</TR>
<TR>
<TD>Thu 10.00-11.30</TD>
<TD>Chapters 3-4</TD>
<TD>Foods in a new language</TD>
</TR>
<TR>
<TD>Thu 13.15-14.45</TD>
<TD>Chapter 5</TD>
<TD>ExtFoods in a new language</TD>
</TR>
<TR>
<TD>Thu 15.15-16.45</TD>
<TD>Chapters 6-7</TD>
<TD>straight code compiler</TD>
</TR>
<TR>
<TD>Fri 8.00-9.30</TD>
<TD>Chapters 8</TD>
<TD>application in Haskell or Java</TD>
</TR>
</TABLE>
<P></P>
<P>
Second week (25/10)
</P>
<TABLE CELLPADDING="4" BORDER="1">
<TR>
<TH>Time</TH>
<TH>Subject</TH>
<TH COLSPAN="2">Assignment</TH>
</TR>
<TR>
<TD>Thu 8.15-9.45</TD>
<TD>Chapters 13-15</TD>
<TD>mini resource in a new language</TD>
</TR>
<TR>
<TD>Thu 10.15-11.45</TD>
<TD>Chapters 12,16</TD>
<TD>query system for a new domain</TD>
</TR>
<TR>
<TD>Thu 13.15-14.45</TD>
<TD>presentations</TD>
<TD>explain your own project</TD>
</TR>
</TABLE>
<P></P>
<P>
The structure of each lecture will be the following:
</P>
<UL>
<LI>ca. 75min lecture, going through the book
<LI>ca. 15min work on computer, individually or in pairs
</UL>
<P>
In order for this to work out, it is important that enough many
have a working GF installation, including the directory
<A HREF="../examples/tutorial"><CODE>examples/tutorial</CODE></A>. This directory is
included in the Darcs version, as well as in the updated binary
packages from 12 September.
</P>
<H1>Purpose</H1>
<P>
<A HREF="http://www.cs.chalmers.se/~aarne/GF/">GF</A>
(Grammatical Framework) is a grammar formalism, i.e. a special-purpose
programming language for writing grammars. It is suitable for many
natural language processing tasks, in particular,
</P>
<UL>
<LI>multilingual applications
<LI>systems where grammar-based components are needed for e.g.
parsing, translation, or speech recognition
</UL>
<P>
The goal of the course is to develop an understanding of GF and
practical skills in using it.
</P>
<H1>Contents</H1>
<P>
The course consists of two modules. The first module is a one-week
intensive course (during the first intensive week of GSLT), which
is as such usable as a one-week intensive course for doctoral studies,
if completed with a small course project.
</P>
<P>
The second module is a larger programming project, written
by each student (possibly working in groups) during the Autumn term.
The projects are discussed during the second intensive week of GSLT
(see <A HREF="http://www.gslt.hum.gu.se/courses/schedule.html">schedule</A>),
and presented at a date that will be set later.
</P>
<P>
The first module goes through the basics of GF, including
</P>
<UL>
<LI>using the GF programming language
<LI>writing multilingual grammars
<LI>using the
<A HREF="http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/doc/">GF resource grammar library</A>
<LI>generating speech recognition systems from GF grammars
<LI>using embedded grammars as components of software systems
</UL>
<P>
The lectures follow a draft of GF book. It contains a heavily updated
version os the
<A HREF="http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html">GF Tutorial</A>;
thus the on-line tutorial is not adequate for this course. To get the course
book, join the course mailing list.
</P>
<P>
Those who just want to do the first module will write a simple application
as their course work during and after the first intensive week.
</P>
<P>
Those who continue with the second module will choose a more substantial
project. Possible topics are
</P>
<UL>
<LI>building a dialogue system by using GF
<LI>implementing a multilingual document generator
<LI>experimenting with synthetized multilingual tree banks
<LI>extending the GF resource grammar library
</UL>
<H1>Prerequisites</H1>
<P>
Experience in programming. No earlier natural language processing
or functional programming experience is necessary.
</P>
<P>
The course is thus suitable both for GSLT and NGSLT students,
and for graduate students in computer science.
</P>
<P>
We will in particular welcome students from the Baltic countries
who wish to build resources for their own language in GF.
</P>
<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
<!-- cmdline: txt2tags gf-course.txt -->
</BODY></HTML>

View File

@@ -1,149 +0,0 @@
Graduate Course: GF (Grammatical Framework)
Aarne Ranta
%%date(%c)
% NOTE: this is a txt2tags file.
% Create an html file from this file using:
% txt2tags -thtml --toc gf-reference.html
%!target:html
[GSLT http://www.gslt.hum.gu.se],
[NGSLT http://ngslt.org/],
and
[Department of Computer Science and Engineering http://www.chalmers.se/cse/EN/],
Chalmers University of Technology and Gothenburg University.
Autumn Term 2007.
=News=
24/10 Tomorrow's session starts at 8.15. A detailed plan has been added to
the table below. Material (new chapters) will appear later today.
It will explain some of the files in
- [``syntax/`` http://digitalgrammars.com/gf/examples/tutorial/syntax/]:
linguistic grammar programming
- [``semantics/`` http://digitalgrammars.com/gf/examples/tutorial/semantics/]:
a question-answer system based on logical semantics
12/9 The course starts tomorrow at 8.00. A detailed plan for the day is
right below. Don't forget to
- join the mailing list (send a mail to ``gf-subscribe at gslt hum gu se``)
- install GF on your laptops from [here ../download.html]
- take with you a copy of the book (as sent to the mailing list yesterday)
31/8 Revised the description of the one- and five-point variants.
21/8 Course mailing list started.
To subscribe, send a mail to ``gf-subscribe at gslt hum gu se``
(replacing spaces by dots except around the word at, where the spaces
are just removed, and the word itself is replaced by the at symbol).
20/8/2007 [Schedule http://www.gslt.hum.gu.se/courses/schedule.html].
The course will start on Thursday 13 September in Room C430 at the Humanities
Building of Gothenburg University ("Humanisten").
=Plan=
First week (13-14/9)
|| Time | Subject | Assignment ||
| Thu 8.00-9.30 | Chapters 1-3 | Hello and Food in a new language |
| Thu 10.00-11.30 | Chapters 3-4 | Foods in a new language |
| Thu 13.15-14.45 | Chapter 5 | ExtFoods in a new language |
| Thu 15.15-16.45 | Chapters 6-7 | straight code compiler |
| Fri 8.00-9.30 | Chapters 8 | application in Haskell or Java |
Second week (25/10)
|| Time | Subject | Assignment ||
| Thu 8.15-9.45 | Chapters 13-15 | mini resource in a new language |
| Thu 10.15-11.45 | Chapters 12,16 | query system for a new domain |
| Thu 13.15-14.45 | presentations | explain your own project |
The structure of each lecture will be the following:
- ca. 75min lecture, going through the book
- ca. 15min work on computer, individually or in pairs
In order for this to work out, it is important that enough many
have a working GF installation, including the directory
[``examples/tutorial`` ../examples/tutorial]. This directory is
included in the Darcs version, as well as in the updated binary
packages from 12 September.
=Purpose=
[GF http://www.cs.chalmers.se/~aarne/GF/]
(Grammatical Framework) is a grammar formalism, i.e. a special-purpose
programming language for writing grammars. It is suitable for many
natural language processing tasks, in particular,
- multilingual applications
- systems where grammar-based components are needed for e.g.
parsing, translation, or speech recognition
The goal of the course is to develop an understanding of GF and
practical skills in using it.
=Contents=
The course consists of two modules. The first module is a one-week
intensive course (during the first intensive week of GSLT), which
is as such usable as a one-week intensive course for doctoral studies,
if completed with a small course project.
The second module is a larger programming project, written
by each student (possibly working in groups) during the Autumn term.
The projects are discussed during the second intensive week of GSLT
(see [schedule http://www.gslt.hum.gu.se/courses/schedule.html]),
and presented at a date that will be set later.
The first module goes through the basics of GF, including
- using the GF programming language
- writing multilingual grammars
- using the
[GF resource grammar library http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/doc/]
- generating speech recognition systems from GF grammars
- using embedded grammars as components of software systems
The lectures follow a draft of GF book. It contains a heavily updated
version os the
[GF Tutorial http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html];
thus the on-line tutorial is not adequate for this course. To get the course
book, join the course mailing list.
Those who just want to do the first module will write a simple application
as their course work during and after the first intensive week.
Those who continue with the second module will choose a more substantial
project. Possible topics are
- building a dialogue system by using GF
- implementing a multilingual document generator
- experimenting with synthetized multilingual tree banks
- extending the GF resource grammar library
=Prerequisites=
Experience in programming. No earlier natural language processing
or functional programming experience is necessary.
The course is thus suitable both for GSLT and NGSLT students,
and for graduate students in computer science.
We will in particular welcome students from the Baltic countries
who wish to build resources for their own language in GF.

View File

@@ -1,699 +0,0 @@
=GF Command Help=
Each command has a long and a short name, options, and zero or more
arguments. Commands are sorted by functionality. The short name is
given first.
Commands and options marked with * are currently not implemented.
==Commands that change the state==
```
i, import: i File
Reads a grammar from File and compiles it into a GF runtime grammar.
Files "include"d in File are read recursively, nubbing repetitions.
If a grammar with the same language name is already in the state,
it is overwritten - but only if compilation succeeds.
The grammar parser depends on the file name suffix:
.gf normal GF source
.gfc canonical GF
.gfr precompiled GF resource
.gfcm multilingual canonical GF
.gfe example-based grammar files (only with the -ex option)
.gfwl multilingual word list (preprocessed to abs + cncs)
.ebnf Extended BNF format
.cf Context-free (BNF) format
.trc TransferCore format
options:
-old old: parse in GF<2.0 format (not necessary)
-v verbose: give lots of messages
-s silent: don't give error messages
-src from source: ignore precompiled gfc and gfr files
-gfc from gfc: use compiled modules whenever they exist
-retain retain operations: read resource modules (needed in comm cc)
-nocf don't build old-style context-free grammar (default without HOAS)
-docf do build old-style context-free grammar (default with HOAS)
-nocheckcirc don't eliminate circular rules from CF
-cflexer build an optimized parser with separate lexer trie
-noemit do not emit code (default with old grammar format)
-o do emit code (default with new grammar format)
-ex preprocess .gfe files if needed
-prob read probabilities from top grammar file (format --# prob Fun Double)
-treebank read a treebank file to memory (xml format)
flags:
-abs set the name used for abstract syntax (with -old option)
-cnc set the name used for concrete syntax (with -old option)
-res set the name used for resource (with -old option)
-path use the (colon-separated) search path to find modules
-optimize select an optimization to override file-defined flags
-conversion select parsing method (values strict|nondet)
-probs read probabilities from file (format (--# prob) Fun Double)
-preproc use a preprocessor on each source file
-noparse read nonparsable functions from file (format --# noparse Funs)
examples:
i English.gf -- ordinary import of Concrete
i -retain german/ParadigmsGer.gf -- import of Resource to test
r, reload: r
Executes the previous import (i) command.
rl, remove_language: rl Language
Takes away the language from the state.
e, empty: e
Takes away all languages and resets all global flags.
sf, set_flags: sf Flag*
The values of the Flags are set for Language. If no language
is specified, the flags are set globally.
examples:
sf -nocpu -- stop showing CPU time
sf -lang=Swe -- make Swe the default concrete
s, strip: s
Prune the state by removing source and resource modules.
dc, define_command Name Anything
Add a new defined command. The Name must star with '%'. Later,
if 'Name X' is used, it is replaced by Anything where #1 is replaced
by X.
Restrictions: Currently at most one argument is possible, and a defined
command cannot appear in a pipe.
To see what definitions are in scope, use help -defs.
examples:
dc %tnp p -cat=NP -lang=Eng #1 | l -lang=Swe -- translate NPs
%tnp "this man" -- translate and parse
dt, define_term Name Tree
Add a constant for a tree. The constant can later be called by
prefixing it with '$'.
Restriction: These terms are not yet usable as a subterm.
To see what definitions are in scope, use help -defs.
examples:
p -cat=NP "this man" | dt tm -- define tm as parse result
l -all $tm -- linearize tm in all forms
```
==Commands that give information about the state==
```
pg, print_grammar: pg
Prints the actual grammar (overridden by the -lang=X flag).
The -printer=X flag sets the format in which the grammar is
written.
N.B. since grammars are compiled when imported, this command
generally does not show the grammar in the same format as the
source. In particular, the -printer=latex is not supported.
Use the command tg -printer=latex File to print the source
grammar in LaTeX.
options:
-utf8 apply UTF8-encoding to the grammar
flags:
-printer
-lang
-startcat -- The start category of the generated grammar.
Only supported by some grammar printers.
examples:
pg -printer=cf -- show the context-free skeleton
pm, print_multigrammar: pm
Prints the current multilingual grammar in .gfcm form.
(Automatically executes the strip command (s) before doing this.)
options:
-utf8 apply UTF8 encoding to the tokens in the grammar
-utf8id apply UTF8 encoding to the identifiers in the grammar
examples:
pm | wf Letter.gfcm -- print the grammar into the file Letter.gfcm
pm -printer=graph | wf D.dot -- then do 'dot -Tps D.dot > D.ps'
vg, visualize_graph: vg
Show the dependency graph of multilingual grammar via dot and gv.
po, print_options: po
Print what modules there are in the state. Also
prints those flag values in the current state that differ from defaults.
pl, print_languages: pl
Prints the names of currently available languages.
pi, print_info: pi Ident
Prints information on the identifier.
```
==Commands that execute and show the session history==
```
eh, execute_history: eh File
Executes commands in the file.
ph, print_history; ph
Prints the commands issued during the GF session.
The result is readable by the eh command.
examples:
ph | wf foo.hist" -- save the history into a file
```
==Linearization, parsing, translation, and computation==
```
l, linearize: l PattList? Tree
Shows all linearization forms of Tree by the actual grammar
(which is overridden by the -lang flag).
The pattern list has the form [P, ... ,Q] where P,...,Q follow GF
syntax for patterns. All those forms are generated that match with the
pattern list. Too short lists are filled with variables in the end.
Only the -table flag is available if a pattern list is specified.
HINT: see GF language specification for the syntax of Pattern and Term.
You can also copy and past parsing results.
options:
-struct bracketed form
-table show parameters (not compatible with -record, -all)
-record record, i.e. explicit GF concrete syntax term (not compatible with -table, -all)
-all show all forms and variants (not compatible with -record, -table)
-multi linearize to all languages (can be combined with the other options)
flags:
-lang linearize in this grammar
-number give this number of forms at most
-unlexer filter output through unlexer
examples:
l -lang=Swe -table -- show full inflection table in Swe
p, parse: p String
Shows all Trees returned for String by the actual
grammar (overridden by the -lang flag), in the category S (overridden
by the -cat flag).
options for batch input:
-lines parse each line of input separately, ignoring empty lines
-all as -lines, but also parse empty lines
-prob rank results by probability
-cut stop after first lexing result leading to parser success
-fail show strings whose parse fails prefixed by #FAIL
-ambiguous show strings that have more than one parse prefixed by #AMBIGUOUS
options for selecting parsing method:
-fcfg parse using a fast variant of MCFG (default is no HOAS in grammar)
-old parse using an overgenerating CFG (default if HOAS in grammar)
-cfg parse using a much less overgenerating CFG
-mcfg parse using an even less overgenerating MCFG
Note: the first time parsing with -cfg, -mcfg, and -fcfg may take a long time
options that only work for the -old default parsing method:
-n non-strict: tolerates morphological errors
-ign ignore unknown words when parsing
-raw return context-free terms in raw form
-v verbose: give more information if parsing fails
flags:
-cat parse in this category
-lang parse in this grammar
-lexer filter input through this lexer
-parser use this parsing strategy
-number return this many results at most
examples:
p -cat=S -mcfg "jag är gammal" -- parse an S with the MCFG
rf examples.txt | p -lines -- parse each non-empty line of the file
at, apply_transfer: at (Module.Fun | Fun)
Transfer a term using Fun from Module, or the topmost transfer
module. Transfer modules are given in the .trc format. They are
shown by the 'po' command.
flags:
-lang typecheck the result in this lang instead of default lang
examples:
p -lang=Cncdecimal "123" | at num2bin | l -- convert dec to bin
tb, tree_bank: tb
Generate a multilingual treebank from a list of trees (default) or compare
to an existing treebank.
options:
-c compare to existing xml-formatted treebank
-trees return the trees of the treebank
-all show all linearization alternatives (branches and variants)
-table show tables of linearizations with parameters
-record show linearization records
-xml wrap the treebank (or comparison results) with XML tags
-mem write the treebank in memory instead of a file TODO
examples:
gr -cat=S -number=100 | tb -xml | wf tb.xml -- random treebank into file
rf tb.xml | tb -c -- compare-test treebank from file
rf old.xml | tb -trees | tb -xml -- create new treebank from old
ut, use_treebank: ut String
Lookup a string in a treebank and return the resulting trees.
Use 'tb' to create a treebank and 'i -treebank' to read one from
a file.
options:
-assocs show all string-trees associations in the treebank
-strings show all strings in the treebank
-trees show all trees in the treebank
-raw return the lookup result as string, without typechecking it
flags:
-treebank use this treebank (instead of the latest introduced one)
examples:
ut "He adds this to that" | l -multi -- use treebank lookup as parser in translation
ut -assocs | grep "ComplV2" -- show all associations with ComplV2
tt, test_tokenizer: tt String
Show the token list sent to the parser when String is parsed.
HINT: can be useful when debugging the parser.
flags:
-lexer use this lexer
examples:
tt -lexer=codelit "2*(x + 3)" -- a favourite lexer for program code
g, grep: g String1 String2
Grep the String1 in the String2. String2 is read line by line,
and only those lines that contain String1 are returned.
flags:
-v return those lines that do not contain String1.
examples:
pg -printer=cf | grep "mother" -- show cf rules with word mother
cc, compute_concrete: cc Term
Compute a term by concrete syntax definitions. Uses the topmost
resource module (the last in listing by command po) to resolve
constant names.
N.B. You need the flag -retain when importing the grammar, if you want
the oper definitions to be retained after compilation; otherwise this
command does not expand oper constants.
N.B.' The resulting Term is not a term in the sense of abstract syntax,
and hence not a valid input to a Tree-demanding command.
flags:
-table show output in a similar readable format as 'l -table'
-res use another module than the topmost one
examples:
cc -res=ParadigmsFin (nLukko "hyppy") -- inflect "hyppy" with nLukko
so, show_operations: so Type
Show oper operations with the given value type. Uses the topmost
resource module to resolve constant names.
N.B. You need the flag -retain when importing the grammar, if you want
the oper definitions to be retained after compilation; otherwise this
command does not find any oper constants.
N.B.' The value type may not be defined in a supermodule of the
topmost resource. In that case, use appropriate qualified name.
flags:
-res use another module than the topmost one
examples:
so -res=ParadigmsFin ResourceFin.N -- show N-paradigms in ParadigmsFin
t, translate: t Lang Lang String
Parses String in Lang1 and linearizes the resulting Trees in Lang2.
flags:
-cat
-lexer
-parser
examples:
t Eng Swe -cat=S "every number is even or odd"
gr, generate_random: gr Tree?
Generates a random Tree of a given category. If a Tree
argument is given, the command completes the Tree with values to
the metavariables in the tree.
options:
-prob use probabilities (works for nondep types only)
-cf use a very fast method (works for nondep types only)
flags:
-cat generate in this category
-lang use the abstract syntax of this grammar
-number generate this number of trees (not impl. with Tree argument)
-depth use this number of search steps at most
examples:
gr -cat=Query -- generate in category Query
gr (PredVP ? (NegVG ?)) -- generate a random tree of this form
gr -cat=S -tr | l -- gererate and linearize
gt, generate_trees: gt Tree?
Generates all trees up to a given depth. If the depth is large,
a small -alts is recommended. If a Tree argument is given, the
command completes the Tree with values to the metavariables in
the tree.
options:
-metas also return trees that include metavariables
-all generate all (can be infinitely many, lazily)
-lin linearize result of -all (otherwise, use pipe to linearize)
flags:
-depth generate to this depth (default 3)
-atoms take this number of atomic rules of each category (default unlimited)
-alts take this number of alternatives at each branch (default unlimited)
-cat generate in this category
-nonub don't remove duplicates (faster, not effective with -mem)
-mem use a memorizing algorithm (often faster, usually more memory-consuming)
-lang use the abstract syntax of this grammar
-number generate (at most) this number of trees (also works with -all)
-noexpand don't expand these categories (comma-separated, e.g. -noexpand=V,CN)
-doexpand only expand these categories (comma-separated, e.g. -doexpand=V,CN)
examples:
gt -depth=10 -cat=NP -- generate all NP's to depth 10
gt (PredVP ? (NegVG ?)) -- generate all trees of this form
gt -cat=S -tr | l -- generate and linearize
gt -noexpand=NP | l -mark=metacat -- the only NP is meta, linearized "?0 +NP"
gt | l | p -lines -ambiguous | grep "#AMBIGUOUS" -- show ambiguous strings
ma, morphologically_analyse: ma String
Runs morphological analysis on each word in String and displays
the results line by line.
options:
-short show analyses in bracketed words, instead of separate lines
-status show just the work at success, prefixed with "*" at failure
flags:
-lang
examples:
wf Bible.txt | ma -short | wf Bible.tagged -- analyse the Bible
```
==Elementary generation of Strings and Trees==
```
ps, put_string: ps String
Returns its argument String, like Unix echo.
HINT. The strength of ps comes from the possibility to receive the
argument from a pipeline, and altering it by the -filter flag.
flags:
-filter filter the result through this string processor
-length cut the string after this number of characters
examples:
gr -cat=Letter | l | ps -filter=text -- random letter as text
pt, put_tree: pt Tree
Returns its argument Tree, like a specialized Unix echo.
HINT. The strength of pt comes from the possibility to receive
the argument from a pipeline, and altering it by the -transform flag.
flags:
-transform transform the result by this term processor
-number generate this number of terms at most
examples:
p "zero is even" | pt -transform=solve -- solve ?'s in parse result
* st, show_tree: st Tree
Prints the tree as a string. Unlike pt, this command cannot be
used in a pipe to produce a tree, since its output is a string.
flags:
-printer show the tree in a special format (-printer=xml supported)
wt, wrap_tree: wt Fun
Wraps the tree as the sole argument of Fun.
flags:
-c compute the resulting new tree to normal form
vt, visualize_tree: vt Tree
Shows the abstract syntax tree via dot and gv (via temporary files
grphtmp.dot, grphtmp.ps).
flags:
-c show categories only (no functions)
-f show functions only (no categories)
-g show as graph (sharing uses of the same function)
-o just generate the .dot file
examples:
p "hello world" | vt -o | wf my.dot ;; ! open -a GraphViz my.dot
-- This writes the parse tree into my.dot and opens the .dot file
-- with another application without generating .ps.
```
==Subshells==
```
es, editing_session: es
Opens an interactive editing session.
N.B. Exit from a Fudget session is to the Unix shell, not to GF.
options:
-f Fudget GUI (necessary for Unicode; only available in X Window System)
ts, translation_session: ts
Translates input lines from any of the actual languages to all other ones.
To exit, type a full stop (.) alone on a line.
N.B. Exit from a Fudget session is to the Unix shell, not to GF.
HINT: Set -parser and -lexer locally in each grammar.
options:
-f Fudget GUI (necessary for Unicode; only available in X Windows)
-lang prepend translation results with language names
flags:
-cat the parser category
examples:
ts -cat=Numeral -lang -- translate numerals, show language names
tq, translation_quiz: tq Lang Lang
Random-generates translation exercises from Lang1 to Lang2,
keeping score of success.
To interrupt, type a full stop (.) alone on a line.
HINT: Set -parser and -lexer locally in each grammar.
flags:
-cat
examples:
tq -cat=NP TestResourceEng TestResourceSwe -- quiz for NPs
tl, translation_list: tl Lang Lang
Random-generates a list of ten translation exercises from Lang1
to Lang2. The number can be changed by a flag.
HINT: use wf to save the exercises in a file.
flags:
-cat
-number
examples:
tl -cat=NP TestResourceEng TestResourceSwe -- quiz list for NPs
mq, morphology_quiz: mq
Random-generates morphological exercises,
keeping score of success.
To interrupt, type a full stop (.) alone on a line.
HINT: use printname judgements in your grammar to
produce nice expressions for desired forms.
flags:
-cat
-lang
examples:
mq -cat=N -lang=TestResourceSwe -- quiz for Swedish nouns
ml, morphology_list: ml
Random-generates a list of ten morphological exercises,
keeping score of success. The number can be changed with a flag.
HINT: use wf to save the exercises in a file.
flags:
-cat
-lang
-number
examples:
ml -cat=N -lang=TestResourceSwe -- quiz list for Swedish nouns
```
==IO-related commands==
```
rf, read_file: rf File
Returns the contents of File as a String; error if File does not exist.
wf, write_file: wf File String
Writes String into File; File is created if it does not exist.
N.B. the command overwrites File without a warning.
af, append_file: af File
Writes String into the end of File; File is created if it does not exist.
* tg, transform_grammar: tg File
Reads File, parses as a grammar,
but instead of compiling further, prints it.
The environment is not changed. When parsing the grammar, the same file
name suffixes are supported as in the i command.
HINT: use this command to print the grammar in
another format (the -printer flag); pipe it to wf to save this format.
flags:
-printer (only -printer=latex supported currently)
* cl, convert_latex: cl File
Reads File, which is expected to be in LaTeX form.
sa, speak_aloud: sa String
Uses the Flite speech generator to produce speech for String.
Works for American English spelling.
examples:
h | sa -- listen to the list of commands
gr -cat=S | l | sa -- generate a random sentence and speak it aloud
si, speech_input: si
Uses an ATK speech recognizer to get speech input.
flags:
-lang: The grammar to use with the speech recognizer.
-cat: The grammar category to get input in.
-language: Use acoustic model and dictionary for this language.
-number: The number of utterances to recognize.
h, help: h Command?
Displays the paragraph concerning the command from this help file.
Without the argument, shows the first lines of all paragraphs.
options
-all show the whole help file
-defs show user-defined commands and terms
-FLAG show the values of FLAG (works for grammar-independent flags)
examples:
h print_grammar -- show all information on the pg command
q, quit: q
Exits GF.
HINT: you can use 'ph | wf history' to save your session.
!, system_command: ! String
Issues a system command. No value is returned to GF.
example:
! ls
?, system_command: ? String
Issues a system command that receives its arguments from GF pipe
and returns a value to GF.
example:
h | ? 'wc -l' | p -cat=Num
```
==Flags==
The availability of flags is defined separately for each command.
```
-cat, category in which parsing is performed.
The default is S.
-depth, the search depth in e.g. random generation.
The default depends on application.
-filter, operation performed on a string. The default is identity.
-filter=identity no change
-filter=erase erase the text
-filter=take100 show the first 100 characters
-filter=length show the length of the string
-filter=text format as text (punctuation, capitalization)
-filter=code format as code (spacing, indentation)
-lang, grammar used when executing a grammar-dependent command.
The default is the last-imported grammar.
-language, voice used by Festival as its --language flag in the sa command.
The default is system-dependent.
-length, the maximum number of characters shown of a string.
The default is unlimited.
-lexer, tokenization transforming a string into lexical units for a parser.
The default is words.
-lexer=words tokens are separated by spaces or newlines
-lexer=literals like words, but GF integer and string literals recognized
-lexer=vars like words, but "x","x_...","$...$" as vars, "?..." as meta
-lexer=chars each character is a token
-lexer=code use Haskell's lex
-lexer=codevars like code, but treat unknown words as variables, ?? as meta
-lexer=textvars like text, but treat unknown words as variables, ?? as meta
-lexer=text with conventions on punctuation and capital letters
-lexer=codelit like code, but treat unknown words as string literals
-lexer=textlit like text, but treat unknown words as string literals
-lexer=codeC use a C-like lexer
-lexer=ignore like literals, but ignore unknown words
-lexer=subseqs like ignore, but then try all subsequences from longest
-number, the maximum number of generated items in a list.
The default is unlimited.
-optimize, optimization on generated code.
The default is share for concrete, none for resource modules.
Each of the flags can have the suffix _subs, which performs
common subexpression elimination after the main optimization.
Thus, -optimize=all_subs is the most aggressive one. The _subs
strategy only works in GFC, and applies therefore in concrete but
not in resource modules.
-optimize=share share common branches in tables
-optimize=parametrize first try parametrize then do share with the rest
-optimize=values represent tables as courses-of-values
-optimize=all first try parametrize then do values with the rest
-optimize=none no optimization
-parser, parsing strategy. The default is chart. If -cfg or -mcfg are
selected, only bottomup and topdown are recognized.
-parser=chart bottom-up chart parsing
-parser=bottomup a more up to date bottom-up strategy
-parser=topdown top-down strategy
-parser=old an old bottom-up chart parser
-printer, format in which the grammar is printed. The default is
gfc. Those marked with M are (only) available for pm, the rest
for pg.
-printer=gfc GFC grammar
-printer=gf GF grammar
-printer=old old GF grammar
-printer=cf context-free grammar, with profiles
-printer=bnf context-free grammar, without profiles
-printer=lbnf labelled context-free grammar for BNF Converter
-printer=plbnf grammar for BNF Converter, with precedence levels
*-printer=happy source file for Happy parser generator (use lbnf!)
-printer=haskell abstract syntax in Haskell, with transl to/from GF
-printer=haskell_gadt abstract syntax GADT in Haskell, with transl to/from GF
-printer=morpho full-form lexicon, long format
*-printer=latex LaTeX file (for the tg command)
-printer=fullform full-form lexicon, short format
*-printer=xml XML: DTD for the pg command, object for st
-printer=old old GF: file readable by GF 1.2
-printer=stat show some statistics of generated GFC
-printer=probs show probabilities of all functions
-printer=gsl Nuance GSL speech recognition grammar
-printer=jsgf Java Speech Grammar Format
-printer=jsgf_sisr_old Java Speech Grammar Format with semantic tags in
SISR WD 20030401 format
-printer=srgs_abnf SRGS ABNF format
-printer=srgs_abnf_non_rec SRGS ABNF format, without any recursion.
-printer=srgs_abnf_sisr_old SRGS ABNF format, with semantic tags in
SISR WD 20030401 format
-printer=srgs_xml SRGS XML format
-printer=srgs_xml_non_rec SRGS XML format, without any recursion.
-printer=srgs_xml_prob SRGS XML format, with weights
-printer=srgs_xml_sisr_old SRGS XML format, with semantic tags in
SISR WD 20030401 format
-printer=vxml Generate a dialogue system in VoiceXML.
-printer=slf a finite automaton in the HTK SLF format
-printer=slf_graphviz the same automaton as slf, but in Graphviz format
-printer=slf_sub a finite automaton with sub-automata in the
HTK SLF format
-printer=slf_sub_graphviz the same automaton as slf_sub, but in
Graphviz format
-printer=fa_graphviz a finite automaton with labelled edges
-printer=regular a regular grammar in a simple BNF
-printer=unpar a gfc grammar with parameters eliminated
-printer=functiongraph abstract syntax functions in 'dot' format
-printer=typegraph abstract syntax categories in 'dot' format
-printer=transfer Transfer language datatype (.tr file format)
-printer=cfg-prolog M cfg in prolog format (also pg)
-printer=gfc-prolog M gfc in prolog format (also pg)
-printer=gfcm M gfcm file (default for pm)
-printer=graph M module dependency graph in 'dot' (graphviz) format
-printer=header M gfcm file with header (for GF embedded in Java)
-printer=js M JavaScript type annotator and linearizer
-printer=mcfg-prolog M mcfg in prolog format (also pg)
-printer=missing M the missing linearizations of each concrete
-startcat, like -cat, but used in grammars (to avoid clash with keyword cat)
-transform, transformation performed on a syntax tree. The default is identity.
-transform=identity no change
-transform=compute compute by using definitions in the grammar
-transform=nodup return the term only if it has no constants duplicated
-transform=nodupatom return the term only if it has no atomic constants duplicated
-transform=typecheck return the term only if it is type-correct
-transform=solve solve metavariables as derived refinements
-transform=context solve metavariables by unique refinements as variables
-transform=delete replace the term by metavariable
-unlexer, untokenization transforming linearization output into a string.
The default is unwords.
-unlexer=unwords space-separated token list (like unwords)
-unlexer=text format as text: punctuation, capitals, paragraph <p>
-unlexer=code format as code (spacing, indentation)
-unlexer=textlit like text, but remove string literal quotes
-unlexer=codelit like code, but remove string literal quotes
-unlexer=concat remove all spaces
-unlexer=bind like identity, but bind at "&+"
-mark, marking of parts of tree in linearization. The default is none.
-mark=metacat append "+CAT" to every metavariable, showing its category
-mark=struct show tree structure with brackets
-mark=java show tree structure with XML tags (used in gfeditor)
-coding, Some grammars are in UTF-8, some in isolatin-1.
If the letters ä (a-umlaut) and ö (o-umlaut) look strange, either
change your terminal to isolatin-1, or rewrite the grammar with
'pg -utf8'.
```

View File

@@ -1,865 +0,0 @@
<html>
<body bgcolor="#FFFFFF" text="#000000" >
<center>
<IMG SRC="gf-logo.gif">
<h1>Grammatical Framework History of Changes</h1>
Changes in functionality since May 17, 2005, release of GF Version 2.2
</center>
<p>
25/6 (BB)
Added new speech recognition grammar printers for non-recursive SRGS grammars,
as used by Nuance Recognizer 9.0. Try <tt>pg -printer=srgs_xml_non_rec</tt>
or <tt>pg -printer=srgs_abnf_non_rec</tt>.
<p>
19/6 (AR)
Extended the functor syntax (<tt>with</tt> modules) so that the functor can have
restricted import and a module body (whose function is normally to complete restricted
import). Thus the following format is now possible:
<pre>
concrete C of A = E ** CI - [f,g] with (...) ** open R in {...}
</pre>
At the same time, the possibility of an empty module body was added to other modules
for symmetry. This can be useful for "proxy modules" that just collect other modules
without adding anything, e.g.
<pre>
abstract Math = Arithmetic, Geometry ;
</pre>
<p>
18/6 (AR)
Added a warning for clashing constants. A constant coming from multiple opened modules
was interpreted as "the first" found by the compiler, which was a source of difficult
errors. Clashing is officially forbidden, but we chose to give a warning instead of
raising an error to begin with (in version 2.8).
<p>
30/1/2007 (AR)
Semantics of variants fixed for complex types. Officially, it was only
defined for basic types (Str and parameters). When used for records, results were
multiplicative, which was nor usable. But now variants should work for any type.
<p>
<hr>
<p>
22/12 (AR) <b>Release of GF version 2.7</b>.
<p>
21/12 (AR)
Overloading rules for GF version 2.7:
<ol>
<li> If a unique instance is found by exact match with argument types,
that instance is used.
<li> Otherwise, if exact match with the expected value type gives a
uniques instance, that instance is used.
<li> Otherwise, if among possible instances only one returns a non-function
type, that instance is used, but a warning is issued.
<li> Otherwise, an error results, and the list of possible instances is shown.
</ol>
These rules are still experimental, but all future developments will guarantee
that their type-correct use will work. Rule (3) is only needed because the
current type checker does not always know an expected type. It can give
an incorrect result which is captured later in the compilation. To be noticed,
in particular, is that exact match is required. Match by subtyping will be
investigated later.
<p>
21/12 (BB) Java Speech Grammar Format with SISR tags can now be generated.
Use <tt>pg -printer=jsgf_sisr_old</tt>. The SISR tags are in Working Draft
20030401 format, which is supported by the OptimTALK VoiceXML interpreter
and the IBM XHTML+Voice implementation use by the Opera web browser.
<p>
21/12 (BB) <a name="voicexml">
VoiceXML 2.0 dialog systems can now be generated from GF grammars.
Use <tt>pg -printer=vxml</tt>.
<p>
21/12 (BB) <a name="javascript">
JavaScript code for linearization and type annotation can now be
generated from a multilingual GF grammar. Use <tt>pm -printer=js</tt>.
<p>
5/12 (BB) <a name="gfcc2c">
A new tool for generating C linearization libraries
from a GFCC file. <tt>make gfcc2c</tt> in <tt>src</tt>
compiles the tool. The generated
code includes header files in <tt>lib/c</tt> and should be linked
against <tt>libgfcc.a</tt> in <tt>lib/c</tt>. For an example of
using the generated code, see <tt>src/tools/c/examples/bronzeage</tt>.
<tt>make</tt> in that directory generates a GFCC file, then generates
C code from that, and then compiles a program <tt>bronzeage-test</tt>.
The <tt>main</tt> function for that program is defined in
<tt>bronzeage-test.c</tt>.
<p>
20/11 (AR) Type error messages in concrete syntax are printed with a
heuristic where a type of the form <tt>{... ; lock_C : {} ; ...}</tt>
is printed as <tt>C</tt>. This gives more readable error messages, but
can produce wrong results if lock fields are hand-written or if subtypes
of lock-fielded categories are used.
<p>
17/11 (AR) <a name="overloading">
Operation overloading: an <tt>oper</tt> can have many types,
from which one is picked at compile time. The types must have different
argument lists. Exact match with the arguments given to the <tt>oper</tt>
is required. An example is given in
<a href="../lib/resource-1.0/doc/gfdoc/Constructors.gf"><tt>Constructors.gf</tt></a>.
The purpose of overloading is to make libraries easier to use, since
only one name for each grammatical operation is needed: predication, modification,
coordination, etc. The concrete syntax is, at this experimental level, not
extended but relies on using a record with the function name repeated
as label name (see the example). The treatment of overloading is inspired
by C++, and was first suggested by Björn Nringert.
<p>
3/10 (AR) A new low-level format <tt>gfcc</tt> ("Canonical Canonical GF").
It is going to replace the <tt>gfc</tt> format later, but is already now
an efficient format for multilingual generation.
See <a href="../src/GF/Canon/GFCC/doc/gfcc.html">GFCC document</a>
for more information.
<p>
1/9 (AR) New way for managing errors in grammar compilation:
<pre>
Predef.Error : Type ;
Predef.error : Str -> Predef.Error ;
</pre>
Denotationally, <tt>Error</tt> is the empty type and thus a
subtype of any other types: it can be used anywhere. But the
<tt>error</tt> function is not canonical. Hence the compilation
is interrupted when <tt>(error s)</tt> is translated to GFC, and
the message <tt>s</tt> is emitted. An example use is given in
<tt>english/ParadigmsEng.gf</tt>:
<pre>
regDuplV : Str -> V ;
regDuplV fit =
case last fit of {
("a" | "e" | "i" | "o" | "u" | "y") =>
Predef.error (["final duplication makes no sense for"] ++ fit) ;
t =>
let fitt = fit + t in
mkV fit (fit + "s") (fitt + "ed") (fitt + "ed") (fitt + "ing")
} ;
</pre>
This function thus cannot be applied to a stem ending with a vowel,
which is exactly what we want. In future, it may be good to add similar
checks to all morphological paradigms in the resource.
<p>
16/8 (AR) New generation algorithm: slower but works with less
memory. Default of <tt>gt</tt>; use <tt>gt -mem</tt> for the old
algorithm. The new option <tt>gt -all</tt> lazily generates all
trees until interrupted. It cannot be piped to other GF commands,
hence use <tt>gt -all -lin</tt> to print out linearized strings
rather than trees.
<hr>
22/6 (AR) <b>Release of GF version 2.6</b>.
<p>
20/6 (AR) The FCFG parser is know the default, as it even handles literals.
The old default can be selected by <tt>p -old</tt>. Since
FCFG does not support variable bindings, <tt>-old</tt> is automatically
selected if the grammar has bindings - and unless the <tt>-fcfg</tt> flag
is used.
<p>
17/6 (AR) The FCFG parser is now the recommended method for parsing
heavy grammars such as the resource grammars. It does not yet support
literals and variable bindings.
<p>
1/6 (AR) Added the FCFG parser written by Krasimir Angelov. Invoked by
<tt>p -fcfg</tt>. This parser is as general as MCFG but faster.
It needs more testing and debugging.
<p>
1/6 (AR) The command <tt>r = reload</tt> repeats the latest
<tt>i = import</tt> command.
<p>
30/5 (AR) It is now possible to use the flags <tt>-all, -table, -record</tt>
in combination with <tt>l -multi</tt>, and also with <tt>tb</tt>.
<p>
18/5 (AR) Introduced a wordlist format <tt>gfwl</tt> for
quick creation of language exercises and (in future) multilingual lexica.
The format is now very simple:
<pre>
# Svenska - Franska - Finska
berg - montagne - vuori
klättra - grimper / escalader - kiivetä / kiipeillä
</pre>
but can be extended to cover paradigm functions in addition to just
words.
<p>
3/4 (AR) The predefined abstract syntax type <tt>Int</tt> now has two
inherent parameters indicating its last digit and its size. The (hard-coded)
linearization type is
<pre>
{s : Str ; size : Predef.Ints 1 ; last : Predef.Ints 9}
</pre>
The <tt>size</tt> field has value <tt>1</tt> for integers greater than 9, and
value <tt>0</tt> for other integers (which are never negative). This parameter can
be used e.g. in calculating number agreement,
<pre>
Risala i = {s = i.s ++ table (Predef.Ints 1 * Predef.Ints 9) {
&lt;0,1&gt; =&gt; "risalah" ;
&lt;0,2&gt; =&gt; "risalatan" ;
&lt;0,_&gt; | &lt;1,0&gt; =&gt; "rasail" ;
_ =&gt; "risalah"
} ! &lt;i.size,i.last&gt;
} ;
</pre>
Notice that the table has to be typed explicitly for <tt>Ints k</tt>,
because type inference would otherwise return <tt>Int</tt> and therefore
fail to expand the table.
<p>
31/3 (AR) Added flags and options to some commands, to help generation:
<ul>
<li> <tt>gt -noexpand=NP,V,TV</tt> does not expand these categories,
but only generates metavariables for them.
<li> <tt>gt -doexpand=NP,V,TV</tt> only expands these categories,
and generates metavariables for others.
<li> <tt>gr -cf</tt> has the same flags.
<li> <tt>l -mark=metacat</tt> marks the metavariables with their categories.
<li> <tt>p -fail</tt> marks with <tt>#FAIL</tt> strings that have no parse.
<li> <tt>p -ambiguous</tt> marks as <tt>#AMBIGUOUS</tt>
strings that have more than one parse.
</ul>
<p>
<hr>
21/3/2006 <b>Release of GF 2.5</b>.
<p>
16/3 (AR) Added two flag values to <tt>pt -transform=X</tt>:
<tt>nodup</tt> which excludes terms where a constant is duplicated,
and
<tt>nodupatom</tt> which excludes terms where an atomic constant is duplicated.
The latter, in particular, is useful as a filter in generation:
<pre>
gt -cat=Cl | pt -transform=nodupatom
</pre>
This gives a corpus where words don't (usually) occur twice in the same clause.
<p>
6/3 (AR) Generalized the <tt>gfe</tt> file format in two ways:
<ol>
<li> Use the real grammar parser, hence <tt>(in M.C "foo")</tt> expressions
may occur anywhere. But the <i>ad hoc</i> word substitution syntax is
abandoned: ordinary <tt>let</tt> (and <tt>where</tt>) expressions
can now be used instead.
<li> The resource may now be a treebank, not just a grammar. Parsing
is thus replaced by treebank lookup, which in most cases is faster.
</ol>
A minor novelty is that the <tt>--# -resource=FILE</tt> flag can now be
relative to <tt>GF_LIB_PATH</tt>, both for grammars and treebanks.
The flag <tt> --# -treebank=IDENT</tt> gives the language whose treebank
entries are used, in case of a multilingual treebank.
<p>
4/3 (AR) Added command <tt>use_treebank = ut</tt> for lookup in a treebank.
This command can be used as a fast substitute for parsing, but also as a
way to browse treebanks.
<pre>
ut "He adds this to that" | l -multi -- use treebank lookup as parser in translation
ut -assocs | grep "ComplV2" -- show all associations with ComplV2
</pre>
<p>
3/3 (AR) Added option <tt>-treebank</tt> to the <tt>i</tt> command. This adds treebanks to
the shell state. The possible file formats are
<ol>
<li> XML file with a multilingual treebank, produced by <tt>tb -xml</tt>
<li> tab-organized text file with a unilingual treebank, produced by <tt>ut -assocs</tt>
</ol>
Notice that the treebanks in shell state are unilingual, and have strings as keys.
Multilingual treebanks have trees as keys. In case 1, one unilingual treebank per
language is built in the shell state.
<p>
1/3 (AR) Added option <tt>-trees</tt> to the command <tt>tree_bank = tb</tt>.
By this option, the command just returns the trees in the treebank. It can be
used for producing new treebanks with the same trees:
<pre>
rf old.xml | tb -trees | tb -xml | wf new.xml
</pre>
Recall that only treebanks in the XML format can be read with the <tt>-trees</tt>
and <tt>-c</tt> flags.
<p>
1/3 (AR) A <tt>.gfe</tt> file can have a <tt>--# -path=PATH</tt> on its
second line. The file given on the first line (<tt>--# -resource=FILE</tt>)
is then read w.r.t. this path. This is useful if the resource file has
no path itself, which happens when it is gfc-only.
<p>
25/2 (AR) The flag <tt>preproc</tt> of the <tt>i</tt> command (and thereby
to <tt>gf</tt> itself) causes GF to apply a preprocessor to each sourcefile
it reads.
<p>
8/2 (AR) The command <tt>tb = tree_bank</tt> for creating and testing against
multilingual treebanks. Example uses:
<pre>
gr -cat=S -number=100 | tb -xml | wf tb.xml -- random treebank into file
rf tb.txt | tb -c -- read comparison treebank from file
</pre>
<p>
10/1 (AR) Forbade variable binding inside negation and Kleene star
patterns.
<p>
7/1 (AR) Full set of regular expression patterns, with
as-patterns to enable variable bindings to matched expressions:
<ul>
<li> <i>p</i> <tt>+</tt> <i>q</i> : token consisting of <i>p</i> followed by <i>q</i>
<li> <i>p</i> <tt>*</tt> : token <i>p</i> repeated 0 or more times
(max the length of the strin to be matched)
<li> <tt>-</tt> <i>p</i> : matches anything that <i>p</i> does not match
<li> <i>x</i> <tt>@</tt> <i>p</i> : bind to <i>x</i> what <i>p</i> matches
<li> <i>p</i> <tt>|</tt> <i>q</i> : matches what either <i>p</i> or <i>q</i> matches
</ul>
The last three apply to all types of patterns, the first two only to token strings.
Example: plural formation in Swedish 2nd declension
(<i>pojke-pojkar, nyckel-nycklar, seger-segrar, bil-bilar</i>):
<pre>
plural2 : Str -> Str = \w -> case w of {
pojk + "e" => pojk + "ar" ;
nyck + "e" + l@("l" | "r" | "n") => nyck + l + "ar" ;
bil => bil + "ar"
} ;
</pre>
Semantics: variables are always bound to the <b>first match</b>, in the sequence defined
as the list <tt>Match p v</tt> as follows:
<pre>
Match (p1|p2) v = Match p1 v ++ Match p2 v
Match (p1+p2) s = [Match p1 s1 ++ Match p2 s2 | i <- [0..length s], (s1,s2) = splitAt i s]
Match p* s = Match "" s ++ Match p s ++ Match (p + p) s ++ ...
Match c v = [[]] if c == v -- for constant patterns c
Match x v = [[(x,v)]] -- for variable patterns x
Match x@p v = [[(x,v)]] + M if M = Match p v /= []
Match p v = [] otherwise -- failure
</pre>
Examples:
<ul>
<li> <tt>x + "e" + y</tt> matches <tt>"peter"</tt> with <tt>x = "p", y = "ter"</tt>
<li> <tt>x@("foo"*)</tt> matches any token with <tt>x = ""</tt>
<li> <tt>x + y@("er"*)</tt> matches <tt>"burgerer"</tt> with <tt>x = "burg", y = "erer"</tt>
</ul>
<p>
6/1 (AR) Concatenative string patterns to help morphology definitions...
This can be seen as a step towards regular expression string patterns.
The natural notation <tt>p1 + p2</tt> will be considered later.
<b>Note</b>. This was done on 7/1.
<p>
5/1/2006 (BB) New grammar printers <tt>slf_sub</tt> and <tt>slf_sub_graphviz</tt>
for creating SLF networks with sub-automata.
<hr>
22/12 <b>Release of GF 2.4</b>.
<p>
21/12 (AR) It now works to parse escaped string literals from command
line, and also string literals with spaces:
<pre>
gf examples/tram0/TramEng.gf
> p -lexer=literals "I want to go to \"Gustaf Adolfs torg\" ;"
QInput (GoTo (DestNamed "Gustaf Adolfs torg"))
</pre>
<p>
20/12 (AR) Support for full disjunctive patterns (<tt>P|Q</tt>) i.e.
not just on top level.
<p>
14/12 (BB) The command <tt>si</tt> (<tt>speech_input</tt>) which creates
a speech recognizer from a grammar for English and admits speech input
of strings has been added. The command uses an
<a href="http://htk.eng.cam.ac.uk/develop/atk.shtml">ATK</a> recognizer and
creates a recognition
network which accepts strings in the currently active grammar.
In order to use the <tt>si</tt> command,
you need to install the
<a href="http://www.cs.chalmers.se/~bringert/darcs/atkrec/">atkrec library</a>
and configure GF with <tt>./configure --with-atk</tt> before compiling.
You need to set two environment variables for the <tt>si</tt> command to
work. <tt>ATK_HOME</tt> should contain the path to your copy of ATK
and <tt>GF_ATK_CFG</tt> should contain the path to your GF ATK configuration
file. A default version of this file can be found in
<tt>GF/src/gf_atk.cfg</tt>.
<p>
11/12 (AR) Parsing of float literals now possible in object language.
Use the flag <tt>lexer=literals</tt>.
<p>
6/12 (AR) Accept <tt>param</tt> and <tt>oper</tt> definitions in
<tt>concrete</tt> modules. The definitions are just inlined in the
current module and not inherited. The purpose is to support rapid
prototyping of grammars.
<p>
2/12 (AR) The built-in type <tt>Float</tt> added to abstract syntax (and
resource). Values are stored as Haskell's <tt>Double</tt> precision
floats. For the syntax of float literals, see BNFC document.
NB: some bug still prevents parsing float literals in object
languages. <b>Bug fixed 11/12.</b>
<p>
1/12 (BB,AR) The command <tt>at = apply_transfer</tt>, which applies
a transfer function to a term. This is used for noncompositional
translation. Transfer functions are defined in a special transfer
language (file suffix <tt>.tr</tt>), which is compiled into a
run-time transfer core language (file suffix <tt>.trc</tt>).
The compiler is included in <tt>GF/transfer</tt>. The following is
a complete example of how to try out transfer:
<pre>
% cd GF/transfer
% make -- compile the trc compiler
% cd examples -- GF/transfer/examples
% ../compile_to_core -i../lib numerals.tr
% mv numerals.trc ../../examples/numerals
% cd ../../examples/numerals -- GF/examples/numerals
% gf
> i decimal.gf
> i BinaryDigits.gf
> i numerals.trc
> p -lang=Cncdecimal "123" | at num2bin | l
1 0 0 1 1 0 0 1 1 1 0
</pre>
Other relevant commands are:
<ul>
<li> <tt>i file.trc</tt>: import a transfer module
<li> <tt>pg -printer=transfer</tt>: create a syntax datatype in <tt>.tr</tt> format
</ul>
For more information on the commands, see <tt>help</tt>. Documentation on
the transfer language: to appear.
<p>
17/11 (AR) Made it possible for lexers to be nondeterministic.
Now with a simple-minded implementation that the parser is sent
each lexing result in turn. The option <tt>-cut</tt> is used for
breaking after first lexing leading to successful parse. The only
nondeterministic lexer right now is <tt>-lexer=subseqs</tt>, which
first filters with <tt>-lexer=ignore</tt> (dropping words neither in
the grammar nor literals) and then starts ignoring other words from
longest to shortest subsequence. This is usable for parser tasks
of keyword spotting type, but expensive (2<sup>n</sup>) in long input.
A smarter implementation is therefore desirable.
<p>
14/11 (AR) Functions can be made unparsable (or "internal" as
in BNFC). This is done by <tt>i -noparse=file</tt>, where
the nonparsable functions are given in <tt>file</tt> using the
line format <tt>--# noparse Funs</tt>. This can be used e.g. to
rule out expensive parsing rules. It is used in
<tt>lib/resource/abstract/LangVP.gf</tt> to get parse values
structured with <tt>VP</tt>, which is obtained via transfer.
So far only the default (= old) parser generator supports this.
<p>
14/11 (AR) Removed the restrictions how a lincat may look like.
Now any record type that has a value in GFC (i.e. without any
functions in it) can be used, e.g. {np : NP ; cn : Bool => CN}.
To display linearization values, only <tt>l -record</tt> shows
nice results.
<p>
9/11 (AR) GF shell state can now have several abstract syntaxes with
their associated concrete syntaxes. This allows e.g. parsing with
resource while testing an application. One can also have a
parse-transfer-lin chain from one abstract syntax to another.
<p>
7/11 (BB) Running commands can now be interrupted with Ctrl-C, without
killing the GF process. This feature is not supported on Windows.
<p>
1/11 (AR) Yet another method for adding probabilities: append
<tt> --# prob Double</tt> to the end of a line defining a function.
This can be (1) a <tt>.cf</tt> rule (2) a <tt>fun</tt> rule, or
(3) a <tt>lin</tt> rule. The probability is attached to the
first identifier on the line.
<p>
1/11 (BB) Added generation of weighted SRGS grammars. The weights
are calculated from the function probabilities. The algorithm
for calculating the weights is not yet very good.
Use <tt>pg -printer=srgs_xml_prob</tt>.
<p>
31/10 (BB) Added option for converting grammars to SRGS grammars in XML format.
Use <tt>pg -printer=srgs_xml</tt>.
<p>
31/10 (AR) Probabilistic grammars. Probabilities can be used to
weight random generation (<tt>gr -prob</tt>) and to rank parse
results (<tt>p -prob</tt>). They are read from a separate file
(flag <tt>i -probs=File</tt>, format <tt>--# prob Fun Double</tt>)
or from the top-level grammar file itself (option <tt>i -prob</tt>).
To see the probabilities, use <tt>pg -printer=probs</tt>.
<br>
As a by-product, the probabilistic random generation algorithm is
available for any context-free abstract syntax. Use the flag
<tt>gr -cf</tt>. This algorithm is much faster than the
old (more general) one, but it may sometimes loop.
<p>
12/10 (AR) Flag <tt>-atoms=Int</tt> to the command <tt>gt = generate_trees</tt>
takes away all zero-argument functions except Int per category. In
this way, it is possible to generate a corpus illustrating each
syntactic structure even when the lexicon (which consists of
zero-argument functions) is large.
<p>
6/10 (AR) New commands <tt>dc = define_command</tt> and
<tt>dt = define_tree</tt> to define macros in a GF session.
See <tt>help</tt> for details and examples.
<p>
5/10 (AR) Printing missing linearization rules:
<tt>pm -printer=missing</tt>. Command <tt>g = grep</tt>,
which works in a way similar to Unix grep.
<p>
5/10 (PL) Printing graphs with function and category dependencies:
<tt>pg -printer=functiongraph</tt>, <tt>pg -printer=typegraph</tt>.
<p>
20/9 (AR) Added optimization by <b>common subexpression elimination</b>.
It works on GFC modules and creates <tt>oper</tt> definitions for
subterms that occur more than once in <tt>lin</tt> definitions. These
<tt>oper</tt> definitions are automatically reinlined in functionalities
that don't support <tt>oper</tt>s in GFC. This conversion is done by
module and the <tt>oper</tt>s are not inherited. Moreover, the subterms
can contain free variables which means that the <tt>oper</tt>s are not
always well typed. However, since all variables in GFC are type-specific
(and local variables are <tt>lin</tt>-specific), this does not destroy
subject reduction or cause illegal captures.
<br>
The optimization is triggered by the flag <tt>optimize=OPT_subs</tt>,
where <tt>OPT</tt> is any of the other optimizations (see <tt>h -optimize</tt>).
The most aggressive value of the flag is <tt>all_subs</tt>. In experiments,
the size of a GFC module can shrink by 85% compared to plain <tt>all</tt>.
<p>
18/9 (AR) Removed superfluous spaces from GFC printing. This shrinks
the GFC size by 5-10%.
<p>
15/9 (AR) Fixed some bugs in dependent-type type checking of abstract
modules at compile time. The type checker is more severe now, which means
that some old grammars may fail to compile - but this is usually the
right result. However, the type checker of <tt>def</tt> judgements still
needs work.
<p>
14/9 (AR) Added printing of grammars to a format without parameters, in
the spirit of Peanos "Latino sine flexione". The command <tt>pg -unpar</tt>
does the trick, and the result can be saved in a <tt>gfcm</tt> file. The generated
concrete syntax modules get the prefix <tt>UP_</tt>. The translation is briefly:
<pre>
(P => T)* = T*
(t ! p)* = t*
(table {p => t ; ...})* = t*
</pre>
In order for this to be maximally useful, the grammar should be written in such
a way that the first value of every parameter type is the desired one. For
instance, in Peano's case it would be the ablative for noun cases, the singular for
numbers, and the 2nd person singular imperative for verb forms.
<p>
14/9 (BB) Added finite state approximation of grammars.
Internally the conversion is done <tt>cfg -&gt; regular -&gt; fa -&gt; slf</tt>, so the
different printers can be used to check the output of each stage.
The new options are:
<dl>
<dt><tt>pg -printer=slf</tt></dt>
<dd>A finite automaton in the HTK SLF format.</dd>
<dt><tt>pg -printer=slf_graphviz</tt></dt>
<dd>The same FA as in SLF, but in Graphviz format.</dd>
<dt><tt>pg -printer=fa_graphviz</tt></dt>
<dd>A finite automaton with labelled edges, instead of labelled nodes which SLF has.</dd>
<dt><tt>pg -printer=regular</tt></dt>
<dd>A regular grammar in a simple BNF.</dd>
</dl>
<p>
4/9 (AR) Added the option <tt>pg -printer=stat</tt> to show
statistics of gfc compilation result. To be extended with new information.
The most important stats now are the top-40 sized definitions.
<p>
<hr>
1/7 <b>Release of GF 2.3</b>.
<p>
1/7 (AR) Added the flag <tt>-o</tt> to the <tt>vt</tt> command
to just write the <tt>.dot</tt> file without going to <tt>.ps</tt>
(cf. 20/6).
<p>
29/6 (AR) The printer used by Embedded Java GF Interpreter
(<tt>pm -header</tt>) now produces
working code from all optimized grammars - hence you need not select a
weaker optimization just to use the interpreter. However, the
optimization <tt>-optimize=share</tt> usually produces smaller object
grammars because the "unoptimizer" just undoes all optimizations.
(This is to be considered a temporary solution until the interpreter
knows how to handle stronger optimizations.)
<p>
27/6 (AR) The flag <tt>flags optimize=noexpand</tt> placed in a
resource module prevents the optimization phase of the compiler when
the <tt>.gfr</tt> file is created. This can prevent serious code
explosion, but it will also make the processing of modules using the
resource slowwer. A favourable example is <tt>lib/resource/finnish/ParadigmsFin</tt>.
<p>
23/6 (HD,AR) The new editor GUI <tt>gfeditor</tt> by Hans-Joachim
Daniels can now be used. It is based on Janna Khegai's <tt>jgf</tt>.
New functionality include HTML display (<tt>gfeditor -h</tt>) and
programmable refinement tooltips.
<p>
23/6 (AR) The flag <tt>unlexer=finnish</tt> can be used to bind
Finnish suffixes (e.g. possessives) to preceding words. The GF source
notation is e.g. <tt>"isä" ++ "&*" ++ "nsa" ++ "&*" ++ "ko"</tt>,
which unlexes to <tt>"isänsäkö"</tt>. There is no corresponding lexer
support yet.
<p>
22/6 (PL,AR) The MCFG parser (<tt>p -mcfg</tt>) now works on all
optimized grammars - hence you need not select a weaker optimization
to use this parser. The same concerns the CFGM printer (<tt>pm -printer=cfgm</tt>).
<p>
20/6 (AR) Added the command <tt>visualize_tree</tt> = <tt>vt</tt>, to
display syntax trees graphically. Like <tt>vg</tt>, this command uses
GraphViz and Ghostview. The foremost use is to pipe the parser to this
command.
<p>
17/6 (BB) There is now support for lists in GF abstract syntax.
A list category is declared as:
<pre>
cat [C]
</pre>
or
<pre>
cat [C]{n}
</pre>
where <tt>C</tt> is a category and <tt>n</tt> is a non-negative integer.
<tt>cat [C]</tt> is equivalent to <tt>cat [C]{0}</tt>. List category
syntax can be used whereever categories are used.
<p>
<tt>cat [C]{n}</tt> is equivalent to the declarations:
<pre>
cat ListC
fun BaseC : C^n -&gt; ListC
fun ConsC : C -&gt; ListC -&gt; ListC
</pre>
where <tt>C^0 -&gt; X</tt> means <tt>X</tt>, and <tt>C^m</tt> (where
m &gt; 0) means <tt>C -&gt; C^(m-1)</tt>.
<p>
A lincat declaration on the form:
<pre>
lincat [C] = T
</pre>
is equivalent to
<pre>
lincat ListC = T
</pre>
The linearizations of the list constructors are written
just like they would be if the function declarations above
had been made manually, e.g.:
<pre>
lin BaseC x_1 ... x_n = t
lin ConsC x xs = t'
</pre>
<p>
10/6 (AR) Preprocessor of <tt>.gfe</tt> files can now be performed as part of
any grammar compilation. The flag <tt>-ex</tt> causes GF to look for
the <tt>.gfe</tt> files and preprocess those that are younger
than the corresponding <tt>.gf</tt> files. The files are first sorted
and grouped by the resource, so that each resource only need be compiled once.
<p>
10/6 (AR) Editor GUI can now be alternatively invoked by the shell
command <tt>gf -edit</tt> (equivalent to <tt>jgf</tt>).
<p>
10/6 (AR) Editor GUI command <tt>pc Int</tt> to pop <tt>Int</tt>
items from the clip board.
<p>
4/6 (AR) Sequence of commands in the Java editor GUI now possible.
The commands are separated by <tt> ;; </tt> (notice the space on
both sides of the two semicolons). Such a sequence can be sent
from the "GF Command" pop-up field, but is mostly intended
for external processes that communicate with GF.
<p>
3/6 (AR) The format <tt>.gfe</tt> defined to support
<b>grammar writing by examples</b>. Files of this format are first
converted to <tt>.gf</tt> files by the command
<pre>
gf -examples File.gfe
</pre>
See <a href="../lib/resource/doc/example/QuestionsI.gfe">
<tt>../lib/resource/doc/examples/QuestionsI.gfe</tt></a>
for an example.
<p>
31/5 (AR) Default of p -rawtrees=k changed to 999999.
<p>
31/5 (AR) Support for restricted inheritance. Syntax:
<pre>
M -- inherit everything from M, as before
M [a,b,c] -- only inherit constants a,b,c
M-[a,b,c] -- inherit everything except a,b,c
</pre>
Caution: there is no check yet for completeness and
consistency, but restricted inheritance can create
run-time failures.
<p>
29/5 (AR) Parser support for reading GFC files line per line.
The category <tt>Line</tt> in <tt>GFC.cf</tt> can be used
as entrypoint instead of <tt>Grammar</tt> to achieve this.
<p>
28/5 (AR) Environment variables and path wild cards.
<ul>
<li> <tt>GF_LIB_PATH</tt> gives the location of <tt>GF/lib</tt>
<li> <tt>GF_GRAMMAR_PATH</tt> gives a list of directories appended
to the explicitly given path
<li> <tt>DIR/*</tt> is expanded to the union of all subdirectories
of <tt>DIR</tt>
</ul>
<p>
26/5/2005 (BB) Notation for list categories.
</body>
</html>

File diff suppressed because it is too large Load Diff

View File

@@ -1,994 +0,0 @@
The Module System of GF
Aarne Ranta
8/4/2005 - 5/7/2007
%!postproc(html): #SUB1 <sub>1</sub>
%!postproc(html): #SUBk <sub>k</sub>
%!postproc(html): #SUBi <sub>i</sub>
%!postproc(html): #SUBm <sub>m</sub>
%!postproc(html): #SUBn <sub>n</sub>
%!postproc(html): #SUBp <sub>p</sub>
%!postproc(html): #SUBq <sub>q</sub>
% to compile: txt2tags --toc -thtml modulesystem.txt
A GF grammar consists of a set of **modules**, which can be
combined in different ways to build different grammars.
There are several different **types of modules**:
- ``abstract``
- ``concrete``
- ``resource``
- ``interface``
- ``instance``
- ``incomplete concrete``
We will go through the module types in this order, which is also
their order of "importance" from the most basic to
the more advanced ones.
This document presupposes knowledge of GF judgements and expressions, which can
be gained from the [GF tutorial tutorial/gf-tutorial2.html]. It aims
to give a systamatic description of the module system;
some tutorial information is repeated to make the document
self-contained.
=The principal module types=
==Abstract syntax==
Any GF grammar that is used in an application
will probably contain at least one module
of the ``abstract`` module type. Here is an example of
such a module, defining a fragment of propositional logic.
```
abstract Logic = {
cat Prop ;
fun Conj : Prop -> Prop -> Prop ;
fun Disj : Prop -> Prop -> Prop ;
fun Impl : Prop -> Prop -> Prop ;
fun Falsum : Prop ;
}
```
The **name** of this module is ``Logic``.
An ``abstract`` module defines an **abstract syntax**, which
is a language-independent representation of a fragment of language.
It consists of two kinds of **judgements**:
- ``cat`` judgements telling what **categories** there are
(types of abstract syntax trees)
- ``fun`` judgements telling what **functions** there are
(to build abstract syntax trees)
There can also be ``def`` and ``data`` judgements in an
abstract syntax.
===Compilation of abstract syntax===
The GF grammar compiler expects to find the module ``Logic`` in a file named
``Logic.gf``. When the compiler is run, it produces
another file, named ``Logic.gfc``. This file is in the
format called **canonical GF**, which is the "machine language"
of GF. Next time that the module ``Logic`` is needed in
compiling a grammar, it can be read from the compiled (``gfc``)
file instead of the source (``gf``) file, unless the source
has been changed after the compilation.
==Concrete syntax==
In order for a GF grammar to describe a concrete language, the abstract
syntax must be completed with a **concrete syntax** of it.
For this purpose, we use modules of type ``concrete``: for instance,
```
concrete LogicEng of Logic = {
lincat Prop = {s : Str} ;
lin Conj a b = {s = a.s ++ "and" ++ b.s} ;
lin Disj a b = {s = a.s ++ "or" ++ b.s} ;
lin Impl a b = {s = "if" ++ a.s ++ "then" ++ b.s} ;
lin Falsum = {s = ["we have a contradiction"]} ;
}
```
The module ``LogicEng`` is a concrete syntax ``of`` the
abstract syntax ``Logic``. The GF grammar compiler checks that
the concrete is valid with respect to the abstract syntax ``of``
which it is claimed to be. The validity requires that there has to be
- a ``lincat`` judgement for each ``cat`` judgement, telling what the
**linearization types** of categories are
- a ``lin`` judgement for each ``fun`` judgement, telling what the
**linearization functions** corresponding to functions are
Validity also requires that the linearization functions defined by
``lin`` judgements are type-correct with respect to the
linearization types of the arguments and value of the function.
There can also be ``lindef`` and ``printname`` judgements in a
concrete syntax.
==Top-level grammar==
When a ``concrete`` module is successfully compiled, a ``gfc``
file is produced in the same way as for ``abstract`` modules. The
pair of an ``abstract`` and a corresponding ``concrete`` module
is a **top-level grammar**, which can be used in the GF system to
perform various tasks. The most fundamental tasks are
- **linearization**: take an abstract syntax tree and find the corresponding string
- **parsing**: take a string and find the corresponding abstract syntax
trees (which can be zero, one, or many)
In the current grammar, infinitely many trees and strings are recognized, although
no very interesting ones. For example, the tree
```
Impl (Disj Falsum Falsum) Falsum
```
has the linearization
```
if we have a contradiction or we have a contradiction then we have a contradiction
```
which in turn can be parsed uniquely as that tree.
===Compiling top-level grammars===
When GF compiles the module ``LogicEng`` it also has to compile
all modules that it **depends** on (in this case, just ``Logic``).
The compilation process starts with dependency analysis to find
all these modules, recursively, starting from the explicitly imported one.
The compiler then reads either ``gf`` or ``gfc`` files, in
a dependency order. The decision on which files to read depends on
time stamps and dependencies in a natural way, so that all and only
those modules that have to be compiled are compiled. (This behaviour can
be changed with flags, see below.)
===Using top-level grammars===
To use a top-level grammar in the GF system, one uses the ``import``
command (short name ``i``). For instance,
```
i LogicEng.gf
```
It is also possible to specify the imported grammar(s) on the command
line when invoking GF:
```
gf LogicEng.gf
```
Various **compilation flags** can be added to both ways of compiling a module:
- ``-src`` forces compilation form source files
- ``-v`` gives more verbose information on compilation
- ``-s`` makes compilation silent (except if it fails with an error message)
A complete list of flags can be obtained in GF by ``help i``.
Importing a grammar makes it visible in GF's **internal state**. To see
what modules are available, use the command ``print_options`` (``po``).
You can empty the state with the command ``empty`` (``e``); this is
needed if you want to read in grammars with a different abstract syntax
than the current one without exiting GF.
Grammar modules can reside in different directories. They can then be found
by means of a **search path**, which is a flag such as
```
-path=.:api/toplevel:prelude
```
given to the ``import`` command or the shell command invoking GF.
(It can also be defined in the grammar file; see below.) The compiler
writes every ``gfc`` file in the same directory as the corresponding
``gf`` file.
The ``path`` is relative to the working directory ``pwd``, so that
all directories listed are primarily interpreted as subdirectories of
``pwd``. Secondarily, they are searched relative to the value of the
environment variable ``GF_LIB_PATH``, which is by default set to
``/usr/local/share/GF``.
Parsing and linearization can be performed with the ``parse``
(``p``) and ``linearize`` (``l``) commands, respectively.
For instance,
```
> l Impl (Disj Falsum Falsum) Falsum
if we have a contradiction or we have a contradiction then we have a contradiction
> p -cat=Prop "we have a contradiction"
Falsum
```
Notice that the ``parse`` command needs the parsing category
as a flag. This necessary since a grammar can have several
possible parsing categories ("entry points").
==Multilingual grammar==
One ``abstract`` syntax can have several ``concrete`` syntaxes.
Here are two new ones for ``Logic``:
```
concrete LogicFre of Logic = {
lincat Prop = {s : Str} ;
lin Conj a b = {s = a.s ++ "et" ++ b.s} ;
lin Disj a b = {s = a.s ++ "ou" ++ b.s} ;
lin Impl a b = {s = "si" ++ a.s ++ "alors" ++ b.s} ;
lin Falsum = {s = ["nous avons une contradiction"]} ;
}
concrete LogicSymb of Logic = {
lincat Prop = {s : Str} ;
lin Conj a b = {s = "(" ++ a.s ++ "&" ++ b.s ++ ")"} ;
lin Disj a b = {s = "(" ++ a.s ++ "v" ++ b.s ++ ")"} ;
lin Impl a b = {s = "(" ++ a.s ++ "->" ++ b.s ++ ")"} ;
lin Falsum = {s = "_|_"} ;
}
```
The four modules ``Logic``, ``LogicEng``, ``LogicFre``, and
``LogicSymb`` together form a **multilingual grammar**, in which
it is possible to perform parsing and linearization with respect to any
of the concrete syntaxes. As a combination of parsing and linearization,
one can also perform **translation** from one language to another.
(By **language** we mean the set of expressions generated by one
concrete syntax.)
===Using multilingual grammars===
Any combination of abstract syntax and corresponding concrete syntaxes
is thus a multilingual grammar. With many languages and other enrichments
(as described below), a multilingual grammar easily grows to the size of
tens of modules. The grammar developer, having finished her job, can
package the result in a **multilingual canonical grammar**, a file
with the suffix ``.gfcm``. For instance, to compile the set of grammars
described by now, the following sequence of GF commands can be used:
```
i LogicEng.gf
i LogicFre.gf
i LogicSymb.gf
pm | wf logic.gfcm
```
The "end user" of the grammar only needs the file ``logic.gfcm`` to
access all the functionality of the multilingual grammar. It can be
imported in the GF system in the same way as ``.gf`` files. But
it can also be used in the
[Embedded Java Interpreter for GF http://www.cs.chalmers.se/~bringert/gf/gf-java.html]
to build Java programs of which the multilingual grammar functionalities
(linearization, parsing, translation) form a part.
In a multilingual grammar, the concrete syntax module names work as
names of languages that can be selected for linearization and parsing:
```
> l -lang=LogicFre Impl Falsum Falsum
si nous avons une contradiction alors nous avons une contradiction
> l -lang=LogicSymb Impl Falsum Falsum
( _|_ -> _|_ )
> p -cat=Prop -lang=LogicSymb "( _|_ & _|_ )"
Conj Falsum Falsum
```
The option ``-multi`` gives linearization to all languages:
```
> l -multi Impl Falsum Falsum
if we have a contradiction then we have a contradiction
si nous avons une contradiction alors nous avons une contradiction
( _|_ -> _|_ )
```
Translation can be obtained by using a **pipe** from a parser
to a linearizer:
```
> p -cat=Prop -lang=LogicSymb "( _|_ & _|_ )" | l -lang=LogicEng
if we have a contradiction then we have a contradiction
```
==Resource modules==
The ``concrete`` modules shown above would look much nicer if
we used the main idea of functional programming: avoid repetitive
code by using **functions** that capture repeated patterns of
expressions. A collection of such functions can be a valuable
**resource** for a programmer, reusable in many different
top-level grammars. Thus we introduce the ``resource``
module type, with the first example
```
resource Util = {
oper SS : Type = {s : Str} ;
oper ss : Str -> SS = \s -> {s = s} ;
oper paren : Str -> Str = \s -> "(" ++ s ++ ")" ;
oper infix : Str -> SS -> SS -> SS = \h,x,y ->
ss (x.s ++ h ++ y.s) ;
oper infixp : Str -> SS -> SS -> SS = \h,x,y ->
ss (paren (infix h x y)) ;
}
```
Modules of ``resource`` type have two forms of judgement:
- ``oper`` defining auxiliary operations
- ``param`` defining parameter types
A ``resource`` can be used in a ``concrete`` (or another
``resource``) by ``open``ing it. This means that
all operations (and parameter types) defined in the resource
module become usable in module that opens it. For instance,
we can rewrite the module ``LogicSymb`` much more concisely:
```
concrete LogicSymb of Logic = open Util in {
lincat Prop = SS ;
lin Conj = infixp "&" ;
lin Disj = infixp "v" ;
lin Impl = infixp "->" ;
lin Falsum = ss "_|_" ;
}
```
What happens when this variant of ``LogicSymb`` is
compiled is that the ``oper``-defined constants
of ``Util`` are **inlined** in the
right-hand-sides of the judgements of ``LogicSymb``,
and these expressions are **partially evaluated**, i.e.
computed as far as possible. The generated ``gfc`` file
will look just like the file generated for the first version
of ``LogicSymb`` - at least, it will do the same job.
Several ``resource`` modules can be ``open``ed
at the same time. If the modules contain same names, the
conflict can be resolved by **qualified** opening and
reference. For instance,
```
concrete LogicSymb of Logic = open Util, Prelude in { ...
} ;
```
(where ``Prelude`` is a standard library of GF) brings
into scope two definitions of the constant ``SS``.
To specify which one is used, you can write
``Util.SS`` or ``Prelude.SS`` instead of just ``SS``.
You can also introduce abbreviations to avoid long qualifiers, e.g.
```
concrete LogicSymb of Logic = open (U=Util), (P=Prelude) in { ...
} ;
```
which means that you can write ``U.SS`` and ``P.SS``.
Judgements of ``param`` and ``oper`` forms may also be used
in ``concrete`` modules, and they are then considered local
to those modules, i.e. they are not exported.
===Compiling resource modules===
The compilation of a ``resource`` module differs
from the compilation of ``abstract`` and
``concrete`` modules because ``oper`` operations
do not in general have values in ``gfc``. A ``gfc``
file //is// generated, but it contains only
``param`` judgements (also recall that ``oper``s
are inlined in their top-level use sites, so it is not
necessary to save them in the compiled grammar).
However, since computing the operations over and over
again can be time comsuming, and since type checking
``resource`` modules also takes time, a third kind
of file is generated for resource modules: a ``.gfr``
file. This file is written in the GF source code notation,
but it is type checked and type annotated, and ``oper``s
are computed as far as possible.
If you look at any ``gfc`` or ``gfr`` file generated
by the GF compiler, you see that all names have been replaced by
their qualified variants. This is an important first step (after parsing)
the compiler does. As for the commands in the GF shell, some output
qualified names and some not. The difference does not always result
from firm principles.
===Using resource modules===
The typical use is through ``open`` in a
``concrete`` module, which means that
``resource`` modules are not imported on their own.
However, in the developing and testing phase of grammars, it
can be useful to evaluate ``oper``s with different
arguments. To prevent them from being thrown away after inlining, the
``-retain`` option can be used:
```
> i -retain Util.gf
```
The command ``compute_concrete`` (``cc``)
can now be used for evaluating expressions that may contain
operations defined in ``Util``:
```
> cc ss (paren "foo")
{s = "(" ++ "foo" ++ ")"}
```
To find out what ``oper``s are available for a given type,
the command ``show_operations`` (``so``) can be used:
```
> so SS
Util.ss : Str -> SS ;
Util.infix : Str -> SS -> SS -> SS ;
Util.infixp : Str -> SS -> SS -> SS ;
```
==Inheritance==
The most characteristic modularity of GF lies in the division of
grammars into ``abstract``, ``concrete``, and
``resource`` modules. This permits writing multilingual
grammar and sharing the maximum of code between different
languages.
In addition to this special kind of modularity, GF provides **inheritance**,
which is familiar from other programming languages (in particular,
object-oriented ones). Inheritance means that a module inherits all
judgements from another module; we also say that it **extends**
the other module. Inheritance is useful to divide big grammars into
smaller units, and also to reuse the same units in different bigger
grammars.
The first example of inheritance is for abstract syntax. Let us
extend the module ``Logic`` to ``Arithmetic``:
```
abstract Arithmetic = Logic ** {
cat Nat ;
fun Even : Nat -> Prop ;
fun Odd : Nat -> Prop ;
fun Zero : Nat ;
fun Succ : Nat -> Nat ;
}
```
In parallel with the extension of the abstract syntax
``Logic`` to ``Arithmetic``, we can extend
the concrete syntax ``LogicEng`` to ``ArithmeticEng``:
```
concrete ArithmeticEng of Arithmetic = LogicEng ** open Util in {
lincat Nat = SS ;
lin Even x = ss (x.s ++ "is" ++ "even") ;
lin Odd x = ss (x.s ++ "is" ++ "odd") ;
lin Zero = ss "zero" ;
lin Succ x = ss ("the" ++ "successor" ++ "of" ++ x.s) ;
}
```
Another extension of ``Logic`` is ``Geometry``,
```
abstract Geometry = Logic ** {
cat Point ;
cat Line ;
fun Incident : Point -> Line -> Prop ;
}
```
The corresponding concrete syntax is left as exercise.
===Multiple inheritance===
Inheritance can be **multiple**, which means that a module
may extend many modules at the same time. Suppose, for instance,
that we want to build a module for mathematics covering both
arithmetic and geometry, and the underlying logic. We then write
```
abstract Mathematics = Arithmetic, Geometry ** {
} ;
```
We could of course add some new judgements in this module, but
it is not necessary to do so. If no new judgements are added, the
module body can be omitted:
```
abstract Mathematics = Arithmetic, Geometry ;
```
The module ``Mathematics`` shows that it is possibe
to extend a module already built by extension. The correctness
criterion for extensions is that the same name
(``cat``, ``fun``, ``oper``, or ``param``)
may not be defined twice in the resulting union of names.
That the names defined in ``Logic`` are "inherited twice"
by ``Mathematics`` (via both ``Arithmetic`` and
``Geometry``) is no violation of this rule; the usual
problems of multiple inheritance do not arise, since
the definitions of inherited constants cannot be changed.
===Restricted inheritance===
Inheritance can be **restricted**, which means that only some of
the constants are inherited. There are two dual notations for this:
```
A [f,g]
```
meaning that //only// ``f`` and ``g`` are inherited from ``A``, and
```
A-[f,g]
```
meaning that //everything except// ``f`` is ``g`` are inherited from ``A``.
Constants that are not inherited may be redefined in the inheriting module.
===Compiling inheritance===
Inherited judgements are not copied into the inheriting modules.
Instead, an **indirection** is created for each inherited name,
as can be seen by looking into the generated ``gfc`` (and
``gfr``) files. Thus for instance the names
```
Mathematics.Prop Arithmetic.Prop Geometry.Prop Logic.Prop
```
all refer to the same category, declared in the module
``Logic``.
===Inspecting grammar hierarchies===
The command ``visualize_graph`` (``vg``) shows the
dependency graph in the current GF shell state. The graph can
also be saved in a file and used e.g. in documentation, by the
command ``print_multi -graph`` (``pm -graph``).
The ``vg`` command uses the free software packages Graphviz (commad ``dot``)
and Ghostscript (command ``gv``).
==Reuse of top-level grammars as resources==
Top-level grammars have a straightforward translation to
``resource`` modules. The translation concerns
pairs of abstract-concrete judgements:
```
cat C ; ===> oper C : Type = T ;
lincat C = T ;
fun f : A ; ===> oper f : A = t ;
lin f = t ;
```
Due to this translation, a ``concrete`` module
can be ``open``ed in the same way as a
``resource`` module; the translation is done
on the fly (it is computationally very cheap).
Modular grammar engineering often means that some grammarians
focus on the semantics of the domain whereas others take care
of linguistic details. Thus a typical reuse opens a
linguistically oriented **resource grammar**,
```
abstract Resource = {
cat S ; NP ; A ;
fun PredA : NP -> A -> S ;
}
concrete ResourceEng of Resource = {
lincat S = ... ;
lin PredA = ... ;
}
```
The **application grammar**, instead of giving linearizations
explicitly, just reduces them to categories and functions in the
resource grammar:
```
concrete ArithmeticEng of Arithmetic = LogicEng ** open ResourceEng in {
lincat Nat = NP ;
lin Even x = PredA x (regA "even") ;
}
```
If the resource grammar is only capable of generating grammatically
correct expressions, then the grammaticality of the application
grammar is also guaranteed: the type checker of GF is used as
grammar checker.
To guarantee distinctions between categories that have
the same linearization type, the actual translation used
in GF adds to every linearization type and linearization
a **lock field**,
```
cat C ; ===> oper C : Type = T ** {lock_C : {}} ;
lincat C = T ;
fun f : C_1 ... C_n -> C ; ===> oper f : C_1 ... C_n -> C = \x_1,...,x_n ->
lin f = t ; t x_1 ... x_n ** {lock_C = &lt;>};
```
(Notice that the latter translation is type-correct because of
record subtyping, which means that ``t`` can ignore the
lock fields of its arguments.) An application grammarian who
only uses resource grammar categories and functions never
needs to write these lock fields herself. Having to do so
serves as a warning that the grammaticality guarantee given
by the resource grammar no longer holds.
**Note**. The lock field mechanism is experimental, and may be changed
to a stronger abstraction mechnism in the future. This may result in
hand-written lock fields ceasing to work.
=Additional module types=
==Interfaces, instances, and incomplete grammars==
One difference between top-level grammars and ``resource``
modules is that the former systematically separete the
declarations of categories and functions from their definitions.
In the reuse translation creating and ``oper`` judgement,
the declaration coming from the ``abstract`` module is put
together with the definition coming from the ``concrete``
module.
However, the separation of declarations and definitions is so
useful a notion that GF also has specific modules types that
``resource`` modules into two parts. In this splitting,
an ``interface`` module corresponds to an abstract syntax,
in giving the declarations of operations (and parameter types).
For instance, a generic markup interface would look as follows:
```
interface Markup = open Util in {
oper Boldface : Str -> Str ;
oper Heading : Str -> Str ;
oper markupSS : (Str -> Str) -> SS -> SS = \f,r ->
ss (f r.s) ;
}
```
The definitions of the constants declared in an ``interface``
are given in an ``instance`` module (which is always ``of``
an interface, in the same way as a ``concrete`` is always
``of`` an abstract). The following ``instance``s
define markup in HTML and latex.
```
instance MarkupHTML of Markup = open Util in {
oper Boldface s = "&lt;b>" ++ s ++ "&lt;/b>" ;
oper Heading s = "&lt;h2>" ++ s ++ "&lt;/h2>" ;
}
instance MarkupLatex of Markup = open Util in {
oper Boldface s = "\\textbf{" ++ s ++ "}" ;
oper Heading s = "\\section{" ++ s ++ "}" ;
}
```
Notice that both ``interface``s and ``instance``s may
``open`` ``resource``s (and also reused top-level grammars).
An ``interface`` may moreover define some of the operations it
declares; these definitions are inherited by all instances and cannot
be changed in them. Inheritance by module extension
is possible, as always, between modules of the same type.
===Using an interface===
An ``interface`` or an ``instance``
can be ``open``ed in
a ``concrete`` using the same syntax as when opening
a ``resource``. For an ``instance``, the semantics
is the same as when opening the definitions together with
the type signatures - one can think of an ``interface``
and an ``instance`` of it together forming an ordinary
``resource``. Opening an ``interface``, however,
is different: functions that are only declared without
having a definition cannot be compiled (inlined); neither
can functions whose definitions depend on undefined functions.
A module that ``open``s an ``interface`` is therefore
**incomplete**, and has to be **completed** with an
``instance`` of the interface to become complete. To make
this situation clear, GF requires any module that opens an
``interface`` to be marked as ``incomplete``. Thus
the module
```
incomplete concrete DocMarkup of Doc = open Markup in {
...
}
```
uses the interface ``Markup`` to place markup in
chosen places in its linearization rules, but the
implementation of markup - whether in HTML or in LaTeX - is
left unspecified. This is a powerful way of sharing
the code of a whole module with just differences in
the definitions of some constants.
Another terminology for ``incomplete`` modules is
**parametrized modules** or **functors**.
The ``interface`` gives the list of parameters
that the functor depends on.
===Instantiating an interface===
To complete an ``incomplete`` module, each ``inteface``
that it opens has to be provided an ``instance``. The following
syntax is used for this:
```
concrete DocHTML of Doc = DocMarkup with (Markup = MarkupHTML) ;
```
Instantiation of ``Markup`` with ``MarkupLatex`` is
another one-liner.
If more interfaces than one are instantiated, a comma-separated
list of equations in parentheses is used, e.g.
```
concrete MusicIta = MusicI with
(Syntax = SyntaxIta), (LexMusic = LexMusicIta) ;
```
This example shows a common design pattern for building applications:
the concrete syntax is a functor on the generic resource grammar library
interface ``Syntax`` and a domain-specific lexicon interface, here
``LexMusic``.
All interfaces that are ``open``ed in the completed model
must be completed.
Notice that the completion of an ``incomplete`` module
may at the same time extend modules of the same type (which need
not be completions). It can also add new judgements in a module body,
and restrict inheritance from the functor.
```
concrete MusicIta = MusicI - [f] with
(Syntax = SyntaxIta), (LexMusic = LexMusicIta) ** {
lin f = ...
} ;
```
===Compiling interfaces, instances, and parametrized modules===
Interfaces, instances, and parametric modules are purely a
front-end feature of GF: these module types do not exist in
the ``gfc`` and ``gfr`` formats. The compiler has
nevertheless to keep track of their dependencies and modification
times. Here is a summary of how they are compiled:
- an ``interface`` is compiled into a ``resource`` with an empty body
- an ``instance`` is compiled into a ``resource`` in union with its
``interface``
- an ``incomplete`` module (``concrete`` or ``resource``) is compiled
into a module of the same type with an empty body
- a completion module (``concrete`` or ``resource``) is compiled
into a module of the same type by compiling its functor so that, instead of
each ``interface``, its given ``instance`` is used
This means that some generated code is duplicated, because those operations that
do have complete definitions in an ``interface`` are copied to each of
the ``instances``.
=Summary of module syntax and semantics=
==Abstract syntax modules==
Syntax:
``abstract`` A ``=`` (A#SUB1,...,A#SUBn ``**``)?
``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }``
where
- i >= 0
- each //A#SUBi// is itself an abstract module,
possibly with restrictions on inheritance, i.e. //A#SUBi//``-[``//f,..,g//``]``
or //A#SUBi//``[``//f,..,g//``]``
- each //J#SUBi// is a judgement of one of the forms
``cat, fun, def, data``
Semantic conditions:
- all inherited names declared in each //A#SUBi// and //A// must be distinct
- names in restriction lists must be defined in the restricted module
- inherited constants may not depend on names excluded by restriction
==Concrete syntax modules==
Syntax:
``incomplete``? ``concrete`` C ``of`` A ``=``
(C#SUB1,...,C#SUBn ``**``)?
(``open`` O#SUB1,...,O#SUBk ``in``)?
``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }``
where
- i >= 0
- //A// is an abstract module
- each //C#SUBi// is a concrete module,
possibly with restrictions on inheritance, i.e. //C#SUBi//``-[``//f,..,g//``]``
- each //O#SUBi// is an open specification, of one of the forms
- //R//
- ``(``//Q//``=``//R//``)``
where //R// is a resource, instance, or concrete, and //Q// is any identifier
- each //J#SUBi// is a judgement of one of the forms
``lincat, lin, lindef, printname``; also the forms ``oper, param`` are
allowed, but they cannot be inherited.
If the modifier ``incomplete`` appears, then any //R// in
an open specification may also be an interface or an abstract.
Semantic conditions:
- each ``cat`` judgement in //A//
must have a corresponding, unique
``lincat`` judgement in //C//
- each ``fun`` judgement in //A//
must have a corresponding, unique
``lin`` judgement in //C//
- names in restriction lists must be defined in the restricted module
- inherited constants may not depend on names excluded by restriction
==Resource modules==
Syntax:
``resource`` R ``=``
(R#SUB1,...,R#SUBn ``**``)?
(``open`` O#SUB1,...,O#SUBk ``in``)?
``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }``
where
- i >= 0
- each //R#SUBi// is a resource, instance, or concrete module,
possibly with restrictions on inheritance, i.e. //R#SUBi//``-[``//f,..,g//``]``
- each //O#SUBi// is an open specification, of one of the forms
- //P//
- ``(``//Q//``=``//R//``)``
where //P// is a resource, instance, or concrete, and //Q// is any identifier
- each //J#SUBi// is a judgement of one of the forms ``oper, param``
Semantic conditions:
- all names defined in each //R#SUBi// and //R// must be distinct
- all constants declared must have a definition
- names in restriction lists must be defined in the restricted module
- inherited constants may not depend on names excluded by restriction
==Interface modules==
Syntax:
``interface`` R ``=``
(R#SUB1,...,R#SUBn ``**``)?
(``open`` O#SUB1,...,O#SUBk ``in``)?
``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }``
where
- i >= 0
- each //R#SUBi// is an interface or abstract module,
possibly with restrictions on inheritance, i.e. //R#SUBi//``-[``//f,..,g//``]``
- each //O#SUBi// is an open specification, of one of the forms
- //P//
- ``(``//Q//``=``//R//``)``
where //P// is a resource, instance, or concrete, and //Q// is any identifier
- each //J#SUBi// is a judgement of one of the forms ``oper, param``
Semantic conditions:
- all names declared in each //R#SUBi// and //R// must be distinct
- names in restriction lists must be defined in the restricted module
- inherited constants may not depend on names excluded by restriction
==Instance modules==
Syntax:
``instance`` R ``of`` I ``=``
(R#SUB1,...,R#SUBn ``**``)?
(``open`` O#SUB1,...,O#SUBk ``in``)?
``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }``
where
- i >= 0
- //I// is an interface module
- each //R#SUBi// is an instance, resource, or concrete module,
possibly with restrictions on inheritance, i.e. //R#SUBi//``-[``//f,..,g//``]``
- each //O#SUBi// is an open specification, of one of the forms
- //P//
- ``(``//Q//``=``//R//``)``
where //P// is a resource, instance, or concrete, and //Q// is any identifier
- each //J#SUBi// is a judgement of one of the forms
``oper, param``
Semantic conditions:
- all names declared in each //R#SUBi//, //I//, and //R// must be distinct
- all constants declared in //I// must have a definition either in
//I// or //R//
- names in restriction lists must be defined in the restricted module
- inherited constants may not depend on names excluded by restriction
==Instantiated concrete syntax modules==
Syntax:
``concrete`` C ``of`` A ``=``
(C#SUB1,...,C#SUBn ``**``)?
B
``with``
``(``I#SUB1 ``=``J#SUB1``),`` ...
``, (``I#SUBp ``=``J#SUBp``)``
(``-``? ``[``c#SUB1,...,c#SUBq ``]``)?
(``**``?
(``open`` O#SUB1,...,O#SUBk ``in``)?
``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }``)? ``;``
where
- i >= 0
- //A// is an abstract module
- each //C#SUBi// is a concrete module,
possibly with restrictions on inheritance, i.e. //R#SUBi//``-[``//f,..,g//``]``
- //B// is an incomplete concrete syntax of //A//
- each //I#SUBi// is an interface or an abstract
- each //J#SUBi// is an instance or a concrete of //I#SUBi//
- each //O#SUBi// is an open specification, of one of the forms
- //R//
- ``(``//Q//``=``//R//``)``
where //R// is a resource, instance, or concrete, and //Q// is any identifier
- each //J#SUBi// is a judgement of one of the forms
``lincat, lin, lindef, printname``; also the forms ``oper, param`` are
allowed, but they cannot be inherited.

View File

@@ -1,300 +0,0 @@
==Texts. phrases, and utterances==
The outermost linguistic structure is ``Text``. ``Text``s are composed
from Phrases (``Phr``) followed by punctuation marks - either of ".", "?" or
"!" (with their proper variants in Spanish and Arabic). Here is an
example of a ``Text`` string.
```
John walks. Why? He doesn't want to sleep!
```
Phrases are mostly built from Utterances (``Utt``), which in turn are
declarative sentences, questions, or imperatives - but there
are also "one-word utterances" consisting of noun phrases
or other subsentential phrases. Some Phrases are atomic,
for instance "yes" and "no". Here are some examples of Phrases.
```
yes
come on, John
but John walks
give me the stick please
don't you know that he is sleeping
a glass of wine
a glass of wine please
```
There is no connection between the punctuation marks and the
types of utterances. This reflects the fact that the punctuation
mark in a real text is selected as a function of the speech act
rather than the grammatical form of an utterance. The following
text is thus well-formed.
```
John walks. John walks? John walks!
```
What is the difference between Phrase and Utterance? Just technical:
a Phrase is an Utterance with an optional leading conjunction ("but")
and an optional tailing vocative ("John", "please").
==Sentences and clauses==
TODO: use overloaded operations in the examples.
The richest of the categories below Utterance is ``S``, Sentence. A Sentence
is formed from a Clause (``Cl``), by fixing its Tense, Anteriority, and Polarity.
For example, each of the following strings has a distinct syntax tree
in the category Sentence:
```
John walks
John doesn't walk
John walked
John didn't walk
John has walked
John hasn't walked
John will walk
John won't walk
...
```
whereas in the category Clause all of them are just different forms of
the same tree.
The difference between Sentence and Clause is thus also rather technical.
It may not correspond exactly to any standard usage of the terms
"clause" and "sentence".
Figure 1 shows a type-annotated syntax tree of the Text "John walks."
and gives an overview of the structural levels.
#BFIG
```
Node Constructor Value type Other constructors
-----------------------------------------------------------
1. TFullStop Text TQuestMark
2. (PhrUtt Phr
3. NoPConj PConj but_PConj
4. (UttS Utt UttQS
5. (UseCl S UseQCl
6. TPres Tense TPast
7. ASimul Anter AAnter
8. PPos Pol PNeg
9. (PredVP Cl
10. (UsePN NP UsePron, DetCN
11. john_PN) PN mary_PN
12. (UseV VP ComplV2, ComplV3
13. walk_V)))) V sleep_V
14. NoVoc) Voc please_Voc
15. TEmpty Text
```
#BCENTER
Figure 1. Type-annotated syntax tree of the Text "John walks."
#ECENTER
#EFIG
Here are some examples of the results of changing constructors.
```
1. TFullStop -> TQuestMark John walks?
3. NoPConj -> but_PConj But John walks.
6. TPres -> TPast John walked.
7. ASimul -> AAnter John has walked.
8. PPos -> PNeg John doesn't walk.
11. john_PN -> mary_PN Mary walks.
13. walk_V -> sleep_V John sleeps.
14. NoVoc -> please_Voc John sleeps please.
```
All constructors cannot of course be changed so freely, because the
resulting tree would not remain well-typed. Here are some changes involving
many constructors:
```
4- 5. UttS (UseCl ...) ->
UttQS (UseQCl (... QuestCl ...)) Does John walk?
10-11. UsePN john_PN ->
UsePron we_Pron We walk.
12-13. UseV walk_V ->
ComplV2 love_V2 this_NP John loves this.
```
==Parts of sentences==
The linguistic phenomena mostly discussed in both traditional grammars and modern
syntax belong to the level of Clauses, that is, lines 9-13, and occasionally
to Sentences, lines 5-13. At this level, the major categories are
``NP`` (Noun Phrase) and ``VP`` (Verb Phrase). A Clause typically
consists of just an ``NP`` and a ``VP``.
The internal structure of both ``NP`` and ``VP`` can be very complex,
and these categories are mutually recursive: not only can a ``VP``
contain an ``NP``,
```
[VP loves [NP Mary]]
```
but also an ``NP`` can contain a ``VP``
```
[NP every man [RS who [VP walks]]]
```
(a labelled bracketing like this is of course just a rough approximation of
a GF syntax tree, but still a useful device of exposition).
Most of the resource modules thus define functions that are used inside
NPs and VPs. Here is a brief overview:
**Noun**. How to construct NPs. The main three mechanisms
for constructing NPs are
- from proper names: "John"
- from pronouns: "we"
- from common nouns by determiners: "this man"
The ``Noun`` module also defines the construction of common nouns.
The most frequent ways are
- lexical noun items: "man"
- adjectival modification: "old man"
- relative clause modification: "man who sleeps"
- application of relational nouns: "successor of the number"
**Verb**.
How to construct VPs. The main mechanism is verbs with their arguments,
for instance,
- one-place verbs: "walks"
- two-place verbs: "loves Mary"
- three-place verbs: "gives her a kiss"
- sentence-complement verbs: "says that it is cold"
- VP-complement verbs: "wants to give her a kiss"
A special verb is the copula, "be" in English but not even realized
by a verb in all languages.
A copula can take different kinds of complement:
- an adjectival phrase: "(John is) old"
- an adverb: "(John is) here"
- a noun phrase: "(John is) a man"
**Adjective**.
How to constuct ``AP``s. The main ways are
- positive forms of adjectives: "old"
- comparative forms with object of comparison: "older than John"
**Adverb**.
How to construct ``Adv``s. The main ways are
- from adjectives: "slowly"
- as prepositional phrases: "in the car"
==Modules and their names==
This section is not necessary for users of the library.
TODO: explain the overloaded API.
The resource modules are named after the kind of
phrases that are constructed in them,
and they can be roughly classified by the "level" or "size" of expressions that are
formed in them:
- Larger than sentence: ``Text``, ``Phrase``
- Same level as sentence: ``Sentence``, ``Question``, ``Relative``
- Parts of sentence: ``Adjective``, ``Adverb``, ``Noun``, ``Verb``
- Cross-cut (coordination): ``Conjunction``
Because of mutual recursion such as in embedded sentences, this classification is
not a complete order. However, no mutual dependence is needed between the
modules themselves - they can all be compiled separately. This is due
to the module ``Cat``, which defines the type system common to the other modules.
For instance, the types ``NP`` and ``VP`` are defined in ``Cat``,
and the module ``Verb`` only
needs to know what is given in ``Cat``, not what is given in ``Noun``. To implement
a rule such as
```
Verb.ComplV2 : V2 -> NP -> VP
```
it is enough to know the linearization type of ``NP``
(as well as those of ``V2`` and ``VP``, all
given in ``Cat``). It is not necessary to know what
ways there are to build ``NP``s (given in ``Noun``), since all these ways must
conform to the linearization type defined in ``Cat``. Thus the format of
category-specific modules is as follows:
```
abstract Adjective = Cat ** {...}
abstract Noun = Cat ** {...}
abstract Verb = Cat ** {...}
```
==Top-level grammar and lexicon==
The module ``Grammar`` collects all the category-specific modules into
a complete grammar:
```
abstract Grammar =
Adjective, Noun, Verb, ..., Structural, Idiom
```
The module ``Structural`` is a lexicon of structural words (function words),
such as determiners.
The module ``Idiom`` is a collection of idiomatic structures whose
implementation is very language-dependent. An example is existential
structures ("there is", "es gibt", "il y a", etc).
The module ``Lang`` combines ``Grammar`` with a ``Lexicon`` of
ca. 350 content words:
```
abstract Lang = Grammar, Lexicon
```
Using ``Lang`` instead of ``Grammar`` as a library may give
for free some words needed in an application. But its main purpose is to
help testing the resource library, rather than as a resource itself.
It does not even seem realistic to develop
a general-purpose multilingual resource lexicon.
The diagram in Figure 2 shows the structure of the API.
#BFIG
#GRAMMAR
#BCENTER
Figure 2. The resource syntax API.
#ECENTER
#EFIG
==Language-specific syntactic structures==
The API collected in ``Grammar`` has been designed to be implementable for
all languages in the resource package. It does contain some rules that
are strange or superfluous in some languages; for instance, the distinction
between definite and indefinite articles does not apply to Finnish and Russian.
But such rules are still easy to implement: they only create some superfluous
ambiguity in the languages in question.
But the library makes no claim that all languages should have exactly the same
abstract syntax. The common API is therefore extended by language-dependent
rules. The top level of each languages looks as follows (with English as example):
```
abstract English = Grammar, ExtraEngAbs, DictEngAbs
```
where ``ExtraEngAbs`` is a collection of syntactic structures specific to English,
and ``DictEngAbs`` is an English dictionary
(at the moment, it consists of ``IrregEngAbs``,
the irregular verbs of English). Each of these language-specific grammars has
the potential to grow into a full-scale grammar of the language. These grammars
can also be used as libraries, but the possibility of using functors is lost.
To give a better overview of language-specific structures,
modules like ``ExtraEngAbs``
are built from a language-independent module ``ExtraAbs``
by restricted inheritance:
```
abstract ExtraEngAbs = Extra [f,g,...]
```
Thus any category and function in ``Extra`` may be shared by a subset of all
languages. One can see this set-up as a matrix, which tells
what ``Extra`` structures
are implemented in what languages. For the common API in ``Grammar``, the matrix
is filled with 1's (everything is implemented in every language).
Language-specific extensions and the use of restricted
inheritance is a recent addition to the resource grammar library, and
has only been exploited in a very small scale so far.