mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-09 04:59:31 -06:00
more rm in doc
This commit is contained in:
@@ -1,221 +0,0 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<HTML>
|
||||
<HEAD>
|
||||
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
|
||||
<TITLE>Graduate Course: GF (Grammatical Framework)</TITLE>
|
||||
</HEAD><BODY BGCOLOR="white" TEXT="black">
|
||||
<P ALIGN="center"><CENTER><H1>Graduate Course: GF (Grammatical Framework)</H1>
|
||||
<FONT SIZE="4">
|
||||
<I>Aarne Ranta</I><BR>
|
||||
Wed Oct 24 09:49:27 2007
|
||||
</FONT></CENTER>
|
||||
|
||||
<P>
|
||||
<A HREF="http://www.gslt.hum.gu.se">GSLT</A>,
|
||||
<A HREF="http://ngslt.org/">NGSLT</A>,
|
||||
and
|
||||
<A HREF="http://www.chalmers.se/cse/EN/">Department of Computer Science and Engineering</A>,
|
||||
Chalmers University of Technology and Gothenburg University.
|
||||
</P>
|
||||
<P>
|
||||
Autumn Term 2007.
|
||||
</P>
|
||||
<H1>News</H1>
|
||||
<P>
|
||||
24/10 Tomorrow's session starts at 8.15. A detailed plan has been added to
|
||||
the table below. Material (new chapters) will appear later today.
|
||||
It will explain some of the files in
|
||||
</P>
|
||||
<UL>
|
||||
<LI><A HREF="http://digitalgrammars.com/gf/examples/tutorial/syntax/"><CODE>syntax/</CODE></A>:
|
||||
linguistic grammar programming
|
||||
<LI><A HREF="http://digitalgrammars.com/gf/examples/tutorial/semantics/"><CODE>semantics/</CODE></A>:
|
||||
a question-answer system based on logical semantics
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
12/9 The course starts tomorrow at 8.00. A detailed plan for the day is
|
||||
right below. Don't forget to
|
||||
</P>
|
||||
<UL>
|
||||
<LI>join the mailing list (send a mail to <CODE>gf-subscribe at gslt hum gu se</CODE>)
|
||||
<LI>install GF on your laptops from <A HREF="../download.html">here</A>
|
||||
<LI>take with you a copy of the book (as sent to the mailing list yesterday)
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
31/8 Revised the description of the one- and five-point variants.
|
||||
</P>
|
||||
<P>
|
||||
21/8 Course mailing list started.
|
||||
To subscribe, send a mail to <CODE>gf-subscribe at gslt hum gu se</CODE>
|
||||
(replacing spaces by dots except around the word at, where the spaces
|
||||
are just removed, and the word itself is replaced by the at symbol).
|
||||
</P>
|
||||
<P>
|
||||
20/8/2007 <A HREF="http://www.gslt.hum.gu.se/courses/schedule.html">Schedule</A>.
|
||||
The course will start on Thursday 13 September in Room C430 at the Humanities
|
||||
Building of Gothenburg University ("Humanisten").
|
||||
</P>
|
||||
<H1>Plan</H1>
|
||||
<P>
|
||||
First week (13-14/9)
|
||||
</P>
|
||||
<TABLE CELLPADDING="4" BORDER="1">
|
||||
<TR>
|
||||
<TH>Time</TH>
|
||||
<TH>Subject</TH>
|
||||
<TH COLSPAN="2">Assignment</TH>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD>Thu 8.00-9.30</TD>
|
||||
<TD>Chapters 1-3</TD>
|
||||
<TD>Hello and Food in a new language</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD>Thu 10.00-11.30</TD>
|
||||
<TD>Chapters 3-4</TD>
|
||||
<TD>Foods in a new language</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD>Thu 13.15-14.45</TD>
|
||||
<TD>Chapter 5</TD>
|
||||
<TD>ExtFoods in a new language</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD>Thu 15.15-16.45</TD>
|
||||
<TD>Chapters 6-7</TD>
|
||||
<TD>straight code compiler</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD>Fri 8.00-9.30</TD>
|
||||
<TD>Chapters 8</TD>
|
||||
<TD>application in Haskell or Java</TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
|
||||
<P></P>
|
||||
<P>
|
||||
Second week (25/10)
|
||||
</P>
|
||||
<TABLE CELLPADDING="4" BORDER="1">
|
||||
<TR>
|
||||
<TH>Time</TH>
|
||||
<TH>Subject</TH>
|
||||
<TH COLSPAN="2">Assignment</TH>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD>Thu 8.15-9.45</TD>
|
||||
<TD>Chapters 13-15</TD>
|
||||
<TD>mini resource in a new language</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD>Thu 10.15-11.45</TD>
|
||||
<TD>Chapters 12,16</TD>
|
||||
<TD>query system for a new domain</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD>Thu 13.15-14.45</TD>
|
||||
<TD>presentations</TD>
|
||||
<TD>explain your own project</TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
|
||||
<P></P>
|
||||
<P>
|
||||
The structure of each lecture will be the following:
|
||||
</P>
|
||||
<UL>
|
||||
<LI>ca. 75min lecture, going through the book
|
||||
<LI>ca. 15min work on computer, individually or in pairs
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
In order for this to work out, it is important that enough many
|
||||
have a working GF installation, including the directory
|
||||
<A HREF="../examples/tutorial"><CODE>examples/tutorial</CODE></A>. This directory is
|
||||
included in the Darcs version, as well as in the updated binary
|
||||
packages from 12 September.
|
||||
</P>
|
||||
<H1>Purpose</H1>
|
||||
<P>
|
||||
<A HREF="http://www.cs.chalmers.se/~aarne/GF/">GF</A>
|
||||
(Grammatical Framework) is a grammar formalism, i.e. a special-purpose
|
||||
programming language for writing grammars. It is suitable for many
|
||||
natural language processing tasks, in particular,
|
||||
</P>
|
||||
<UL>
|
||||
<LI>multilingual applications
|
||||
<LI>systems where grammar-based components are needed for e.g.
|
||||
parsing, translation, or speech recognition
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
The goal of the course is to develop an understanding of GF and
|
||||
practical skills in using it.
|
||||
</P>
|
||||
<H1>Contents</H1>
|
||||
<P>
|
||||
The course consists of two modules. The first module is a one-week
|
||||
intensive course (during the first intensive week of GSLT), which
|
||||
is as such usable as a one-week intensive course for doctoral studies,
|
||||
if completed with a small course project.
|
||||
</P>
|
||||
<P>
|
||||
The second module is a larger programming project, written
|
||||
by each student (possibly working in groups) during the Autumn term.
|
||||
The projects are discussed during the second intensive week of GSLT
|
||||
(see <A HREF="http://www.gslt.hum.gu.se/courses/schedule.html">schedule</A>),
|
||||
and presented at a date that will be set later.
|
||||
</P>
|
||||
<P>
|
||||
The first module goes through the basics of GF, including
|
||||
</P>
|
||||
<UL>
|
||||
<LI>using the GF programming language
|
||||
<LI>writing multilingual grammars
|
||||
<LI>using the
|
||||
<A HREF="http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/doc/">GF resource grammar library</A>
|
||||
<LI>generating speech recognition systems from GF grammars
|
||||
<LI>using embedded grammars as components of software systems
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
The lectures follow a draft of GF book. It contains a heavily updated
|
||||
version os the
|
||||
<A HREF="http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html">GF Tutorial</A>;
|
||||
thus the on-line tutorial is not adequate for this course. To get the course
|
||||
book, join the course mailing list.
|
||||
</P>
|
||||
<P>
|
||||
Those who just want to do the first module will write a simple application
|
||||
as their course work during and after the first intensive week.
|
||||
</P>
|
||||
<P>
|
||||
Those who continue with the second module will choose a more substantial
|
||||
project. Possible topics are
|
||||
</P>
|
||||
<UL>
|
||||
<LI>building a dialogue system by using GF
|
||||
<LI>implementing a multilingual document generator
|
||||
<LI>experimenting with synthetized multilingual tree banks
|
||||
<LI>extending the GF resource grammar library
|
||||
</UL>
|
||||
|
||||
<H1>Prerequisites</H1>
|
||||
<P>
|
||||
Experience in programming. No earlier natural language processing
|
||||
or functional programming experience is necessary.
|
||||
</P>
|
||||
<P>
|
||||
The course is thus suitable both for GSLT and NGSLT students,
|
||||
and for graduate students in computer science.
|
||||
</P>
|
||||
<P>
|
||||
We will in particular welcome students from the Baltic countries
|
||||
who wish to build resources for their own language in GF.
|
||||
</P>
|
||||
|
||||
<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
|
||||
<!-- cmdline: txt2tags gf-course.txt -->
|
||||
</BODY></HTML>
|
||||
@@ -1,149 +0,0 @@
|
||||
Graduate Course: GF (Grammatical Framework)
|
||||
Aarne Ranta
|
||||
%%date(%c)
|
||||
|
||||
% NOTE: this is a txt2tags file.
|
||||
% Create an html file from this file using:
|
||||
% txt2tags -thtml --toc gf-reference.html
|
||||
|
||||
%!target:html
|
||||
|
||||
[GSLT http://www.gslt.hum.gu.se],
|
||||
[NGSLT http://ngslt.org/],
|
||||
and
|
||||
[Department of Computer Science and Engineering http://www.chalmers.se/cse/EN/],
|
||||
Chalmers University of Technology and Gothenburg University.
|
||||
|
||||
Autumn Term 2007.
|
||||
|
||||
|
||||
=News=
|
||||
|
||||
24/10 Tomorrow's session starts at 8.15. A detailed plan has been added to
|
||||
the table below. Material (new chapters) will appear later today.
|
||||
It will explain some of the files in
|
||||
- [``syntax/`` http://digitalgrammars.com/gf/examples/tutorial/syntax/]:
|
||||
linguistic grammar programming
|
||||
- [``semantics/`` http://digitalgrammars.com/gf/examples/tutorial/semantics/]:
|
||||
a question-answer system based on logical semantics
|
||||
|
||||
|
||||
|
||||
12/9 The course starts tomorrow at 8.00. A detailed plan for the day is
|
||||
right below. Don't forget to
|
||||
- join the mailing list (send a mail to ``gf-subscribe at gslt hum gu se``)
|
||||
- install GF on your laptops from [here ../download.html]
|
||||
- take with you a copy of the book (as sent to the mailing list yesterday)
|
||||
|
||||
|
||||
31/8 Revised the description of the one- and five-point variants.
|
||||
|
||||
21/8 Course mailing list started.
|
||||
To subscribe, send a mail to ``gf-subscribe at gslt hum gu se``
|
||||
(replacing spaces by dots except around the word at, where the spaces
|
||||
are just removed, and the word itself is replaced by the at symbol).
|
||||
|
||||
20/8/2007 [Schedule http://www.gslt.hum.gu.se/courses/schedule.html].
|
||||
The course will start on Thursday 13 September in Room C430 at the Humanities
|
||||
Building of Gothenburg University ("Humanisten").
|
||||
|
||||
|
||||
=Plan=
|
||||
|
||||
First week (13-14/9)
|
||||
|
||||
|| Time | Subject | Assignment ||
|
||||
| Thu 8.00-9.30 | Chapters 1-3 | Hello and Food in a new language |
|
||||
| Thu 10.00-11.30 | Chapters 3-4 | Foods in a new language |
|
||||
| Thu 13.15-14.45 | Chapter 5 | ExtFoods in a new language |
|
||||
| Thu 15.15-16.45 | Chapters 6-7 | straight code compiler |
|
||||
| Fri 8.00-9.30 | Chapters 8 | application in Haskell or Java |
|
||||
|
||||
Second week (25/10)
|
||||
|
||||
|| Time | Subject | Assignment ||
|
||||
| Thu 8.15-9.45 | Chapters 13-15 | mini resource in a new language |
|
||||
| Thu 10.15-11.45 | Chapters 12,16 | query system for a new domain |
|
||||
| Thu 13.15-14.45 | presentations | explain your own project |
|
||||
|
||||
|
||||
|
||||
The structure of each lecture will be the following:
|
||||
- ca. 75min lecture, going through the book
|
||||
- ca. 15min work on computer, individually or in pairs
|
||||
|
||||
|
||||
In order for this to work out, it is important that enough many
|
||||
have a working GF installation, including the directory
|
||||
[``examples/tutorial`` ../examples/tutorial]. This directory is
|
||||
included in the Darcs version, as well as in the updated binary
|
||||
packages from 12 September.
|
||||
|
||||
|
||||
|
||||
=Purpose=
|
||||
|
||||
[GF http://www.cs.chalmers.se/~aarne/GF/]
|
||||
(Grammatical Framework) is a grammar formalism, i.e. a special-purpose
|
||||
programming language for writing grammars. It is suitable for many
|
||||
natural language processing tasks, in particular,
|
||||
- multilingual applications
|
||||
- systems where grammar-based components are needed for e.g.
|
||||
parsing, translation, or speech recognition
|
||||
|
||||
|
||||
The goal of the course is to develop an understanding of GF and
|
||||
practical skills in using it.
|
||||
|
||||
|
||||
=Contents=
|
||||
|
||||
The course consists of two modules. The first module is a one-week
|
||||
intensive course (during the first intensive week of GSLT), which
|
||||
is as such usable as a one-week intensive course for doctoral studies,
|
||||
if completed with a small course project.
|
||||
|
||||
The second module is a larger programming project, written
|
||||
by each student (possibly working in groups) during the Autumn term.
|
||||
The projects are discussed during the second intensive week of GSLT
|
||||
(see [schedule http://www.gslt.hum.gu.se/courses/schedule.html]),
|
||||
and presented at a date that will be set later.
|
||||
|
||||
The first module goes through the basics of GF, including
|
||||
- using the GF programming language
|
||||
- writing multilingual grammars
|
||||
- using the
|
||||
[GF resource grammar library http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/doc/]
|
||||
- generating speech recognition systems from GF grammars
|
||||
- using embedded grammars as components of software systems
|
||||
|
||||
|
||||
The lectures follow a draft of GF book. It contains a heavily updated
|
||||
version os the
|
||||
[GF Tutorial http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html];
|
||||
thus the on-line tutorial is not adequate for this course. To get the course
|
||||
book, join the course mailing list.
|
||||
|
||||
Those who just want to do the first module will write a simple application
|
||||
as their course work during and after the first intensive week.
|
||||
|
||||
Those who continue with the second module will choose a more substantial
|
||||
project. Possible topics are
|
||||
- building a dialogue system by using GF
|
||||
- implementing a multilingual document generator
|
||||
- experimenting with synthetized multilingual tree banks
|
||||
- extending the GF resource grammar library
|
||||
|
||||
|
||||
|
||||
=Prerequisites=
|
||||
|
||||
Experience in programming. No earlier natural language processing
|
||||
or functional programming experience is necessary.
|
||||
|
||||
The course is thus suitable both for GSLT and NGSLT students,
|
||||
and for graduate students in computer science.
|
||||
|
||||
We will in particular welcome students from the Baltic countries
|
||||
who wish to build resources for their own language in GF.
|
||||
|
||||
699
doc/gf-help.txt
699
doc/gf-help.txt
@@ -1,699 +0,0 @@
|
||||
=GF Command Help=
|
||||
|
||||
Each command has a long and a short name, options, and zero or more
|
||||
arguments. Commands are sorted by functionality. The short name is
|
||||
given first.
|
||||
|
||||
Commands and options marked with * are currently not implemented.
|
||||
|
||||
==Commands that change the state==
|
||||
|
||||
```
|
||||
i, import: i File
|
||||
Reads a grammar from File and compiles it into a GF runtime grammar.
|
||||
Files "include"d in File are read recursively, nubbing repetitions.
|
||||
If a grammar with the same language name is already in the state,
|
||||
it is overwritten - but only if compilation succeeds.
|
||||
The grammar parser depends on the file name suffix:
|
||||
.gf normal GF source
|
||||
.gfc canonical GF
|
||||
.gfr precompiled GF resource
|
||||
.gfcm multilingual canonical GF
|
||||
.gfe example-based grammar files (only with the -ex option)
|
||||
.gfwl multilingual word list (preprocessed to abs + cncs)
|
||||
.ebnf Extended BNF format
|
||||
.cf Context-free (BNF) format
|
||||
.trc TransferCore format
|
||||
options:
|
||||
-old old: parse in GF<2.0 format (not necessary)
|
||||
-v verbose: give lots of messages
|
||||
-s silent: don't give error messages
|
||||
-src from source: ignore precompiled gfc and gfr files
|
||||
-gfc from gfc: use compiled modules whenever they exist
|
||||
-retain retain operations: read resource modules (needed in comm cc)
|
||||
-nocf don't build old-style context-free grammar (default without HOAS)
|
||||
-docf do build old-style context-free grammar (default with HOAS)
|
||||
-nocheckcirc don't eliminate circular rules from CF
|
||||
-cflexer build an optimized parser with separate lexer trie
|
||||
-noemit do not emit code (default with old grammar format)
|
||||
-o do emit code (default with new grammar format)
|
||||
-ex preprocess .gfe files if needed
|
||||
-prob read probabilities from top grammar file (format --# prob Fun Double)
|
||||
-treebank read a treebank file to memory (xml format)
|
||||
flags:
|
||||
-abs set the name used for abstract syntax (with -old option)
|
||||
-cnc set the name used for concrete syntax (with -old option)
|
||||
-res set the name used for resource (with -old option)
|
||||
-path use the (colon-separated) search path to find modules
|
||||
-optimize select an optimization to override file-defined flags
|
||||
-conversion select parsing method (values strict|nondet)
|
||||
-probs read probabilities from file (format (--# prob) Fun Double)
|
||||
-preproc use a preprocessor on each source file
|
||||
-noparse read nonparsable functions from file (format --# noparse Funs)
|
||||
examples:
|
||||
i English.gf -- ordinary import of Concrete
|
||||
i -retain german/ParadigmsGer.gf -- import of Resource to test
|
||||
|
||||
r, reload: r
|
||||
Executes the previous import (i) command.
|
||||
|
||||
rl, remove_language: rl Language
|
||||
Takes away the language from the state.
|
||||
|
||||
e, empty: e
|
||||
Takes away all languages and resets all global flags.
|
||||
|
||||
sf, set_flags: sf Flag*
|
||||
The values of the Flags are set for Language. If no language
|
||||
is specified, the flags are set globally.
|
||||
examples:
|
||||
sf -nocpu -- stop showing CPU time
|
||||
sf -lang=Swe -- make Swe the default concrete
|
||||
|
||||
s, strip: s
|
||||
Prune the state by removing source and resource modules.
|
||||
|
||||
dc, define_command Name Anything
|
||||
Add a new defined command. The Name must star with '%'. Later,
|
||||
if 'Name X' is used, it is replaced by Anything where #1 is replaced
|
||||
by X.
|
||||
Restrictions: Currently at most one argument is possible, and a defined
|
||||
command cannot appear in a pipe.
|
||||
To see what definitions are in scope, use help -defs.
|
||||
examples:
|
||||
dc %tnp p -cat=NP -lang=Eng #1 | l -lang=Swe -- translate NPs
|
||||
%tnp "this man" -- translate and parse
|
||||
|
||||
dt, define_term Name Tree
|
||||
Add a constant for a tree. The constant can later be called by
|
||||
prefixing it with '$'.
|
||||
Restriction: These terms are not yet usable as a subterm.
|
||||
To see what definitions are in scope, use help -defs.
|
||||
examples:
|
||||
p -cat=NP "this man" | dt tm -- define tm as parse result
|
||||
l -all $tm -- linearize tm in all forms
|
||||
```
|
||||
|
||||
==Commands that give information about the state==
|
||||
|
||||
```
|
||||
pg, print_grammar: pg
|
||||
Prints the actual grammar (overridden by the -lang=X flag).
|
||||
The -printer=X flag sets the format in which the grammar is
|
||||
written.
|
||||
N.B. since grammars are compiled when imported, this command
|
||||
generally does not show the grammar in the same format as the
|
||||
source. In particular, the -printer=latex is not supported.
|
||||
Use the command tg -printer=latex File to print the source
|
||||
grammar in LaTeX.
|
||||
options:
|
||||
-utf8 apply UTF8-encoding to the grammar
|
||||
flags:
|
||||
-printer
|
||||
-lang
|
||||
-startcat -- The start category of the generated grammar.
|
||||
Only supported by some grammar printers.
|
||||
examples:
|
||||
pg -printer=cf -- show the context-free skeleton
|
||||
|
||||
pm, print_multigrammar: pm
|
||||
Prints the current multilingual grammar in .gfcm form.
|
||||
(Automatically executes the strip command (s) before doing this.)
|
||||
options:
|
||||
-utf8 apply UTF8 encoding to the tokens in the grammar
|
||||
-utf8id apply UTF8 encoding to the identifiers in the grammar
|
||||
examples:
|
||||
pm | wf Letter.gfcm -- print the grammar into the file Letter.gfcm
|
||||
pm -printer=graph | wf D.dot -- then do 'dot -Tps D.dot > D.ps'
|
||||
|
||||
vg, visualize_graph: vg
|
||||
Show the dependency graph of multilingual grammar via dot and gv.
|
||||
|
||||
po, print_options: po
|
||||
Print what modules there are in the state. Also
|
||||
prints those flag values in the current state that differ from defaults.
|
||||
|
||||
pl, print_languages: pl
|
||||
Prints the names of currently available languages.
|
||||
|
||||
pi, print_info: pi Ident
|
||||
Prints information on the identifier.
|
||||
```
|
||||
|
||||
==Commands that execute and show the session history==
|
||||
|
||||
```
|
||||
eh, execute_history: eh File
|
||||
Executes commands in the file.
|
||||
|
||||
ph, print_history; ph
|
||||
Prints the commands issued during the GF session.
|
||||
The result is readable by the eh command.
|
||||
examples:
|
||||
ph | wf foo.hist" -- save the history into a file
|
||||
```
|
||||
|
||||
|
||||
==Linearization, parsing, translation, and computation==
|
||||
|
||||
```
|
||||
l, linearize: l PattList? Tree
|
||||
Shows all linearization forms of Tree by the actual grammar
|
||||
(which is overridden by the -lang flag).
|
||||
The pattern list has the form [P, ... ,Q] where P,...,Q follow GF
|
||||
syntax for patterns. All those forms are generated that match with the
|
||||
pattern list. Too short lists are filled with variables in the end.
|
||||
Only the -table flag is available if a pattern list is specified.
|
||||
HINT: see GF language specification for the syntax of Pattern and Term.
|
||||
You can also copy and past parsing results.
|
||||
options:
|
||||
-struct bracketed form
|
||||
-table show parameters (not compatible with -record, -all)
|
||||
-record record, i.e. explicit GF concrete syntax term (not compatible with -table, -all)
|
||||
-all show all forms and variants (not compatible with -record, -table)
|
||||
-multi linearize to all languages (can be combined with the other options)
|
||||
flags:
|
||||
-lang linearize in this grammar
|
||||
-number give this number of forms at most
|
||||
-unlexer filter output through unlexer
|
||||
examples:
|
||||
l -lang=Swe -table -- show full inflection table in Swe
|
||||
|
||||
p, parse: p String
|
||||
Shows all Trees returned for String by the actual
|
||||
grammar (overridden by the -lang flag), in the category S (overridden
|
||||
by the -cat flag).
|
||||
options for batch input:
|
||||
-lines parse each line of input separately, ignoring empty lines
|
||||
-all as -lines, but also parse empty lines
|
||||
-prob rank results by probability
|
||||
-cut stop after first lexing result leading to parser success
|
||||
-fail show strings whose parse fails prefixed by #FAIL
|
||||
-ambiguous show strings that have more than one parse prefixed by #AMBIGUOUS
|
||||
options for selecting parsing method:
|
||||
-fcfg parse using a fast variant of MCFG (default is no HOAS in grammar)
|
||||
-old parse using an overgenerating CFG (default if HOAS in grammar)
|
||||
-cfg parse using a much less overgenerating CFG
|
||||
-mcfg parse using an even less overgenerating MCFG
|
||||
Note: the first time parsing with -cfg, -mcfg, and -fcfg may take a long time
|
||||
options that only work for the -old default parsing method:
|
||||
-n non-strict: tolerates morphological errors
|
||||
-ign ignore unknown words when parsing
|
||||
-raw return context-free terms in raw form
|
||||
-v verbose: give more information if parsing fails
|
||||
flags:
|
||||
-cat parse in this category
|
||||
-lang parse in this grammar
|
||||
-lexer filter input through this lexer
|
||||
-parser use this parsing strategy
|
||||
-number return this many results at most
|
||||
examples:
|
||||
p -cat=S -mcfg "jag är gammal" -- parse an S with the MCFG
|
||||
rf examples.txt | p -lines -- parse each non-empty line of the file
|
||||
|
||||
at, apply_transfer: at (Module.Fun | Fun)
|
||||
Transfer a term using Fun from Module, or the topmost transfer
|
||||
module. Transfer modules are given in the .trc format. They are
|
||||
shown by the 'po' command.
|
||||
flags:
|
||||
-lang typecheck the result in this lang instead of default lang
|
||||
examples:
|
||||
p -lang=Cncdecimal "123" | at num2bin | l -- convert dec to bin
|
||||
|
||||
tb, tree_bank: tb
|
||||
Generate a multilingual treebank from a list of trees (default) or compare
|
||||
to an existing treebank.
|
||||
options:
|
||||
-c compare to existing xml-formatted treebank
|
||||
-trees return the trees of the treebank
|
||||
-all show all linearization alternatives (branches and variants)
|
||||
-table show tables of linearizations with parameters
|
||||
-record show linearization records
|
||||
-xml wrap the treebank (or comparison results) with XML tags
|
||||
-mem write the treebank in memory instead of a file TODO
|
||||
examples:
|
||||
gr -cat=S -number=100 | tb -xml | wf tb.xml -- random treebank into file
|
||||
rf tb.xml | tb -c -- compare-test treebank from file
|
||||
rf old.xml | tb -trees | tb -xml -- create new treebank from old
|
||||
|
||||
ut, use_treebank: ut String
|
||||
Lookup a string in a treebank and return the resulting trees.
|
||||
Use 'tb' to create a treebank and 'i -treebank' to read one from
|
||||
a file.
|
||||
options:
|
||||
-assocs show all string-trees associations in the treebank
|
||||
-strings show all strings in the treebank
|
||||
-trees show all trees in the treebank
|
||||
-raw return the lookup result as string, without typechecking it
|
||||
flags:
|
||||
-treebank use this treebank (instead of the latest introduced one)
|
||||
examples:
|
||||
ut "He adds this to that" | l -multi -- use treebank lookup as parser in translation
|
||||
ut -assocs | grep "ComplV2" -- show all associations with ComplV2
|
||||
|
||||
tt, test_tokenizer: tt String
|
||||
Show the token list sent to the parser when String is parsed.
|
||||
HINT: can be useful when debugging the parser.
|
||||
flags:
|
||||
-lexer use this lexer
|
||||
examples:
|
||||
tt -lexer=codelit "2*(x + 3)" -- a favourite lexer for program code
|
||||
|
||||
g, grep: g String1 String2
|
||||
Grep the String1 in the String2. String2 is read line by line,
|
||||
and only those lines that contain String1 are returned.
|
||||
flags:
|
||||
-v return those lines that do not contain String1.
|
||||
examples:
|
||||
pg -printer=cf | grep "mother" -- show cf rules with word mother
|
||||
|
||||
cc, compute_concrete: cc Term
|
||||
Compute a term by concrete syntax definitions. Uses the topmost
|
||||
resource module (the last in listing by command po) to resolve
|
||||
constant names.
|
||||
N.B. You need the flag -retain when importing the grammar, if you want
|
||||
the oper definitions to be retained after compilation; otherwise this
|
||||
command does not expand oper constants.
|
||||
N.B.' The resulting Term is not a term in the sense of abstract syntax,
|
||||
and hence not a valid input to a Tree-demanding command.
|
||||
flags:
|
||||
-table show output in a similar readable format as 'l -table'
|
||||
-res use another module than the topmost one
|
||||
examples:
|
||||
cc -res=ParadigmsFin (nLukko "hyppy") -- inflect "hyppy" with nLukko
|
||||
|
||||
so, show_operations: so Type
|
||||
Show oper operations with the given value type. Uses the topmost
|
||||
resource module to resolve constant names.
|
||||
N.B. You need the flag -retain when importing the grammar, if you want
|
||||
the oper definitions to be retained after compilation; otherwise this
|
||||
command does not find any oper constants.
|
||||
N.B.' The value type may not be defined in a supermodule of the
|
||||
topmost resource. In that case, use appropriate qualified name.
|
||||
flags:
|
||||
-res use another module than the topmost one
|
||||
examples:
|
||||
so -res=ParadigmsFin ResourceFin.N -- show N-paradigms in ParadigmsFin
|
||||
|
||||
t, translate: t Lang Lang String
|
||||
Parses String in Lang1 and linearizes the resulting Trees in Lang2.
|
||||
flags:
|
||||
-cat
|
||||
-lexer
|
||||
-parser
|
||||
examples:
|
||||
t Eng Swe -cat=S "every number is even or odd"
|
||||
|
||||
gr, generate_random: gr Tree?
|
||||
Generates a random Tree of a given category. If a Tree
|
||||
argument is given, the command completes the Tree with values to
|
||||
the metavariables in the tree.
|
||||
options:
|
||||
-prob use probabilities (works for nondep types only)
|
||||
-cf use a very fast method (works for nondep types only)
|
||||
flags:
|
||||
-cat generate in this category
|
||||
-lang use the abstract syntax of this grammar
|
||||
-number generate this number of trees (not impl. with Tree argument)
|
||||
-depth use this number of search steps at most
|
||||
examples:
|
||||
gr -cat=Query -- generate in category Query
|
||||
gr (PredVP ? (NegVG ?)) -- generate a random tree of this form
|
||||
gr -cat=S -tr | l -- gererate and linearize
|
||||
|
||||
gt, generate_trees: gt Tree?
|
||||
Generates all trees up to a given depth. If the depth is large,
|
||||
a small -alts is recommended. If a Tree argument is given, the
|
||||
command completes the Tree with values to the metavariables in
|
||||
the tree.
|
||||
options:
|
||||
-metas also return trees that include metavariables
|
||||
-all generate all (can be infinitely many, lazily)
|
||||
-lin linearize result of -all (otherwise, use pipe to linearize)
|
||||
flags:
|
||||
-depth generate to this depth (default 3)
|
||||
-atoms take this number of atomic rules of each category (default unlimited)
|
||||
-alts take this number of alternatives at each branch (default unlimited)
|
||||
-cat generate in this category
|
||||
-nonub don't remove duplicates (faster, not effective with -mem)
|
||||
-mem use a memorizing algorithm (often faster, usually more memory-consuming)
|
||||
-lang use the abstract syntax of this grammar
|
||||
-number generate (at most) this number of trees (also works with -all)
|
||||
-noexpand don't expand these categories (comma-separated, e.g. -noexpand=V,CN)
|
||||
-doexpand only expand these categories (comma-separated, e.g. -doexpand=V,CN)
|
||||
examples:
|
||||
gt -depth=10 -cat=NP -- generate all NP's to depth 10
|
||||
gt (PredVP ? (NegVG ?)) -- generate all trees of this form
|
||||
gt -cat=S -tr | l -- generate and linearize
|
||||
gt -noexpand=NP | l -mark=metacat -- the only NP is meta, linearized "?0 +NP"
|
||||
gt | l | p -lines -ambiguous | grep "#AMBIGUOUS" -- show ambiguous strings
|
||||
|
||||
ma, morphologically_analyse: ma String
|
||||
Runs morphological analysis on each word in String and displays
|
||||
the results line by line.
|
||||
options:
|
||||
-short show analyses in bracketed words, instead of separate lines
|
||||
-status show just the work at success, prefixed with "*" at failure
|
||||
flags:
|
||||
-lang
|
||||
examples:
|
||||
wf Bible.txt | ma -short | wf Bible.tagged -- analyse the Bible
|
||||
```
|
||||
|
||||
|
||||
==Elementary generation of Strings and Trees==
|
||||
|
||||
```
|
||||
ps, put_string: ps String
|
||||
Returns its argument String, like Unix echo.
|
||||
HINT. The strength of ps comes from the possibility to receive the
|
||||
argument from a pipeline, and altering it by the -filter flag.
|
||||
flags:
|
||||
-filter filter the result through this string processor
|
||||
-length cut the string after this number of characters
|
||||
examples:
|
||||
gr -cat=Letter | l | ps -filter=text -- random letter as text
|
||||
|
||||
pt, put_tree: pt Tree
|
||||
Returns its argument Tree, like a specialized Unix echo.
|
||||
HINT. The strength of pt comes from the possibility to receive
|
||||
the argument from a pipeline, and altering it by the -transform flag.
|
||||
flags:
|
||||
-transform transform the result by this term processor
|
||||
-number generate this number of terms at most
|
||||
examples:
|
||||
p "zero is even" | pt -transform=solve -- solve ?'s in parse result
|
||||
|
||||
* st, show_tree: st Tree
|
||||
Prints the tree as a string. Unlike pt, this command cannot be
|
||||
used in a pipe to produce a tree, since its output is a string.
|
||||
flags:
|
||||
-printer show the tree in a special format (-printer=xml supported)
|
||||
|
||||
wt, wrap_tree: wt Fun
|
||||
Wraps the tree as the sole argument of Fun.
|
||||
flags:
|
||||
-c compute the resulting new tree to normal form
|
||||
|
||||
vt, visualize_tree: vt Tree
|
||||
Shows the abstract syntax tree via dot and gv (via temporary files
|
||||
grphtmp.dot, grphtmp.ps).
|
||||
flags:
|
||||
-c show categories only (no functions)
|
||||
-f show functions only (no categories)
|
||||
-g show as graph (sharing uses of the same function)
|
||||
-o just generate the .dot file
|
||||
examples:
|
||||
p "hello world" | vt -o | wf my.dot ;; ! open -a GraphViz my.dot
|
||||
-- This writes the parse tree into my.dot and opens the .dot file
|
||||
-- with another application without generating .ps.
|
||||
```
|
||||
|
||||
==Subshells==
|
||||
|
||||
```
|
||||
es, editing_session: es
|
||||
Opens an interactive editing session.
|
||||
N.B. Exit from a Fudget session is to the Unix shell, not to GF.
|
||||
options:
|
||||
-f Fudget GUI (necessary for Unicode; only available in X Window System)
|
||||
|
||||
ts, translation_session: ts
|
||||
Translates input lines from any of the actual languages to all other ones.
|
||||
To exit, type a full stop (.) alone on a line.
|
||||
N.B. Exit from a Fudget session is to the Unix shell, not to GF.
|
||||
HINT: Set -parser and -lexer locally in each grammar.
|
||||
options:
|
||||
-f Fudget GUI (necessary for Unicode; only available in X Windows)
|
||||
-lang prepend translation results with language names
|
||||
flags:
|
||||
-cat the parser category
|
||||
examples:
|
||||
ts -cat=Numeral -lang -- translate numerals, show language names
|
||||
|
||||
tq, translation_quiz: tq Lang Lang
|
||||
Random-generates translation exercises from Lang1 to Lang2,
|
||||
keeping score of success.
|
||||
To interrupt, type a full stop (.) alone on a line.
|
||||
HINT: Set -parser and -lexer locally in each grammar.
|
||||
flags:
|
||||
-cat
|
||||
examples:
|
||||
tq -cat=NP TestResourceEng TestResourceSwe -- quiz for NPs
|
||||
|
||||
tl, translation_list: tl Lang Lang
|
||||
Random-generates a list of ten translation exercises from Lang1
|
||||
to Lang2. The number can be changed by a flag.
|
||||
HINT: use wf to save the exercises in a file.
|
||||
flags:
|
||||
-cat
|
||||
-number
|
||||
examples:
|
||||
tl -cat=NP TestResourceEng TestResourceSwe -- quiz list for NPs
|
||||
|
||||
mq, morphology_quiz: mq
|
||||
Random-generates morphological exercises,
|
||||
keeping score of success.
|
||||
To interrupt, type a full stop (.) alone on a line.
|
||||
HINT: use printname judgements in your grammar to
|
||||
produce nice expressions for desired forms.
|
||||
flags:
|
||||
-cat
|
||||
-lang
|
||||
examples:
|
||||
mq -cat=N -lang=TestResourceSwe -- quiz for Swedish nouns
|
||||
|
||||
ml, morphology_list: ml
|
||||
Random-generates a list of ten morphological exercises,
|
||||
keeping score of success. The number can be changed with a flag.
|
||||
HINT: use wf to save the exercises in a file.
|
||||
flags:
|
||||
-cat
|
||||
-lang
|
||||
-number
|
||||
examples:
|
||||
ml -cat=N -lang=TestResourceSwe -- quiz list for Swedish nouns
|
||||
```
|
||||
|
||||
|
||||
==IO-related commands==
|
||||
|
||||
```
|
||||
rf, read_file: rf File
|
||||
Returns the contents of File as a String; error if File does not exist.
|
||||
|
||||
wf, write_file: wf File String
|
||||
Writes String into File; File is created if it does not exist.
|
||||
N.B. the command overwrites File without a warning.
|
||||
|
||||
af, append_file: af File
|
||||
Writes String into the end of File; File is created if it does not exist.
|
||||
|
||||
* tg, transform_grammar: tg File
|
||||
Reads File, parses as a grammar,
|
||||
but instead of compiling further, prints it.
|
||||
The environment is not changed. When parsing the grammar, the same file
|
||||
name suffixes are supported as in the i command.
|
||||
HINT: use this command to print the grammar in
|
||||
another format (the -printer flag); pipe it to wf to save this format.
|
||||
flags:
|
||||
-printer (only -printer=latex supported currently)
|
||||
|
||||
* cl, convert_latex: cl File
|
||||
Reads File, which is expected to be in LaTeX form.
|
||||
|
||||
sa, speak_aloud: sa String
|
||||
Uses the Flite speech generator to produce speech for String.
|
||||
Works for American English spelling.
|
||||
examples:
|
||||
h | sa -- listen to the list of commands
|
||||
gr -cat=S | l | sa -- generate a random sentence and speak it aloud
|
||||
|
||||
si, speech_input: si
|
||||
Uses an ATK speech recognizer to get speech input.
|
||||
flags:
|
||||
-lang: The grammar to use with the speech recognizer.
|
||||
-cat: The grammar category to get input in.
|
||||
-language: Use acoustic model and dictionary for this language.
|
||||
-number: The number of utterances to recognize.
|
||||
|
||||
h, help: h Command?
|
||||
Displays the paragraph concerning the command from this help file.
|
||||
Without the argument, shows the first lines of all paragraphs.
|
||||
options
|
||||
-all show the whole help file
|
||||
-defs show user-defined commands and terms
|
||||
-FLAG show the values of FLAG (works for grammar-independent flags)
|
||||
examples:
|
||||
h print_grammar -- show all information on the pg command
|
||||
|
||||
q, quit: q
|
||||
Exits GF.
|
||||
HINT: you can use 'ph | wf history' to save your session.
|
||||
|
||||
!, system_command: ! String
|
||||
Issues a system command. No value is returned to GF.
|
||||
example:
|
||||
! ls
|
||||
|
||||
?, system_command: ? String
|
||||
Issues a system command that receives its arguments from GF pipe
|
||||
and returns a value to GF.
|
||||
example:
|
||||
h | ? 'wc -l' | p -cat=Num
|
||||
```
|
||||
|
||||
|
||||
==Flags==
|
||||
|
||||
The availability of flags is defined separately for each command.
|
||||
```
|
||||
-cat, category in which parsing is performed.
|
||||
The default is S.
|
||||
|
||||
-depth, the search depth in e.g. random generation.
|
||||
The default depends on application.
|
||||
|
||||
-filter, operation performed on a string. The default is identity.
|
||||
-filter=identity no change
|
||||
-filter=erase erase the text
|
||||
-filter=take100 show the first 100 characters
|
||||
-filter=length show the length of the string
|
||||
-filter=text format as text (punctuation, capitalization)
|
||||
-filter=code format as code (spacing, indentation)
|
||||
|
||||
-lang, grammar used when executing a grammar-dependent command.
|
||||
The default is the last-imported grammar.
|
||||
|
||||
-language, voice used by Festival as its --language flag in the sa command.
|
||||
The default is system-dependent.
|
||||
|
||||
-length, the maximum number of characters shown of a string.
|
||||
The default is unlimited.
|
||||
|
||||
-lexer, tokenization transforming a string into lexical units for a parser.
|
||||
The default is words.
|
||||
-lexer=words tokens are separated by spaces or newlines
|
||||
-lexer=literals like words, but GF integer and string literals recognized
|
||||
-lexer=vars like words, but "x","x_...","$...$" as vars, "?..." as meta
|
||||
-lexer=chars each character is a token
|
||||
-lexer=code use Haskell's lex
|
||||
-lexer=codevars like code, but treat unknown words as variables, ?? as meta
|
||||
-lexer=textvars like text, but treat unknown words as variables, ?? as meta
|
||||
-lexer=text with conventions on punctuation and capital letters
|
||||
-lexer=codelit like code, but treat unknown words as string literals
|
||||
-lexer=textlit like text, but treat unknown words as string literals
|
||||
-lexer=codeC use a C-like lexer
|
||||
-lexer=ignore like literals, but ignore unknown words
|
||||
-lexer=subseqs like ignore, but then try all subsequences from longest
|
||||
|
||||
-number, the maximum number of generated items in a list.
|
||||
The default is unlimited.
|
||||
|
||||
-optimize, optimization on generated code.
|
||||
The default is share for concrete, none for resource modules.
|
||||
Each of the flags can have the suffix _subs, which performs
|
||||
common subexpression elimination after the main optimization.
|
||||
Thus, -optimize=all_subs is the most aggressive one. The _subs
|
||||
strategy only works in GFC, and applies therefore in concrete but
|
||||
not in resource modules.
|
||||
-optimize=share share common branches in tables
|
||||
-optimize=parametrize first try parametrize then do share with the rest
|
||||
-optimize=values represent tables as courses-of-values
|
||||
-optimize=all first try parametrize then do values with the rest
|
||||
-optimize=none no optimization
|
||||
|
||||
-parser, parsing strategy. The default is chart. If -cfg or -mcfg are
|
||||
selected, only bottomup and topdown are recognized.
|
||||
-parser=chart bottom-up chart parsing
|
||||
-parser=bottomup a more up to date bottom-up strategy
|
||||
-parser=topdown top-down strategy
|
||||
-parser=old an old bottom-up chart parser
|
||||
|
||||
-printer, format in which the grammar is printed. The default is
|
||||
gfc. Those marked with M are (only) available for pm, the rest
|
||||
for pg.
|
||||
-printer=gfc GFC grammar
|
||||
-printer=gf GF grammar
|
||||
-printer=old old GF grammar
|
||||
-printer=cf context-free grammar, with profiles
|
||||
-printer=bnf context-free grammar, without profiles
|
||||
-printer=lbnf labelled context-free grammar for BNF Converter
|
||||
-printer=plbnf grammar for BNF Converter, with precedence levels
|
||||
*-printer=happy source file for Happy parser generator (use lbnf!)
|
||||
-printer=haskell abstract syntax in Haskell, with transl to/from GF
|
||||
-printer=haskell_gadt abstract syntax GADT in Haskell, with transl to/from GF
|
||||
-printer=morpho full-form lexicon, long format
|
||||
*-printer=latex LaTeX file (for the tg command)
|
||||
-printer=fullform full-form lexicon, short format
|
||||
*-printer=xml XML: DTD for the pg command, object for st
|
||||
-printer=old old GF: file readable by GF 1.2
|
||||
-printer=stat show some statistics of generated GFC
|
||||
-printer=probs show probabilities of all functions
|
||||
-printer=gsl Nuance GSL speech recognition grammar
|
||||
-printer=jsgf Java Speech Grammar Format
|
||||
-printer=jsgf_sisr_old Java Speech Grammar Format with semantic tags in
|
||||
SISR WD 20030401 format
|
||||
-printer=srgs_abnf SRGS ABNF format
|
||||
-printer=srgs_abnf_non_rec SRGS ABNF format, without any recursion.
|
||||
-printer=srgs_abnf_sisr_old SRGS ABNF format, with semantic tags in
|
||||
SISR WD 20030401 format
|
||||
-printer=srgs_xml SRGS XML format
|
||||
-printer=srgs_xml_non_rec SRGS XML format, without any recursion.
|
||||
-printer=srgs_xml_prob SRGS XML format, with weights
|
||||
-printer=srgs_xml_sisr_old SRGS XML format, with semantic tags in
|
||||
SISR WD 20030401 format
|
||||
-printer=vxml Generate a dialogue system in VoiceXML.
|
||||
-printer=slf a finite automaton in the HTK SLF format
|
||||
-printer=slf_graphviz the same automaton as slf, but in Graphviz format
|
||||
-printer=slf_sub a finite automaton with sub-automata in the
|
||||
HTK SLF format
|
||||
-printer=slf_sub_graphviz the same automaton as slf_sub, but in
|
||||
Graphviz format
|
||||
-printer=fa_graphviz a finite automaton with labelled edges
|
||||
-printer=regular a regular grammar in a simple BNF
|
||||
-printer=unpar a gfc grammar with parameters eliminated
|
||||
-printer=functiongraph abstract syntax functions in 'dot' format
|
||||
-printer=typegraph abstract syntax categories in 'dot' format
|
||||
-printer=transfer Transfer language datatype (.tr file format)
|
||||
-printer=cfg-prolog M cfg in prolog format (also pg)
|
||||
-printer=gfc-prolog M gfc in prolog format (also pg)
|
||||
-printer=gfcm M gfcm file (default for pm)
|
||||
-printer=graph M module dependency graph in 'dot' (graphviz) format
|
||||
-printer=header M gfcm file with header (for GF embedded in Java)
|
||||
-printer=js M JavaScript type annotator and linearizer
|
||||
-printer=mcfg-prolog M mcfg in prolog format (also pg)
|
||||
-printer=missing M the missing linearizations of each concrete
|
||||
|
||||
-startcat, like -cat, but used in grammars (to avoid clash with keyword cat)
|
||||
|
||||
-transform, transformation performed on a syntax tree. The default is identity.
|
||||
-transform=identity no change
|
||||
-transform=compute compute by using definitions in the grammar
|
||||
-transform=nodup return the term only if it has no constants duplicated
|
||||
-transform=nodupatom return the term only if it has no atomic constants duplicated
|
||||
-transform=typecheck return the term only if it is type-correct
|
||||
-transform=solve solve metavariables as derived refinements
|
||||
-transform=context solve metavariables by unique refinements as variables
|
||||
-transform=delete replace the term by metavariable
|
||||
|
||||
-unlexer, untokenization transforming linearization output into a string.
|
||||
The default is unwords.
|
||||
-unlexer=unwords space-separated token list (like unwords)
|
||||
-unlexer=text format as text: punctuation, capitals, paragraph <p>
|
||||
-unlexer=code format as code (spacing, indentation)
|
||||
-unlexer=textlit like text, but remove string literal quotes
|
||||
-unlexer=codelit like code, but remove string literal quotes
|
||||
-unlexer=concat remove all spaces
|
||||
-unlexer=bind like identity, but bind at "&+"
|
||||
|
||||
-mark, marking of parts of tree in linearization. The default is none.
|
||||
-mark=metacat append "+CAT" to every metavariable, showing its category
|
||||
-mark=struct show tree structure with brackets
|
||||
-mark=java show tree structure with XML tags (used in gfeditor)
|
||||
|
||||
-coding, Some grammars are in UTF-8, some in isolatin-1.
|
||||
If the letters ä (a-umlaut) and ö (o-umlaut) look strange, either
|
||||
change your terminal to isolatin-1, or rewrite the grammar with
|
||||
'pg -utf8'.
|
||||
```
|
||||
@@ -1,865 +0,0 @@
|
||||
<html>
|
||||
<body bgcolor="#FFFFFF" text="#000000" >
|
||||
<center>
|
||||
<IMG SRC="gf-logo.gif">
|
||||
|
||||
|
||||
<h1>Grammatical Framework History of Changes</h1>
|
||||
|
||||
|
||||
|
||||
Changes in functionality since May 17, 2005, release of GF Version 2.2
|
||||
|
||||
</center>
|
||||
|
||||
<p>
|
||||
|
||||
25/6 (BB)
|
||||
Added new speech recognition grammar printers for non-recursive SRGS grammars,
|
||||
as used by Nuance Recognizer 9.0. Try <tt>pg -printer=srgs_xml_non_rec</tt>
|
||||
or <tt>pg -printer=srgs_abnf_non_rec</tt>.
|
||||
|
||||
<p>
|
||||
|
||||
19/6 (AR)
|
||||
Extended the functor syntax (<tt>with</tt> modules) so that the functor can have
|
||||
restricted import and a module body (whose function is normally to complete restricted
|
||||
import). Thus the following format is now possible:
|
||||
<pre>
|
||||
concrete C of A = E ** CI - [f,g] with (...) ** open R in {...}
|
||||
</pre>
|
||||
At the same time, the possibility of an empty module body was added to other modules
|
||||
for symmetry. This can be useful for "proxy modules" that just collect other modules
|
||||
without adding anything, e.g.
|
||||
<pre>
|
||||
abstract Math = Arithmetic, Geometry ;
|
||||
</pre>
|
||||
|
||||
|
||||
<p>
|
||||
|
||||
|
||||
18/6 (AR)
|
||||
Added a warning for clashing constants. A constant coming from multiple opened modules
|
||||
was interpreted as "the first" found by the compiler, which was a source of difficult
|
||||
errors. Clashing is officially forbidden, but we chose to give a warning instead of
|
||||
raising an error to begin with (in version 2.8).
|
||||
|
||||
<p>
|
||||
|
||||
30/1/2007 (AR)
|
||||
Semantics of variants fixed for complex types. Officially, it was only
|
||||
defined for basic types (Str and parameters). When used for records, results were
|
||||
multiplicative, which was nor usable. But now variants should work for any type.
|
||||
|
||||
<p>
|
||||
|
||||
<hr>
|
||||
|
||||
<p>
|
||||
|
||||
22/12 (AR) <b>Release of GF version 2.7</b>.
|
||||
|
||||
<p>
|
||||
|
||||
21/12 (AR)
|
||||
Overloading rules for GF version 2.7:
|
||||
<ol>
|
||||
<li> If a unique instance is found by exact match with argument types,
|
||||
that instance is used.
|
||||
<li> Otherwise, if exact match with the expected value type gives a
|
||||
uniques instance, that instance is used.
|
||||
<li> Otherwise, if among possible instances only one returns a non-function
|
||||
type, that instance is used, but a warning is issued.
|
||||
<li> Otherwise, an error results, and the list of possible instances is shown.
|
||||
</ol>
|
||||
These rules are still experimental, but all future developments will guarantee
|
||||
that their type-correct use will work. Rule (3) is only needed because the
|
||||
current type checker does not always know an expected type. It can give
|
||||
an incorrect result which is captured later in the compilation. To be noticed,
|
||||
in particular, is that exact match is required. Match by subtyping will be
|
||||
investigated later.
|
||||
|
||||
<p>
|
||||
|
||||
21/12 (BB) Java Speech Grammar Format with SISR tags can now be generated.
|
||||
Use <tt>pg -printer=jsgf_sisr_old</tt>. The SISR tags are in Working Draft
|
||||
20030401 format, which is supported by the OptimTALK VoiceXML interpreter
|
||||
and the IBM XHTML+Voice implementation use by the Opera web browser.
|
||||
|
||||
<p>
|
||||
|
||||
21/12 (BB) <a name="voicexml">
|
||||
VoiceXML 2.0 dialog systems can now be generated from GF grammars.
|
||||
Use <tt>pg -printer=vxml</tt>.
|
||||
|
||||
<p>
|
||||
|
||||
21/12 (BB) <a name="javascript">
|
||||
JavaScript code for linearization and type annotation can now be
|
||||
generated from a multilingual GF grammar. Use <tt>pm -printer=js</tt>.
|
||||
|
||||
|
||||
<p>
|
||||
|
||||
5/12 (BB) <a name="gfcc2c">
|
||||
A new tool for generating C linearization libraries
|
||||
from a GFCC file. <tt>make gfcc2c</tt> in <tt>src</tt>
|
||||
compiles the tool. The generated
|
||||
code includes header files in <tt>lib/c</tt> and should be linked
|
||||
against <tt>libgfcc.a</tt> in <tt>lib/c</tt>. For an example of
|
||||
using the generated code, see <tt>src/tools/c/examples/bronzeage</tt>.
|
||||
<tt>make</tt> in that directory generates a GFCC file, then generates
|
||||
C code from that, and then compiles a program <tt>bronzeage-test</tt>.
|
||||
The <tt>main</tt> function for that program is defined in
|
||||
<tt>bronzeage-test.c</tt>.
|
||||
|
||||
|
||||
<p>
|
||||
|
||||
20/11 (AR) Type error messages in concrete syntax are printed with a
|
||||
heuristic where a type of the form <tt>{... ; lock_C : {} ; ...}</tt>
|
||||
is printed as <tt>C</tt>. This gives more readable error messages, but
|
||||
can produce wrong results if lock fields are hand-written or if subtypes
|
||||
of lock-fielded categories are used.
|
||||
|
||||
<p>
|
||||
|
||||
17/11 (AR) <a name="overloading">
|
||||
Operation overloading: an <tt>oper</tt> can have many types,
|
||||
from which one is picked at compile time. The types must have different
|
||||
argument lists. Exact match with the arguments given to the <tt>oper</tt>
|
||||
is required. An example is given in
|
||||
<a href="../lib/resource-1.0/doc/gfdoc/Constructors.gf"><tt>Constructors.gf</tt></a>.
|
||||
The purpose of overloading is to make libraries easier to use, since
|
||||
only one name for each grammatical operation is needed: predication, modification,
|
||||
coordination, etc. The concrete syntax is, at this experimental level, not
|
||||
extended but relies on using a record with the function name repeated
|
||||
as label name (see the example). The treatment of overloading is inspired
|
||||
by C++, and was first suggested by Björn Nringert.
|
||||
|
||||
<p>
|
||||
|
||||
|
||||
3/10 (AR) A new low-level format <tt>gfcc</tt> ("Canonical Canonical GF").
|
||||
It is going to replace the <tt>gfc</tt> format later, but is already now
|
||||
an efficient format for multilingual generation.
|
||||
See <a href="../src/GF/Canon/GFCC/doc/gfcc.html">GFCC document</a>
|
||||
for more information.
|
||||
|
||||
<p>
|
||||
|
||||
1/9 (AR) New way for managing errors in grammar compilation:
|
||||
<pre>
|
||||
Predef.Error : Type ;
|
||||
Predef.error : Str -> Predef.Error ;
|
||||
</pre>
|
||||
Denotationally, <tt>Error</tt> is the empty type and thus a
|
||||
subtype of any other types: it can be used anywhere. But the
|
||||
<tt>error</tt> function is not canonical. Hence the compilation
|
||||
is interrupted when <tt>(error s)</tt> is translated to GFC, and
|
||||
the message <tt>s</tt> is emitted. An example use is given in
|
||||
<tt>english/ParadigmsEng.gf</tt>:
|
||||
<pre>
|
||||
regDuplV : Str -> V ;
|
||||
regDuplV fit =
|
||||
case last fit of {
|
||||
("a" | "e" | "i" | "o" | "u" | "y") =>
|
||||
Predef.error (["final duplication makes no sense for"] ++ fit) ;
|
||||
t =>
|
||||
let fitt = fit + t in
|
||||
mkV fit (fit + "s") (fitt + "ed") (fitt + "ed") (fitt + "ing")
|
||||
} ;
|
||||
</pre>
|
||||
This function thus cannot be applied to a stem ending with a vowel,
|
||||
which is exactly what we want. In future, it may be good to add similar
|
||||
checks to all morphological paradigms in the resource.
|
||||
|
||||
|
||||
<p>
|
||||
|
||||
16/8 (AR) New generation algorithm: slower but works with less
|
||||
memory. Default of <tt>gt</tt>; use <tt>gt -mem</tt> for the old
|
||||
algorithm. The new option <tt>gt -all</tt> lazily generates all
|
||||
trees until interrupted. It cannot be piped to other GF commands,
|
||||
hence use <tt>gt -all -lin</tt> to print out linearized strings
|
||||
rather than trees.
|
||||
|
||||
<hr>
|
||||
|
||||
|
||||
22/6 (AR) <b>Release of GF version 2.6</b>.
|
||||
|
||||
<p>
|
||||
|
||||
20/6 (AR) The FCFG parser is know the default, as it even handles literals.
|
||||
The old default can be selected by <tt>p -old</tt>. Since
|
||||
FCFG does not support variable bindings, <tt>-old</tt> is automatically
|
||||
selected if the grammar has bindings - and unless the <tt>-fcfg</tt> flag
|
||||
is used.
|
||||
|
||||
<p>
|
||||
|
||||
17/6 (AR) The FCFG parser is now the recommended method for parsing
|
||||
heavy grammars such as the resource grammars. It does not yet support
|
||||
literals and variable bindings.
|
||||
|
||||
<p>
|
||||
|
||||
1/6 (AR) Added the FCFG parser written by Krasimir Angelov. Invoked by
|
||||
<tt>p -fcfg</tt>. This parser is as general as MCFG but faster.
|
||||
It needs more testing and debugging.
|
||||
|
||||
<p>
|
||||
|
||||
1/6 (AR) The command <tt>r = reload</tt> repeats the latest
|
||||
<tt>i = import</tt> command.
|
||||
|
||||
<p>
|
||||
|
||||
30/5 (AR) It is now possible to use the flags <tt>-all, -table, -record</tt>
|
||||
in combination with <tt>l -multi</tt>, and also with <tt>tb</tt>.
|
||||
|
||||
<p>
|
||||
|
||||
18/5 (AR) Introduced a wordlist format <tt>gfwl</tt> for
|
||||
quick creation of language exercises and (in future) multilingual lexica.
|
||||
The format is now very simple:
|
||||
<pre>
|
||||
# Svenska - Franska - Finska
|
||||
berg - montagne - vuori
|
||||
klättra - grimper / escalader - kiivetä / kiipeillä
|
||||
</pre>
|
||||
but can be extended to cover paradigm functions in addition to just
|
||||
words.
|
||||
|
||||
<p>
|
||||
|
||||
3/4 (AR) The predefined abstract syntax type <tt>Int</tt> now has two
|
||||
inherent parameters indicating its last digit and its size. The (hard-coded)
|
||||
linearization type is
|
||||
<pre>
|
||||
{s : Str ; size : Predef.Ints 1 ; last : Predef.Ints 9}
|
||||
</pre>
|
||||
The <tt>size</tt> field has value <tt>1</tt> for integers greater than 9, and
|
||||
value <tt>0</tt> for other integers (which are never negative). This parameter can
|
||||
be used e.g. in calculating number agreement,
|
||||
<pre>
|
||||
Risala i = {s = i.s ++ table (Predef.Ints 1 * Predef.Ints 9) {
|
||||
<0,1> => "risalah" ;
|
||||
<0,2> => "risalatan" ;
|
||||
<0,_> | <1,0> => "rasail" ;
|
||||
_ => "risalah"
|
||||
} ! <i.size,i.last>
|
||||
} ;
|
||||
</pre>
|
||||
Notice that the table has to be typed explicitly for <tt>Ints k</tt>,
|
||||
because type inference would otherwise return <tt>Int</tt> and therefore
|
||||
fail to expand the table.
|
||||
|
||||
|
||||
<p>
|
||||
|
||||
31/3 (AR) Added flags and options to some commands, to help generation:
|
||||
<ul>
|
||||
<li> <tt>gt -noexpand=NP,V,TV</tt> does not expand these categories,
|
||||
but only generates metavariables for them.
|
||||
<li> <tt>gt -doexpand=NP,V,TV</tt> only expands these categories,
|
||||
and generates metavariables for others.
|
||||
<li> <tt>gr -cf</tt> has the same flags.
|
||||
<li> <tt>l -mark=metacat</tt> marks the metavariables with their categories.
|
||||
<li> <tt>p -fail</tt> marks with <tt>#FAIL</tt> strings that have no parse.
|
||||
<li> <tt>p -ambiguous</tt> marks as <tt>#AMBIGUOUS</tt>
|
||||
strings that have more than one parse.
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
|
||||
<hr>
|
||||
|
||||
21/3/2006 <b>Release of GF 2.5</b>.
|
||||
|
||||
<p>
|
||||
|
||||
16/3 (AR) Added two flag values to <tt>pt -transform=X</tt>:
|
||||
<tt>nodup</tt> which excludes terms where a constant is duplicated,
|
||||
and
|
||||
<tt>nodupatom</tt> which excludes terms where an atomic constant is duplicated.
|
||||
The latter, in particular, is useful as a filter in generation:
|
||||
<pre>
|
||||
gt -cat=Cl | pt -transform=nodupatom
|
||||
</pre>
|
||||
This gives a corpus where words don't (usually) occur twice in the same clause.
|
||||
|
||||
<p>
|
||||
|
||||
6/3 (AR) Generalized the <tt>gfe</tt> file format in two ways:
|
||||
<ol>
|
||||
<li> Use the real grammar parser, hence <tt>(in M.C "foo")</tt> expressions
|
||||
may occur anywhere. But the <i>ad hoc</i> word substitution syntax is
|
||||
abandoned: ordinary <tt>let</tt> (and <tt>where</tt>) expressions
|
||||
can now be used instead.
|
||||
<li> The resource may now be a treebank, not just a grammar. Parsing
|
||||
is thus replaced by treebank lookup, which in most cases is faster.
|
||||
</ol>
|
||||
A minor novelty is that the <tt>--# -resource=FILE</tt> flag can now be
|
||||
relative to <tt>GF_LIB_PATH</tt>, both for grammars and treebanks.
|
||||
The flag <tt> --# -treebank=IDENT</tt> gives the language whose treebank
|
||||
entries are used, in case of a multilingual treebank.
|
||||
|
||||
<p>
|
||||
|
||||
4/3 (AR) Added command <tt>use_treebank = ut</tt> for lookup in a treebank.
|
||||
This command can be used as a fast substitute for parsing, but also as a
|
||||
way to browse treebanks.
|
||||
<pre>
|
||||
ut "He adds this to that" | l -multi -- use treebank lookup as parser in translation
|
||||
ut -assocs | grep "ComplV2" -- show all associations with ComplV2
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
|
||||
3/3 (AR) Added option <tt>-treebank</tt> to the <tt>i</tt> command. This adds treebanks to
|
||||
the shell state. The possible file formats are
|
||||
<ol>
|
||||
<li> XML file with a multilingual treebank, produced by <tt>tb -xml</tt>
|
||||
<li> tab-organized text file with a unilingual treebank, produced by <tt>ut -assocs</tt>
|
||||
</ol>
|
||||
Notice that the treebanks in shell state are unilingual, and have strings as keys.
|
||||
Multilingual treebanks have trees as keys. In case 1, one unilingual treebank per
|
||||
language is built in the shell state.
|
||||
|
||||
|
||||
<p>
|
||||
|
||||
1/3 (AR) Added option <tt>-trees</tt> to the command <tt>tree_bank = tb</tt>.
|
||||
By this option, the command just returns the trees in the treebank. It can be
|
||||
used for producing new treebanks with the same trees:
|
||||
<pre>
|
||||
rf old.xml | tb -trees | tb -xml | wf new.xml
|
||||
</pre>
|
||||
Recall that only treebanks in the XML format can be read with the <tt>-trees</tt>
|
||||
and <tt>-c</tt> flags.
|
||||
|
||||
<p>
|
||||
|
||||
1/3 (AR) A <tt>.gfe</tt> file can have a <tt>--# -path=PATH</tt> on its
|
||||
second line. The file given on the first line (<tt>--# -resource=FILE</tt>)
|
||||
is then read w.r.t. this path. This is useful if the resource file has
|
||||
no path itself, which happens when it is gfc-only.
|
||||
|
||||
<p>
|
||||
|
||||
25/2 (AR) The flag <tt>preproc</tt> of the <tt>i</tt> command (and thereby
|
||||
to <tt>gf</tt> itself) causes GF to apply a preprocessor to each sourcefile
|
||||
it reads.
|
||||
|
||||
<p>
|
||||
|
||||
8/2 (AR) The command <tt>tb = tree_bank</tt> for creating and testing against
|
||||
multilingual treebanks. Example uses:
|
||||
<pre>
|
||||
gr -cat=S -number=100 | tb -xml | wf tb.xml -- random treebank into file
|
||||
rf tb.txt | tb -c -- read comparison treebank from file
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
|
||||
10/1 (AR) Forbade variable binding inside negation and Kleene star
|
||||
patterns.
|
||||
|
||||
<p>
|
||||
|
||||
7/1 (AR) Full set of regular expression patterns, with
|
||||
as-patterns to enable variable bindings to matched expressions:
|
||||
<ul>
|
||||
<li> <i>p</i> <tt>+</tt> <i>q</i> : token consisting of <i>p</i> followed by <i>q</i>
|
||||
<li> <i>p</i> <tt>*</tt> : token <i>p</i> repeated 0 or more times
|
||||
(max the length of the strin to be matched)
|
||||
<li> <tt>-</tt> <i>p</i> : matches anything that <i>p</i> does not match
|
||||
<li> <i>x</i> <tt>@</tt> <i>p</i> : bind to <i>x</i> what <i>p</i> matches
|
||||
<li> <i>p</i> <tt>|</tt> <i>q</i> : matches what either <i>p</i> or <i>q</i> matches
|
||||
</ul>
|
||||
The last three apply to all types of patterns, the first two only to token strings.
|
||||
Example: plural formation in Swedish 2nd declension
|
||||
(<i>pojke-pojkar, nyckel-nycklar, seger-segrar, bil-bilar</i>):
|
||||
<pre>
|
||||
plural2 : Str -> Str = \w -> case w of {
|
||||
pojk + "e" => pojk + "ar" ;
|
||||
nyck + "e" + l@("l" | "r" | "n") => nyck + l + "ar" ;
|
||||
bil => bil + "ar"
|
||||
} ;
|
||||
</pre>
|
||||
Semantics: variables are always bound to the <b>first match</b>, in the sequence defined
|
||||
as the list <tt>Match p v</tt> as follows:
|
||||
<pre>
|
||||
Match (p1|p2) v = Match p1 v ++ Match p2 v
|
||||
Match (p1+p2) s = [Match p1 s1 ++ Match p2 s2 | i <- [0..length s], (s1,s2) = splitAt i s]
|
||||
Match p* s = Match "" s ++ Match p s ++ Match (p + p) s ++ ...
|
||||
Match c v = [[]] if c == v -- for constant patterns c
|
||||
Match x v = [[(x,v)]] -- for variable patterns x
|
||||
Match x@p v = [[(x,v)]] + M if M = Match p v /= []
|
||||
Match p v = [] otherwise -- failure
|
||||
</pre>
|
||||
Examples:
|
||||
<ul>
|
||||
<li> <tt>x + "e" + y</tt> matches <tt>"peter"</tt> with <tt>x = "p", y = "ter"</tt>
|
||||
<li> <tt>x@("foo"*)</tt> matches any token with <tt>x = ""</tt>
|
||||
<li> <tt>x + y@("er"*)</tt> matches <tt>"burgerer"</tt> with <tt>x = "burg", y = "erer"</tt>
|
||||
</ul>
|
||||
<p>
|
||||
|
||||
6/1 (AR) Concatenative string patterns to help morphology definitions...
|
||||
This can be seen as a step towards regular expression string patterns.
|
||||
The natural notation <tt>p1 + p2</tt> will be considered later.
|
||||
<b>Note</b>. This was done on 7/1.
|
||||
|
||||
<p>
|
||||
|
||||
5/1/2006 (BB) New grammar printers <tt>slf_sub</tt> and <tt>slf_sub_graphviz</tt>
|
||||
for creating SLF networks with sub-automata.
|
||||
|
||||
<hr>
|
||||
|
||||
22/12 <b>Release of GF 2.4</b>.
|
||||
|
||||
<p>
|
||||
|
||||
21/12 (AR) It now works to parse escaped string literals from command
|
||||
line, and also string literals with spaces:
|
||||
<pre>
|
||||
gf examples/tram0/TramEng.gf
|
||||
> p -lexer=literals "I want to go to \"Gustaf Adolfs torg\" ;"
|
||||
QInput (GoTo (DestNamed "Gustaf Adolfs torg"))
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
|
||||
20/12 (AR) Support for full disjunctive patterns (<tt>P|Q</tt>) i.e.
|
||||
not just on top level.
|
||||
|
||||
<p>
|
||||
|
||||
14/12 (BB) The command <tt>si</tt> (<tt>speech_input</tt>) which creates
|
||||
a speech recognizer from a grammar for English and admits speech input
|
||||
of strings has been added. The command uses an
|
||||
<a href="http://htk.eng.cam.ac.uk/develop/atk.shtml">ATK</a> recognizer and
|
||||
creates a recognition
|
||||
network which accepts strings in the currently active grammar.
|
||||
In order to use the <tt>si</tt> command,
|
||||
you need to install the
|
||||
<a href="http://www.cs.chalmers.se/~bringert/darcs/atkrec/">atkrec library</a>
|
||||
and configure GF with <tt>./configure --with-atk</tt> before compiling.
|
||||
You need to set two environment variables for the <tt>si</tt> command to
|
||||
work. <tt>ATK_HOME</tt> should contain the path to your copy of ATK
|
||||
and <tt>GF_ATK_CFG</tt> should contain the path to your GF ATK configuration
|
||||
file. A default version of this file can be found in
|
||||
<tt>GF/src/gf_atk.cfg</tt>.
|
||||
|
||||
|
||||
<p>
|
||||
|
||||
11/12 (AR) Parsing of float literals now possible in object language.
|
||||
Use the flag <tt>lexer=literals</tt>.
|
||||
|
||||
<p>
|
||||
|
||||
6/12 (AR) Accept <tt>param</tt> and <tt>oper</tt> definitions in
|
||||
<tt>concrete</tt> modules. The definitions are just inlined in the
|
||||
current module and not inherited. The purpose is to support rapid
|
||||
prototyping of grammars.
|
||||
|
||||
<p>
|
||||
|
||||
2/12 (AR) The built-in type <tt>Float</tt> added to abstract syntax (and
|
||||
resource). Values are stored as Haskell's <tt>Double</tt> precision
|
||||
floats. For the syntax of float literals, see BNFC document.
|
||||
NB: some bug still prevents parsing float literals in object
|
||||
languages. <b>Bug fixed 11/12.</b>
|
||||
|
||||
<p>
|
||||
|
||||
1/12 (BB,AR) The command <tt>at = apply_transfer</tt>, which applies
|
||||
a transfer function to a term. This is used for noncompositional
|
||||
translation. Transfer functions are defined in a special transfer
|
||||
language (file suffix <tt>.tr</tt>), which is compiled into a
|
||||
run-time transfer core language (file suffix <tt>.trc</tt>).
|
||||
The compiler is included in <tt>GF/transfer</tt>. The following is
|
||||
a complete example of how to try out transfer:
|
||||
<pre>
|
||||
% cd GF/transfer
|
||||
% make -- compile the trc compiler
|
||||
% cd examples -- GF/transfer/examples
|
||||
% ../compile_to_core -i../lib numerals.tr
|
||||
% mv numerals.trc ../../examples/numerals
|
||||
% cd ../../examples/numerals -- GF/examples/numerals
|
||||
% gf
|
||||
> i decimal.gf
|
||||
> i BinaryDigits.gf
|
||||
> i numerals.trc
|
||||
> p -lang=Cncdecimal "123" | at num2bin | l
|
||||
1 0 0 1 1 0 0 1 1 1 0
|
||||
</pre>
|
||||
Other relevant commands are:
|
||||
<ul>
|
||||
<li> <tt>i file.trc</tt>: import a transfer module
|
||||
<li> <tt>pg -printer=transfer</tt>: create a syntax datatype in <tt>.tr</tt> format
|
||||
</ul>
|
||||
For more information on the commands, see <tt>help</tt>. Documentation on
|
||||
the transfer language: to appear.
|
||||
|
||||
<p>
|
||||
|
||||
17/11 (AR) Made it possible for lexers to be nondeterministic.
|
||||
Now with a simple-minded implementation that the parser is sent
|
||||
each lexing result in turn. The option <tt>-cut</tt> is used for
|
||||
breaking after first lexing leading to successful parse. The only
|
||||
nondeterministic lexer right now is <tt>-lexer=subseqs</tt>, which
|
||||
first filters with <tt>-lexer=ignore</tt> (dropping words neither in
|
||||
the grammar nor literals) and then starts ignoring other words from
|
||||
longest to shortest subsequence. This is usable for parser tasks
|
||||
of keyword spotting type, but expensive (2<sup>n</sup>) in long input.
|
||||
A smarter implementation is therefore desirable.
|
||||
|
||||
<p>
|
||||
|
||||
14/11 (AR) Functions can be made unparsable (or "internal" as
|
||||
in BNFC). This is done by <tt>i -noparse=file</tt>, where
|
||||
the nonparsable functions are given in <tt>file</tt> using the
|
||||
line format <tt>--# noparse Funs</tt>. This can be used e.g. to
|
||||
rule out expensive parsing rules. It is used in
|
||||
<tt>lib/resource/abstract/LangVP.gf</tt> to get parse values
|
||||
structured with <tt>VP</tt>, which is obtained via transfer.
|
||||
So far only the default (= old) parser generator supports this.
|
||||
|
||||
<p>
|
||||
|
||||
14/11 (AR) Removed the restrictions how a lincat may look like.
|
||||
Now any record type that has a value in GFC (i.e. without any
|
||||
functions in it) can be used, e.g. {np : NP ; cn : Bool => CN}.
|
||||
To display linearization values, only <tt>l -record</tt> shows
|
||||
nice results.
|
||||
|
||||
<p>
|
||||
|
||||
9/11 (AR) GF shell state can now have several abstract syntaxes with
|
||||
their associated concrete syntaxes. This allows e.g. parsing with
|
||||
resource while testing an application. One can also have a
|
||||
parse-transfer-lin chain from one abstract syntax to another.
|
||||
|
||||
<p>
|
||||
7/11 (BB) Running commands can now be interrupted with Ctrl-C, without
|
||||
killing the GF process. This feature is not supported on Windows.
|
||||
|
||||
<p>
|
||||
|
||||
1/11 (AR) Yet another method for adding probabilities: append
|
||||
<tt> --# prob Double</tt> to the end of a line defining a function.
|
||||
This can be (1) a <tt>.cf</tt> rule (2) a <tt>fun</tt> rule, or
|
||||
(3) a <tt>lin</tt> rule. The probability is attached to the
|
||||
first identifier on the line.
|
||||
|
||||
<p>
|
||||
1/11 (BB) Added generation of weighted SRGS grammars. The weights
|
||||
are calculated from the function probabilities. The algorithm
|
||||
for calculating the weights is not yet very good.
|
||||
Use <tt>pg -printer=srgs_xml_prob</tt>.
|
||||
|
||||
<p>
|
||||
31/10 (BB) Added option for converting grammars to SRGS grammars in XML format.
|
||||
Use <tt>pg -printer=srgs_xml</tt>.
|
||||
|
||||
<p>
|
||||
|
||||
31/10 (AR) Probabilistic grammars. Probabilities can be used to
|
||||
weight random generation (<tt>gr -prob</tt>) and to rank parse
|
||||
results (<tt>p -prob</tt>). They are read from a separate file
|
||||
(flag <tt>i -probs=File</tt>, format <tt>--# prob Fun Double</tt>)
|
||||
or from the top-level grammar file itself (option <tt>i -prob</tt>).
|
||||
To see the probabilities, use <tt>pg -printer=probs</tt>.
|
||||
<br>
|
||||
As a by-product, the probabilistic random generation algorithm is
|
||||
available for any context-free abstract syntax. Use the flag
|
||||
<tt>gr -cf</tt>. This algorithm is much faster than the
|
||||
old (more general) one, but it may sometimes loop.
|
||||
|
||||
<p>
|
||||
|
||||
12/10 (AR) Flag <tt>-atoms=Int</tt> to the command <tt>gt = generate_trees</tt>
|
||||
takes away all zero-argument functions except Int per category. In
|
||||
this way, it is possible to generate a corpus illustrating each
|
||||
syntactic structure even when the lexicon (which consists of
|
||||
zero-argument functions) is large.
|
||||
|
||||
<p>
|
||||
|
||||
6/10 (AR) New commands <tt>dc = define_command</tt> and
|
||||
<tt>dt = define_tree</tt> to define macros in a GF session.
|
||||
See <tt>help</tt> for details and examples.
|
||||
|
||||
<p>
|
||||
|
||||
5/10 (AR) Printing missing linearization rules:
|
||||
<tt>pm -printer=missing</tt>. Command <tt>g = grep</tt>,
|
||||
which works in a way similar to Unix grep.
|
||||
|
||||
<p>
|
||||
|
||||
5/10 (PL) Printing graphs with function and category dependencies:
|
||||
<tt>pg -printer=functiongraph</tt>, <tt>pg -printer=typegraph</tt>.
|
||||
|
||||
<p>
|
||||
|
||||
20/9 (AR) Added optimization by <b>common subexpression elimination</b>.
|
||||
It works on GFC modules and creates <tt>oper</tt> definitions for
|
||||
subterms that occur more than once in <tt>lin</tt> definitions. These
|
||||
<tt>oper</tt> definitions are automatically reinlined in functionalities
|
||||
that don't support <tt>oper</tt>s in GFC. This conversion is done by
|
||||
module and the <tt>oper</tt>s are not inherited. Moreover, the subterms
|
||||
can contain free variables which means that the <tt>oper</tt>s are not
|
||||
always well typed. However, since all variables in GFC are type-specific
|
||||
(and local variables are <tt>lin</tt>-specific), this does not destroy
|
||||
subject reduction or cause illegal captures.
|
||||
<br>
|
||||
The optimization is triggered by the flag <tt>optimize=OPT_subs</tt>,
|
||||
where <tt>OPT</tt> is any of the other optimizations (see <tt>h -optimize</tt>).
|
||||
The most aggressive value of the flag is <tt>all_subs</tt>. In experiments,
|
||||
the size of a GFC module can shrink by 85% compared to plain <tt>all</tt>.
|
||||
|
||||
<p>
|
||||
|
||||
18/9 (AR) Removed superfluous spaces from GFC printing. This shrinks
|
||||
the GFC size by 5-10%.
|
||||
|
||||
<p>
|
||||
|
||||
15/9 (AR) Fixed some bugs in dependent-type type checking of abstract
|
||||
modules at compile time. The type checker is more severe now, which means
|
||||
that some old grammars may fail to compile - but this is usually the
|
||||
right result. However, the type checker of <tt>def</tt> judgements still
|
||||
needs work.
|
||||
|
||||
<p>
|
||||
|
||||
14/9 (AR) Added printing of grammars to a format without parameters, in
|
||||
the spirit of Peanos "Latino sine flexione". The command <tt>pg -unpar</tt>
|
||||
does the trick, and the result can be saved in a <tt>gfcm</tt> file. The generated
|
||||
concrete syntax modules get the prefix <tt>UP_</tt>. The translation is briefly:
|
||||
<pre>
|
||||
(P => T)* = T*
|
||||
(t ! p)* = t*
|
||||
(table {p => t ; ...})* = t*
|
||||
</pre>
|
||||
In order for this to be maximally useful, the grammar should be written in such
|
||||
a way that the first value of every parameter type is the desired one. For
|
||||
instance, in Peano's case it would be the ablative for noun cases, the singular for
|
||||
numbers, and the 2nd person singular imperative for verb forms.
|
||||
|
||||
<p>
|
||||
|
||||
14/9 (BB) Added finite state approximation of grammars.
|
||||
Internally the conversion is done <tt>cfg -> regular -> fa -> slf</tt>, so the
|
||||
different printers can be used to check the output of each stage.
|
||||
The new options are:
|
||||
<dl>
|
||||
<dt><tt>pg -printer=slf</tt></dt>
|
||||
<dd>A finite automaton in the HTK SLF format.</dd>
|
||||
<dt><tt>pg -printer=slf_graphviz</tt></dt>
|
||||
<dd>The same FA as in SLF, but in Graphviz format.</dd>
|
||||
<dt><tt>pg -printer=fa_graphviz</tt></dt>
|
||||
<dd>A finite automaton with labelled edges, instead of labelled nodes which SLF has.</dd>
|
||||
<dt><tt>pg -printer=regular</tt></dt>
|
||||
<dd>A regular grammar in a simple BNF.</dd>
|
||||
</dl>
|
||||
|
||||
<p>
|
||||
|
||||
4/9 (AR) Added the option <tt>pg -printer=stat</tt> to show
|
||||
statistics of gfc compilation result. To be extended with new information.
|
||||
The most important stats now are the top-40 sized definitions.
|
||||
|
||||
<p>
|
||||
<hr>
|
||||
|
||||
1/7 <b>Release of GF 2.3</b>.
|
||||
|
||||
<p>
|
||||
|
||||
|
||||
1/7 (AR) Added the flag <tt>-o</tt> to the <tt>vt</tt> command
|
||||
to just write the <tt>.dot</tt> file without going to <tt>.ps</tt>
|
||||
(cf. 20/6).
|
||||
|
||||
<p>
|
||||
|
||||
29/6 (AR) The printer used by Embedded Java GF Interpreter
|
||||
(<tt>pm -header</tt>) now produces
|
||||
working code from all optimized grammars - hence you need not select a
|
||||
weaker optimization just to use the interpreter. However, the
|
||||
optimization <tt>-optimize=share</tt> usually produces smaller object
|
||||
grammars because the "unoptimizer" just undoes all optimizations.
|
||||
(This is to be considered a temporary solution until the interpreter
|
||||
knows how to handle stronger optimizations.)
|
||||
|
||||
<p>
|
||||
|
||||
27/6 (AR) The flag <tt>flags optimize=noexpand</tt> placed in a
|
||||
resource module prevents the optimization phase of the compiler when
|
||||
the <tt>.gfr</tt> file is created. This can prevent serious code
|
||||
explosion, but it will also make the processing of modules using the
|
||||
resource slowwer. A favourable example is <tt>lib/resource/finnish/ParadigmsFin</tt>.
|
||||
|
||||
<p>
|
||||
|
||||
23/6 (HD,AR) The new editor GUI <tt>gfeditor</tt> by Hans-Joachim
|
||||
Daniels can now be used. It is based on Janna Khegai's <tt>jgf</tt>.
|
||||
New functionality include HTML display (<tt>gfeditor -h</tt>) and
|
||||
programmable refinement tooltips.
|
||||
|
||||
<p>
|
||||
|
||||
23/6 (AR) The flag <tt>unlexer=finnish</tt> can be used to bind
|
||||
Finnish suffixes (e.g. possessives) to preceding words. The GF source
|
||||
notation is e.g. <tt>"isä" ++ "&*" ++ "nsa" ++ "&*" ++ "ko"</tt>,
|
||||
which unlexes to <tt>"isänsäkö"</tt>. There is no corresponding lexer
|
||||
support yet.
|
||||
|
||||
|
||||
<p>
|
||||
|
||||
22/6 (PL,AR) The MCFG parser (<tt>p -mcfg</tt>) now works on all
|
||||
optimized grammars - hence you need not select a weaker optimization
|
||||
to use this parser. The same concerns the CFGM printer (<tt>pm -printer=cfgm</tt>).
|
||||
|
||||
<p>
|
||||
|
||||
20/6 (AR) Added the command <tt>visualize_tree</tt> = <tt>vt</tt>, to
|
||||
display syntax trees graphically. Like <tt>vg</tt>, this command uses
|
||||
GraphViz and Ghostview. The foremost use is to pipe the parser to this
|
||||
command.
|
||||
|
||||
<p>
|
||||
|
||||
17/6 (BB) There is now support for lists in GF abstract syntax.
|
||||
A list category is declared as:
|
||||
<pre>
|
||||
cat [C]
|
||||
</pre>
|
||||
or
|
||||
<pre>
|
||||
cat [C]{n}
|
||||
</pre>
|
||||
where <tt>C</tt> is a category and <tt>n</tt> is a non-negative integer.
|
||||
<tt>cat [C]</tt> is equivalent to <tt>cat [C]{0}</tt>. List category
|
||||
syntax can be used whereever categories are used.
|
||||
|
||||
<p>
|
||||
|
||||
<tt>cat [C]{n}</tt> is equivalent to the declarations:
|
||||
<pre>
|
||||
cat ListC
|
||||
fun BaseC : C^n -> ListC
|
||||
fun ConsC : C -> ListC -> ListC
|
||||
</pre>
|
||||
|
||||
where <tt>C^0 -> X</tt> means <tt>X</tt>, and <tt>C^m</tt> (where
|
||||
m > 0) means <tt>C -> C^(m-1)</tt>.
|
||||
|
||||
<p>
|
||||
|
||||
A lincat declaration on the form:
|
||||
<pre>
|
||||
lincat [C] = T
|
||||
</pre>
|
||||
is equivalent to
|
||||
<pre>
|
||||
lincat ListC = T
|
||||
</pre>
|
||||
|
||||
The linearizations of the list constructors are written
|
||||
just like they would be if the function declarations above
|
||||
had been made manually, e.g.:
|
||||
<pre>
|
||||
lin BaseC x_1 ... x_n = t
|
||||
lin ConsC x xs = t'
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
|
||||
10/6 (AR) Preprocessor of <tt>.gfe</tt> files can now be performed as part of
|
||||
any grammar compilation. The flag <tt>-ex</tt> causes GF to look for
|
||||
the <tt>.gfe</tt> files and preprocess those that are younger
|
||||
than the corresponding <tt>.gf</tt> files. The files are first sorted
|
||||
and grouped by the resource, so that each resource only need be compiled once.
|
||||
|
||||
<p>
|
||||
|
||||
10/6 (AR) Editor GUI can now be alternatively invoked by the shell
|
||||
command <tt>gf -edit</tt> (equivalent to <tt>jgf</tt>).
|
||||
|
||||
<p>
|
||||
|
||||
10/6 (AR) Editor GUI command <tt>pc Int</tt> to pop <tt>Int</tt>
|
||||
items from the clip board.
|
||||
|
||||
<p>
|
||||
|
||||
4/6 (AR) Sequence of commands in the Java editor GUI now possible.
|
||||
The commands are separated by <tt> ;; </tt> (notice the space on
|
||||
both sides of the two semicolons). Such a sequence can be sent
|
||||
from the "GF Command" pop-up field, but is mostly intended
|
||||
for external processes that communicate with GF.
|
||||
|
||||
<p>
|
||||
|
||||
3/6 (AR) The format <tt>.gfe</tt> defined to support
|
||||
<b>grammar writing by examples</b>. Files of this format are first
|
||||
converted to <tt>.gf</tt> files by the command
|
||||
<pre>
|
||||
gf -examples File.gfe
|
||||
</pre>
|
||||
See <a href="../lib/resource/doc/example/QuestionsI.gfe">
|
||||
<tt>../lib/resource/doc/examples/QuestionsI.gfe</tt></a>
|
||||
for an example.
|
||||
|
||||
<p>
|
||||
|
||||
31/5 (AR) Default of p -rawtrees=k changed to 999999.
|
||||
|
||||
<p>
|
||||
|
||||
31/5 (AR) Support for restricted inheritance. Syntax:
|
||||
<pre>
|
||||
M -- inherit everything from M, as before
|
||||
M [a,b,c] -- only inherit constants a,b,c
|
||||
M-[a,b,c] -- inherit everything except a,b,c
|
||||
</pre>
|
||||
Caution: there is no check yet for completeness and
|
||||
consistency, but restricted inheritance can create
|
||||
run-time failures.
|
||||
|
||||
<p>
|
||||
|
||||
29/5 (AR) Parser support for reading GFC files line per line.
|
||||
The category <tt>Line</tt> in <tt>GFC.cf</tt> can be used
|
||||
as entrypoint instead of <tt>Grammar</tt> to achieve this.
|
||||
|
||||
<p>
|
||||
|
||||
28/5 (AR) Environment variables and path wild cards.
|
||||
<ul>
|
||||
<li> <tt>GF_LIB_PATH</tt> gives the location of <tt>GF/lib</tt>
|
||||
<li> <tt>GF_GRAMMAR_PATH</tt> gives a list of directories appended
|
||||
to the explicitly given path
|
||||
<li> <tt>DIR/*</tt> is expanded to the union of all subdirectories
|
||||
of <tt>DIR</tt>
|
||||
</ul>
|
||||
<p>
|
||||
|
||||
|
||||
26/5/2005 (BB) Notation for list categories.
|
||||
|
||||
|
||||
|
||||
</body>
|
||||
</html>
|
||||
1183
doc/gf-modules.html
1183
doc/gf-modules.html
File diff suppressed because it is too large
Load Diff
@@ -1,994 +0,0 @@
|
||||
The Module System of GF
|
||||
Aarne Ranta
|
||||
8/4/2005 - 5/7/2007
|
||||
|
||||
%!postproc(html): #SUB1 <sub>1</sub>
|
||||
%!postproc(html): #SUBk <sub>k</sub>
|
||||
%!postproc(html): #SUBi <sub>i</sub>
|
||||
%!postproc(html): #SUBm <sub>m</sub>
|
||||
%!postproc(html): #SUBn <sub>n</sub>
|
||||
%!postproc(html): #SUBp <sub>p</sub>
|
||||
%!postproc(html): #SUBq <sub>q</sub>
|
||||
|
||||
|
||||
% to compile: txt2tags --toc -thtml modulesystem.txt
|
||||
|
||||
|
||||
A GF grammar consists of a set of **modules**, which can be
|
||||
combined in different ways to build different grammars.
|
||||
There are several different **types of modules**:
|
||||
- ``abstract``
|
||||
- ``concrete``
|
||||
- ``resource``
|
||||
- ``interface``
|
||||
- ``instance``
|
||||
- ``incomplete concrete``
|
||||
|
||||
|
||||
We will go through the module types in this order, which is also
|
||||
their order of "importance" from the most basic to
|
||||
the more advanced ones.
|
||||
|
||||
This document presupposes knowledge of GF judgements and expressions, which can
|
||||
be gained from the [GF tutorial tutorial/gf-tutorial2.html]. It aims
|
||||
to give a systamatic description of the module system;
|
||||
some tutorial information is repeated to make the document
|
||||
self-contained.
|
||||
|
||||
|
||||
|
||||
|
||||
=The principal module types=
|
||||
|
||||
==Abstract syntax==
|
||||
|
||||
Any GF grammar that is used in an application
|
||||
will probably contain at least one module
|
||||
of the ``abstract`` module type. Here is an example of
|
||||
such a module, defining a fragment of propositional logic.
|
||||
```
|
||||
abstract Logic = {
|
||||
cat Prop ;
|
||||
fun Conj : Prop -> Prop -> Prop ;
|
||||
fun Disj : Prop -> Prop -> Prop ;
|
||||
fun Impl : Prop -> Prop -> Prop ;
|
||||
fun Falsum : Prop ;
|
||||
}
|
||||
```
|
||||
The **name** of this module is ``Logic``.
|
||||
|
||||
|
||||
|
||||
An ``abstract`` module defines an **abstract syntax**, which
|
||||
is a language-independent representation of a fragment of language.
|
||||
It consists of two kinds of **judgements**:
|
||||
- ``cat`` judgements telling what **categories** there are
|
||||
(types of abstract syntax trees)
|
||||
- ``fun`` judgements telling what **functions** there are
|
||||
(to build abstract syntax trees)
|
||||
|
||||
|
||||
There can also be ``def`` and ``data`` judgements in an
|
||||
abstract syntax.
|
||||
|
||||
|
||||
===Compilation of abstract syntax===
|
||||
|
||||
The GF grammar compiler expects to find the module ``Logic`` in a file named
|
||||
``Logic.gf``. When the compiler is run, it produces
|
||||
another file, named ``Logic.gfc``. This file is in the
|
||||
format called **canonical GF**, which is the "machine language"
|
||||
of GF. Next time that the module ``Logic`` is needed in
|
||||
compiling a grammar, it can be read from the compiled (``gfc``)
|
||||
file instead of the source (``gf``) file, unless the source
|
||||
has been changed after the compilation.
|
||||
|
||||
|
||||
==Concrete syntax==
|
||||
|
||||
In order for a GF grammar to describe a concrete language, the abstract
|
||||
syntax must be completed with a **concrete syntax** of it.
|
||||
For this purpose, we use modules of type ``concrete``: for instance,
|
||||
```
|
||||
concrete LogicEng of Logic = {
|
||||
lincat Prop = {s : Str} ;
|
||||
lin Conj a b = {s = a.s ++ "and" ++ b.s} ;
|
||||
lin Disj a b = {s = a.s ++ "or" ++ b.s} ;
|
||||
lin Impl a b = {s = "if" ++ a.s ++ "then" ++ b.s} ;
|
||||
lin Falsum = {s = ["we have a contradiction"]} ;
|
||||
}
|
||||
```
|
||||
The module ``LogicEng`` is a concrete syntax ``of`` the
|
||||
abstract syntax ``Logic``. The GF grammar compiler checks that
|
||||
the concrete is valid with respect to the abstract syntax ``of``
|
||||
which it is claimed to be. The validity requires that there has to be
|
||||
- a ``lincat`` judgement for each ``cat`` judgement, telling what the
|
||||
**linearization types** of categories are
|
||||
- a ``lin`` judgement for each ``fun`` judgement, telling what the
|
||||
**linearization functions** corresponding to functions are
|
||||
|
||||
|
||||
Validity also requires that the linearization functions defined by
|
||||
``lin`` judgements are type-correct with respect to the
|
||||
linearization types of the arguments and value of the function.
|
||||
|
||||
|
||||
|
||||
There can also be ``lindef`` and ``printname`` judgements in a
|
||||
concrete syntax.
|
||||
|
||||
|
||||
==Top-level grammar==
|
||||
|
||||
When a ``concrete`` module is successfully compiled, a ``gfc``
|
||||
file is produced in the same way as for ``abstract`` modules. The
|
||||
pair of an ``abstract`` and a corresponding ``concrete`` module
|
||||
is a **top-level grammar**, which can be used in the GF system to
|
||||
perform various tasks. The most fundamental tasks are
|
||||
- **linearization**: take an abstract syntax tree and find the corresponding string
|
||||
- **parsing**: take a string and find the corresponding abstract syntax
|
||||
trees (which can be zero, one, or many)
|
||||
|
||||
|
||||
In the current grammar, infinitely many trees and strings are recognized, although
|
||||
no very interesting ones. For example, the tree
|
||||
```
|
||||
Impl (Disj Falsum Falsum) Falsum
|
||||
```
|
||||
has the linearization
|
||||
```
|
||||
if we have a contradiction or we have a contradiction then we have a contradiction
|
||||
```
|
||||
which in turn can be parsed uniquely as that tree.
|
||||
|
||||
|
||||
===Compiling top-level grammars===
|
||||
|
||||
When GF compiles the module ``LogicEng`` it also has to compile
|
||||
all modules that it **depends** on (in this case, just ``Logic``).
|
||||
The compilation process starts with dependency analysis to find
|
||||
all these modules, recursively, starting from the explicitly imported one.
|
||||
The compiler then reads either ``gf`` or ``gfc`` files, in
|
||||
a dependency order. The decision on which files to read depends on
|
||||
time stamps and dependencies in a natural way, so that all and only
|
||||
those modules that have to be compiled are compiled. (This behaviour can
|
||||
be changed with flags, see below.)
|
||||
|
||||
|
||||
===Using top-level grammars===
|
||||
|
||||
To use a top-level grammar in the GF system, one uses the ``import``
|
||||
command (short name ``i``). For instance,
|
||||
```
|
||||
i LogicEng.gf
|
||||
```
|
||||
It is also possible to specify the imported grammar(s) on the command
|
||||
line when invoking GF:
|
||||
```
|
||||
gf LogicEng.gf
|
||||
```
|
||||
Various **compilation flags** can be added to both ways of compiling a module:
|
||||
- ``-src`` forces compilation form source files
|
||||
- ``-v`` gives more verbose information on compilation
|
||||
- ``-s`` makes compilation silent (except if it fails with an error message)
|
||||
|
||||
|
||||
A complete list of flags can be obtained in GF by ``help i``.
|
||||
|
||||
Importing a grammar makes it visible in GF's **internal state**. To see
|
||||
what modules are available, use the command ``print_options`` (``po``).
|
||||
You can empty the state with the command ``empty`` (``e``); this is
|
||||
needed if you want to read in grammars with a different abstract syntax
|
||||
than the current one without exiting GF.
|
||||
|
||||
|
||||
|
||||
Grammar modules can reside in different directories. They can then be found
|
||||
by means of a **search path**, which is a flag such as
|
||||
```
|
||||
-path=.:api/toplevel:prelude
|
||||
```
|
||||
given to the ``import`` command or the shell command invoking GF.
|
||||
(It can also be defined in the grammar file; see below.) The compiler
|
||||
writes every ``gfc`` file in the same directory as the corresponding
|
||||
``gf`` file.
|
||||
|
||||
The ``path`` is relative to the working directory ``pwd``, so that
|
||||
all directories listed are primarily interpreted as subdirectories of
|
||||
``pwd``. Secondarily, they are searched relative to the value of the
|
||||
environment variable ``GF_LIB_PATH``, which is by default set to
|
||||
``/usr/local/share/GF``.
|
||||
|
||||
Parsing and linearization can be performed with the ``parse``
|
||||
(``p``) and ``linearize`` (``l``) commands, respectively.
|
||||
For instance,
|
||||
```
|
||||
> l Impl (Disj Falsum Falsum) Falsum
|
||||
if we have a contradiction or we have a contradiction then we have a contradiction
|
||||
|
||||
> p -cat=Prop "we have a contradiction"
|
||||
Falsum
|
||||
```
|
||||
Notice that the ``parse`` command needs the parsing category
|
||||
as a flag. This necessary since a grammar can have several
|
||||
possible parsing categories ("entry points").
|
||||
|
||||
|
||||
|
||||
==Multilingual grammar==
|
||||
|
||||
One ``abstract`` syntax can have several ``concrete`` syntaxes.
|
||||
Here are two new ones for ``Logic``:
|
||||
```
|
||||
concrete LogicFre of Logic = {
|
||||
lincat Prop = {s : Str} ;
|
||||
lin Conj a b = {s = a.s ++ "et" ++ b.s} ;
|
||||
lin Disj a b = {s = a.s ++ "ou" ++ b.s} ;
|
||||
lin Impl a b = {s = "si" ++ a.s ++ "alors" ++ b.s} ;
|
||||
lin Falsum = {s = ["nous avons une contradiction"]} ;
|
||||
}
|
||||
|
||||
concrete LogicSymb of Logic = {
|
||||
lincat Prop = {s : Str} ;
|
||||
lin Conj a b = {s = "(" ++ a.s ++ "&" ++ b.s ++ ")"} ;
|
||||
lin Disj a b = {s = "(" ++ a.s ++ "v" ++ b.s ++ ")"} ;
|
||||
lin Impl a b = {s = "(" ++ a.s ++ "->" ++ b.s ++ ")"} ;
|
||||
lin Falsum = {s = "_|_"} ;
|
||||
}
|
||||
```
|
||||
The four modules ``Logic``, ``LogicEng``, ``LogicFre``, and
|
||||
``LogicSymb`` together form a **multilingual grammar**, in which
|
||||
it is possible to perform parsing and linearization with respect to any
|
||||
of the concrete syntaxes. As a combination of parsing and linearization,
|
||||
one can also perform **translation** from one language to another.
|
||||
(By **language** we mean the set of expressions generated by one
|
||||
concrete syntax.)
|
||||
|
||||
|
||||
===Using multilingual grammars===
|
||||
|
||||
Any combination of abstract syntax and corresponding concrete syntaxes
|
||||
is thus a multilingual grammar. With many languages and other enrichments
|
||||
(as described below), a multilingual grammar easily grows to the size of
|
||||
tens of modules. The grammar developer, having finished her job, can
|
||||
package the result in a **multilingual canonical grammar**, a file
|
||||
with the suffix ``.gfcm``. For instance, to compile the set of grammars
|
||||
described by now, the following sequence of GF commands can be used:
|
||||
```
|
||||
i LogicEng.gf
|
||||
i LogicFre.gf
|
||||
i LogicSymb.gf
|
||||
pm | wf logic.gfcm
|
||||
```
|
||||
The "end user" of the grammar only needs the file ``logic.gfcm`` to
|
||||
access all the functionality of the multilingual grammar. It can be
|
||||
imported in the GF system in the same way as ``.gf`` files. But
|
||||
it can also be used in the
|
||||
[Embedded Java Interpreter for GF http://www.cs.chalmers.se/~bringert/gf/gf-java.html]
|
||||
to build Java programs of which the multilingual grammar functionalities
|
||||
(linearization, parsing, translation) form a part.
|
||||
|
||||
In a multilingual grammar, the concrete syntax module names work as
|
||||
names of languages that can be selected for linearization and parsing:
|
||||
```
|
||||
> l -lang=LogicFre Impl Falsum Falsum
|
||||
si nous avons une contradiction alors nous avons une contradiction
|
||||
|
||||
> l -lang=LogicSymb Impl Falsum Falsum
|
||||
( _|_ -> _|_ )
|
||||
|
||||
> p -cat=Prop -lang=LogicSymb "( _|_ & _|_ )"
|
||||
Conj Falsum Falsum
|
||||
```
|
||||
The option ``-multi`` gives linearization to all languages:
|
||||
```
|
||||
> l -multi Impl Falsum Falsum
|
||||
if we have a contradiction then we have a contradiction
|
||||
si nous avons une contradiction alors nous avons une contradiction
|
||||
( _|_ -> _|_ )
|
||||
```
|
||||
Translation can be obtained by using a **pipe** from a parser
|
||||
to a linearizer:
|
||||
```
|
||||
> p -cat=Prop -lang=LogicSymb "( _|_ & _|_ )" | l -lang=LogicEng
|
||||
if we have a contradiction then we have a contradiction
|
||||
```
|
||||
|
||||
|
||||
|
||||
==Resource modules==
|
||||
|
||||
The ``concrete`` modules shown above would look much nicer if
|
||||
we used the main idea of functional programming: avoid repetitive
|
||||
code by using **functions** that capture repeated patterns of
|
||||
expressions. A collection of such functions can be a valuable
|
||||
**resource** for a programmer, reusable in many different
|
||||
top-level grammars. Thus we introduce the ``resource``
|
||||
module type, with the first example
|
||||
```
|
||||
resource Util = {
|
||||
oper SS : Type = {s : Str} ;
|
||||
oper ss : Str -> SS = \s -> {s = s} ;
|
||||
oper paren : Str -> Str = \s -> "(" ++ s ++ ")" ;
|
||||
oper infix : Str -> SS -> SS -> SS = \h,x,y ->
|
||||
ss (x.s ++ h ++ y.s) ;
|
||||
oper infixp : Str -> SS -> SS -> SS = \h,x,y ->
|
||||
ss (paren (infix h x y)) ;
|
||||
}
|
||||
```
|
||||
Modules of ``resource`` type have two forms of judgement:
|
||||
|
||||
- ``oper`` defining auxiliary operations
|
||||
- ``param`` defining parameter types
|
||||
|
||||
|
||||
A ``resource`` can be used in a ``concrete`` (or another
|
||||
``resource``) by ``open``ing it. This means that
|
||||
all operations (and parameter types) defined in the resource
|
||||
module become usable in module that opens it. For instance,
|
||||
we can rewrite the module ``LogicSymb`` much more concisely:
|
||||
```
|
||||
concrete LogicSymb of Logic = open Util in {
|
||||
lincat Prop = SS ;
|
||||
lin Conj = infixp "&" ;
|
||||
lin Disj = infixp "v" ;
|
||||
lin Impl = infixp "->" ;
|
||||
lin Falsum = ss "_|_" ;
|
||||
}
|
||||
```
|
||||
What happens when this variant of ``LogicSymb`` is
|
||||
compiled is that the ``oper``-defined constants
|
||||
of ``Util`` are **inlined** in the
|
||||
right-hand-sides of the judgements of ``LogicSymb``,
|
||||
and these expressions are **partially evaluated**, i.e.
|
||||
computed as far as possible. The generated ``gfc`` file
|
||||
will look just like the file generated for the first version
|
||||
of ``LogicSymb`` - at least, it will do the same job.
|
||||
|
||||
|
||||
Several ``resource`` modules can be ``open``ed
|
||||
at the same time. If the modules contain same names, the
|
||||
conflict can be resolved by **qualified** opening and
|
||||
reference. For instance,
|
||||
```
|
||||
concrete LogicSymb of Logic = open Util, Prelude in { ...
|
||||
} ;
|
||||
```
|
||||
(where ``Prelude`` is a standard library of GF) brings
|
||||
into scope two definitions of the constant ``SS``.
|
||||
To specify which one is used, you can write
|
||||
``Util.SS`` or ``Prelude.SS`` instead of just ``SS``.
|
||||
You can also introduce abbreviations to avoid long qualifiers, e.g.
|
||||
```
|
||||
concrete LogicSymb of Logic = open (U=Util), (P=Prelude) in { ...
|
||||
} ;
|
||||
```
|
||||
which means that you can write ``U.SS`` and ``P.SS``.
|
||||
|
||||
Judgements of ``param`` and ``oper`` forms may also be used
|
||||
in ``concrete`` modules, and they are then considered local
|
||||
to those modules, i.e. they are not exported.
|
||||
|
||||
|
||||
|
||||
===Compiling resource modules===
|
||||
|
||||
The compilation of a ``resource`` module differs
|
||||
from the compilation of ``abstract`` and
|
||||
``concrete`` modules because ``oper`` operations
|
||||
do not in general have values in ``gfc``. A ``gfc``
|
||||
file //is// generated, but it contains only
|
||||
``param`` judgements (also recall that ``oper``s
|
||||
are inlined in their top-level use sites, so it is not
|
||||
necessary to save them in the compiled grammar).
|
||||
However, since computing the operations over and over
|
||||
again can be time comsuming, and since type checking
|
||||
``resource`` modules also takes time, a third kind
|
||||
of file is generated for resource modules: a ``.gfr``
|
||||
file. This file is written in the GF source code notation,
|
||||
but it is type checked and type annotated, and ``oper``s
|
||||
are computed as far as possible.
|
||||
|
||||
|
||||
|
||||
If you look at any ``gfc`` or ``gfr`` file generated
|
||||
by the GF compiler, you see that all names have been replaced by
|
||||
their qualified variants. This is an important first step (after parsing)
|
||||
the compiler does. As for the commands in the GF shell, some output
|
||||
qualified names and some not. The difference does not always result
|
||||
from firm principles.
|
||||
|
||||
|
||||
===Using resource modules===
|
||||
|
||||
The typical use is through ``open`` in a
|
||||
``concrete`` module, which means that
|
||||
``resource`` modules are not imported on their own.
|
||||
However, in the developing and testing phase of grammars, it
|
||||
can be useful to evaluate ``oper``s with different
|
||||
arguments. To prevent them from being thrown away after inlining, the
|
||||
``-retain`` option can be used:
|
||||
```
|
||||
> i -retain Util.gf
|
||||
```
|
||||
The command ``compute_concrete`` (``cc``)
|
||||
can now be used for evaluating expressions that may contain
|
||||
operations defined in ``Util``:
|
||||
```
|
||||
> cc ss (paren "foo")
|
||||
{s = "(" ++ "foo" ++ ")"}
|
||||
```
|
||||
To find out what ``oper``s are available for a given type,
|
||||
the command ``show_operations`` (``so``) can be used:
|
||||
```
|
||||
> so SS
|
||||
Util.ss : Str -> SS ;
|
||||
Util.infix : Str -> SS -> SS -> SS ;
|
||||
Util.infixp : Str -> SS -> SS -> SS ;
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
==Inheritance==
|
||||
|
||||
The most characteristic modularity of GF lies in the division of
|
||||
grammars into ``abstract``, ``concrete``, and
|
||||
``resource`` modules. This permits writing multilingual
|
||||
grammar and sharing the maximum of code between different
|
||||
languages.
|
||||
|
||||
|
||||
In addition to this special kind of modularity, GF provides **inheritance**,
|
||||
which is familiar from other programming languages (in particular,
|
||||
object-oriented ones). Inheritance means that a module inherits all
|
||||
judgements from another module; we also say that it **extends**
|
||||
the other module. Inheritance is useful to divide big grammars into
|
||||
smaller units, and also to reuse the same units in different bigger
|
||||
grammars.
|
||||
|
||||
|
||||
|
||||
The first example of inheritance is for abstract syntax. Let us
|
||||
extend the module ``Logic`` to ``Arithmetic``:
|
||||
```
|
||||
abstract Arithmetic = Logic ** {
|
||||
cat Nat ;
|
||||
fun Even : Nat -> Prop ;
|
||||
fun Odd : Nat -> Prop ;
|
||||
fun Zero : Nat ;
|
||||
fun Succ : Nat -> Nat ;
|
||||
}
|
||||
```
|
||||
In parallel with the extension of the abstract syntax
|
||||
``Logic`` to ``Arithmetic``, we can extend
|
||||
the concrete syntax ``LogicEng`` to ``ArithmeticEng``:
|
||||
```
|
||||
concrete ArithmeticEng of Arithmetic = LogicEng ** open Util in {
|
||||
lincat Nat = SS ;
|
||||
lin Even x = ss (x.s ++ "is" ++ "even") ;
|
||||
lin Odd x = ss (x.s ++ "is" ++ "odd") ;
|
||||
lin Zero = ss "zero" ;
|
||||
lin Succ x = ss ("the" ++ "successor" ++ "of" ++ x.s) ;
|
||||
}
|
||||
```
|
||||
Another extension of ``Logic`` is ``Geometry``,
|
||||
```
|
||||
abstract Geometry = Logic ** {
|
||||
cat Point ;
|
||||
cat Line ;
|
||||
fun Incident : Point -> Line -> Prop ;
|
||||
}
|
||||
```
|
||||
The corresponding concrete syntax is left as exercise.
|
||||
|
||||
|
||||
===Multiple inheritance===
|
||||
|
||||
|
||||
Inheritance can be **multiple**, which means that a module
|
||||
may extend many modules at the same time. Suppose, for instance,
|
||||
that we want to build a module for mathematics covering both
|
||||
arithmetic and geometry, and the underlying logic. We then write
|
||||
```
|
||||
abstract Mathematics = Arithmetic, Geometry ** {
|
||||
} ;
|
||||
```
|
||||
We could of course add some new judgements in this module, but
|
||||
it is not necessary to do so. If no new judgements are added, the
|
||||
module body can be omitted:
|
||||
```
|
||||
abstract Mathematics = Arithmetic, Geometry ;
|
||||
```
|
||||
|
||||
The module ``Mathematics`` shows that it is possibe
|
||||
to extend a module already built by extension. The correctness
|
||||
criterion for extensions is that the same name
|
||||
(``cat``, ``fun``, ``oper``, or ``param``)
|
||||
may not be defined twice in the resulting union of names.
|
||||
That the names defined in ``Logic`` are "inherited twice"
|
||||
by ``Mathematics`` (via both ``Arithmetic`` and
|
||||
``Geometry``) is no violation of this rule; the usual
|
||||
problems of multiple inheritance do not arise, since
|
||||
the definitions of inherited constants cannot be changed.
|
||||
|
||||
|
||||
|
||||
===Restricted inheritance===
|
||||
|
||||
Inheritance can be **restricted**, which means that only some of
|
||||
the constants are inherited. There are two dual notations for this:
|
||||
```
|
||||
A [f,g]
|
||||
```
|
||||
meaning that //only// ``f`` and ``g`` are inherited from ``A``, and
|
||||
```
|
||||
A-[f,g]
|
||||
```
|
||||
meaning that //everything except// ``f`` is ``g`` are inherited from ``A``.
|
||||
|
||||
Constants that are not inherited may be redefined in the inheriting module.
|
||||
|
||||
|
||||
|
||||
|
||||
===Compiling inheritance===
|
||||
|
||||
Inherited judgements are not copied into the inheriting modules.
|
||||
Instead, an **indirection** is created for each inherited name,
|
||||
as can be seen by looking into the generated ``gfc`` (and
|
||||
``gfr``) files. Thus for instance the names
|
||||
```
|
||||
Mathematics.Prop Arithmetic.Prop Geometry.Prop Logic.Prop
|
||||
```
|
||||
all refer to the same category, declared in the module
|
||||
``Logic``.
|
||||
|
||||
|
||||
|
||||
===Inspecting grammar hierarchies===
|
||||
|
||||
The command ``visualize_graph`` (``vg``) shows the
|
||||
dependency graph in the current GF shell state. The graph can
|
||||
also be saved in a file and used e.g. in documentation, by the
|
||||
command ``print_multi -graph`` (``pm -graph``).
|
||||
|
||||
The ``vg`` command uses the free software packages Graphviz (commad ``dot``)
|
||||
and Ghostscript (command ``gv``).
|
||||
|
||||
|
||||
|
||||
==Reuse of top-level grammars as resources==
|
||||
|
||||
Top-level grammars have a straightforward translation to
|
||||
``resource`` modules. The translation concerns
|
||||
pairs of abstract-concrete judgements:
|
||||
```
|
||||
cat C ; ===> oper C : Type = T ;
|
||||
lincat C = T ;
|
||||
|
||||
fun f : A ; ===> oper f : A = t ;
|
||||
lin f = t ;
|
||||
```
|
||||
Due to this translation, a ``concrete`` module
|
||||
can be ``open``ed in the same way as a
|
||||
``resource`` module; the translation is done
|
||||
on the fly (it is computationally very cheap).
|
||||
|
||||
Modular grammar engineering often means that some grammarians
|
||||
focus on the semantics of the domain whereas others take care
|
||||
of linguistic details. Thus a typical reuse opens a
|
||||
linguistically oriented **resource grammar**,
|
||||
```
|
||||
abstract Resource = {
|
||||
cat S ; NP ; A ;
|
||||
fun PredA : NP -> A -> S ;
|
||||
}
|
||||
concrete ResourceEng of Resource = {
|
||||
lincat S = ... ;
|
||||
lin PredA = ... ;
|
||||
}
|
||||
```
|
||||
The **application grammar**, instead of giving linearizations
|
||||
explicitly, just reduces them to categories and functions in the
|
||||
resource grammar:
|
||||
```
|
||||
concrete ArithmeticEng of Arithmetic = LogicEng ** open ResourceEng in {
|
||||
lincat Nat = NP ;
|
||||
lin Even x = PredA x (regA "even") ;
|
||||
}
|
||||
```
|
||||
If the resource grammar is only capable of generating grammatically
|
||||
correct expressions, then the grammaticality of the application
|
||||
grammar is also guaranteed: the type checker of GF is used as
|
||||
grammar checker.
|
||||
To guarantee distinctions between categories that have
|
||||
the same linearization type, the actual translation used
|
||||
in GF adds to every linearization type and linearization
|
||||
a **lock field**,
|
||||
```
|
||||
cat C ; ===> oper C : Type = T ** {lock_C : {}} ;
|
||||
lincat C = T ;
|
||||
|
||||
fun f : C_1 ... C_n -> C ; ===> oper f : C_1 ... C_n -> C = \x_1,...,x_n ->
|
||||
lin f = t ; t x_1 ... x_n ** {lock_C = <>};
|
||||
```
|
||||
(Notice that the latter translation is type-correct because of
|
||||
record subtyping, which means that ``t`` can ignore the
|
||||
lock fields of its arguments.) An application grammarian who
|
||||
only uses resource grammar categories and functions never
|
||||
needs to write these lock fields herself. Having to do so
|
||||
serves as a warning that the grammaticality guarantee given
|
||||
by the resource grammar no longer holds.
|
||||
|
||||
**Note**. The lock field mechanism is experimental, and may be changed
|
||||
to a stronger abstraction mechnism in the future. This may result in
|
||||
hand-written lock fields ceasing to work.
|
||||
|
||||
|
||||
=Additional module types=
|
||||
|
||||
==Interfaces, instances, and incomplete grammars==
|
||||
|
||||
One difference between top-level grammars and ``resource``
|
||||
modules is that the former systematically separete the
|
||||
declarations of categories and functions from their definitions.
|
||||
In the reuse translation creating and ``oper`` judgement,
|
||||
the declaration coming from the ``abstract`` module is put
|
||||
together with the definition coming from the ``concrete``
|
||||
module.
|
||||
|
||||
|
||||
|
||||
However, the separation of declarations and definitions is so
|
||||
useful a notion that GF also has specific modules types that
|
||||
``resource`` modules into two parts. In this splitting,
|
||||
an ``interface`` module corresponds to an abstract syntax,
|
||||
in giving the declarations of operations (and parameter types).
|
||||
For instance, a generic markup interface would look as follows:
|
||||
```
|
||||
interface Markup = open Util in {
|
||||
oper Boldface : Str -> Str ;
|
||||
oper Heading : Str -> Str ;
|
||||
oper markupSS : (Str -> Str) -> SS -> SS = \f,r ->
|
||||
ss (f r.s) ;
|
||||
}
|
||||
```
|
||||
The definitions of the constants declared in an ``interface``
|
||||
are given in an ``instance`` module (which is always ``of``
|
||||
an interface, in the same way as a ``concrete`` is always
|
||||
``of`` an abstract). The following ``instance``s
|
||||
define markup in HTML and latex.
|
||||
```
|
||||
instance MarkupHTML of Markup = open Util in {
|
||||
oper Boldface s = "<b>" ++ s ++ "</b>" ;
|
||||
oper Heading s = "<h2>" ++ s ++ "</h2>" ;
|
||||
}
|
||||
|
||||
instance MarkupLatex of Markup = open Util in {
|
||||
oper Boldface s = "\\textbf{" ++ s ++ "}" ;
|
||||
oper Heading s = "\\section{" ++ s ++ "}" ;
|
||||
}
|
||||
```
|
||||
Notice that both ``interface``s and ``instance``s may
|
||||
``open`` ``resource``s (and also reused top-level grammars).
|
||||
An ``interface`` may moreover define some of the operations it
|
||||
declares; these definitions are inherited by all instances and cannot
|
||||
be changed in them. Inheritance by module extension
|
||||
is possible, as always, between modules of the same type.
|
||||
|
||||
|
||||
===Using an interface===
|
||||
|
||||
An ``interface`` or an ``instance``
|
||||
can be ``open``ed in
|
||||
a ``concrete`` using the same syntax as when opening
|
||||
a ``resource``. For an ``instance``, the semantics
|
||||
is the same as when opening the definitions together with
|
||||
the type signatures - one can think of an ``interface``
|
||||
and an ``instance`` of it together forming an ordinary
|
||||
``resource``. Opening an ``interface``, however,
|
||||
is different: functions that are only declared without
|
||||
having a definition cannot be compiled (inlined); neither
|
||||
can functions whose definitions depend on undefined functions.
|
||||
|
||||
|
||||
|
||||
A module that ``open``s an ``interface`` is therefore
|
||||
**incomplete**, and has to be **completed** with an
|
||||
``instance`` of the interface to become complete. To make
|
||||
this situation clear, GF requires any module that opens an
|
||||
``interface`` to be marked as ``incomplete``. Thus
|
||||
the module
|
||||
```
|
||||
incomplete concrete DocMarkup of Doc = open Markup in {
|
||||
...
|
||||
}
|
||||
```
|
||||
uses the interface ``Markup`` to place markup in
|
||||
chosen places in its linearization rules, but the
|
||||
implementation of markup - whether in HTML or in LaTeX - is
|
||||
left unspecified. This is a powerful way of sharing
|
||||
the code of a whole module with just differences in
|
||||
the definitions of some constants.
|
||||
|
||||
|
||||
|
||||
Another terminology for ``incomplete`` modules is
|
||||
**parametrized modules** or **functors**.
|
||||
The ``interface`` gives the list of parameters
|
||||
that the functor depends on.
|
||||
|
||||
|
||||
===Instantiating an interface===
|
||||
|
||||
To complete an ``incomplete`` module, each ``inteface``
|
||||
that it opens has to be provided an ``instance``. The following
|
||||
syntax is used for this:
|
||||
```
|
||||
concrete DocHTML of Doc = DocMarkup with (Markup = MarkupHTML) ;
|
||||
```
|
||||
Instantiation of ``Markup`` with ``MarkupLatex`` is
|
||||
another one-liner.
|
||||
|
||||
If more interfaces than one are instantiated, a comma-separated
|
||||
list of equations in parentheses is used, e.g.
|
||||
```
|
||||
concrete MusicIta = MusicI with
|
||||
(Syntax = SyntaxIta), (LexMusic = LexMusicIta) ;
|
||||
```
|
||||
This example shows a common design pattern for building applications:
|
||||
the concrete syntax is a functor on the generic resource grammar library
|
||||
interface ``Syntax`` and a domain-specific lexicon interface, here
|
||||
``LexMusic``.
|
||||
|
||||
All interfaces that are ``open``ed in the completed model
|
||||
must be completed.
|
||||
|
||||
Notice that the completion of an ``incomplete`` module
|
||||
may at the same time extend modules of the same type (which need
|
||||
not be completions). It can also add new judgements in a module body,
|
||||
and restrict inheritance from the functor.
|
||||
```
|
||||
concrete MusicIta = MusicI - [f] with
|
||||
(Syntax = SyntaxIta), (LexMusic = LexMusicIta) ** {
|
||||
|
||||
lin f = ...
|
||||
|
||||
} ;
|
||||
```
|
||||
|
||||
|
||||
===Compiling interfaces, instances, and parametrized modules===
|
||||
|
||||
Interfaces, instances, and parametric modules are purely a
|
||||
front-end feature of GF: these module types do not exist in
|
||||
the ``gfc`` and ``gfr`` formats. The compiler has
|
||||
nevertheless to keep track of their dependencies and modification
|
||||
times. Here is a summary of how they are compiled:
|
||||
- an ``interface`` is compiled into a ``resource`` with an empty body
|
||||
- an ``instance`` is compiled into a ``resource`` in union with its
|
||||
``interface``
|
||||
- an ``incomplete`` module (``concrete`` or ``resource``) is compiled
|
||||
into a module of the same type with an empty body
|
||||
- a completion module (``concrete`` or ``resource``) is compiled
|
||||
into a module of the same type by compiling its functor so that, instead of
|
||||
each ``interface``, its given ``instance`` is used
|
||||
|
||||
|
||||
This means that some generated code is duplicated, because those operations that
|
||||
do have complete definitions in an ``interface`` are copied to each of
|
||||
the ``instances``.
|
||||
|
||||
|
||||
=Summary of module syntax and semantics=
|
||||
|
||||
|
||||
==Abstract syntax modules==
|
||||
|
||||
Syntax:
|
||||
|
||||
``abstract`` A ``=`` (A#SUB1,...,A#SUBn ``**``)?
|
||||
``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }``
|
||||
|
||||
|
||||
|
||||
where
|
||||
- i >= 0
|
||||
- each //A#SUBi// is itself an abstract module,
|
||||
possibly with restrictions on inheritance, i.e. //A#SUBi//``-[``//f,..,g//``]``
|
||||
or //A#SUBi//``[``//f,..,g//``]``
|
||||
- each //J#SUBi// is a judgement of one of the forms
|
||||
``cat, fun, def, data``
|
||||
|
||||
|
||||
Semantic conditions:
|
||||
- all inherited names declared in each //A#SUBi// and //A// must be distinct
|
||||
- names in restriction lists must be defined in the restricted module
|
||||
- inherited constants may not depend on names excluded by restriction
|
||||
|
||||
|
||||
|
||||
==Concrete syntax modules==
|
||||
|
||||
Syntax:
|
||||
|
||||
``incomplete``? ``concrete`` C ``of`` A ``=``
|
||||
(C#SUB1,...,C#SUBn ``**``)?
|
||||
(``open`` O#SUB1,...,O#SUBk ``in``)?
|
||||
``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }``
|
||||
|
||||
|
||||
|
||||
where
|
||||
- i >= 0
|
||||
- //A// is an abstract module
|
||||
- each //C#SUBi// is a concrete module,
|
||||
possibly with restrictions on inheritance, i.e. //C#SUBi//``-[``//f,..,g//``]``
|
||||
- each //O#SUBi// is an open specification, of one of the forms
|
||||
- //R//
|
||||
- ``(``//Q//``=``//R//``)``
|
||||
|
||||
|
||||
where //R// is a resource, instance, or concrete, and //Q// is any identifier
|
||||
- each //J#SUBi// is a judgement of one of the forms
|
||||
``lincat, lin, lindef, printname``; also the forms ``oper, param`` are
|
||||
allowed, but they cannot be inherited.
|
||||
|
||||
|
||||
|
||||
If the modifier ``incomplete`` appears, then any //R// in
|
||||
an open specification may also be an interface or an abstract.
|
||||
|
||||
|
||||
Semantic conditions:
|
||||
- each ``cat`` judgement in //A//
|
||||
must have a corresponding, unique
|
||||
``lincat`` judgement in //C//
|
||||
- each ``fun`` judgement in //A//
|
||||
must have a corresponding, unique
|
||||
``lin`` judgement in //C//
|
||||
- names in restriction lists must be defined in the restricted module
|
||||
- inherited constants may not depend on names excluded by restriction
|
||||
|
||||
|
||||
|
||||
==Resource modules==
|
||||
|
||||
Syntax:
|
||||
|
||||
``resource`` R ``=``
|
||||
(R#SUB1,...,R#SUBn ``**``)?
|
||||
(``open`` O#SUB1,...,O#SUBk ``in``)?
|
||||
``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }``
|
||||
|
||||
|
||||
where
|
||||
- i >= 0
|
||||
- each //R#SUBi// is a resource, instance, or concrete module,
|
||||
possibly with restrictions on inheritance, i.e. //R#SUBi//``-[``//f,..,g//``]``
|
||||
- each //O#SUBi// is an open specification, of one of the forms
|
||||
- //P//
|
||||
- ``(``//Q//``=``//R//``)``
|
||||
|
||||
|
||||
where //P// is a resource, instance, or concrete, and //Q// is any identifier
|
||||
- each //J#SUBi// is a judgement of one of the forms ``oper, param``
|
||||
|
||||
|
||||
|
||||
|
||||
Semantic conditions:
|
||||
- all names defined in each //R#SUBi// and //R// must be distinct
|
||||
- all constants declared must have a definition
|
||||
- names in restriction lists must be defined in the restricted module
|
||||
- inherited constants may not depend on names excluded by restriction
|
||||
|
||||
|
||||
|
||||
==Interface modules==
|
||||
|
||||
Syntax:
|
||||
|
||||
``interface`` R ``=``
|
||||
(R#SUB1,...,R#SUBn ``**``)?
|
||||
(``open`` O#SUB1,...,O#SUBk ``in``)?
|
||||
``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }``
|
||||
|
||||
|
||||
where
|
||||
- i >= 0
|
||||
- each //R#SUBi// is an interface or abstract module,
|
||||
possibly with restrictions on inheritance, i.e. //R#SUBi//``-[``//f,..,g//``]``
|
||||
- each //O#SUBi// is an open specification, of one of the forms
|
||||
- //P//
|
||||
- ``(``//Q//``=``//R//``)``
|
||||
|
||||
|
||||
where //P// is a resource, instance, or concrete, and //Q// is any identifier
|
||||
- each //J#SUBi// is a judgement of one of the forms ``oper, param``
|
||||
|
||||
|
||||
|
||||
|
||||
Semantic conditions:
|
||||
- all names declared in each //R#SUBi// and //R// must be distinct
|
||||
- names in restriction lists must be defined in the restricted module
|
||||
- inherited constants may not depend on names excluded by restriction
|
||||
|
||||
|
||||
|
||||
|
||||
==Instance modules==
|
||||
|
||||
Syntax:
|
||||
|
||||
``instance`` R ``of`` I ``=``
|
||||
(R#SUB1,...,R#SUBn ``**``)?
|
||||
(``open`` O#SUB1,...,O#SUBk ``in``)?
|
||||
``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }``
|
||||
|
||||
|
||||
where
|
||||
- i >= 0
|
||||
- //I// is an interface module
|
||||
- each //R#SUBi// is an instance, resource, or concrete module,
|
||||
possibly with restrictions on inheritance, i.e. //R#SUBi//``-[``//f,..,g//``]``
|
||||
|
||||
- each //O#SUBi// is an open specification, of one of the forms
|
||||
- //P//
|
||||
- ``(``//Q//``=``//R//``)``
|
||||
|
||||
|
||||
where //P// is a resource, instance, or concrete, and //Q// is any identifier
|
||||
- each //J#SUBi// is a judgement of one of the forms
|
||||
``oper, param``
|
||||
|
||||
|
||||
|
||||
|
||||
Semantic conditions:
|
||||
- all names declared in each //R#SUBi//, //I//, and //R// must be distinct
|
||||
- all constants declared in //I// must have a definition either in
|
||||
//I// or //R//
|
||||
- names in restriction lists must be defined in the restricted module
|
||||
- inherited constants may not depend on names excluded by restriction
|
||||
|
||||
|
||||
|
||||
==Instantiated concrete syntax modules==
|
||||
|
||||
Syntax:
|
||||
|
||||
``concrete`` C ``of`` A ``=``
|
||||
(C#SUB1,...,C#SUBn ``**``)?
|
||||
B
|
||||
``with``
|
||||
``(``I#SUB1 ``=``J#SUB1``),`` ...
|
||||
``, (``I#SUBp ``=``J#SUBp``)``
|
||||
(``-``? ``[``c#SUB1,...,c#SUBq ``]``)?
|
||||
(``**``?
|
||||
(``open`` O#SUB1,...,O#SUBk ``in``)?
|
||||
``{``J#SUB1 ``;`` ... ``;`` J#SUBm ``; }``)? ``;``
|
||||
|
||||
where
|
||||
- i >= 0
|
||||
- //A// is an abstract module
|
||||
- each //C#SUBi// is a concrete module,
|
||||
possibly with restrictions on inheritance, i.e. //R#SUBi//``-[``//f,..,g//``]``
|
||||
- //B// is an incomplete concrete syntax of //A//
|
||||
- each //I#SUBi// is an interface or an abstract
|
||||
- each //J#SUBi// is an instance or a concrete of //I#SUBi//
|
||||
- each //O#SUBi// is an open specification, of one of the forms
|
||||
- //R//
|
||||
- ``(``//Q//``=``//R//``)``
|
||||
|
||||
|
||||
where //R// is a resource, instance, or concrete, and //Q// is any identifier
|
||||
- each //J#SUBi// is a judgement of one of the forms
|
||||
``lincat, lin, lindef, printname``; also the forms ``oper, param`` are
|
||||
allowed, but they cannot be inherited.
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -1,300 +0,0 @@
|
||||
==Texts. phrases, and utterances==
|
||||
|
||||
The outermost linguistic structure is ``Text``. ``Text``s are composed
|
||||
from Phrases (``Phr``) followed by punctuation marks - either of ".", "?" or
|
||||
"!" (with their proper variants in Spanish and Arabic). Here is an
|
||||
example of a ``Text`` string.
|
||||
```
|
||||
John walks. Why? He doesn't want to sleep!
|
||||
```
|
||||
Phrases are mostly built from Utterances (``Utt``), which in turn are
|
||||
declarative sentences, questions, or imperatives - but there
|
||||
are also "one-word utterances" consisting of noun phrases
|
||||
or other subsentential phrases. Some Phrases are atomic,
|
||||
for instance "yes" and "no". Here are some examples of Phrases.
|
||||
```
|
||||
yes
|
||||
come on, John
|
||||
but John walks
|
||||
give me the stick please
|
||||
don't you know that he is sleeping
|
||||
a glass of wine
|
||||
a glass of wine please
|
||||
```
|
||||
There is no connection between the punctuation marks and the
|
||||
types of utterances. This reflects the fact that the punctuation
|
||||
mark in a real text is selected as a function of the speech act
|
||||
rather than the grammatical form of an utterance. The following
|
||||
text is thus well-formed.
|
||||
```
|
||||
John walks. John walks? John walks!
|
||||
```
|
||||
What is the difference between Phrase and Utterance? Just technical:
|
||||
a Phrase is an Utterance with an optional leading conjunction ("but")
|
||||
and an optional tailing vocative ("John", "please").
|
||||
|
||||
|
||||
==Sentences and clauses==
|
||||
|
||||
TODO: use overloaded operations in the examples.
|
||||
|
||||
The richest of the categories below Utterance is ``S``, Sentence. A Sentence
|
||||
is formed from a Clause (``Cl``), by fixing its Tense, Anteriority, and Polarity.
|
||||
For example, each of the following strings has a distinct syntax tree
|
||||
in the category Sentence:
|
||||
```
|
||||
John walks
|
||||
John doesn't walk
|
||||
John walked
|
||||
John didn't walk
|
||||
John has walked
|
||||
John hasn't walked
|
||||
John will walk
|
||||
John won't walk
|
||||
...
|
||||
```
|
||||
whereas in the category Clause all of them are just different forms of
|
||||
the same tree.
|
||||
The difference between Sentence and Clause is thus also rather technical.
|
||||
It may not correspond exactly to any standard usage of the terms
|
||||
"clause" and "sentence".
|
||||
|
||||
Figure 1 shows a type-annotated syntax tree of the Text "John walks."
|
||||
and gives an overview of the structural levels.
|
||||
|
||||
#BFIG
|
||||
|
||||
```
|
||||
Node Constructor Value type Other constructors
|
||||
-----------------------------------------------------------
|
||||
1. TFullStop Text TQuestMark
|
||||
2. (PhrUtt Phr
|
||||
3. NoPConj PConj but_PConj
|
||||
4. (UttS Utt UttQS
|
||||
5. (UseCl S UseQCl
|
||||
6. TPres Tense TPast
|
||||
7. ASimul Anter AAnter
|
||||
8. PPos Pol PNeg
|
||||
9. (PredVP Cl
|
||||
10. (UsePN NP UsePron, DetCN
|
||||
11. john_PN) PN mary_PN
|
||||
12. (UseV VP ComplV2, ComplV3
|
||||
13. walk_V)))) V sleep_V
|
||||
14. NoVoc) Voc please_Voc
|
||||
15. TEmpty Text
|
||||
```
|
||||
|
||||
#BCENTER
|
||||
Figure 1. Type-annotated syntax tree of the Text "John walks."
|
||||
#ECENTER
|
||||
|
||||
#EFIG
|
||||
|
||||
Here are some examples of the results of changing constructors.
|
||||
```
|
||||
1. TFullStop -> TQuestMark John walks?
|
||||
3. NoPConj -> but_PConj But John walks.
|
||||
6. TPres -> TPast John walked.
|
||||
7. ASimul -> AAnter John has walked.
|
||||
8. PPos -> PNeg John doesn't walk.
|
||||
11. john_PN -> mary_PN Mary walks.
|
||||
13. walk_V -> sleep_V John sleeps.
|
||||
14. NoVoc -> please_Voc John sleeps please.
|
||||
```
|
||||
All constructors cannot of course be changed so freely, because the
|
||||
resulting tree would not remain well-typed. Here are some changes involving
|
||||
many constructors:
|
||||
```
|
||||
4- 5. UttS (UseCl ...) ->
|
||||
UttQS (UseQCl (... QuestCl ...)) Does John walk?
|
||||
10-11. UsePN john_PN ->
|
||||
UsePron we_Pron We walk.
|
||||
12-13. UseV walk_V ->
|
||||
ComplV2 love_V2 this_NP John loves this.
|
||||
```
|
||||
|
||||
|
||||
==Parts of sentences==
|
||||
|
||||
The linguistic phenomena mostly discussed in both traditional grammars and modern
|
||||
syntax belong to the level of Clauses, that is, lines 9-13, and occasionally
|
||||
to Sentences, lines 5-13. At this level, the major categories are
|
||||
``NP`` (Noun Phrase) and ``VP`` (Verb Phrase). A Clause typically
|
||||
consists of just an ``NP`` and a ``VP``.
|
||||
The internal structure of both ``NP`` and ``VP`` can be very complex,
|
||||
and these categories are mutually recursive: not only can a ``VP``
|
||||
contain an ``NP``,
|
||||
```
|
||||
[VP loves [NP Mary]]
|
||||
```
|
||||
but also an ``NP`` can contain a ``VP``
|
||||
```
|
||||
[NP every man [RS who [VP walks]]]
|
||||
```
|
||||
(a labelled bracketing like this is of course just a rough approximation of
|
||||
a GF syntax tree, but still a useful device of exposition).
|
||||
|
||||
Most of the resource modules thus define functions that are used inside
|
||||
NPs and VPs. Here is a brief overview:
|
||||
|
||||
**Noun**. How to construct NPs. The main three mechanisms
|
||||
for constructing NPs are
|
||||
- from proper names: "John"
|
||||
- from pronouns: "we"
|
||||
- from common nouns by determiners: "this man"
|
||||
|
||||
|
||||
The ``Noun`` module also defines the construction of common nouns.
|
||||
The most frequent ways are
|
||||
- lexical noun items: "man"
|
||||
- adjectival modification: "old man"
|
||||
- relative clause modification: "man who sleeps"
|
||||
- application of relational nouns: "successor of the number"
|
||||
|
||||
|
||||
**Verb**.
|
||||
How to construct VPs. The main mechanism is verbs with their arguments,
|
||||
for instance,
|
||||
- one-place verbs: "walks"
|
||||
- two-place verbs: "loves Mary"
|
||||
- three-place verbs: "gives her a kiss"
|
||||
- sentence-complement verbs: "says that it is cold"
|
||||
- VP-complement verbs: "wants to give her a kiss"
|
||||
|
||||
|
||||
A special verb is the copula, "be" in English but not even realized
|
||||
by a verb in all languages.
|
||||
A copula can take different kinds of complement:
|
||||
- an adjectival phrase: "(John is) old"
|
||||
- an adverb: "(John is) here"
|
||||
- a noun phrase: "(John is) a man"
|
||||
|
||||
|
||||
**Adjective**.
|
||||
How to constuct ``AP``s. The main ways are
|
||||
- positive forms of adjectives: "old"
|
||||
- comparative forms with object of comparison: "older than John"
|
||||
|
||||
|
||||
**Adverb**.
|
||||
How to construct ``Adv``s. The main ways are
|
||||
- from adjectives: "slowly"
|
||||
- as prepositional phrases: "in the car"
|
||||
|
||||
|
||||
==Modules and their names==
|
||||
|
||||
This section is not necessary for users of the library.
|
||||
|
||||
TODO: explain the overloaded API.
|
||||
|
||||
The resource modules are named after the kind of
|
||||
phrases that are constructed in them,
|
||||
and they can be roughly classified by the "level" or "size" of expressions that are
|
||||
formed in them:
|
||||
- Larger than sentence: ``Text``, ``Phrase``
|
||||
- Same level as sentence: ``Sentence``, ``Question``, ``Relative``
|
||||
- Parts of sentence: ``Adjective``, ``Adverb``, ``Noun``, ``Verb``
|
||||
- Cross-cut (coordination): ``Conjunction``
|
||||
|
||||
|
||||
Because of mutual recursion such as in embedded sentences, this classification is
|
||||
not a complete order. However, no mutual dependence is needed between the
|
||||
modules themselves - they can all be compiled separately. This is due
|
||||
to the module ``Cat``, which defines the type system common to the other modules.
|
||||
For instance, the types ``NP`` and ``VP`` are defined in ``Cat``,
|
||||
and the module ``Verb`` only
|
||||
needs to know what is given in ``Cat``, not what is given in ``Noun``. To implement
|
||||
a rule such as
|
||||
```
|
||||
Verb.ComplV2 : V2 -> NP -> VP
|
||||
```
|
||||
it is enough to know the linearization type of ``NP``
|
||||
(as well as those of ``V2`` and ``VP``, all
|
||||
given in ``Cat``). It is not necessary to know what
|
||||
ways there are to build ``NP``s (given in ``Noun``), since all these ways must
|
||||
conform to the linearization type defined in ``Cat``. Thus the format of
|
||||
category-specific modules is as follows:
|
||||
```
|
||||
abstract Adjective = Cat ** {...}
|
||||
abstract Noun = Cat ** {...}
|
||||
abstract Verb = Cat ** {...}
|
||||
```
|
||||
|
||||
|
||||
==Top-level grammar and lexicon==
|
||||
|
||||
The module ``Grammar`` collects all the category-specific modules into
|
||||
a complete grammar:
|
||||
```
|
||||
abstract Grammar =
|
||||
Adjective, Noun, Verb, ..., Structural, Idiom
|
||||
```
|
||||
The module ``Structural`` is a lexicon of structural words (function words),
|
||||
such as determiners.
|
||||
|
||||
The module ``Idiom`` is a collection of idiomatic structures whose
|
||||
implementation is very language-dependent. An example is existential
|
||||
structures ("there is", "es gibt", "il y a", etc).
|
||||
|
||||
The module ``Lang`` combines ``Grammar`` with a ``Lexicon`` of
|
||||
ca. 350 content words:
|
||||
```
|
||||
abstract Lang = Grammar, Lexicon
|
||||
```
|
||||
Using ``Lang`` instead of ``Grammar`` as a library may give
|
||||
for free some words needed in an application. But its main purpose is to
|
||||
help testing the resource library, rather than as a resource itself.
|
||||
It does not even seem realistic to develop
|
||||
a general-purpose multilingual resource lexicon.
|
||||
|
||||
The diagram in Figure 2 shows the structure of the API.
|
||||
|
||||
#BFIG
|
||||
|
||||
#GRAMMAR
|
||||
|
||||
#BCENTER
|
||||
Figure 2. The resource syntax API.
|
||||
#ECENTER
|
||||
|
||||
#EFIG
|
||||
|
||||
==Language-specific syntactic structures==
|
||||
|
||||
The API collected in ``Grammar`` has been designed to be implementable for
|
||||
all languages in the resource package. It does contain some rules that
|
||||
are strange or superfluous in some languages; for instance, the distinction
|
||||
between definite and indefinite articles does not apply to Finnish and Russian.
|
||||
But such rules are still easy to implement: they only create some superfluous
|
||||
ambiguity in the languages in question.
|
||||
|
||||
But the library makes no claim that all languages should have exactly the same
|
||||
abstract syntax. The common API is therefore extended by language-dependent
|
||||
rules. The top level of each languages looks as follows (with English as example):
|
||||
```
|
||||
abstract English = Grammar, ExtraEngAbs, DictEngAbs
|
||||
```
|
||||
where ``ExtraEngAbs`` is a collection of syntactic structures specific to English,
|
||||
and ``DictEngAbs`` is an English dictionary
|
||||
(at the moment, it consists of ``IrregEngAbs``,
|
||||
the irregular verbs of English). Each of these language-specific grammars has
|
||||
the potential to grow into a full-scale grammar of the language. These grammars
|
||||
can also be used as libraries, but the possibility of using functors is lost.
|
||||
|
||||
To give a better overview of language-specific structures,
|
||||
modules like ``ExtraEngAbs``
|
||||
are built from a language-independent module ``ExtraAbs``
|
||||
by restricted inheritance:
|
||||
```
|
||||
abstract ExtraEngAbs = Extra [f,g,...]
|
||||
```
|
||||
Thus any category and function in ``Extra`` may be shared by a subset of all
|
||||
languages. One can see this set-up as a matrix, which tells
|
||||
what ``Extra`` structures
|
||||
are implemented in what languages. For the common API in ``Grammar``, the matrix
|
||||
is filled with 1's (everything is implemented in every language).
|
||||
|
||||
Language-specific extensions and the use of restricted
|
||||
inheritance is a recent addition to the resource grammar library, and
|
||||
has only been exploited in a very small scale so far.
|
||||
Reference in New Issue
Block a user