1
0
forked from GitHub/gf-core
This commit is contained in:
aarne
2005-12-18 21:29:55 +00:00
parent 3d9a05f843
commit 7878cd5e0a
4 changed files with 384 additions and 351 deletions

BIN
doc/tutorial/Foodmarket.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.9 KiB

BIN
doc/tutorial/Tree2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.7 KiB

View File

@@ -7,7 +7,7 @@
<P ALIGN="center"><CENTER><H1>Grammatical Framework Tutorial</H1>
<FONT SIZE="4">
<I>Author: Aarne Ranta &lt;aarne (at) cs.chalmers.se&gt;</I><BR>
Last update: Sun Dec 18 22:27:21 2005
Last update: Sun Dec 18 22:29:50 2005
</FONT></CENTER>
<P></P>
@@ -77,6 +77,45 @@ Last update: Sun Dec 18 22:27:21 2005
<UL>
<LI><A HREF="#toc47">Parametric vs. inherent features, agreement</A>
<LI><A HREF="#toc48">English concrete syntax with parameters</A>
<LI><A HREF="#toc49">Hierarchic parameter types</A>
<LI><A HREF="#toc50">Morphological analysis and morphology quiz</A>
<LI><A HREF="#toc51">Discontinuous constituents</A>
</UL>
<LI><A HREF="#toc52">More constructs for concrete syntax</A>
<UL>
<LI><A HREF="#toc53">Free variation</A>
<LI><A HREF="#toc54">Record extension and subtyping</A>
<LI><A HREF="#toc55">Tuples and product types</A>
<LI><A HREF="#toc56">Prefix-dependent choices</A>
<LI><A HREF="#toc57">Predefined types and operations</A>
</UL>
<LI><A HREF="#toc58">More features of the module system</A>
<UL>
<LI><A HREF="#toc59">Resource grammars and their reuse</A>
<LI><A HREF="#toc60">Interfaces, instances, and functors</A>
<LI><A HREF="#toc61">Restricted inheritance and qualified opening</A>
</UL>
<LI><A HREF="#toc62">More concepts of abstract syntax</A>
<UL>
<LI><A HREF="#toc63">Dependent types</A>
<LI><A HREF="#toc64">Higher-order abstract syntax</A>
<LI><A HREF="#toc65">Semantic definitions</A>
</UL>
<LI><A HREF="#toc66">Transfer modules</A>
<LI><A HREF="#toc67">Practical issues</A>
<UL>
<LI><A HREF="#toc68">Lexers and unlexers</A>
<LI><A HREF="#toc69">Efficiency of grammars</A>
<LI><A HREF="#toc70">Speech input and output</A>
<LI><A HREF="#toc71">Multilingual syntax editor</A>
<LI><A HREF="#toc72">Interactive Development Environment (IDE)</A>
<LI><A HREF="#toc73">Communicating with GF</A>
<LI><A HREF="#toc74">Embedded grammars in Haskell, Java, and Prolog</A>
<LI><A HREF="#toc75">Alternative input and output grammar formats</A>
</UL>
<LI><A HREF="#toc76">Case studies</A>
<UL>
<LI><A HREF="#toc77">Interfacing formal and natural languages</A>
</UL>
</UL>
@@ -1568,432 +1607,426 @@ the formation of sentences.
} ;
}
```
%--!
===Hierarchic parameter types===
The reader familiar with a functional programming language such as
[Haskell http://www.haskell.org] must have noticed the similarity
between parameter types in GF and **algebraic datatypes** (``data`` definitions
in Haskell). The GF parameter types are actually a special case of algebraic
datatypes: the main restriction is that in GF, these types must be finite.
(It is this restriction that makes it possible to invert linearization rules into
parsing methods.)
However, finite is not the same thing as enumerated. Even in GF, parameter
constructors can take arguments, provided these arguments are from other
parameter types - only recursion is forbidden. Such parameter types impose a
hierarchic order among parameters. They are often needed to define
the linguistically most accurate parameter systems.
To give an example, Swedish adjectives
are inflected in number (singular or plural) and
gender (uter or neuter). These parameters would suggest 2*2=4 different
forms. However, the gender distinction is done only in the singular. Therefore,
it would be inaccurate to define adjective paradigms using the type
``Gender =&gt; Number =&gt; Str``. The following hierarchic definition
yields an accurate system of three adjectival forms.
</PRE>
<P></P>
<A NAME="toc49"></A>
<H3>Hierarchic parameter types</H3>
<P>
param AdjForm = ASg Gender | APl ;
param Gender = Uter | Neuter ;
The reader familiar with a functional programming language such as
<A HREF="http://www.haskell.org">Haskell</A> must have noticed the similarity
between parameter types in GF and <B>algebraic datatypes</B> (<CODE>data</CODE> definitions
in Haskell). The GF parameter types are actually a special case of algebraic
datatypes: the main restriction is that in GF, these types must be finite.
(It is this restriction that makes it possible to invert linearization rules into
parsing methods.)
</P>
<P>
However, finite is not the same thing as enumerated. Even in GF, parameter
constructors can take arguments, provided these arguments are from other
parameter types - only recursion is forbidden. Such parameter types impose a
hierarchic order among parameters. They are often needed to define
the linguistically most accurate parameter systems.
</P>
<P>
To give an example, Swedish adjectives
are inflected in number (singular or plural) and
gender (uter or neuter). These parameters would suggest 2*2=4 different
forms. However, the gender distinction is done only in the singular. Therefore,
it would be inaccurate to define adjective paradigms using the type
<CODE>Gender =&gt; Number =&gt; Str</CODE>. The following hierarchic definition
yields an accurate system of three adjectival forms.
</P>
<PRE>
In pattern matching, a constructor can have patterns as arguments. For instance,
the adjectival paradigm in which the two singular forms are the same, can be defined
param AdjForm = ASg Gender | APl ;
param Gender = Uter | Neuter ;
</PRE>
<P>
oper plattAdj : Str -&gt; AdjForm =&gt; Str = \x -&gt; table {
ASg _ =&gt; x ;
APl =&gt; x + "a" ;
}
In pattern matching, a constructor can have patterns as arguments. For instance,
the adjectival paradigm in which the two singular forms are the same, can be defined
</P>
<PRE>
%--!
===Morphological analysis and morphology quiz===
Even though in GF morphology
is mostly seen as an auxiliary of syntax, a morphology once defined
can be used on its own right. The command ``morpho_analyse = ma``
can be used to read a text and return for each word the analyses that
it has in the current concrete syntax.
oper plattAdj : Str -&gt; AdjForm =&gt; Str = \x -&gt; table {
ASg _ =&gt; x ;
APl =&gt; x + "a" ;
}
</PRE>
<P></P>
<A NAME="toc50"></A>
<H3>Morphological analysis and morphology quiz</H3>
<P>
&gt; rf bible.txt | morpho_analyse
Even though in GF morphology
is mostly seen as an auxiliary of syntax, a morphology once defined
can be used on its own right. The command <CODE>morpho_analyse = ma</CODE>
can be used to read a text and return for each word the analyses that
it has in the current concrete syntax.
</P>
<PRE>
In the same way as translation exercises, morphological exercises can
be generated, by the command ``morpho_quiz = mq``. Usually,
the category is set to be something else than ``S``. For instance,
&gt; rf bible.txt | morpho_analyse
</PRE>
<P>
&gt; i lib/resource/french/VerbsFre.gf
&gt; morpho_quiz -cat=V
</P>
<P>
Welcome to GF Morphology Quiz.
...
</P>
<P>
réapparaître : VFin VCondit Pl P2
réapparaitriez
&gt; No, not réapparaitriez, but
réapparaîtriez
Score 0/1
In the same way as translation exercises, morphological exercises can
be generated, by the command <CODE>morpho_quiz = mq</CODE>. Usually,
the category is set to be something else than <CODE>S</CODE>. For instance,
</P>
<PRE>
Finally, a list of morphological exercises and save it in a
file for later use, by the command ``morpho_list = ml``
&gt; i lib/resource/french/VerbsFre.gf
&gt; morpho_quiz -cat=V
Welcome to GF Morphology Quiz.
...
réapparaître : VFin VCondit Pl P2
réapparaitriez
&gt; No, not réapparaitriez, but
réapparaîtriez
Score 0/1
</PRE>
<P>
&gt; morpho_list -number=25 -cat=V
Finally, a list of morphological exercises and save it in a
file for later use, by the command <CODE>morpho_list = ml</CODE>
</P>
<PRE>
The ``number`` flag gives the number of exercises generated.
%--!
===Discontinuous constituents===
A linearization type may contain more strings than one.
An example of where this is useful are English particle
verbs, such as //switch off//. The linearization of
a sentence may place the object between the verb and the particle:
//he switched it off//.
The first of the following judgements defines transitive verbs as
**discontinuous constituents**, i.e. as having a linearization
type with two strings and not just one. The second judgement
shows how the constituents are separated by the object in complementization.
&gt; morpho_list -number=25 -cat=V
</PRE>
<P>
lincat TV = {s : Number =&gt; Str ; s2 : Str} ;
lin ComplTV tv obj = {s = \\n =&gt; tv.s ! n ++ obj.s ++ tv.s2} ;
The <CODE>number</CODE> flag gives the number of exercises generated.
</P>
<A NAME="toc51"></A>
<H3>Discontinuous constituents</H3>
<P>
A linearization type may contain more strings than one.
An example of where this is useful are English particle
verbs, such as <I>switch off</I>. The linearization of
a sentence may place the object between the verb and the particle:
<I>he switched it off</I>.
</P>
<P>
The first of the following judgements defines transitive verbs as
<B>discontinuous constituents</B>, i.e. as having a linearization
type with two strings and not just one. The second judgement
shows how the constituents are separated by the object in complementization.
</P>
<PRE>
There is no restriction in the number of discontinuous constituents
(or other fields) a ``lincat`` may contain. The only condition is that
the fields must be of finite types, i.e. built from records, tables,
parameters, and ``Str``, and not functions. A mathematical result
about parsing in GF says that the worst-case complexity of parsing
increases with the number of discontinuous constituents. Moreover,
the parsing and linearization commands only give reliable results
for categories whose linearization type has a unique ``Str`` valued
field labelled ``s``.
%--!
==More constructs for concrete syntax==
%--!
===Free variation===
Sometimes there are many alternative ways to define a concrete syntax.
For instance, the verb negation in English can be expressed both by
//does not// and //doesn't//. In linguistic terms, these expressions
are in **free variation**. The ``variants`` construct of GF can
be used to give a list of strings in free variation. For example,
lincat TV = {s : Number =&gt; Str ; s2 : Str} ;
lin ComplTV tv obj = {s = \\n =&gt; tv.s ! n ++ obj.s ++ tv.s2} ;
</PRE>
<P>
NegVerb verb = {s = variants {["does not"] ; "doesn't} ++ verb.s} ;
There is no restriction in the number of discontinuous constituents
(or other fields) a <CODE>lincat</CODE> may contain. The only condition is that
the fields must be of finite types, i.e. built from records, tables,
parameters, and <CODE>Str</CODE>, and not functions. A mathematical result
about parsing in GF says that the worst-case complexity of parsing
increases with the number of discontinuous constituents. Moreover,
the parsing and linearization commands only give reliable results
for categories whose linearization type has a unique <CODE>Str</CODE> valued
field labelled <CODE>s</CODE>.
</P>
<A NAME="toc52"></A>
<H2>More constructs for concrete syntax</H2>
<A NAME="toc53"></A>
<H3>Free variation</H3>
<P>
Sometimes there are many alternative ways to define a concrete syntax.
For instance, the verb negation in English can be expressed both by
<I>does not</I> and <I>doesn't</I>. In linguistic terms, these expressions
are in <B>free variation</B>. The <CODE>variants</CODE> construct of GF can
be used to give a list of strings in free variation. For example,
</P>
<PRE>
An empty variant list
NegVerb verb = {s = variants {["does not"] ; "doesn't} ++ verb.s} ;
</PRE>
<P>
variants {}
An empty variant list
</P>
<PRE>
can be used e.g. if a word lacks a certain form.
In general, ``variants`` should be used cautiously. It is not
recommended for modules aimed to be libraries, because the
user of the library has no way to choose among the variants.
Moreover, even though ``variants`` admits lists of any type,
its semantics for complex types can cause surprises.
===Record extension and subtyping===
Record types and records can be **extended** with new fields. For instance,
in German it is natural to see transitive verbs as verbs with a case.
The symbol ``**`` is used for both constructs.
variants {}
</PRE>
<P>
lincat TV = Verb ** {c : Case} ;
can be used e.g. if a word lacks a certain form.
</P>
<P>
lin Follow = regVerb "folgen" ** {c = Dative} ;
In general, <CODE>variants</CODE> should be used cautiously. It is not
recommended for modules aimed to be libraries, because the
user of the library has no way to choose among the variants.
Moreover, even though <CODE>variants</CODE> admits lists of any type,
its semantics for complex types can cause surprises.
</P>
<A NAME="toc54"></A>
<H3>Record extension and subtyping</H3>
<P>
Record types and records can be <B>extended</B> with new fields. For instance,
in German it is natural to see transitive verbs as verbs with a case.
The symbol <CODE>**</CODE> is used for both constructs.
</P>
<PRE>
To extend a record type or a record with a field whose label it
already has is a type error.
lincat TV = Verb ** {c : Case} ;
A record type //T// is a **subtype** of another one //R//, if //T// has
all the fields of //R// and possibly other fields. For instance,
an extension of a record type is always a subtype of it.
If //T// is a subtype of //R//, an object of //T// can be used whenever
an object of //R// is required. For instance, a transitive verb can
be used whenever a verb is required.
**Contravariance** means that a function taking an //R// as argument
can also be applied to any object of a subtype //T//.
===Tuples and product types===
Product types and tuples are syntactic sugar for record types and records:
lin Follow = regVerb "folgen" ** {c = Dative} ;
</PRE>
<P>
T1 * ... * Tn === {p1 : T1 ; ... ; pn : Tn}
&lt;t1, ..., tn&gt; === {p1 = T1 ; ... ; pn = Tn}
To extend a record type or a record with a field whose label it
already has is a type error.
</P>
<P>
A record type <I>T</I> is a <B>subtype</B> of another one <I>R</I>, if <I>T</I> has
all the fields of <I>R</I> and possibly other fields. For instance,
an extension of a record type is always a subtype of it.
</P>
<P>
If <I>T</I> is a subtype of <I>R</I>, an object of <I>T</I> can be used whenever
an object of <I>R</I> is required. For instance, a transitive verb can
be used whenever a verb is required.
</P>
<P>
<B>Contravariance</B> means that a function taking an <I>R</I> as argument
can also be applied to any object of a subtype <I>T</I>.
</P>
<A NAME="toc55"></A>
<H3>Tuples and product types</H3>
<P>
Product types and tuples are syntactic sugar for record types and records:
</P>
<PRE>
Thus the labels ``p1, p2,...``` are hard-coded.
%--!
===Prefix-dependent choices===
The construct exemplified in
T1 * ... * Tn === {p1 : T1 ; ... ; pn : Tn}
&lt;t1, ..., tn&gt; === {p1 = T1 ; ... ; pn = Tn}
</PRE>
<P>
oper artIndef : Str =
pre {"a" ; "an" / strs {"a" ; "e" ; "i" ; "o"}} ;
Thus the labels <CODE>p1, p2,...`</CODE> are hard-coded.
</P>
<A NAME="toc56"></A>
<H3>Prefix-dependent choices</H3>
<P>
The construct exemplified in
</P>
<PRE>
Thus
oper artIndef : Str =
pre {"a" ; "an" / strs {"a" ; "e" ; "i" ; "o"}} ;
</PRE>
<P>
artIndef ++ "cheese" ---&gt; "a" ++ "cheese"
artIndef ++ "apple" ---&gt; "an" ++ "cheese"
Thus
</P>
<PRE>
This very example does not work in all situations: the prefix
//u// has no general rules, and some problematic words are
//euphemism, one-eyed, n-gram//. It is possible to write
artIndef ++ "cheese" ---&gt; "a" ++ "cheese"
artIndef ++ "apple" ---&gt; "an" ++ "cheese"
</PRE>
<P>
oper artIndef : Str =
pre {"a" ;
"a" / strs {"eu" ; "one"} ;
"an" / strs {"a" ; "e" ; "i" ; "o" ; "n-"}
} ;
This very example does not work in all situations: the prefix
<I>u</I> has no general rules, and some problematic words are
<I>euphemism, one-eyed, n-gram</I>. It is possible to write
</P>
<PRE>
===Predefined types and operations===
GF has the following predefined categories in abstract syntax:
oper artIndef : Str =
pre {"a" ;
"a" / strs {"eu" ; "one"} ;
"an" / strs {"a" ; "e" ; "i" ; "o" ; "n-"}
} ;
</PRE>
<P></P>
<A NAME="toc57"></A>
<H3>Predefined types and operations</H3>
<P>
cat Int ; -- integers, e.g. 0, 5, 743145151019
cat Float ; -- floats, e.g. 0.0, 3.1415926
cat String ; -- strings, e.g. "", "foo", "123"
GF has the following predefined categories in abstract syntax:
</P>
<PRE>
The objects of each of these categories are **literals**
as indicated in the comments above. No ``fun`` definition
can have a predefined category as its value type, but
they can be used as arguments. For example:
cat Int ; -- integers, e.g. 0, 5, 743145151019
cat Float ; -- floats, e.g. 0.0, 3.1415926
cat String ; -- strings, e.g. "", "foo", "123"
</PRE>
<P>
fun StreetAddress : Int -&gt; String -&gt; Address ;
lin StreetAddress number street = {s = number.s ++ street.s} ;
</P>
<P>
-- e.g. (StreetAddress 10 "Downing Street") : Address
The objects of each of these categories are <B>literals</B>
as indicated in the comments above. No <CODE>fun</CODE> definition
can have a predefined category as its value type, but
they can be used as arguments. For example:
</P>
<PRE>
fun StreetAddress : Int -&gt; String -&gt; Address ;
lin StreetAddress number street = {s = number.s ++ street.s} ;
%--!
==More features of the module system==
===Resource grammars and their reuse===
See
[resource library documentation ../../lib/resource/doc/gf-resource.html]
===Interfaces, instances, and functors===
See an
[example built this way ../../examples/mp3/mp3-resource.html]
===Restricted inheritance and qualified opening===
==More concepts of abstract syntax==
===Dependent types===
===Higher-order abstract syntax===
===Semantic definitions===
==Transfer modules==
Transfer means noncompositional tree-transforming operations.
The command ``apply_transfer = at`` is typically used in a pipe:
-- e.g. (StreetAddress 10 "Downing Street") : Address
</PRE>
<P></P>
<A NAME="toc58"></A>
<H2>More features of the module system</H2>
<A NAME="toc59"></A>
<H3>Resource grammars and their reuse</H3>
<P>
&gt; p "John walks and John runs" | apply_transfer aggregate | l
John walks and runs
See
<A HREF="../../lib/resource/doc/gf-resource.html">resource library documentation</A>
</P>
<A NAME="toc60"></A>
<H3>Interfaces, instances, and functors</H3>
<P>
See an
<A HREF="../../examples/mp3/mp3-resource.html">example built this way</A>
</P>
<A NAME="toc61"></A>
<H3>Restricted inheritance and qualified opening</H3>
<A NAME="toc62"></A>
<H2>More concepts of abstract syntax</H2>
<A NAME="toc63"></A>
<H3>Dependent types</H3>
<A NAME="toc64"></A>
<H3>Higher-order abstract syntax</H3>
<A NAME="toc65"></A>
<H3>Semantic definitions</H3>
<A NAME="toc66"></A>
<H2>Transfer modules</H2>
<P>
Transfer means noncompositional tree-transforming operations.
The command <CODE>apply_transfer = at</CODE> is typically used in a pipe:
</P>
<PRE>
See the
[sources ../../transfer/examples/aggregation] of this example.
See the
[transfer language documentation ../transfer.html]
for more information.
==Practical issues==
===Lexers and unlexers===
Lexers and unlexers can be chosen from
a list of predefined ones, using the flags``-lexer`` and `` -unlexer`` either
in the grammar file or on the GF command line.
Given by ``help -lexer``, ``help -unlexer``:
&gt; p "John walks and John runs" | apply_transfer aggregate | l
John walks and runs
</PRE>
<P>
The default is words.
-lexer=words tokens are separated by spaces or newlines
-lexer=literals like words, but GF integer and string literals recognized
-lexer=vars like words, but "x","x_...","$...$" as vars, "?..." as meta
-lexer=chars each character is a token
-lexer=code use Haskell's lex
-lexer=codevars like code, but treat unknown words as variables, ?? as meta
-lexer=text with conventions on punctuation and capital letters
-lexer=codelit like code, but treat unknown words as string literals
-lexer=textlit like text, but treat unknown words as string literals
-lexer=codeC use a C-like lexer
-lexer=ignore like literals, but ignore unknown words
-lexer=subseqs like ignore, but then try all subsequences from longest
See the
<A HREF="../../transfer/examples/aggregation">sources</A> of this example.
</P>
<P>
The default is unwords.
-unlexer=unwords space-separated token list (like unwords)
-unlexer=text format as text: punctuation, capitals, paragraph &lt;p&gt;
-unlexer=code format as code (spacing, indentation)
-unlexer=textlit like text, but remove string literal quotes
-unlexer=codelit like code, but remove string literal quotes
-unlexer=concat remove all spaces
-unlexer=bind like identity, but bind at "&amp;+"
See the
<A HREF="../transfer.html">transfer language documentation</A>
for more information.
</P>
<A NAME="toc67"></A>
<H2>Practical issues</H2>
<A NAME="toc68"></A>
<H3>Lexers and unlexers</H3>
<P>
Lexers and unlexers can be chosen from
a list of predefined ones, using the flags<CODE>-lexer</CODE> and `` -unlexer`` either
in the grammar file or on the GF command line.
</P>
<P>
Given by <CODE>help -lexer</CODE>, <CODE>help -unlexer</CODE>:
</P>
<PRE>
The default is words.
-lexer=words tokens are separated by spaces or newlines
-lexer=literals like words, but GF integer and string literals recognized
-lexer=vars like words, but "x","x_...","$...$" as vars, "?..." as meta
-lexer=chars each character is a token
-lexer=code use Haskell's lex
-lexer=codevars like code, but treat unknown words as variables, ?? as meta
-lexer=text with conventions on punctuation and capital letters
-lexer=codelit like code, but treat unknown words as string literals
-lexer=textlit like text, but treat unknown words as string literals
-lexer=codeC use a C-like lexer
-lexer=ignore like literals, but ignore unknown words
-lexer=subseqs like ignore, but then try all subsequences from longest
===Efficiency of grammars===
Issues:
- the choice of datastructures in ``lincat``s
- the value of the ``optimize`` flag
- parsing efficiency: ``-mcfg`` vs. others
===Speech input and output===
The``speak_aloud = sa`` command sends a string to the speech
synthesizer
[Flite http://www.speech.cs.cmu.edu/flite/doc/].
It is typically used via a pipe:
``` generate_random | linearize | speak_aloud
The result is only satisfactory for English.
The ``speech_input = si`` command receives a string from a
speech recognizer that requires the installation of
[ATK http://mi.eng.cam.ac.uk/~sjy/software.htm].
It is typically used to pipe input to a parser:
``` speech_input -tr | parse
The method words only for grammars of English.
Both Flite and ATK are freely available through the links
above, but they are not distributed together with GF.
===Multilingual syntax editor===
The
[Editor User Manual http://www.cs.chalmers.se/~aarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm]
describes the use of the editor, which works for any multilingual GF grammar.
Here is a snapshot of the editor:
[../quick-editor.gif]
The grammars of the snapshot are from the
[Letter grammar package http://www.cs.chalmers.se/~aarne/GF/examples/letter].
===Interactive Development Environment (IDE)===
Forthcoming.
===Communicating with GF===
Other processes can communicate with the GF command interpreter,
and also with the GF syntax editor.
===Embedded grammars in Haskell, Java, and Prolog===
GF grammars can be used as parts of programs written in the
following languages. The links give more documentation.
- [Java http://www.cs.chalmers.se/~bringert/gf/gf-java.html]
- [Haskell http://www.cs.chalmers.se/~aarne/GF/src/GF/Embed/EmbedAPI.hs]
- [Prolog http://www.cs.chalmers.se/~peb/software.html]
===Alternative input and output grammar formats===
A summary is given in the following chart of GF grammar compiler phases:
[../gf-compiler.png]
==Case studies==
===Interfacing formal and natural languages===
[Formal and Informal Software Specifications http://www.cs.chalmers.se/~krijo/thesis/thesisA4.pdf],
PhD Thesis by
[Kristofer Johannisson http://www.cs.chalmers.se/~krijo], is an extensive example of this.
The system is based on a multilingual grammar relating the formal language OCL with
English and German.
A simpler example will be explained here.
The default is unwords.
-unlexer=unwords space-separated token list (like unwords)
-unlexer=text format as text: punctuation, capitals, paragraph &lt;p&gt;
-unlexer=code format as code (spacing, indentation)
-unlexer=textlit like text, but remove string literal quotes
-unlexer=codelit like code, but remove string literal quotes
-unlexer=concat remove all spaces
-unlexer=bind like identity, but bind at "&amp;+"
</PRE>
<P></P>
<A NAME="toc69"></A>
<H3>Efficiency of grammars</H3>
<P>
Issues:
</P>
<UL>
<LI>the choice of datastructures in <CODE>lincat</CODE>s
<LI>the value of the <CODE>optimize</CODE> flag
<LI>parsing efficiency: <CODE>-mcfg</CODE> vs. others
</UL>
<A NAME="toc70"></A>
<H3>Speech input and output</H3>
<P>
The<CODE>speak_aloud = sa</CODE> command sends a string to the speech
synthesizer
<A HREF="http://www.speech.cs.cmu.edu/flite/doc/">Flite</A>.
It is typically used via a pipe:
</P>
<PRE>
generate_random | linearize | speak_aloud
</PRE>
<P>
The result is only satisfactory for English.
</P>
<P>
The <CODE>speech_input = si</CODE> command receives a string from a
speech recognizer that requires the installation of
<A HREF="http://mi.eng.cam.ac.uk/~sjy/software.htm">ATK</A>.
It is typically used to pipe input to a parser:
</P>
<PRE>
speech_input -tr | parse
</PRE>
<P>
The method words only for grammars of English.
</P>
<P>
Both Flite and ATK are freely available through the links
above, but they are not distributed together with GF.
</P>
<A NAME="toc71"></A>
<H3>Multilingual syntax editor</H3>
<P>
The
<A HREF="http://www.cs.chalmers.se/~aarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm">Editor User Manual</A>
describes the use of the editor, which works for any multilingual GF grammar.
</P>
<P>
Here is a snapshot of the editor:
</P>
<P>
<IMG ALIGN="middle" SRC="../quick-editor.gif" BORDER="0" ALT="">
</P>
<P>
The grammars of the snapshot are from the
<A HREF="http://www.cs.chalmers.se/~aarne/GF/examples/letter">Letter grammar package</A>.
</P>
<A NAME="toc72"></A>
<H3>Interactive Development Environment (IDE)</H3>
<P>
Forthcoming.
</P>
<A NAME="toc73"></A>
<H3>Communicating with GF</H3>
<P>
Other processes can communicate with the GF command interpreter,
and also with the GF syntax editor.
</P>
<A NAME="toc74"></A>
<H3>Embedded grammars in Haskell, Java, and Prolog</H3>
<P>
GF grammars can be used as parts of programs written in the
following languages. The links give more documentation.
</P>
<UL>
<LI><A HREF="http://www.cs.chalmers.se/~bringert/gf/gf-java.html">Java</A>
<LI><A HREF="http://www.cs.chalmers.se/~aarne/GF/src/GF/Embed/EmbedAPI.hs">Haskell</A>
<LI><A HREF="http://www.cs.chalmers.se/~peb/software.html">Prolog</A>
</UL>
<A NAME="toc75"></A>
<H3>Alternative input and output grammar formats</H3>
<P>
A summary is given in the following chart of GF grammar compiler phases:
<IMG ALIGN="middle" SRC="../gf-compiler.png" BORDER="0" ALT="">
</P>
<A NAME="toc76"></A>
<H2>Case studies</H2>
<A NAME="toc77"></A>
<H3>Interfacing formal and natural languages</H3>
<P>
<A HREF="http://www.cs.chalmers.se/~krijo/thesis/thesisA4.pdf">Formal and Informal Software Specifications</A>,
PhD Thesis by
<A HREF="http://www.cs.chalmers.se/~krijo">Kristofer Johannisson</A>, is an extensive example of this.
The system is based on a multilingual grammar relating the formal language OCL with
English and German.
</P>
<P>
A simpler example will be explained here.
</P>
<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
<!-- cmdline: txt2tags -\-toc gf-tutorial2.txt -->

View File

@@ -1345,7 +1345,7 @@ concrete FoodsEng of Foods = open Prelude, MorphoEng in {
} ;
}
```
```