forked from GitHub/gf-core
864 lines
29 KiB
HTML
864 lines
29 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
|
|
<TITLE>Demonstrative Expressions and Multimodal Grammars</TITLE>
|
|
</HEAD><BODY BGCOLOR="white" TEXT="black">
|
|
<P ALIGN="center"><CENTER><H1>Demonstrative Expressions and Multimodal Grammars</H1>
|
|
<FONT SIZE="4">
|
|
<I>Author: Aarne Ranta <aarne (at) cs.chalmers.se></I><BR>
|
|
Last update: Mon Jan 9 20:29:45 2006
|
|
</FONT></CENTER>
|
|
|
|
<P></P>
|
|
<HR NOSHADE SIZE=1>
|
|
<P></P>
|
|
<UL>
|
|
<LI><A HREF="#toc1">Abstract</A>
|
|
<LI><A HREF="#toc2">Multimodal grammars</A>
|
|
<UL>
|
|
<LI><A HREF="#toc3">Representing demonstratives in semantics and grammar</A>
|
|
<LI><A HREF="#toc4">Asynchronous syntax in GF</A>
|
|
<LI><A HREF="#toc5">Example multimodal grammar: abstract syntax</A>
|
|
<LI><A HREF="#toc6">Digression: discontinuous constituents</A>
|
|
<LI><A HREF="#toc7">From grammars to dialogue systems</A>
|
|
</UL>
|
|
<LI><A HREF="#toc8">Adding multimodality to a unimodal grammar</A>
|
|
<UL>
|
|
<LI><A HREF="#toc9">The multimodal conversion</A>
|
|
<LI><A HREF="#toc10">An example of the conversion</A>
|
|
<LI><A HREF="#toc11">Multimodal conversion combinators</A>
|
|
</UL>
|
|
<LI><A HREF="#toc12">Multimodal resource grammars</A>
|
|
<UL>
|
|
<LI><A HREF="#toc13">Resource grammar API</A>
|
|
<LI><A HREF="#toc14">Multimodal API: functions for building demonstratives</A>
|
|
<LI><A HREF="#toc15">Multimodal API: functions for building sentences and phrases</A>
|
|
<LI><A HREF="#toc16">Language-independent implementation: examples</A>
|
|
<LI><A HREF="#toc17">Multimodal API: interface to unimodal expressions</A>
|
|
<LI><A HREF="#toc18">Instantiating multimodality to different languages</A>
|
|
<LI><A HREF="#toc19">Language-independent reimplementation of TramDemo</A>
|
|
<LI><A HREF="#toc20">The order problem</A>
|
|
<LI><A HREF="#toc21">A recipe for using the resource library</A>
|
|
</UL>
|
|
</UL>
|
|
|
|
<P></P>
|
|
<HR NOSHADE SIZE=1>
|
|
<P></P>
|
|
<A NAME="toc1"></A>
|
|
<H2>Abstract</H2>
|
|
<P>
|
|
This document shows a method to write grammars
|
|
in which spoken utterances are accompanied by
|
|
pointing gestures. A computer application of such
|
|
grammars are <B>multimodal dialogue systems</B>, in
|
|
which the pointing gestures are performed by
|
|
mouse clicks and movements.
|
|
</P>
|
|
<P>
|
|
After an introduction to the notions of
|
|
<B>demonstratives</B> and <B>integrated multimodality</B>,
|
|
we will show by a concrete example
|
|
how multimodal grammars can be written in GF
|
|
and how they can be used in dialogue systems.
|
|
The explanation is given in three stages:
|
|
</P>
|
|
<OL>
|
|
<LI>How to write a multimodal grammar by hand.
|
|
<LI>How to add multimodality to a unimodal grammar.
|
|
<LI>How to use a multimodal resource grammar.
|
|
</OL>
|
|
|
|
<A NAME="toc2"></A>
|
|
<H2>Multimodal grammars</H2>
|
|
<P>
|
|
<B>Demonstrative expressions</B> are an old idea. Such
|
|
expressions get their meaning from the context.
|
|
</P>
|
|
<BLOCKQUOTE>
|
|
<I>This train</I> is faster than <I>that airplane</I>.
|
|
</BLOCKQUOTE>
|
|
<P></P>
|
|
<BLOCKQUOTE>
|
|
I want to go from <I>this place</I> to <I>this place</I>.
|
|
</BLOCKQUOTE>
|
|
<P></P>
|
|
<P>
|
|
In particular, as in these examples, the meaning
|
|
can be obtained from accompanying pointing gestures.
|
|
</P>
|
|
<P>
|
|
Thus the meaning-bearing unit is neither the words nor the
|
|
gestures alone, but their combination. Demonstratives
|
|
thus provide an example of <B>integrated multimodality</B>,
|
|
as opposed to parallel multimodality. In parallel
|
|
multimodality, speech and other modes of communication
|
|
are just alternative ways to convey the same information.
|
|
</P>
|
|
<A NAME="toc3"></A>
|
|
<H3>Representing demonstratives in semantics and grammar</H3>
|
|
<P>
|
|
When formalizing the semantics of demonstratives, we can combine syntax with coordinates:
|
|
</P>
|
|
<BLOCKQUOTE>
|
|
I want to go from this place to this place
|
|
</BLOCKQUOTE>
|
|
<P></P>
|
|
<P>
|
|
is interpreted as something like
|
|
</P>
|
|
<PRE>
|
|
want(I, go, this(place,(123,45)), this(place,(98,10)))
|
|
</PRE>
|
|
<P>
|
|
Now, the same semantic value can be given in many ways, by performing
|
|
the clicks at different points of time in relation to the speech:
|
|
</P>
|
|
<BLOCKQUOTE>
|
|
I want to go from this place CLICK(123,45) to this place CLICK(98,10)
|
|
</BLOCKQUOTE>
|
|
<P></P>
|
|
<BLOCKQUOTE>
|
|
I want to go from this place to this place CLICK(123,45) CLICK(98,10)
|
|
</BLOCKQUOTE>
|
|
<P></P>
|
|
<BLOCKQUOTE>
|
|
CLICK(123,45) CLICK(98,10) I want to go from this place to this place
|
|
</BLOCKQUOTE>
|
|
<P></P>
|
|
<P>
|
|
How do we build the value compositionally in parsing?
|
|
Traditional parsing is sequential: its input is a string of tokens.
|
|
It works for demonstratives only if the pointing is adjacent to
|
|
the spoken expression. In the actual input, the demonstrative word
|
|
can be separated from the accompanying click by other words. The two
|
|
can also be simultaneous.
|
|
</P>
|
|
<A NAME="toc4"></A>
|
|
<H3>Asynchronous syntax in GF</H3>
|
|
<P>
|
|
What we need is a notion of <B>asynchronous parsing</B>, as opposed to
|
|
sequential parsing (where demonstrative words and clicks must be
|
|
adjacent).
|
|
</P>
|
|
<P>
|
|
We can implement asynchronous parsin in GF by exploiting the generality
|
|
of <B>linearization types</B>. A linearization type is the type of
|
|
the <B>concrete syntax objects</B> assigned to semantic values.
|
|
What a GF grammar defines is a relation
|
|
</P>
|
|
<PRE>
|
|
abstract syntax trees <---> concrete syntax objects
|
|
</PRE>
|
|
<P>
|
|
When modelling context-free grammar in GF,
|
|
the concrete syntax objects are just strings.
|
|
But they can be more structured objects as well - in general, they are
|
|
<B>records</B> of different kinds of objects. For example,
|
|
a demonstrative expression can be linearized into a record of two strings.
|
|
</P>
|
|
<PRE>
|
|
{s = "this place" ;
|
|
this place (coord 123 45) <---> p = "(123,45)"
|
|
}
|
|
</PRE>
|
|
<P>
|
|
The record
|
|
</P>
|
|
<PRE>
|
|
{s = "I want to go from this place to this place" ;
|
|
p = "(123,45) (98,10"
|
|
}
|
|
</PRE>
|
|
<P>
|
|
represents any combination of the sentence and the clicks, as long
|
|
as the clicks appear in this order.
|
|
</P>
|
|
<A NAME="toc5"></A>
|
|
<H3>Example multimodal grammar: abstract syntax</H3>
|
|
<P>
|
|
A simple example of a multimodal GF grammar is the one called
|
|
the Tram Demo grammar. It was written by Björn Bringert within
|
|
the TALK project as a part of a dialogue system that
|
|
deals with queries about tram timetables. The system interprets
|
|
a speech input in combination with mouse clicks on a digital map.
|
|
</P>
|
|
<P>
|
|
The abstract syntax of (a minimal fragment of) the Tram Demo
|
|
grammar is
|
|
</P>
|
|
<PRE>
|
|
cat
|
|
Input, Dep, Dest, Click ;
|
|
fun
|
|
GoFromTo : Dep -> Dest -> Input ; -- "I want to go from x to y"
|
|
DepHere : Click -> Dep ; -- "from here" with click
|
|
DestHere : Click -> Dest ; -- "to here" with click
|
|
|
|
CCoord : Int -> Int -> Click ; -- click coordinates
|
|
</PRE>
|
|
<P>
|
|
An English concrete syntax of the grammar is
|
|
</P>
|
|
<PRE>
|
|
lincat
|
|
Input, Dep, Dest = {s : Str ; p : Str} ;
|
|
Click = {p : Str} ;
|
|
|
|
lin
|
|
GoFromTo x y = {s = ["I want to go"] ++ x.s ++ y.s ; p = x.p ++ y.p} ;
|
|
DepHere c = {s = ["from here"] ; p = c.p} ;
|
|
DestHere c = {s = ["to here"] ; p = c.p} ;
|
|
|
|
CCoord x y = {p = "(" ++ x.s ++ "," ++ y.s ++ ")"} ;
|
|
</PRE>
|
|
<P>
|
|
When the grammar is used in the actual system, standard parsing methods
|
|
are used for interpreting the integrated speech and click input.
|
|
Parsing appears on two levels: the speech input parsing
|
|
performed by the Nuance speech recognition program (without the clicks),
|
|
and the semantics-yielding parser sending input to the dialogue manager.
|
|
The latter parser just attaches the clicks to the speech input. The order
|
|
of the clicks is preserved, and the parser can hence associate each of
|
|
the clicks with proper demonstratives. Here is the grammar used in the
|
|
two parsing phases.
|
|
</P>
|
|
<PRE>
|
|
cat
|
|
Query, -- whole content
|
|
Speech ; -- speech only
|
|
fun
|
|
QueryInput : Input -> Query ; -- the whole content shown
|
|
SpeechInput : Input -> Speech ; -- only the speech shown
|
|
|
|
lincat
|
|
Query, Speech = {s : Str} ;
|
|
lin
|
|
QueryInput i = {s = i.s ++ ";" ++ i.p} ;
|
|
SpeechInput i = {s = i.s} ;
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc6"></A>
|
|
<H3>Digression: discontinuous constituents</H3>
|
|
<P>
|
|
The GF representation of integrated multimodality is
|
|
similar to the representation of <B>discontinous constituents</B>.
|
|
For instance, assume <I>has arrived</I> is a verb phrase in English,
|
|
which can be used both in declarative sentences and questions,
|
|
</P>
|
|
<BLOCKQUOTE>
|
|
she <I>has arrived</I>
|
|
</BLOCKQUOTE>
|
|
<P></P>
|
|
<BLOCKQUOTE>
|
|
<I>has</I> she <I>arrived</I>
|
|
</BLOCKQUOTE>
|
|
<P></P>
|
|
<P>
|
|
In the question, the two words are separated from each other. If
|
|
<I>has arrived</I> is a constituent of the question, it is thus discontinuous.
|
|
To represent such constituents in GF, records can be used:
|
|
we split verb phrases (<CODE>VP</CODE>) into a finite and infinitive part.
|
|
</P>
|
|
<PRE>
|
|
lincat VP = {fin, inf : Str} ;
|
|
|
|
lin Indic np vp = {s = np.s ++ vp.fin ++ vp.inf} ;
|
|
lin Quest np vp = {s = vp.fin ++ np.s ++ vp.inf} ;
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc7"></A>
|
|
<H3>From grammars to dialogue systems</H3>
|
|
<P>
|
|
The general recipe for using GF when building dialogue systems
|
|
is to write a grammar with the following components:
|
|
</P>
|
|
<UL>
|
|
<LI>The abstract syntax defines the semantics (the "ontology")
|
|
of the domain of the system.
|
|
<LI>The concrete syntaxes define alternative modes of input and output.
|
|
</UL>
|
|
|
|
<P>
|
|
The engineering advantages of this approach have to do partly with
|
|
the declarativity of the description, partly with the tools provided
|
|
by GF to derive different components of the system:
|
|
</P>
|
|
<UL>
|
|
<LI>The type checker guarantees that all the input and output
|
|
modes match with the ontology.
|
|
<LI>The grammar compiler generates parsers for each input grammar
|
|
and generators for each output grammar.
|
|
<LI>Translators between GF's abstract syntax and other ontology
|
|
description languages enable communication with different
|
|
kinds of dialogue managers and cover e.g. Prolog terms and XML objects.
|
|
<LI>Translators from GF's concrete syntax to speech recognition formats
|
|
make it possible to generate e.g. Nuance grammars and ATK language
|
|
models.
|
|
</UL>
|
|
|
|
<P>
|
|
An example of this process is Björn Bringert's TramDemo.
|
|
More recently, grammars have been integrated to the GoDiS dialogue
|
|
manager by Prolog representations of abstract syntax.
|
|
</P>
|
|
<A NAME="toc8"></A>
|
|
<H2>Adding multimodality to a unimodal grammar</H2>
|
|
<P>
|
|
This section gives a recipe for making any unimodal grammar
|
|
multimodal, by adding pointing gestures to chosen expressions. The recipe
|
|
guarantees that the resulting grammar remains semantically well-formed,
|
|
i.e. type correct.
|
|
</P>
|
|
<A NAME="toc9"></A>
|
|
<H3>The multimodal conversion</H3>
|
|
<P>
|
|
The <B>multimodal conversion</B> of a grammar consists of seven
|
|
steps, of which the first is always the same, the second
|
|
involves a decision, and the rest are derivative:
|
|
</P>
|
|
<OL>
|
|
<LI>Add the category <CODE>`Point`</CODE> with a standard linearization type.
|
|
<PRE>
|
|
cat Point ;
|
|
lincat Point = {point : Str} ;
|
|
</PRE>
|
|
<LI>(Decision) Decide which constructors are demonstrative, i.e. take
|
|
a pointing gesture as an argument. Add a <CODE>Point`</CODE> as their last argument.
|
|
The new type signatures for such constructors <I>d</I> have the form
|
|
<PRE>
|
|
fun d : ... -> Point -> D
|
|
</PRE>
|
|
<LI>(Derivative) Add a <CODE>point</CODE> field to the linearization type <I>L</I> of any
|
|
demonstrative category <I>D</I>, i.e. a category that has at least one demonstrative
|
|
constructor:
|
|
<PRE>
|
|
lincat D = L ** {point : Str} ;
|
|
</PRE>
|
|
<LI>(Derivative) If some other category <I>C</I> has a constructor <I>d</I> that takes
|
|
demonstratives as arguments, make it demonstrative by adding a <I>point</I> field
|
|
to its linearization type.
|
|
<LI>(Derivative) Store the <CODE>point</CODE> field in the linearization <I>t</I> of any
|
|
constructor <I>d</I> that has been made demonstrative:
|
|
<PRE>
|
|
lin d x1 ... xn p = t x1 ... xn ** {point = p.point} ;
|
|
</PRE>
|
|
<LI>(Derivative) For each constructor <I>f</I> that takes demonstratives <I>D_1,...,D_n</I>
|
|
as arguments, collect the <I>point</I> fields of the arguments in the <I>point</I>
|
|
field of the value:
|
|
<PRE>
|
|
lin f x_1 ... x_m =
|
|
t x_1 ... x_m ** {point = x_d1.point ++ ... ++ x_dn.point} ;
|
|
</PRE>
|
|
Make sure that the pointings <CODE>x_d1.point ... x_dn.point</CODE> are concatenated
|
|
in the same order as the arguments appear in the <I>linearization</I> <I>t</I>,
|
|
which is not necessarily the same as the abstract argument order.
|
|
<LI>(Derivative) To preserve type correctness, add an empty
|
|
<CODE>point</CODE> field to the linearization <I>t</I> of any
|
|
constructor <I>c</I> of a demonstrative category:
|
|
<PRE>
|
|
lin c x1 ... xn = t x1 ... xn ** {point = []} ;
|
|
</PRE>
|
|
</OL>
|
|
|
|
<A NAME="toc10"></A>
|
|
<H3>An example of the conversion</H3>
|
|
<P>
|
|
Start with a Tram Demo grammar with no demonstratives, but just
|
|
tram stop names and the indexical <I>here</I> (interpreted as e.g. the user's
|
|
standing place).
|
|
</P>
|
|
<PRE>
|
|
cat
|
|
Input, Dep, Dest, Name ;
|
|
fun
|
|
GoFromTo : Dep -> Dest -> Input ;
|
|
DepHere : Dep ;
|
|
DestHere : Dest ;
|
|
DepName : Name -> Dep ;
|
|
DestName : Name -> Dest ;
|
|
|
|
Almedal : Name ;
|
|
</PRE>
|
|
<P>
|
|
A unimodal English concrete syntax of the grammar is
|
|
</P>
|
|
<PRE>
|
|
lincat
|
|
Input, Dep, Dest, Name = {s : Str} ;
|
|
|
|
lin
|
|
GoFromTo x y = {s = ["I want to go"] ++ x.s ++ y.s} ;
|
|
DepHere = {s = ["from here"]} ;
|
|
DestHere = {s = ["to here"]} ;
|
|
DepName n = {s = ["from"] ++ n.s} ;
|
|
DestName n = {s = ["to"] ++ n.s} ;
|
|
|
|
Almedal = {s = "Almedal"} ;
|
|
</PRE>
|
|
<P>
|
|
Let us follow the steps of the recipe.
|
|
</P>
|
|
<OL>
|
|
<LI>We add the category <CODE>Point</CODE> and its linearization type.
|
|
<LI>We decide that <CODE>DepHere</CODE> and <CODE>DestHere</CODE> involve a pointing gesture.
|
|
<LI>We add <CODE>point</CODE> to the linearization types of <CODE>Dep</CODE> and <CODE>Dest</CODE>.
|
|
<LI>Therefore, also add <CODE>point</CODE> to <CODE>Input</CODE>. (But <CODE>Name</CODE> remains unimodal.)
|
|
<LI>Add <CODE>p.point</CODE> to the linearizations of <CODE>DepHere</CODE> and <CODE>DestHere</CODE>.
|
|
<LI>Concatenate the points of the arguments of <CODE>GoFromTo</CODE>.
|
|
<LI>Add an empty <CODE>point</CODE> to <CODE>DepName</CODE> and <CODE>DestName</CODE>.
|
|
</OL>
|
|
|
|
<P>
|
|
In the resulting grammar, one category is added and
|
|
two functions are changed in the abstract syntax (annotated by the step numbers):
|
|
</P>
|
|
<PRE>
|
|
cat
|
|
Point ; -- 1
|
|
fun
|
|
DepHere : Point -> Dep ; -- 2
|
|
DestHere : Point -> Dest ; -- 2
|
|
|
|
</PRE>
|
|
<P>
|
|
The concrete syntax in its entirety looks as follows
|
|
</P>
|
|
<PRE>
|
|
lincat
|
|
Dep, Dest = {s : Str ; point : Str} ; -- 3
|
|
Input = {s : Str ; point : Str} ; -- 4
|
|
Name = {s : Str} ;
|
|
Point = {point : Str} ; -- 1
|
|
lin
|
|
GoFromTo x y = {s = ["I want to go"] ++ x.s ++ y.s ; -- 6
|
|
point = x.point ++ y.point
|
|
} ;
|
|
DepHere p = {s = ["from here"] ; -- 5
|
|
point = p.point
|
|
} ;
|
|
DestHere p = {s = ["to here"] : -- 5
|
|
point = p.point
|
|
} ;
|
|
DepName n = {s = ["from"] ++ n.s ; -- 7
|
|
point = []
|
|
} ;
|
|
DestName n = {s = ["to"] ++ n.s ; -- 7
|
|
point = []
|
|
} ;
|
|
Almedal = {s = "Almedal"} ;
|
|
</PRE>
|
|
<P>
|
|
What we need in addition, to use the grammar in applications, are
|
|
</P>
|
|
<OL>
|
|
<LI>Constructors for <CODE>Point</CODE>, e.g. coordinate pairs.
|
|
<LI>Top-level categories, like <CODE>Query</CODE> and <CODE>Speech</CODE> in the original.
|
|
</OL>
|
|
|
|
<P>
|
|
But their proper place is probably in another grammar module, so that
|
|
the core Tram Demo grammar can be used in different systems e.g.
|
|
encoding clicks in different ways.
|
|
</P>
|
|
<A NAME="toc11"></A>
|
|
<H3>Multimodal conversion combinators</H3>
|
|
<P>
|
|
GF is a functional programming language, and we exploit this
|
|
by providing a set of combinators that makes the multimodal conversion easier
|
|
and clearer. We start with the type of sequences of pointing gestures.
|
|
</P>
|
|
<PRE>
|
|
Point : Type = {point : Str} ;
|
|
</PRE>
|
|
<P>
|
|
To make a record type multimodal is to extend it with <CODE>Point</CODE>.
|
|
The record extension operator <CODE>**</CODE> is needed here.
|
|
</P>
|
|
<PRE>
|
|
Dem : Type -> Type = \t -> t ** Point ;
|
|
</PRE>
|
|
<P>
|
|
To construct, use, and concatenate pointings:
|
|
</P>
|
|
<PRE>
|
|
mkPoint : Str -> Point = \s -> {point = s} ;
|
|
|
|
noPoint : Point = mkPoint [] ;
|
|
|
|
point : Point -> Str = \p -> p.point ;
|
|
|
|
concatPoint : (x,y : Point) -> Point = \x,y ->
|
|
mkPoint (point x ++ point y) ;
|
|
</PRE>
|
|
<P>
|
|
Finally, to add pointing to a record, with the limiting case of no demonstrative needed.
|
|
</P>
|
|
<PRE>
|
|
mkDem : (t : Type) -> t -> Point -> Dem t = \_,x,s -> x ** s ;
|
|
|
|
nonDem : (t : Type) -> t -> Dem t = \t,x -> mkDem t x noPoint ;
|
|
</PRE>
|
|
<P>
|
|
Let us rewrite the Tram Demo grammar by using these combinators:
|
|
</P>
|
|
<PRE>
|
|
oper
|
|
SS : Type = {s : Str} ;
|
|
lincat
|
|
Input, Dep, Dest = Dem SS ;
|
|
Name = SS ;
|
|
|
|
lin
|
|
GoFromTo x y = {s = ["I want to go"] ++ x.s ++ y.s} **
|
|
concatPoint x y ;
|
|
DepHere = mkDem SS {s = ["from here"]} ;
|
|
DestHere = mkDem SS {s = ["to here"]} ;
|
|
DepName n = nonDem SS {s = ["from"] ++ n.s} ;
|
|
DestName n = nonDem SS {s = ["to"] ++ n.s} ;
|
|
|
|
Almedal = {s = "Almedal"} ;
|
|
</PRE>
|
|
<P>
|
|
The type synonym <CODE>SS</CODE> is introduced to make the combinator applications
|
|
concise. Notice the use of partial application in <CODE>DepHere</CODE> and
|
|
<CODE>DestHere</CODE>; an equivalent way to write is
|
|
</P>
|
|
<PRE>
|
|
DepHere p = mkDem SS {s = ["from here"]} p ;
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc12"></A>
|
|
<H2>Multimodal resource grammars</H2>
|
|
<P>
|
|
The main advantage of using GF when building dialogue systems is
|
|
that various components of the system
|
|
can be automatically generated from GF grammars.
|
|
Writing these grammars, however, can still be a considerable
|
|
task. A case in point are multilingual systems:
|
|
how to localize e.g. a system built in a car to
|
|
the languages of all those customers to whom the
|
|
car is sold? This problem has been the main focus of
|
|
GF for some years, and the solution on which most work has been
|
|
done is the development of <B>resource grammar libraries</B>.
|
|
These libraries work in the same way as program libraries
|
|
in software engineering, enabling a division of labour
|
|
between linguists and domain experts.
|
|
</P>
|
|
<P>
|
|
One of the goals in the resource grammars of different
|
|
languages has been to provide a <B>language-independent API</B>,
|
|
which makes the same resource grammar functions available for
|
|
different languages. For instance, the categories
|
|
<CODE>S</CODE>, <CODE>NP</CODE>, and <CODE>VP</CODE> are available in all of the
|
|
10 languages currently supported, and so is the function
|
|
</P>
|
|
<PRE>
|
|
PredVP : NP -> VP -> S
|
|
</PRE>
|
|
<P>
|
|
which corresponds to the rule <CODE>S -> NP VP</CODE> in phrase
|
|
structure grammar. However, there are several levels of abstraction
|
|
between the function <CODE>PredVP</CODE> and the phrase structure rule,
|
|
because the rule is implemented in so different ways in different
|
|
languages. In particular, discontinuous constituents are needed in
|
|
various degrees to make the rule work in different languages.
|
|
</P>
|
|
<P>
|
|
Now, dealing with discontinuous constituents is one of the demanding
|
|
aspects of multilingual grammar writing that the resource grammar
|
|
API is designed to hide. But the proposed treatment of integrated
|
|
multimodality is heavily dependent on similar things. What can we
|
|
do to make multimodal grammars easier to write (for different languages)?
|
|
There are two orthogonal answers:
|
|
</P>
|
|
<OL>
|
|
<LI>Use resource grammars to write a unimodal dialogue grammar and
|
|
then apply the multimodal
|
|
conversion to manually chosen parts.
|
|
<LI>Use <B>multimodal resource grammars</B> to derive multimodal
|
|
dialogue system grammars directly.
|
|
</OL>
|
|
|
|
<P>
|
|
The multimodal resource grammar library has been obtained from
|
|
the unimodal one by applying the multimodal conversion manually.
|
|
In addition, the API has been simplified
|
|
by leaving out structures needed in written technical documents
|
|
(the original application area of GF) but not in spoken dialogue.
|
|
</P>
|
|
<P>
|
|
In the following subsections, we will show a part of the
|
|
multimodal resource grammar API, limited to a fragment that
|
|
is needed to get the main ideas and to reimplement the
|
|
Tram Demo grammar. The reimplementation shows one more advantage
|
|
of the resource grammar approach: dialogue systems can be
|
|
automatically instantiated to different languages.
|
|
</P>
|
|
<A NAME="toc13"></A>
|
|
<H3>Resource grammar API</H3>
|
|
<P>
|
|
The resource grammar API has three main kinds of entries:
|
|
</P>
|
|
<OL>
|
|
<LI>Language-independent linguistic structures (``linguistic ontology''), e.g.
|
|
<PRE>
|
|
PredVP : NP -> VP -> S ; -- "Mary helps him"
|
|
</PRE>
|
|
<LI>Language-specific syntax extensions, e.g. Swedish and German fronting
|
|
topicalization
|
|
<PRE>
|
|
TopicObj : NP -> VP -> S ; -- "honom hjälper Mary"
|
|
</PRE>
|
|
<LI>Language-specific lexical constructors, e.g. Germanic <I>Ablaut</I> patterns
|
|
<PRE>
|
|
irregV : (sing,sang,sung : Str) -> V ;
|
|
</PRE>
|
|
</OL>
|
|
|
|
<P>
|
|
The first two kinds of entries are <CODE>cat</CODE> and <CODE>fun</CODE> definitions
|
|
in an abstract syntax. The multimodal, restricted API has
|
|
e.g. the following categories. Their names are obtained from
|
|
the corresponding unimodal categories by prefixing <CODE>M</CODE>.
|
|
</P>
|
|
<PRE>
|
|
MS ; -- multimodal sentence or question
|
|
MQS ; -- multimodal wh question
|
|
MImp ; -- multimodal imperative
|
|
MVP ; -- multimodal verb phrase
|
|
MNP ; -- multimodal (demonstrative) noun phrase
|
|
MAdv ; -- multimodal (demonstrative) adverbial
|
|
|
|
Point ; -- pointing gesture
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc14"></A>
|
|
<H3>Multimodal API: functions for building demonstratives</H3>
|
|
<P>
|
|
Demonstrative pronouns can be used both as noun phrases and
|
|
as determiners.
|
|
</P>
|
|
<PRE>
|
|
this_MNP : Point -> MNP ; -- this
|
|
thisDet_MNP : CN -> Point -> MNP ; -- this car
|
|
</PRE>
|
|
<P>
|
|
There are also demonstrative adverbs, and prepositions give
|
|
a productive way to build more adverbs.
|
|
</P>
|
|
<PRE>
|
|
here_MAdv : Point -> MAdv ; -- here
|
|
here7from_MAdv : Point -> MAdv ; -- from here
|
|
|
|
MPrepNP : Prep -> MNP -> MAdv ; -- in this car
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc15"></A>
|
|
<H3>Multimodal API: functions for building sentences and phrases</H3>
|
|
<P>
|
|
A handful of predication rules construct sentences, questions, and imperatives.
|
|
</P>
|
|
<PRE>
|
|
MPredVP : MNP -> MVP -> MS ; -- this plane flies here
|
|
MQPredVP : MNP -> MVP -> MQS ; -- does this plane fly here
|
|
MQuestVP : IP -> MVP -> MQS ; -- who flies here
|
|
MImpVP : MVP -> MImp ; -- fly here!
|
|
</PRE>
|
|
<P>
|
|
Verb phrases are constructed from verbs (inherited as such from
|
|
the unimodal API) by providing their complements.
|
|
</P>
|
|
<PRE>
|
|
MUseV : V -> MVP ; -- flies
|
|
MComplV2 : V2 -> MNP -> MVP ; -- takes this
|
|
MComplVV : VV -> MVP -> MVP ; -- wants to take this
|
|
</PRE>
|
|
<P>
|
|
A multimodal adverb can be attached to a verb phrase.
|
|
</P>
|
|
<PRE>
|
|
MAdvVP : MVP -> MAdv -> MVP ; -- flies here
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc16"></A>
|
|
<H3>Language-independent implementation: examples</H3>
|
|
<P>
|
|
The implementation makes heavy use of the multimodal conversion
|
|
combinators. It adds a <CODE>point</CODE> field to whatever the implementation of the unimodal
|
|
category is in any language. Thus, for example
|
|
</P>
|
|
<PRE>
|
|
lincat
|
|
MVP = Dem VP ;
|
|
MNP = Dem NP ;
|
|
MAdv = Dem Adv ;
|
|
|
|
lin
|
|
this_MNP = mkDem NP this_NP ;
|
|
-- i.e. this_MNP p = this_NP ** {point = p.point} ;
|
|
|
|
MComplV2 verb obj = mkDem VP (ComplV2 verb obj) obj ;
|
|
|
|
MAdvVP vp adv = mkDem VP (AdvVP vp adv) (concatPoint vp adv) ;
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc17"></A>
|
|
<H3>Multimodal API: interface to unimodal expressions</H3>
|
|
<P>
|
|
Using nondemonstrative expressions as demonstratives:
|
|
</P>
|
|
<PRE>
|
|
DemNP : NP -> MNP ;
|
|
DemAdv : Adv -> MAdv ;
|
|
</PRE>
|
|
<P>
|
|
Building top-level phrases:
|
|
</P>
|
|
<PRE>
|
|
PhrMS : Pol -> MS -> Phr ;
|
|
PhrMS : Pol -> MS -> Phr ;
|
|
PhrMQS : Pol -> MQS -> Phr ;
|
|
PhrMImp : Pol -> MImp -> Phr ;
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc18"></A>
|
|
<H3>Instantiating multimodality to different languages</H3>
|
|
<P>
|
|
The implementation above has only used the resource grammar API,
|
|
not the concrete implementations. The library <CODE>Demonstrative</CODE>
|
|
is a <B>parametrized module</B>, also called a <B>functor</B>, which
|
|
has the following structure
|
|
</P>
|
|
<PRE>
|
|
incomplete concrete DemonstrativeI of Demonstrative =
|
|
Cat, TenseX ** open Test, Structural in {
|
|
|
|
-- lincat and lin rules
|
|
|
|
}
|
|
</PRE>
|
|
<P>
|
|
It can be <B>instantiated</B> to different languages as follows.
|
|
</P>
|
|
<PRE>
|
|
concrete DemonstrativeEng of Demonstrative =
|
|
CatEng, TenseX ** DemonstrativeI with
|
|
(Test = TestEng),
|
|
(Structural = StructuralEng) ;
|
|
|
|
concrete DemonstrativeSwe of Demonstrative =
|
|
CatSwe, TenseX ** DemonstrativeI with
|
|
(Test = TestSwe),
|
|
(Structural = StructuralSwe) ;
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc19"></A>
|
|
<H3>Language-independent reimplementation of TramDemo</H3>
|
|
<P>
|
|
Again using the functor idea, we reimplement <CODE>TramDemo</CODE>
|
|
as follows:
|
|
</P>
|
|
<PRE>
|
|
incomplete concrete TramI of Tram = open Multimodal in {
|
|
|
|
lincat
|
|
Query = Phr ; Input = MS ;
|
|
Dep, Dest = MAdv ; Click = Point ;
|
|
lin
|
|
QInput = PhrMS PPos ;
|
|
|
|
GoFromTo x y =
|
|
MPredVP (DemNP (UsePron i_Pron))
|
|
(MAdvVP (MAdvVP (MComplVV want_VV (MUseV go_V)) x) y) ;
|
|
|
|
DepHere = here7from_MAdv ;
|
|
DestHere = here7to_MAdv ;
|
|
DepName s = MPrepNP from_Prep (DemNP (UsePN (SymbPN (MkSymb s)))) ;
|
|
DestName s = MPrepNP to_Prep (DemNP (UsePN (SymbPN (MkSymb s)))) ;
|
|
|
|
</PRE>
|
|
<P>
|
|
Then we can instantiate this to all languages for which
|
|
the <CODE>Multimodal</CODE> API has been implemented:
|
|
</P>
|
|
<PRE>
|
|
concrete TramEng of Tram = TramI with
|
|
(Multimodal = MultimodalEng) ;
|
|
|
|
concrete TramSwe of Tram = TramI with
|
|
(Multimodal = MultimodalSwe) ;
|
|
|
|
concrete TramFre of Tram = TramI with
|
|
(Multimodal = MultimodalFre) ;
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc20"></A>
|
|
<H3>The order problem</H3>
|
|
<P>
|
|
It was pointed out in the section on the multimodal conversion that
|
|
the concrete word order may be different from the abstract one,
|
|
and vary between different languages. For instance, Swedish
|
|
topicalization
|
|
</P>
|
|
<BLOCKQUOTE>
|
|
Det här tåget vill den här kunden inte ta.
|
|
</BLOCKQUOTE>
|
|
<P></P>
|
|
<P>
|
|
(``this train, this customer doesn't want to take'') may well have
|
|
an abstract syntax of a form in which the customer appears
|
|
before the train.
|
|
</P>
|
|
<P>
|
|
This is a problem for the implementor of the resource grammar.
|
|
It means that some parts of the resource must be written manually
|
|
and not as a functor.
|
|
However, the <I>user</I> of the resource can safely
|
|
ignore the word order problem, if it is correctly dealt with in
|
|
the resource.
|
|
</P>
|
|
<A NAME="toc21"></A>
|
|
<H3>A recipe for using the resource library</H3>
|
|
<P>
|
|
When starting to develop resource grammars, we believed they
|
|
would be all that
|
|
an application grammarian needs to write a concrete syntax.
|
|
However, experience has shown that it can be tough to start
|
|
grammar development in this way: selecting functions from
|
|
a resource API requires more abstract thinking than just
|
|
writing strings, and its take longer to reach testable
|
|
results. The most light-weight format is
|
|
maybe to start with context-free grammars (which notation is
|
|
also supported by GF). Context-free grammars that
|
|
give acceptable even though over-generating
|
|
results for languages like English are quick to produce.
|
|
</P>
|
|
<P>
|
|
The experience has led to the following
|
|
steps for grammar development. While giving the work
|
|
a quick start, this recipe
|
|
increases abstraction at a later level, when it is time to
|
|
to localize the grammar to different languages.
|
|
If context-free notation is used, steps 1 and 2 can
|
|
be merged.
|
|
</P>
|
|
<OL>
|
|
<LI>Encode domain ontology in and abstract syntax, <CODE>Domain</CODE>.
|
|
<LI>Write a rough concrete syntax in English, <CODE>DomainRough</CODE>.
|
|
This can be oversimplified and overgenerating.
|
|
<LI>Reimplement by using the resource library, and build a functor <CODE>DomainI</CODE>.
|
|
This can helped by <B>example-based grammar writing</B>, where
|
|
the examples are generated from <CODE>DomainRough</CODE>.
|
|
<LI>Instantiate the functor <CODE>DomainI</CODE> to different languages,
|
|
and test the results by generating linearizations.
|
|
<LI>If some rule doesn't satisfy in some language, use the resource in
|
|
a different way for that case (<B>compile-time transfer</B>).
|
|
</OL>
|
|
|
|
|
|
<!-- html code generated by txt2tags 2.3 (http://txt2tags.sf.net) -->
|
|
<!-- cmdline: txt2tags -\-toc multimodal.txt -->
|
|
</BODY></HTML>
|