diff --git a/doc/multimodal.html b/doc/multimodal.html new file mode 100644 index 000000000..b1f202a9d --- /dev/null +++ b/doc/multimodal.html @@ -0,0 +1,849 @@ + + +
+ ++This document shows a method to write grammars +in which spoken utterances are accompanied by +pointing gestures. A computer application of such +grammars are multimodal dialogue systems, in +which the pointing gestures are performed by +mouse clicks and movements. +
++After an introduction to the notions of +demonstratives and integrated multimodality, +we will show by a concrete example +how multimodal grammars can be written in GF +and how they can be used in dialogue systems. +The explanation is given in three stages: +
++Demonstrative expressions are an old idea. Such +expressions get their meaning from the context. +
++ This train is faster than that airplane. ++ +
+ I want to go from this place to this place. ++ +
+In particular, as in these examples, the meaning +can be obtained from accompanying pointing gestures. +
++Thus the meaning-bearing unit is neither the words nor the +gestures alone, but their combination. Demonstratives +thus provide an example of integrated multimodality, +as opposed to parallel multimodality. In parallel +multimodality, speech and other modes of communication +are just alternative ways to convey the same information. +
+ ++When formalizing the semantics of demonstratives, we can combine syntax with coordinates: +
++ I want to go from this place to this place ++ +
+is interpreted as something like +
++ want(I, go, this(place,(123,45)), this(place,(98,10))) ++
+Now, the same semantic value can be given in many ways, by performing +the clicks at different points of time in relation to the speech: +
++ I want to go from this place CLICK(123,45) to this place CLICK(98,10) ++ +
+ I want to go from this place to this place CLICK(123,45) CLICK(98,10) ++ +
+ CLICK(123,45) CLICK(98,10) I want to go from this place to this place ++ +
+How do we build the value compositionally in parsing? +Traditional parsing is sequential: its input is a string of tokens. +It works for demonstratives only if the pointing is adjacent to +the spoken expression. In the actual input, the demonstrative word +can be separated from the accompanying click by other words. The two +can also be simultaneous. +
+ ++What we need is a notion of asynchronous parsing, as opposed to +sequential parsing (where demonstrative words and clicks must be +adjacent). +
++We can implement asynchronous parsin in GF by exploiting the generality +of linearization types. A linearization type is the type of +the concrete syntax objects assigned to semantic values. +What a GF grammar defines is a relation +
++ abstract syntax trees <---> concrete syntax objects ++
+When modelling context-free grammar in GF, +the concrete syntax objects are just strings. +But they can be more structured objects as well - in general, they are +records of different kinds of objects. For example, +a demonstrative expression can be linearized into a record of two strings. +
+
+ {s = "this place" ;
+ this place (coord 123 45) <---> p = "(123,45)"
+ }
+
++The record +
+
+ {s = "I want to go from this place to this place" ;
+ p = "(123,45) (98,10"
+ }
+
++represents any combination of the sentence and the clicks, as long +as the clicks appear in this order. +
+ ++A simple example of a multimodal GF grammar is the one called +the Tram Demo grammar. It was written by Björn Bringert within +the TALK project as a part of a dialogue system that +deals with queries about tram timetables. The system interprets +a speech input in combination with mouse clicks on a digital map. +
++The abstract syntax of (a minimal fragment of) the Tram Demo +grammar is +
++ cat + Input, Dep, Dest, Click ; + fun + GoFromTo : Dep -> Dest -> Input ; -- "I want to go from x to y" + DepHere : Click -> Dep ; -- "from here" with click + DestHere : Click -> Dest ; -- "to here" with click + + CCoord : Int -> Int -> Click ; -- click coordinates ++
+An English concrete syntax of the grammar is +
+
+ lincat
+ Input, Dep, Dest = {s : Str ; p : Str} ;
+ Click = {p : Str} ;
+
+ lin
+ GoFromTo x y = {s = ["I want to go"] ++ x.s ++ y.s ; p = x.p ++ y.p} ;
+ DepHere c = {s = ["from here"] ; p = c.p} ;
+ DestHere c = {s = ["to here"] ; p = c.p} ;
+
+ CCoord x y = {p = "(" ++ x.s ++ "," ++ y.s ++ ")"} ;
+
++When the grammar is used in the actual system, standard parsing methods +are used for interpreting the integrated speech and click input. +Parsing appears on two levels: the speech input parsing +performed by the Nuance speech recognition program (without the clicks), +and the semantics-yielding parser sending input to the dialogue manager. +The latter parser just attaches the clicks to the speech input. The order +of the clicks is preserved, and the parser can hence associate each of +the clicks with proper demonstratives. Here is the grammar used in the +two parsing phases. +
+
+ cat
+ Query, -- whole content
+ Speech ; -- speech only
+ fun
+ QueryInput : Input -> Query ; -- the whole content shown
+ SpeechInput : Input -> Speech ; -- only the speech shown
+
+ lincat
+ Query, Speech = {s : Str} ;
+ lin
+ QueryInput i = {s = i.s ++ ";" ++ i.p} ;
+ SpeechInput i = {s = i.s} ;
+
+
+
++The GF representation of integrated multimodality is +similar to the representation of discontinous constituents. +For instance, assume has arrived is a verb phrase in English, +which can be used both in declarative sentences and questions, +
++ she has arrived ++ +
+ has she arrived ++ +
+In the question, the two words are separated from each other. If
+has arrived is a constituent of the question, it is thus discontinuous.
+To represent such constituents in GF, records can be used:
+we split verb phrases (VP) into a finite and infinitive part.
+
+ lincat VP = {fin, inf : Str} ;
+
+ lin Indic np vp = {s = np.s ++ vp.fin ++ vp.inf} ;
+ lin Quest np vp = {s = vp.fin ++ np.s ++ vp.inf} ;
+
+
+
++The general recipe for using GF when building dialogue systems +is to write a grammar with the following components: +
++The engineering advantages of this approach have to do partly with +the declarativity of the description, partly with the tools provided +by GF to derive different components of the system: +
++An example of this process is Björn Bringert's TramDemo. +More recently, grammars have been integrated to the GoDiS dialogue +manager by Prolog representations of abstract syntax. +
+ ++This section gives a recipe for making any unimodal grammar +multimodal, by adding pointing gestures to chosen expressions. The recipe +guarantees that the resulting grammar remains semantically well-formed, +i.e. type correct. +
+ ++The multimodal conversion of a grammar consists of seven +steps, of which the first is always the same, the second +involves a decision, and the rest are derivative: +
+`Point` with a standard linearization type.
+
+ cat Point ;
+ lincat Point = {point : Str} ;
+
+Point` as their last argument.
+ The new type signatures for such constructors d have the form
++ fun d : ... -> Point -> D ++
point field to the linearization type L of any
+ demonstrative category D, i.e. a category that has at least one demonstrative
+ constructor:
+
+ lincat D = L ** {point : Str} ;
+
+point field in the linearization t of any
+ constructor d that has been made demonstrative:
+
+ lin d x1 ... xn p = t x1 ... xn ** {point = p.point} ;
+
+
+ lin f x_1 ... x_m =
+ t x_1 ... x_m ** {point = x_d1.point ++ ... ++ x_dn.point} ;
+
+ Make sure that the pointings x_d1.point ... x_dn.point are concatenated
+ in the same order as the arguments appear in the linearization t,
+ which is not necessarily the same as the abstract argument order.
+point field to the linearization t of any
+ constructor c of a demonstrative category:
+
+ lin c x1 ... xn = t x1 ... xn ** {point = []} ;
+
++Start with a Tram Demo grammar with no demonstratives, but just +tram stop names and the indexical here (interpreted as e.g. the user's +standing place). +
++ cat + Input, Dep, Dest, Name ; + fun + GoFromTo : Dep -> Dest -> Input ; + DepHere : Dep ; + DestHere : Dest ; + DepName : Name -> Dep ; + DestName : Name -> Dest ; + + Almedal : Name ; ++
+A unimodal English concrete syntax of the grammar is +
+
+ lincat
+ Input, Dep, Dest, Name = {s : Str} ;
+
+ lin
+ GoFromTo x y = {s = ["I want to go"] ++ x.s ++ y.s} ;
+ DepHere = {s = ["from here"]} ;
+ DestHere = {s = ["to here"]} ;
+ DepName n = {s = ["from"] ++ n.s} ;
+ DestName n = {s = ["to"] ++ n.s} ;
+
+ Almedal = {s = "Almedal"} ;
+
++Let us follow the steps of the recipe. +
+Point and its linearization type.
+DepHere and DestHere involve a pointing gesture.
+point to the linearization types of Dep and Dest.
+point to Input. (But Name remains unimodal.)
+p.point to the linearizations of DepHere and DestHere.
+GoFromTo.
+point to DepName and DestName.
++In the resulting grammar, one category is added and +two functions are changed in the abstract syntax (annotated by the step numbers): +
++ cat + Point ; -- 1 + fun + DepHere : Point -> Dep ; -- 2 + DestHere : Point -> Dest ; -- 2 + ++
+The concrete syntax in its entirety looks as follows +
+
+ lincat
+ Dep, Dest = {s : Str ; point : Str} ; -- 3
+ Input = {s : Str ; point : Str} ; -- 4
+ Name = {s : Str} ;
+ Point = {point : Str} ; -- 1
+ lin
+ GoFromTo x y = {s = ["I want to go"] ++ x.s ++ y.s ; -- 6
+ point = x.point ++ y.point
+ } ;
+ DepHere p = {s = ["from here"] ; -- 5
+ point = p.point
+ } ;
+ DestHere p = {s = ["to here"] : -- 5
+ point = p.point
+ } ;
+ DepName n = {s = ["from"] ++ n.s ; -- 7
+ point = []
+ } ;
+ DestName n = {s = ["to"] ++ n.s ; -- 7
+ point = []
+ } ;
+ Almedal = {s = "Almedal"} ;
+
++What we need in addition, to use the grammar in applications, are +
+Point, e.g. coordinate pairs.
+Query and Speech in the original.
++But their proper place is probably in another grammar module, so that +the core Tram Demo grammar can be used in different systems e.g. +encoding clicks in different ways. +
+ ++GF is a functional programming language, and we exploit this +by providing a set of combinators that makes the multimodal conversion easier +and clearer. We start with the type of sequences of pointing gestures. +
+
+ Point : Type = {point : Str} ;
+
+
+To make a record type multimodal is to extend it with Point.
+The record extension operator ** is needed here.
+
+ Dem : Type -> Type = \t -> t ** Point ; ++
+To construct, use, and concatenate pointings: +
+
+ mkPoint : Str -> Point = \s -> {point = s} ;
+
+ noPoint : Point = mkPoint [] ;
+
+ point : Point -> Str = \p -> p.point ;
+
+ concatPoint : (x,y : Point) -> Point = \x,y ->
+ mkPoint (point x ++ point y) ;
+
++Finally, to add pointing to a record, with the limiting case of no demonstrative needed. +
++ mkDem : (t : Type) -> t -> Point -> Dem t = \_,x,s -> x ** s ; + + nonDem : (t : Type) -> t -> Dem t = \t,x -> mkDem t x noPoint ; ++
+Let us rewrite the Tram Demo grammar by using these combinators: +
+
+ oper
+ SS : Type = {s : Str} ;
+ lincat
+ Input, Dep, Dest = Dem SS ;
+ Name = SS ;
+
+ lin
+ GoFromTo x y = {s = ["I want to go"] ++ x.s ++ y.s} **
+ concatPoint x y ;
+ DepHere = mkDem SS {s = ["from here"]} ;
+ DestHere = mkDem SS {s = ["to here"]} ;
+ DepName n = nonDem SS {s = ["from"] ++ n.s} ;
+ DestName n = nonDem SS {s = ["to"] ++ n.s} ;
+
+ Almedal = {s = "Almedal"} ;
+
+
+The type synonym SS is introduced to make the combinator applications
+concise. Notice the use of partial application in DepHere and
+DestHere; an equivalent way to write is
+
+ DepHere p = mkDem SS {s = ["from here"]} p ;
+
+
+
++The main advantage of using GF when building dialogue systems is +that various components of the system +can be automatically generated from GF grammars. +Writing these grammars, however, can still be a considerable +task. A case in point are multilingual systems: +how to localize e.g. a system built in a car to +the languages of all those customers to whom the +car is sold? This problem has been the main focus of +GF for some years, and the solution on which most work has been +done is the development of resource grammar libraries. +These libraries work in the same way as program libraries +in software engineering, enabling a division of labour +between linguists and domain experts. +
+
+One of the goals in the resource grammars of different
+languages has been to provide a language-independent API,
+which makes the same resource grammar functions available for
+different languages. For instance, the categories
+S, NP, and VP are available in all of the
+10 languages currently supported, and so is the function
+
+ PredVP : NP -> VP -> S ++
+which corresponds to the rule S -> NP VP in phrase
+structure grammar. However, there are several levels of abstraction
+between the function PredVP and the phrase structure rule,
+because the rule is implemented in so different ways in different
+languages. In particular, discontinuous constituents are needed in
+various degrees to make the rule work in different languages.
+
+Now, dealing with discontinuous constituents is one of the demanding +aspects of multilingual grammar writing that the resource grammar +API is designed to hide. But the proposed treatment of integrated +multimodality is heavily dependent on similar things. What can we +do to make multimodal grammars easier to write (for different languages)? +There are two orthogonal answers: +
++The multimodal resource grammar library has been obtained from +the unimodal one by applying the multimodal conversion manually. +In addition, the API has been simplified +by leaving out structures needed in written technical documents +(the original application area of GF) but not in spoken dialogue. +
++In the following subsections, we will show a part of the +multimodal resource grammar API, limited to a fragment that +is needed to get the main ideas and to reimplement the +Tram Demo grammar. The reimplementation shows one more advantage +of the resource grammar approach: dialogue systems can be +automatically instantiated to different languages. +
+ ++The resource grammar API has three main kinds of entries: +
++ PredVP : NP -> VP -> S ; -- "Mary helps him" ++
+ TopicObj : NP -> VP -> S ; -- "honom hjälper Mary" ++
+ irregV : (sing,sang,sung : Str) -> V ; ++
+The first two kinds of entries are cat and fun definitions
+in an abstract syntax. The multimodal, restricted API has
+e.g. the following categories. Their names are obtained from
+the corresponding unimodal categories by prefixing M.
+
+ MS ; -- multimodal sentence or question + MQS ; -- multimodal wh question + MImp ; -- multimodal imperative + MVP ; -- multimodal verb phrase + MNP ; -- multimodal (demonstrative) noun phrase + MAdv ; -- multimodal (demonstrative) adverbial + + Point ; -- pointing gesture ++ + +
+Demonstrative pronouns can be used both as noun phrases and +as determiners. +
++ this_MNP : Point -> MNP ; -- this + thisDet_MNP : CN -> Point -> MNP ; -- this car ++
+There are also demonstrative adverbs, and prepositions give +a productive way to build more adverbs. +
++ here_MAdv : Point -> MAdv ; -- here + here7from_MAdv : Point -> MAdv ; -- from here + + MPrepNP : Prep -> MNP -> MAdv ; -- in this car ++ + +
+A handful of predication rules construct sentences, questions, and imperatives. +
++ MPredVP : MNP -> MVP -> MS ; -- this plane flies here + MQPredVP : MNP -> MVP -> MQS ; -- does this plane fly here + MQuestVP : IP -> MVP -> MQS ; -- who flies here + MImpVP : MVP -> MImp ; -- fly here! ++
+Verb phrases are constructed from verbs (inherited as such from +the unimodal API) by providing their complements. +
++ MUseV : V -> MVP ; -- flies + MComplV2 : V2 -> MNP -> MVP ; -- takes this + MComplVV : VV -> MVP -> MVP ; -- wants to take this ++
+A multimodal adverb can be attached to a verb phrase. +
++ MAdvVP : MVP -> MAdv -> MVP ; -- flies here ++ + +
+The implementation makes heavy use of the multimodal conversion
+combinators. It adds a point field to whatever the implementation of the unimodal
+category is in any language. Thus, for example
+
+ lincat
+ MVP = Dem VP ;
+ MNP = Dem NP ;
+ MAdv = Dem Adv ;
+
+ lin
+ this_MNP = mkDem NP this_NP ;
+ -- i.e. this_MNP p = this_NP ** {point = p.point} ;
+
+ MComplV2 verb obj = mkDem VP (ComplV2 verb obj) obj ;
+
+ MAdvVP vp adv = mkDem VP (AdvVP vp adv) (concatPoint vp adv) ;
+
+
+
++Using nondemonstrative expressions as demonstratives: +
++ DemNP : NP -> MNP ; + DemAdv : Adv -> MAdv ; ++
+Building top-level phrases: +
++ PhrMS : Pol -> MS -> Phr ; + PhrMS : Pol -> MS -> Phr ; + PhrMQS : Pol -> MQS -> Phr ; + PhrMImp : Pol -> MImp -> Phr ; ++ + +
+The implementation above has only used the resource grammar API,
+not the concrete implementations. The library Demonstrative
+is a parametrized module, also called a functor, which
+has the following structure
+
+ incomplete concrete DemonstrativeI of Demonstrative =
+ Cat, TenseX ** open Test, Structural in {
+
+ -- lincat and lin rules
+
+ }
+
++It can be instantiated to different languages as follows. +
++ concrete DemonstrativeEng of Demonstrative = + CatEng, TenseX ** DemonstrativeI with + (Test = TestEng), + (Structural = StructuralEng) ; + + concrete DemonstrativeSwe of Demonstrative = + CatSwe, TenseX ** DemonstrativeI with + (Test = TestSwe), + (Structural = StructuralSwe) ; ++ + +
+Again using the functor idea, we reimplement TramDemo
+as follows:
+
+ incomplete concrete TramI of Tram = open Multimodal in {
+
+ lincat
+ Query = Phr ; Input = MS ;
+ Dep, Dest = MAdv ; Click = Point ;
+ lin
+ QInput = PhrMS PPos ;
+
+ GoFromTo x y =
+ MPredVP (DemNP (UsePron i_Pron))
+ (MAdvVP (MAdvVP (MComplVV want_VV (MUseV go_V)) x) y) ;
+
+ DepHere = here7from_MAdv ;
+ DestHere = here7to_MAdv ;
+ DepName s = MPrepNP from_Prep (DemNP (UsePN (SymbPN (MkSymb s)))) ;
+ DestName s = MPrepNP to_Prep (DemNP (UsePN (SymbPN (MkSymb s)))) ;
+
+
+
+Then we can instantiate this to all languages for which
+the Multimodal API has been implemented:
+
+ concrete TramEng of Tram = TramI with + (Multimodal = MultimodalEng) ; + + concrete TramSwe of Tram = TramI with + (Multimodal = MultimodalSwe) ; + + concrete TramFre of Tram = TramI with + (Multimodal = MultimodalFre) ; ++ + +
+It was pointed out in the section on the multimodal conversion that +the concrete word order may be different from the abstract one, +and vary between different languages. For instance, Swedish +topicalization +
++ Det här tåget vill den här kunden inte ta. ++ +
+(``this train, this customer doesn't want to take'') may well have +an abstract syntax of a form in which the customer appears +before the train. +
++This is a problem for the implementor of the resource grammar. +It means that some parts of the resource must be written manually +and not as a functor. +However, the user of the resource can safely +ignore the word order problem, if it is correctly dealt with in +the resource. +
+ ++In the beginning, we believed resource grammars are all that +an application grammarian needs to write a concrete syntax. +However, experience has shown that it can be heavy to start +the grammar development in this way: selecting functions from +a resource API requires more abstract thinking than just +writing things (maybe even in a context-free grammar notation, +also supported by GF). This experience has led to the following +steps for grammar development, which, while permitting +a quick start of the work, towards the end increase abstraction +to localize the grammar in different languages. +
+Domain.
+DomainRough.
+ This can be oversimplified and overgenerating.
+DomainI.
+