From 8b0711062694eb29f9cad91e088308dfe1887752 Mon Sep 17 00:00:00 2001 From: aarne Date: Sun, 8 Jan 2006 20:51:32 +0000 Subject: [PATCH] html version of multimodal doc --- doc/multimodal.html | 849 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 849 insertions(+) create mode 100644 doc/multimodal.html diff --git a/doc/multimodal.html b/doc/multimodal.html new file mode 100644 index 000000000..b1f202a9d --- /dev/null +++ b/doc/multimodal.html @@ -0,0 +1,849 @@ + + + + +Demonstrative Expressions and Multimodal Grammars + +

Demonstrative Expressions and Multimodal Grammars

+ +Author: Aarne Ranta <aarne (at) cs.chalmers.se>
+Last update: Sun Jan 8 21:50:32 2006 +
+ +

+
+

+ + +

+
+

+ +

Abstract

+

+This document shows a method to write grammars +in which spoken utterances are accompanied by +pointing gestures. A computer application of such +grammars are multimodal dialogue systems, in +which the pointing gestures are performed by +mouse clicks and movements. +

+

+After an introduction to the notions of +demonstratives and integrated multimodality, +we will show by a concrete example +how multimodal grammars can be written in GF +and how they can be used in dialogue systems. +The explanation is given in three stages: +

+
    +
  1. How to write a multimodal grammar by hand. +
  2. How to add multimodality to a unimodal grammar. +
  3. How to use a multimodal resource grammar. +
+ + +

Multimodal grammars

+

+Demonstrative expressions are an old idea. Such +expressions get their meaning from the context. +

+
+ This train is faster than that airplane. +
+

+
+ I want to go from this place to this place. +
+

+

+In particular, as in these examples, the meaning +can be obtained from accompanying pointing gestures. +

+

+Thus the meaning-bearing unit is neither the words nor the +gestures alone, but their combination. Demonstratives +thus provide an example of integrated multimodality, +as opposed to parallel multimodality. In parallel +multimodality, speech and other modes of communication +are just alternative ways to convey the same information. +

+ +

Representing demonstratives in semantics and grammar

+

+When formalizing the semantics of demonstratives, we can combine syntax with coordinates: +

+
+ I want to go from this place to this place +
+

+

+is interpreted as something like +

+
+    want(I, go, this(place,(123,45)), this(place,(98,10))) 
+
+

+Now, the same semantic value can be given in many ways, by performing +the clicks at different points of time in relation to the speech: +

+
+ I want to go from this place CLICK(123,45) to this place CLICK(98,10) +
+

+
+ I want to go from this place to this place CLICK(123,45) CLICK(98,10) +
+

+
+ CLICK(123,45) CLICK(98,10) I want to go from this place to this place +
+

+

+How do we build the value compositionally in parsing? +Traditional parsing is sequential: its input is a string of tokens. +It works for demonstratives only if the pointing is adjacent to +the spoken expression. In the actual input, the demonstrative word +can be separated from the accompanying click by other words. The two +can also be simultaneous. +

+ +

Asynchronous syntax in GF

+

+What we need is a notion of asynchronous parsing, as opposed to +sequential parsing (where demonstrative words and clicks must be +adjacent). +

+

+We can implement asynchronous parsin in GF by exploiting the generality +of linearization types. A linearization type is the type of +the concrete syntax objects assigned to semantic values. +What a GF grammar defines is a relation +

+
+        abstract syntax trees  <--->  concrete syntax objects
+
+

+When modelling context-free grammar in GF, +the concrete syntax objects are just strings. +But they can be more structured objects as well - in general, they are +records of different kinds of objects. For example, +a demonstrative expression can be linearized into a record of two strings. +

+
+                                       {s = "this place" ;
+    this place (coord 123 45)  <--->    p = "(123,45)"
+                                       }
+
+

+The record +

+
+    {s = "I want to go from this place to this place" ;
+     p = "(123,45) (98,10"
+    }
+
+

+represents any combination of the sentence and the clicks, as long +as the clicks appear in this order. +

+ +

Example multimodal grammar: abstract syntax

+

+A simple example of a multimodal GF grammar is the one called +the Tram Demo grammar. It was written by Björn Bringert within +the TALK project as a part of a dialogue system that +deals with queries about tram timetables. The system interprets +a speech input in combination with mouse clicks on a digital map. +

+

+The abstract syntax of (a minimal fragment of) the Tram Demo +grammar is +

+
+  cat
+    Input, Dep, Dest, Click ;
+  fun
+    GoFromTo    : Dep  -> Dest -> Input ; -- "I want to go from x to y"
+    DepHere     : Click -> Dep ;          -- "from here" with click
+    DestHere    : Click -> Dest ;         -- "to here" with click
+  
+    CCoord      : Int -> Int -> Click ;   -- click coordinates
+
+

+An English concrete syntax of the grammar is +

+
+  lincat
+    Input, Dep, Dest = {s : Str ; p : Str} ;
+    Click            = {p : Str} ;
+  
+  lin
+    GoFromTo x y  = {s = ["I want to go"] ++ x.s ++ y.s ; p = x.p ++ y.p} ;
+    DepHere c     = {s = ["from here"]                  ; p = c.p} ;
+    DestHere c    = {s = ["to here"]                    ; p = c.p} ;
+  
+    CCoord x y    = {p = "(" ++ x.s ++ "," ++ y.s ++ ")"} ;
+
+

+When the grammar is used in the actual system, standard parsing methods +are used for interpreting the integrated speech and click input. +Parsing appears on two levels: the speech input parsing +performed by the Nuance speech recognition program (without the clicks), +and the semantics-yielding parser sending input to the dialogue manager. +The latter parser just attaches the clicks to the speech input. The order +of the clicks is preserved, and the parser can hence associate each of +the clicks with proper demonstratives. Here is the grammar used in the +two parsing phases. +

+
+  cat
+    Query,    -- whole content
+    Speech ;  -- speech only
+  fun
+    QueryInput  : Input -> Query ;   -- the whole content shown
+    SpeechInput : Input -> Speech ;  -- only the speech shown
+  
+  lincat
+    Query, Speech = {s : Str} ;
+  lin
+    QueryInput i  = {s = i.s ++ ";" ++ i.p} ;
+    SpeechInput i = {s = i.s} ;
+
+

+ +

Digression: discontinuous constituents

+

+The GF representation of integrated multimodality is +similar to the representation of discontinous constituents. +For instance, assume has arrived is a verb phrase in English, +which can be used both in declarative sentences and questions, +

+
+ she has arrived +
+

+
+ has she arrived +
+

+

+In the question, the two words are separated from each other. If +has arrived is a constituent of the question, it is thus discontinuous. +To represent such constituents in GF, records can be used: +we split verb phrases (VP) into a finite and infinitive part. +

+
+    lincat VP = {fin, inf : Str} ;
+  
+    lin Indic np vp = {s = np.s ++ vp.fin ++ vp.inf} ;
+    lin Quest np vp = {s = vp.fin ++ np.s ++ vp.inf} ;
+
+

+ +

From grammars to dialogue systems

+

+The general recipe for using GF when building dialogue systems +is to write a grammar with the following components: +

+ + +

+The engineering advantages of this approach have to do partly with +the declarativity of the description, partly with the tools provided +by GF to derive different components of the system: +

+ + +

+An example of this process is Björn Bringert's TramDemo. +More recently, grammars have been integrated to the GoDiS dialogue +manager by Prolog representations of abstract syntax. +

+ +

Adding multimodality to a unimodal grammar

+

+This section gives a recipe for making any unimodal grammar +multimodal, by adding pointing gestures to chosen expressions. The recipe +guarantees that the resulting grammar remains semantically well-formed, +i.e. type correct. +

+ +

The multimodal conversion

+

+The multimodal conversion of a grammar consists of seven +steps, of which the first is always the same, the second +involves a decision, and the rest are derivative: +

+
    +
  1. Add the category `Point` with a standard linearization type. +
    +    cat Point ;
    +    lincat Point = {point : Str} ;
    +
    +
  2. (Decision) Decide which constructors are demonstrative, i.e. take + a pointing gesture as an argument. Add a Point` as their last argument. + The new type signatures for such constructors d have the form +
    +     fun d : ... -> Point -> D 
    +
    +
  3. (Derivative) Add a point field to the linearization type L of any + demonstrative category D, i.e. a category that has at least one demonstrative + constructor: +
    +      lincat D = L ** {point : Str} ;
    +
    +
  4. (Derivative) If some other category C has a constructor d that takes + demonstratives as arguments, make it demonstrative by adding a point field + to its linearization type. +
  5. (Derivative) Store the point field in the linearization t of any + constructor d that has been made demonstrative: +
    +      lin d x1 ... xn p = t x1 ... xn ** {point = p.point} ;
    +
    +
  6. (Derivative) For each constructor f that takes demonstratives D_1,...,D_n + as arguments, collect the point fields of the arguments in the point + field of the value: +
    +    lin f x_1 ... x_m = 
    +      t x_1 ... x_m ** {point = x_d1.point ++ ... ++ x_dn.point} ;
    +
    + Make sure that the pointings x_d1.point ... x_dn.point are concatenated + in the same order as the arguments appear in the linearization t, + which is not necessarily the same as the abstract argument order. +
  7. (Derivative) To preserve type correctness, add an empty + point field to the linearization t of any + constructor c of a demonstrative category: +
    +      lin c x1 ... xn = t x1 ... xn ** {point = []} ;
    +
    +
+ + +

An example of the conversion

+

+Start with a Tram Demo grammar with no demonstratives, but just +tram stop names and the indexical here (interpreted as e.g. the user's +standing place). +

+
+  cat
+    Input, Dep, Dest, Name ;
+  fun
+    GoFromTo    : Dep  -> Dest -> Input ;
+    DepHere     : Dep ;                  
+    DestHere    : Dest ;                 
+    DepName     : Name -> Dep ;          
+    DestName    : Name -> Dest ;         
+  
+    Almedal     : Name ;                 
+
+

+A unimodal English concrete syntax of the grammar is +

+
+  lincat
+    Input, Dep, Dest, Name = {s : Str} ;
+  
+  lin
+    GoFromTo x y  = {s = ["I want to go"] ++ x.s ++ y.s} ;
+    DepHere       = {s = ["from here"]} ;
+    DestHere      = {s = ["to here"]} ;
+    DepName n     = {s = ["from"] ++ n.s} ;
+    DestName n    = {s = ["to"] ++ n.s} ;
+  
+    Almedal       = {s = "Almedal"} ;
+
+

+Let us follow the steps of the recipe. +

+
    +
  1. We add the category Point and its linearization type. +
  2. We decide that DepHere and DestHere involve a pointing gesture. +
  3. We add point to the linearization types of Dep and Dest. +
  4. Therefore, also add point to Input. (But Name remains unimodal.) +
  5. Add p.point to the linearizations of DepHere and DestHere. +
  6. Concatenate the points of the arguments of GoFromTo. +
  7. Add an empty point to DepName and DestName. +
+ +

+In the resulting grammar, one category is added and +two functions are changed in the abstract syntax (annotated by the step numbers): +

+
+  cat
+    Point ;                                               -- 1
+  fun
+    DepHere     : Point -> Dep ;                          -- 2
+    DestHere    : Point -> Dest ;                         -- 2
+  
+
+

+The concrete syntax in its entirety looks as follows +

+
+  lincat
+    Dep, Dest = {s : Str ; point : Str} ;                 -- 3    
+    Input = {s : Str ; point : Str} ;                     -- 4
+    Name = {s : Str} ;
+    Point = {point : Str} ;                               -- 1
+  lin
+    GoFromTo x y  = {s = ["I want to go"] ++ x.s ++ y.s ; -- 6
+                     point = x.point ++ y.point
+                    } ;
+    DepHere p     = {s = ["from here"] ;                  -- 5
+                     point = p.point
+                    } ;
+    DestHere p    = {s = ["to here"] :                    -- 5
+                     point = p.point
+                    } ;
+    DepName n     = {s = ["from"] ++ n.s ;                -- 7
+                     point = []
+                    } ;
+    DestName n    = {s = ["to"] ++ n.s ;                  -- 7
+                     point = []
+                    } ;
+    Almedal       = {s = "Almedal"} ;
+
+

+What we need in addition, to use the grammar in applications, are +

+
    +
  1. Constructors for Point, e.g. coordinate pairs. +
  2. Top-level categories, like Query and Speech in the original. +
+ +

+But their proper place is probably in another grammar module, so that +the core Tram Demo grammar can be used in different systems e.g. +encoding clicks in different ways. +

+ +

Multimodal conversion combinators

+

+GF is a functional programming language, and we exploit this +by providing a set of combinators that makes the multimodal conversion easier +and clearer. We start with the type of sequences of pointing gestures. +

+
+      Point : Type = {point : Str} ;
+
+

+To make a record type multimodal is to extend it with Point. +The record extension operator ** is needed here. +

+
+      Dem   : Type -> Type = \t -> t ** Point ;
+
+

+To construct, use, and concatenate pointings: +

+
+      mkPoint : Str -> Point = \s -> {point = s} ;
+  
+      noPoint : Point = mkPoint [] ;
+  
+      point   : Point -> Str = \p -> p.point ;
+  
+      concatPoint : (x,y : Point) -> Point = \x,y -> 
+        mkPoint (point x ++ point y) ;
+
+

+Finally, to add pointing to a record, with the limiting case of no demonstrative needed. +

+
+      mkDem : (t : Type) -> t -> Point -> Dem t = \_,x,s -> x ** s ;
+  
+      nonDem : (t : Type) -> t -> Dem t = \t,x -> mkDem t x noPoint ;
+
+

+Let us rewrite the Tram Demo grammar by using these combinators: +

+
+  oper
+    SS : Type = {s : Str} ;
+  lincat
+    Input, Dep, Dest = Dem SS ; 
+    Name = SS ;
+  
+  lin
+    GoFromTo x y  = {s = ["I want to go"] ++ x.s ++ y.s} ** 
+                    concatPoint x y ;
+    DepHere       = mkDem  SS {s = ["from here"]} ;
+    DestHere      = mkDem  SS {s = ["to here"]} ;
+    DepName n     = nonDem SS {s = ["from"] ++ n.s} ;
+    DestName n    = nonDem SS {s = ["to"] ++ n.s} ;
+  
+    Almedal       = {s = "Almedal"} ;
+
+

+The type synonym SS is introduced to make the combinator applications +concise. Notice the use of partial application in DepHere and +DestHere; an equivalent way to write is +

+
+    DepHere p     = mkDem  SS {s = ["from here"]} p ;
+
+

+ +

Multimodal resource grammars

+

+The main advantage of using GF when building dialogue systems is +that various components of the system +can be automatically generated from GF grammars. +Writing these grammars, however, can still be a considerable +task. A case in point are multilingual systems: +how to localize e.g. a system built in a car to +the languages of all those customers to whom the +car is sold? This problem has been the main focus of +GF for some years, and the solution on which most work has been +done is the development of resource grammar libraries. +These libraries work in the same way as program libraries +in software engineering, enabling a division of labour +between linguists and domain experts. +

+

+One of the goals in the resource grammars of different +languages has been to provide a language-independent API, +which makes the same resource grammar functions available for +different languages. For instance, the categories +S, NP, and VP are available in all of the +10 languages currently supported, and so is the function +

+
+    PredVP : NP -> VP -> S
+
+

+which corresponds to the rule S -> NP VP in phrase +structure grammar. However, there are several levels of abstraction +between the function PredVP and the phrase structure rule, +because the rule is implemented in so different ways in different +languages. In particular, discontinuous constituents are needed in +various degrees to make the rule work in different languages. +

+

+Now, dealing with discontinuous constituents is one of the demanding +aspects of multilingual grammar writing that the resource grammar +API is designed to hide. But the proposed treatment of integrated +multimodality is heavily dependent on similar things. What can we +do to make multimodal grammars easier to write (for different languages)? +There are two orthogonal answers: +

+
    +
  1. Use resource grammars to write a unimodal dialogue grammar and + then apply the multimodal + conversion to manually chosen parts. +
  2. Use multimodal resource grammars to derive multimodal + dialogue system grammars directly. +
+ +

+The multimodal resource grammar library has been obtained from +the unimodal one by applying the multimodal conversion manually. +In addition, the API has been simplified +by leaving out structures needed in written technical documents +(the original application area of GF) but not in spoken dialogue. +

+

+In the following subsections, we will show a part of the +multimodal resource grammar API, limited to a fragment that +is needed to get the main ideas and to reimplement the +Tram Demo grammar. The reimplementation shows one more advantage +of the resource grammar approach: dialogue systems can be +automatically instantiated to different languages. +

+ +

Resource grammar API

+

+The resource grammar API has three main kinds of entries: +

+
    +
  1. Language-independent linguistic structures (``linguistic ontology''), e.g. +
    +    PredVP : NP -> VP -> S ;     -- "Mary helps him"
    +
    +
  2. Language-specific syntax extensions, e.g. Swedish and German fronting +topicalization +
    +    TopicObj : NP -> VP -> S ;   -- "honom hjälper Mary"
    +
    +
  3. Language-specific lexical constructors, e.g. Germanic Ablaut patterns +
    +    irregV : (sing,sang,sung : Str) -> V ;
    +
    +
+ +

+The first two kinds of entries are cat and fun definitions +in an abstract syntax. The multimodal, restricted API has +e.g. the following categories. Their names are obtained from +the corresponding unimodal categories by prefixing M. +

+
+    MS ;     -- multimodal sentence or question
+    MQS ;    -- multimodal wh question
+    MImp ;   -- multimodal imperative
+    MVP ;    -- multimodal verb phrase
+    MNP ;    -- multimodal (demonstrative) noun phrase
+    MAdv ;   -- multimodal (demonstrative) adverbial
+  
+    Point ;  -- pointing gesture
+
+

+ +

Multimodal API: functions for building demonstratives

+

+Demonstrative pronouns can be used both as noun phrases and +as determiners. +

+
+      this_MNP    : Point -> MNP ;        -- this
+      thisDet_MNP : CN -> Point -> MNP ;  -- this car
+
+

+There are also demonstrative adverbs, and prepositions give +a productive way to build more adverbs. +

+
+      here_MAdv      : Point -> MAdv ;    -- here
+      here7from_MAdv : Point -> MAdv ;    -- from here
+  
+      MPrepNP : Prep -> MNP -> MAdv ;     -- in this car
+
+

+ +

Multimodal API: functions for building sentences and phrases

+

+A handful of predication rules construct sentences, questions, and imperatives. +

+
+      MPredVP   : MNP -> MVP -> MS ;    -- this plane flies here
+      MQPredVP  : MNP -> MVP -> MQS ;   -- does this plane fly here
+      MQuestVP  : IP  -> MVP -> MQS ;   -- who flies here
+      MImpVP    : MVP -> MImp ;         -- fly here!
+
+

+Verb phrases are constructed from verbs (inherited as such from +the unimodal API) by providing their complements. +

+
+      MUseV     : V   -> MVP ;          -- flies
+      MComplV2  : V2  -> MNP -> MVP ;   -- takes this
+      MComplVV  : VV  -> MVP -> MVP ;   -- wants to take this
+
+

+A multimodal adverb can be attached to a verb phrase. +

+
+      MAdvVP    : MVP -> MAdv -> MVP ;  -- flies here
+
+

+ +

Language-independent implementation: examples

+

+The implementation makes heavy use of the multimodal conversion +combinators. It adds a point field to whatever the implementation of the unimodal +category is in any language. Thus, for example +

+
+    lincat
+      MVP   = Dem VP ;
+      MNP   = Dem NP ;
+      MAdv  = Dem Adv ;
+  
+    lin 
+      this_MNP = mkDem NP this_NP ;
+      -- i.e. this_MNP p = this_NP ** {point = p.point} ;
+  
+      MComplV2 verb obj = mkDem VP (ComplV2 verb obj) obj ;
+  
+      MAdvVP vp adv = mkDem VP (AdvVP vp adv) (concatPoint vp adv) ;
+
+

+ +

Multimodal API: interface to unimodal expressions

+

+Using nondemonstrative expressions as demonstratives: +

+
+      DemNP   : NP  -> MNP ;
+      DemAdv  : Adv -> MAdv ;
+
+

+Building top-level phrases: +

+
+      PhrMS   : Pol -> MS   -> Phr ;
+      PhrMS   : Pol -> MS   -> Phr ;
+      PhrMQS  : Pol -> MQS  -> Phr ;
+      PhrMImp : Pol -> MImp -> Phr ;
+
+

+ +

Instantiating multimodality to different languages

+

+The implementation above has only used the resource grammar API, +not the concrete implementations. The library Demonstrative +is a parametrized module, also called a functor, which +has the following structure +

+
+    incomplete concrete DemonstrativeI of Demonstrative = 
+      Cat, TenseX ** open Test, Structural in {
+      
+      -- lincat and lin rules
+  
+      }
+
+

+It can be instantiated to different languages as follows. +

+
+    concrete DemonstrativeEng of Demonstrative = 
+      CatEng, TenseX ** DemonstrativeI with
+        (Test = TestEng),
+        (Structural = StructuralEng) ;
+  
+    concrete DemonstrativeSwe of Demonstrative = 
+      CatSwe, TenseX ** DemonstrativeI with
+        (Test = TestSwe),
+        (Structural = StructuralSwe) ;
+
+

+ +

Language-independent reimplementation of TramDemo

+

+Again using the functor idea, we reimplement TramDemo +as follows: +

+
+  incomplete concrete TramI of Tram = open Multimodal in {
+  
+  lincat
+    Query = Phr ; Input = MS ; 
+    Dep, Dest = MAdv ; Click = Point ;
+  lin
+    QInput = PhrMS PPos ;
+  
+    GoFromTo x y = 
+      MPredVP (DemNP (UsePron i_Pron)) 
+        (MAdvVP (MAdvVP (MComplVV want_VV (MUseV go_V)) x) y) ;
+  
+    DepHere    = here7from_MAdv ;
+    DestHere   = here7to_MAdv ;
+    DepName s  = MPrepNP from_Prep (DemNP (UsePN (SymbPN (MkSymb s)))) ;
+    DestName s = MPrepNP to_Prep (DemNP (UsePN (SymbPN (MkSymb s)))) ;
+  
+
+

+Then we can instantiate this to all languages for which +the Multimodal API has been implemented: +

+
+    concrete TramEng of Tram = TramI with 
+      (Multimodal = MultimodalEng) ;
+  
+    concrete TramSwe of Tram = TramI with 
+      (Multimodal = MultimodalSwe) ;
+  
+    concrete TramFre of Tram = TramI with 
+      (Multimodal = MultimodalFre) ;
+
+

+ +

The order problem

+

+It was pointed out in the section on the multimodal conversion that +the concrete word order may be different from the abstract one, +and vary between different languages. For instance, Swedish +topicalization +

+
+ Det här tåget vill den här kunden inte ta. +
+

+

+(``this train, this customer doesn't want to take'') may well have +an abstract syntax of a form in which the customer appears +before the train. +

+

+This is a problem for the implementor of the resource grammar. +It means that some parts of the resource must be written manually +and not as a functor. +However, the user of the resource can safely +ignore the word order problem, if it is correctly dealt with in +the resource. +

+ +

A recipe for using a resource library

+

+In the beginning, we believed resource grammars are all that +an application grammarian needs to write a concrete syntax. +However, experience has shown that it can be heavy to start +the grammar development in this way: selecting functions from +a resource API requires more abstract thinking than just +writing things (maybe even in a context-free grammar notation, +also supported by GF). This experience has led to the following +steps for grammar development, which, while permitting +a quick start of the work, towards the end increase abstraction +to localize the grammar in different languages. +

+
    +
  1. Encode domain ontology in and abstract syntax, Domain. +
  2. Write a rough concrete syntax in English, DomainRough. + This can be oversimplified and overgenerating. +
  3. Reimplement by resource, and build a functor DomainI. +
  4. Instantiate this functor to different languages, and test. +
  5. If a rule doesn't satisfy in a language, use its resource in + a different way (compile-time transfer). +
+ + + + +