multimodal grammar document

2026-05-25 02:38:55 -06:00 · 2006-01-08 18:57:17 +00:00
parent 4dec64349a
commit 54d77b022f
1 changed files with 691 additions and 0 deletions
--- a/doc/multimodal.txt
+++ b/doc/multimodal.txt
@@ -0,0 +1,691 @@
+Multimodal Resource Grammars
+Author: Aarne Ranta <aarne (at) cs.chalmers.se>
+Last update: %%date(%c)
+
+% NOTE: this is a txt2tags file.
+% Create an html file from this file using:
+% txt2tags --toc multimodal.txt
+
+% Create a latex file from this file using:
+% txt2tags -ttex multimodal.txt
+
+%!target:html
+
+
+==Plan==
+
+After an introduction to **demonstratives**
+and **integrated multimodality**,
+we will show how multimodal grammars can be written in GF
+and how they can be used in dialogue systems.
+The explanation is given in three stages:
+
+ How to write a multimodal grammar by hand.
+ How to add multimodality to a unimodal grammar.
+ How to use a multimodal resource grammar.
+
+
+==Multimodal expressions==
+
+**Demonstrative expressions** are an old idea. Such
+expressions get their meaning from the context.
+
+	    //This train// is faster than //that airplane//.
+
+	    I want to go from //this place// to //this place//.
+
+In particular, as in these examples, the meaning
+can be obtained from accompanying pointing gestures.
+
+Thus the meaning-bearing unit if neither the words and the 
+gesture alone, but their combination. Demonstratives
+thus provide an example of **integrated multimodality**,
+as opposed to parallel multimodality. In parallel
+multimodality, speech and other modes of communication 
+are just alternative ways to convey the same information.
+
+
+===Representing demonstratives in semantics and grammar===
+
+When formalizing the semantics of demonstratives, we can combine syntax with coordinates:
+
+	    I want to go from this place to this place
+
+is interpreted as something like
+```
+  want(I, go, this(place,(123,45)), this(place,(98,10))) 
+```
+Now, the same semantic value can be given in many ways, by performing
+the clicks at different points of time in relation to the speech: 
+
+	    I want to go from this place CLICK(123,45) to this place CLICK(98,10) 
+
+	    I want to go from this place to this place CLICK(123,45) CLICK(98,10) 
+
+	    CLICK(123,45) CLICK(98,10) I want to go from this place to this place 
+
+How do we build the value compositionally in parsing?
+Traditional parsing is sequential: its input is a string of tokens.
+It works for demonstratives only if the pointing is adjacent to 
+the spoken expression. In the actual input, the demonstrative word
+can be separated from the accompanying click by other words. The two
+can also be simultaneous. 
+
+
+===Asynchronous syntax in GF===
+
+What we need is a notion of **asynchronous parsing**, as opposed to
+sequential parsing (where demonstrative words and clicks must be
+adjacent). 
+
+We can implement asynchronous parsin in GF by exploiting the generality
+of **linearization types**. A linearization type is the type of
+the **concrete syntax objects** assigned to semantic values.
+What a GF grammar defines is a relation 
+```
+  abstract syntax trees  ---  concrete syntax objects
+```
+When modelling context-free grammar in GF,
+the concrete syntax objects are just strings. 
+But they can be more structured objects as well - in general, they are
+**records** of different kinds of objects. For example,
+a demonstrative expression can be linearized into a record of two strings.
+```
+                                     {s = "this place" ;
+  this place (coord 123 45)  <--->    p = "(123,45)"
+                                     }
+```
+The record
+```
+  {s = "I want to go from this place to this place" ;
+   p = "(123,45) (98,10"
+  }
+```
+represents any combination of the sentence and the clicks, as long
+as the clicks appear in this order.
+
+
+===Example multimodal grammar: abstract syntax===
+
+A simple example of a multimodal GF grammar is the one called
+the Tram Demo grammar. It was written by Björn Bringert within
+the TALK project as a part of a dialogue system that
+deals with queries about tram timetables. The system interprets
+a speech input in combination with clicks on a digital map.
+
+The abstract syntax of (a minimal fragment of) the Tram Demo
+grammar is
+```
+cat
+  Input, Dep, Dest, Click ;
+fun
+  GoFromTo    : Dep  -> Dest -> Input ; -- "I want to go from x to y"
+  DepClick    : Click -> Dep ;          -- "from here" with click
+  DestClick   : Click -> Dest ;         -- "to here" with click
+
+  CCoord      : Int -> Int -> Click ;   -- click coordinates
+```
+An English concrete syntax of the grammar is
+```
+lincat
+  Input, Dep, Dest = {s : Str ; p : Str} ;
+  Click            = {p : Str} ;
+
+lin
+  GoFromTo x y  = {s = ["I want to go"] ++ x.s ++ y.s ; p = x.p ++ y.p} ;
+  DepClick c    = {s = ["from here"]                  ; p = c.p} ;
+  DestClick c   = {s = ["to here"]                    ; p = c.p} ;
+
+  CCoord x y    = {p = "(" ++ x.s ++ "," ++ y.s ++ ")"} ;
+```
+When the grammar is used in the actual system, standard parsing methods
+are used for interpreting the integrated speech and click input.
+Parsing appears on two levels: the speech input parsing
+performed by the Nuance speech recognition program (without the clicks),
+and the semantics-yielding parser sending input to the dialogue manager.
+The latter parser just attaches the clicks to the speech input. The order
+of the clicks is preserved, and the parser can hence associate each of
+the clicks with proper demonstratives. Here is the grammar used in the
+two parsing phases.
+```
+cat
+  Query,    -- whole content
+  Speech ;  -- speech only
+fun
+  QueryInput  : Input -> Query ;   -- the whole content shown
+  SpeechInput : Input -> Speech ;  -- only the speech shown
+
+lincat
+  Query, Speech = {s : Str} ;
+lin
+  QueryInput i  = {s = i.s ++ ";" ++ i.p} ;
+  SpeechInput i = {s = i.s} ;
+```
+
+
+===Digression: discontinuous constituents===
+
+The GF representation of integrated multimodality is 
+similar to the representation of **discontinous constituents**.
+For instance, assume //has arrived// is a verb phrase in English,
+which can be used both in declarative sentences and questions,
+
+	she //has arrived//
+
+	//has// she //arrived//
+
+In the question, the two words are separated from each other. If
+//has arrived// is a constituent of the question, it is thus discontinuous.
+To represent such constituents in GF, records can be used:
+we split verb phrases (``VP``) into a finite and infinitive part.
+```
+  lincat VP = {fin, inf : Str} ;
+
+  lin Indic np vp = {s = np.s ++ vp.fin ++ vp.inf} ;
+  lin Quest np vp = {s = vp.fin ++ np.s ++ vp.inf} ;
+```
+
+==From grammars to dialogue systems==
+
+The general recipe for using GF when building dialogue systems
+is to write a grammar with the following components:
+
+- The abstract syntax defines the semantics (the "ontology")
+  of the domain of the system.
+- The concrete syntaxes define alternative modes of input and output.
+
+
+The engineering advantages of this approach have to do partly with
+the declarativity of the description, partly with the tools provided
+by GF to derive different components of the system:
+
+- The type checker guarantees that all the input and output
+  modes match with the ontology.
+- The grammar compiler generates parsers for each input grammar
+  and generators for each output grammar.
+- Translators between GF's abstract syntax and other ontology
+  description languages enable communication with different 
+  kinds of dialogue managers and cover e.g. Prolog terms and XML objects.
+- Translators from GF's concrete syntax to speech recognition formats
+  make it possible to generate e.g. Nuance grammars and ATK language
+  models. 
+
+
+An example of this process is Björn Bringert's TramDemo.
+More recently, grammars have been integrated to the GoDiS dialogue
+manager by Prolog representations of abstract syntax.
+
+
+==Adding multimodality to a unimodal grammar==
+
+This section gives a recipe for converting a unimodal grammar to
+multimodal, by adding pointing gestures to expressions. The recipe
+guarantees that the resulting grammar remains semantically well-formed,
+i.e. type correct.
+
+
+===The multimodal conversion===
+
+The **multimodal conversion** of a grammar consists of three
+steps involving a decision, and four derivative steps:
+
+ (Decision) Decide which categories are demonstrative. This means that their
+  expressions can (but need not) contain pointing gestures.
+ (Decision) Define constructors that are truly demonstrative, i.e. take
+  a pointing gesture as an argument. These constructors have the form
+```
+   fun d : ... -> Point -> D 
+```
+  In the simplest case, such a //d// is an already existing
+  constructor, to which a ``Point`` argument it added. But it is also
+  possible to add new constructors.
+ (Derivative) Add an extra ``point`` field to the linearization type //L// of any
+  demonstrative category //D//:
+```
+    lincat D = L ** {point : Str} ;
+```
+ (Derivative) Add an extra ``point`` field to the linearization //t// of any
+  constructor //d// that has been made demonstrative:
+```
+    lin d x1 ... xn p = t x1 ... xn ** p ;
+```
+ (Decision) Define the linearization rules of those demonstrative constructors
+  that are new.
+ (Derivative) If some other category //C// has a constructor //f// that takes
+  demonstratives as arguments, make it demonstrative by adding a //point// field
+  to its linearization type.
+ (Derivative) For each constructor //f// that takes demonstratives //D_1,...,D_n// 
+  as arguments, collect the //point// fields of the arguments in the //point//
+  field of the value:
+```
+  lin f x_1 ... x_m = t x_1 ... x_m ** {point = x_d1.point ++ ... ++ x_dn.point} ;
+```
+  Make sure that the pointings ``x_d1.point ... x_dn.point`` are concatenated
+  in the same order as the arguments appear in the //linearization// //t//,
+  which is not necessarily the same as the abstract argument order.
+
+
+===An example of the conversion===
+
+Start with a Tram Demo grammar with no demonstratives, but just 
+tram stop names and the indexical //here// (referring to the user's 
+standing place). 
+```
+cat
+  Input, Dep, Dest, Name ;
+fun
+  GoFromTo    : Dep  -> Dest -> Input ;
+  DepHere     : Dep ;                  
+  DestHere    : Dest ;                 
+  DepName     : Name -> Dep ;          
+  DestName    : Name -> Dest ;         
+
+  Almedal     : Name ;                 
+```
+A unimodal English concrete syntax of the grammar is
+```
+lincat
+  Input, Dep, Dest, Name = {s : Str} ;
+
+lin
+  GoFromTo x y  = {s = ["I want to go"] ++ x.s ++ y.s} ;
+  DepHere       = {s = ["from here"]} ;
+  DestHere      = {s = ["to here"]} ;
+  DepName n     = {s = ["from"] ++ n.s} ;
+  DestName n    = {s = ["to"] ++ n.s} ;
+
+  Almedal       = {s = "Almedal"} ;
+```
+We now decide that the categories ``Dep`` and ``Dest`` are demonstrative.
+This means, derivatively, that ``Input`` is also demonstrative.
+But ``Name`` remains unimodal.
+
+We also decide that ``DepHere`` and ``DestHere`` involve a pointing gesture.
+This has consequences for ``GoFromTo`` but not for the other constructors.
+However, even here we have to add an empty pointing sequence if required by the
+linearization type. 
+
+In the resulting grammar, one category is added and 
+two functions are changed in the abstract syntax:
+```
+cat
+  Point ;
+fun
+  DepHere     : Point -> Dep ;
+  DestHere    : Point -> Dest ;
+
+```
+The concrete syntax in its entirety looks as follows:
+```
+lincat
+  Input, Dep, Dest = {s : Str ; point : Str} ;
+  Name = {s : Str} ;
+  Point = {point : Str} ;
+lin
+  GoFromTo x y  = {s = ["I want to go"] ++ x.s ++ y.s ; 
+                   point = x.point ++ y.point
+                  } ;
+  DepHere p     = {s = ["from here"] ;
+                   point = p.point
+                  } ;
+  DestHere p    = {s = ["to here"] :
+                   point = p.point
+                  } ;
+  DepName n     = {s = ["from"] ++ n.s ;
+                   point = []
+                  } ;
+  DestName n    = {s = ["to"] ++ n.s ;
+                   point = []
+                  } ;
+  Almedal       = {s = "Almedal"} ;
+```
+What we need in addition, to use the grammar in applications, are
+
+ Constructors for ``Point``, e.g. coordinate pairs.
+ Top-level categories, like ``Query`` and ``Speech`` in the original.
+
+
+
+
+===Multimodal conversion combinators===
+
+GF is a functional programming language, and we exploit this
+by providing a set of combinators that makes the multimodal conversion easier
+and clearer. We start with the type of sequences of pointing gestures.
+```
+    Point : Type = {point : Str} ;
+```
+To make a record type multimodal is to extend it with ``Point``.
+The record extension operator ``**`` is needed here.
+```
+    Dem   : Type -> Type = \t -> t ** Point ;
+```
+To construct, use, and concatenate pointings:
+```
+    mkPoint : Str -> Point = \s -> {point = s} ;
+
+    noPoint : Point = mkPoint [] ;
+
+    point   : Point -> Str = \p -> p.point ;
+
+    concatPoint : (x,y : Point) -> Point = \x,y -> 
+      mkPoint (point x ++ point y) ;
+```
+Finally, to add pointing to a record, with the limiting case of no demonstrative needed.
+```
+    mkDem : (t : Type) -> t -> Point -> Dem t = \_,x,s -> x ** s ;
+
+    nonDem : (t : Type) -> t -> Dem t = \t,x -> mkDem t x noPoint ;
+```
+Let us rewrite the Tram Demo grammar by using these combinators:
+```
+oper
+  SS : Type = {s : Str} ;
+lincat
+  Input, Dep, Dest = Dem SS ; 
+  Name = SS ;
+
+lin
+  GoFromTo x y  = {s = ["I want to go"] ++ x.s ++ y.s} ** concatPoint x y ;
+  DepHere       = mkDem  SS {s = ["from here"]} ;
+  DestHere      = mkDem  SS {s = ["to here"]} ;
+  DepName n     = nonDem SS {s = ["from"] ++ n.s} ;
+  DestName n    = nonDem SS {s = ["to"] ++ n.s} ;
+
+  Almedal       = {s = "Almedal"} ;
+```
+The type synonym ``SS`` is introduced to make the combinator applications
+concise. Notice the use of partial application in ``DepHere`` and
+``DestHere``; an equivalent way to write is
+```
+  DepHere p     = mkDem  SS {s = ["from here"]} p ;
+```
+
+
+==Multimodal resource grammars==
+
+The main advantage of using GF when building dialogue systems is
+that various components of the system
+can be automatically generated GF grammars.
+Writing grammars, however, can still be a considerable
+task. A case in point are multilingual systems:
+how to localize e.g. a system built in a car to
+the languages of all those customers to whom the
+car is sold? This problem has been the main focus of
+GF for some years, and the solution on which work has been
+done is the development of **resource grammar libraries**.
+These libraries work in the same way as program libraries
+in software engineering, enabling a division of labour
+between, in the present case, linguists and domain experts.
+
+One of the challenges in the resource grammars of different
+languages has been to provide a **language-independent API**,
+which makes the same resource grammar functions available for
+different languages. For instance, the categories
+``S``, ``NP``, and ``VP`` are available in all of the
+10 languages currently supported, and so is the function
+```
+  PredVP : NP -> VP -> S
+```
+which corresponds to the rule ``S -> NP VP`` in phrase
+structure grammar. However, there are several levels of abstraction
+between the function ``PredVP`` and the phrase structure rule,
+because the rule is implemented in so different ways in different
+languages. In particular, discontinuous constituents are needed in
+various degrees to make the rule work in different languages.
+
+Now, dealing with discontinuous constituents is one of the demanding
+aspects of multilingual grammar writing that the resource grammar 
+API is designed to hide. But the proposed treatment of integrated
+multimodality is heavily dependent on similar things. What can we
+do to make multimodal grammars easier to write (for different languages)?
+There are two orthogonal answers:
+
+ Use resource grammars and before and then apply the multimodal
+  conversion to manually chosen parts.
+ Use **multimodal resource grammars** to derive multimodal
+  dialogue system grammars automatically.
+
+
+The multimodal resource grammar library has been obtained from
+the unimodal one by applying, manually, an idea similar to the
+multimodal conversion. In addition, the API has been simplified
+by leaving out structures needed in written technical documents 
+(the original application area of GF) but not in spoken dialogue.
+
+In the following subsections, we will show a part of the
+multimodal resource grammar API, limited to a fragment that
+is needed to get the main ideas and to reimplement the
+Tram Demo grammar. The reimplementation shows one more advantage
+of the resource grammar approach: dialogue systems can be 
+automatically instantiated to different languages.
+
+
+
+
+===Resource grammar API===
+
+The resource grammar API has three main kinds of entries:
+
+ Language-independent linguistic structures (``linguistic ontology''), e.g.
+```
+  PredVP : NP -> VP -> S ;     -- "Mary helps him"
+```
+ Language-specific syntax extensions, e.g. Swedish and German fronting
+topicalization
+```
+  TopicObj : NP -> VP -> S ;   -- "honom hjälper Mary"
+```
+ Language-specific lexical constructors, e.g. Germanic //Ablaut// patterns
+```
+  irregV : (sing,sang,sung : Str) -> V ;
+```
+
+
+The first two kinds of entries are ``cat`` and ``fun`` definitions
+in an abstract syntax. The multimodal, restricted API has
+e.g. the following categories. Their names are obtained from
+the corresponding unimodal categories by prefixing ``M``.
+```
+  MS ;     -- multimodal sentence or question
+  MQS ;    -- multimodal wh question
+  MImp ;   -- multimodal imperative
+  MVP ;    -- multimodal verb phrase
+  MNP ;    -- multimodal (demonstrative) noun phrase
+  MAdv ;   -- multimodal (demonstrative) adverbial
+
+  Point ;  -- pointing gesture
+```
+
+
+
+===Multimodal API: functions for building demonstratives===
+
+Demonstrative pronouns can be used both as noun phrases and
+as determiners.
+```
+    this_MNP    : Point -> MNP ;        -- this
+    thisDet_MNP : CN -> Point -> MNP ;  -- this car
+```
+There are also demonstrative adverbs, and prepositions give
+a productive way to build more adverbs.
+```
+    here_MAdv      : Point -> MAdv ;    -- here
+    here7from_MAdv : Point -> MAdv ;    -- from here
+
+    MPrepNP : Prep -> MNP -> MAdv ;     -- in this car
+```
+
+
+===Multimodal API: functions for building sentences and phrases===
+
+A handful of predication rules construct sentences, questions, and imperatives.
+```
+    MPredVP   : MNP -> MVP -> MS ;    -- this plane flies here
+    MQPredVP  : MNP -> MVP -> MQS ;   -- does this plane fly here
+    MQuestVP  : IP  -> MVP -> MQS ;   -- who flies here
+    MImpVP    : MVP -> MImp ;         -- fly here!
+```
+Verb phrases are constructed from verbs (inherited as such from
+the unimodal API) by providing their complements.
+```
+    MUseV     : V   -> MVP ;          -- flies
+    MComplV2  : V2  -> MNP -> MVP ;   -- takes this
+    MComplVV  : VV  -> MVP -> MVP ;   -- wants to take this
+```
+A multimodal adverb can be attached to a verb phrase.
+```
+    MAdvVP    : MVP -> MAdv -> MVP ;  -- flies here
+```
+
+
+
+
+===Language-independent implementation: examples===
+
+The implementation makes heavy use of the multimodal conversion
+combinators. It adds a ``point`` field to whatever the implementation of the unimodal
+category is in any language. Thus, for example
+```
+  lincat
+    MVP   = Dem VP ;
+    MNP   = Dem NP ;
+    MAdv  = Dem Adv ;
+
+  lin 
+    this_MNP = mkDem NP this_NP ;
+    -- i.e. this_MNP p = this_NP ** {point = p.point} ;
+
+    MComplV2 verb obj = mkDem VP (ComplV2 verb obj) obj ;
+
+    MAdvVP vp adv = mkDem VP (AdvVP vp adv) (concatPoint vp adv) ;
+```
+
+
+
+===Multimodal API: interface to unimodal expressions===
+
+Using nondemonstrative expressions as demonstratives:
+```
+    DemNP   : NP  -> MNP ;
+    DemAdv  : Adv -> MAdv ;
+```
+Building top-level phrases:
+```
+    PhrMS   : Pol -> MS   -> Phr ;
+    PhrMS   : Pol -> MS   -> Phr ;
+    PhrMQS  : Pol -> MQS  -> Phr ;
+    PhrMImp : Pol -> MImp -> Phr ;
+```
+
+
+===Instantiating multimodality to different languages===
+
+The implementation above has only used the resource grammar API,
+not the concrete implementations. The library ``Demonstrative``
+is a **parametrized module**, also called a **functor**, which
+has the following structure
+```
+  incomplete concrete DemonstrativeI of Demonstrative = 
+    Cat, TenseX ** open Test, Structural in {
+    
+    -- lincat and lin rules
+
+    }
+```
+It can be **instantiated** to different languages as follows.
+```
+  concrete DemonstrativeEng of Demonstrative = 
+    CatEng, TenseX ** DemonstrativeI with
+      (Test = TestEng),
+      (Structural = StructuralEng) ;
+
+  concrete DemonstrativeSwe of Demonstrative = 
+    CatSwe, TenseX ** DemonstrativeI with
+      (Test = TestSwe),
+      (Structural = StructuralSwe) ;
+```
+
+
+
+===Language-independent reimplementation of TramDemo===
+
+Again using the functor idea, we reimplement ``TramDemo``
+as follows:
+```
+incomplete concrete TramI of Tram = open Multimodal in {
+
+lincat
+  Query = Phr ; Input = MS ; 
+  Dep, Dest = MAdv ; Click = Point ;
+lin
+  QInput = PhrMS PPos ;
+
+  GoFromTo x y = 
+    MPredVP (DemNP (UsePron i_Pron)) 
+      (MAdvVP (MAdvVP (MComplVV want_VV (MUseV go_V)) x) y) ;
+
+  DepHere    = here7from_MAdv ;
+  DestHere   = here7to_MAdv ;
+  DepName s  = MPrepNP from_Prep (DemNP (UsePN (SymbPN (MkSymb s)))) ;
+  DestName s = MPrepNP to_Prep (DemNP (UsePN (SymbPN (MkSymb s)))) ;
+
+```
+Then we can instantiate this to all languages for which
+the ``Multimodal`` API has been implemented:
+```
+  concrete TramEng of Tram = TramI with 
+    (Multimodal = MultimodalEng) ;
+
+  concrete TramSwe of Tram = TramI with 
+    (Multimodal = MultimodalSwe) ;
+
+  concrete TramFre of Tram = TramI with 
+    (Multimodal = MultimodalFre) ;
+```
+
+
+
+==A problem: switched order==
+
+It was pointed out in the section on the multimodal conversion that
+the concrete word order may be different from the abstract one,
+and vary between different languages. For instance, Swedish
+topicalization
+
+	 Det här tåget vill den här kunden inte ta.
+
+(``this train, this customer doesn't want to take'') may well have
+an abstract syntax of a form in which the customer appears 
+before the train.
+
+This is a problem for the implementor of the resource grammar.
+It means that some parts of the resource must be written manually
+and not as a functor.
+However, the //user// of the resource can safely
+ignore the word order problem, if it is correctly dealt with in
+the resource.
+
+
+==A recipe for using a resource library==
+
+In the beginning, we believed resource grammars are all that
+an application grammarian needs to write a concrete syntax.
+However, experience has shown that it can be heavy to start
+the grammar development in this way: selecting functions from
+a resource API requires more abstract thinking than just
+writing things (maybe even in a context-free grammar notation,
+also supported by GF). This experience has led to the following
+steps for grammar development, which at the same time give
+the work a quick start and in the end used increased abstraction
+to localize the grammar in different languages.
+
+ Encode domain ontology in and abstract syntax, ``Domain``.
+ Write a rough concrete syntax in English, ``DomainRough``.
+  This can be oversimplified and overgenerating.
+ Reimplement by resource, and build a functor ``DomainI``.
+ Instantiate this functor to different languages, and test.
+ If a rule doesn't satisfy in a language, use its resource in
+  a different way (**compile-time transfer**).
+
+