mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-05-27 11:48:55 -06:00
gslt sem presentation
This commit is contained in:
@@ -14,13 +14,19 @@ Last update: %%date(%c)
|
|||||||
|
|
||||||
==Setting==
|
==Setting==
|
||||||
|
|
||||||
Funding
|
Current funding
|
||||||
- VR: Library-Based Grammar Engineering (2006-2008)
|
- VR: Library-Based Grammar Engineering (2006-2008)
|
||||||
|
- Lars Borin (Swedish)
|
||||||
|
- Robin Cooper (Computational Linguistics)
|
||||||
|
- Sibylle Schupp and Aarne Ranta (Computer Science)
|
||||||
|
|
||||||
|
|
||||||
|
Previous funding
|
||||||
- VR: Record Types and Dialogue Semantics (2003-2005)
|
- VR: Record Types and Dialogue Semantics (2003-2005)
|
||||||
- VINNOVA: Interactive Language Technology (2001-2004)
|
- VINNOVA: Interactive Language Technology (2001-2004)
|
||||||
|
|
||||||
|
|
||||||
Applications
|
Main applications
|
||||||
- TALK: multilingual and multimodal dialogue systems
|
- TALK: multilingual and multimodal dialogue systems
|
||||||
- WebALT: multilingual generation of mathematical teaching material
|
- WebALT: multilingual generation of mathematical teaching material
|
||||||
- KeY: multilingual authoring of software specifications
|
- KeY: multilingual authoring of software specifications
|
||||||
@@ -30,16 +36,15 @@ Applications
|
|||||||
|
|
||||||
==People==
|
==People==
|
||||||
|
|
||||||
Staff:
|
Staff contributions to grammar libraries:
|
||||||
- Björn Bringert
|
- Björn Bringert
|
||||||
- Markus Forsberg
|
- Markus Forsberg
|
||||||
- Harald Hammarström
|
- Harald Hammarström
|
||||||
- Janna Khegai
|
- Janna Khegai
|
||||||
- Peter Ljunglöf
|
|
||||||
- Aarne Ranta
|
- Aarne Ranta
|
||||||
|
|
||||||
|
|
||||||
Student projects:
|
Student projects on libraries:
|
||||||
- Inger Andersson & Therese Söderberg: Spanish morphology
|
- Inger Andersson & Therese Söderberg: Spanish morphology
|
||||||
- Ludmilla Bogavac: Russian morphology
|
- Ludmilla Bogavac: Russian morphology
|
||||||
- Ali El Dada: Arabic morphology and syntax
|
- Ali El Dada: Arabic morphology and syntax
|
||||||
@@ -47,6 +52,7 @@ Student projects:
|
|||||||
- Michael Pellauer: Estonian morphology
|
- Michael Pellauer: Estonian morphology
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
|
|
||||||
==Software Libraries==
|
==Software Libraries==
|
||||||
@@ -97,8 +103,13 @@ Possible ways to do this:
|
|||||||
yesButton swedish = button "Ja"
|
yesButton swedish = button "Ja"
|
||||||
yesButton finnish = button "Kyllä"
|
yesButton finnish = button "Kyllä"
|
||||||
```
|
```
|
||||||
|
|
||||||
+ Hire more programmers to perform localization in different languages
|
+ Hire more programmers to perform localization in different languages
|
||||||
+ Use a library ``GUIText`` such that you can write
|
|
||||||
|
|
||||||
|
#NEW
|
||||||
|
|
||||||
|
3. Use a library ``GUIText`` such that you can write
|
||||||
```
|
```
|
||||||
yesButton lang = button (render lang GUIText.Yes)
|
yesButton lang = button (render lang GUIText.Yes)
|
||||||
```
|
```
|
||||||
@@ -194,17 +205,18 @@ You may want to write
|
|||||||
For this purpose, you need a library with the following API
|
For this purpose, you need a library with the following API
|
||||||
(Application Programmer's Interface):
|
(Application Programmer's Interface):
|
||||||
```
|
```
|
||||||
Have : NounPhrase -> NounPhrase -> Sentence
|
Have : NounPhrase -> NounPhrase -> Sentence
|
||||||
|
|
||||||
PolYou, FamYou, I : NounPhrase
|
PolYou : NounPhrase
|
||||||
|
FamYou : NounPhrase
|
||||||
|
|
||||||
Num : Int -> Noun -> NounPhrase
|
Num : Int -> Noun -> NounPhrase
|
||||||
|
|
||||||
Message, Jewel, Surprise : Noun
|
Message : Noun
|
||||||
```
|
```
|
||||||
You also need a top-level rendering function
|
You also need a top-level rendering function
|
||||||
```
|
```
|
||||||
render : Language -> Sentence -> String
|
render : Language -> Sentence -> String
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
@@ -223,6 +235,9 @@ To this end, the API should provide the top-level function
|
|||||||
```
|
```
|
||||||
The library that we will present actually has this as well!
|
The library that we will present actually has this as well!
|
||||||
|
|
||||||
|
|
||||||
|
#NEW
|
||||||
|
|
||||||
The only complication is that ``parse`` does not always return
|
The only complication is that ``parse`` does not always return
|
||||||
just one sentence. Those may be zero:
|
just one sentence. Those may be zero:
|
||||||
```
|
```
|
||||||
@@ -230,8 +245,8 @@ just one sentence. Those may be zero:
|
|||||||
```
|
```
|
||||||
or many:
|
or many:
|
||||||
```
|
```
|
||||||
Have PolYou (Num n Message)
|
Have PolYou (Num n Message)
|
||||||
Have FamYou (Num n Message)
|
Have FamYou (Num n Message)
|
||||||
Have PlurYou (Num n Message)
|
Have PlurYou (Num n Message)
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -251,7 +266,7 @@ can have different realizations in different languages.
|
|||||||
Therefore we also need **realization functions**,
|
Therefore we also need **realization functions**,
|
||||||
```
|
```
|
||||||
render : Language -> Sentence -> String
|
render : Language -> Sentence -> String
|
||||||
parse : Language -> String -> [Sentence]
|
parse : Language -> String -> [Sentence]
|
||||||
```
|
```
|
||||||
Both of them require major linguistic expertise to write - but,
|
Both of them require major linguistic expertise to write - but,
|
||||||
one this is done, they can be used with very little linguistic
|
one this is done, they can be used with very little linguistic
|
||||||
@@ -272,15 +287,19 @@ In GF,
|
|||||||
- realization functions = **concrete syntax**
|
- realization functions = **concrete syntax**
|
||||||
|
|
||||||
|
|
||||||
Example:
|
#NEW
|
||||||
|
|
||||||
|
Simplest possible example:
|
||||||
```
|
```
|
||||||
abstract GUIText = {
|
abstract GUIText = {
|
||||||
cat Text ;
|
cat Text ;
|
||||||
fun Yes : Text ;
|
fun Yes : Text ;
|
||||||
}
|
}
|
||||||
|
|
||||||
concrete GUITextEng of GUIText = {
|
concrete GUITextEng of GUIText = {
|
||||||
lin Yes = ss "yes" ;
|
lin Yes = ss "yes" ;
|
||||||
}
|
}
|
||||||
|
|
||||||
concrete GUITextFin of GUIText = {
|
concrete GUITextFin of GUIText = {
|
||||||
lin Yes = ss "kyllä" ;
|
lin Yes = ss "kyllä" ;
|
||||||
}
|
}
|
||||||
@@ -302,8 +321,8 @@ The GF formalism moreover has the property of **reversibility**:
|
|||||||
a set of linearization rules automatically generates a parser as
|
a set of linearization rules automatically generates a parser as
|
||||||
well.
|
well.
|
||||||
|
|
||||||
While reversibility has a minor importance for the applications
|
%While reversibility has a minor importance for the applications
|
||||||
shown above, it is crucial for other applications of GF grammars.
|
%shown above, it is crucial for other applications of GF grammars.
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
@@ -312,34 +331,24 @@ shown above, it is crucial for other applications of GF grammars.
|
|||||||
|
|
||||||
**multilingual grammar** = abstract syntax + concrete syntaxes
|
**multilingual grammar** = abstract syntax + concrete syntaxes
|
||||||
|
|
||||||
Early instances of the idea (from 1998) - **application grammars**:
|
Examples of the idea:
|
||||||
- multilingual authoring
|
- multilingual authoring
|
||||||
- domain-specific translation
|
- domain-specific translation
|
||||||
- dialogue systems
|
- dialogue systems
|
||||||
|
|
||||||
|
|
||||||
Later development (from 2001) - **resource grammars**:
|
|
||||||
- grammar libraries with language-independent APIs
|
|
||||||
|
|
||||||
|
|
||||||
Of course, one important use of resource grammars is
|
|
||||||
to help writing application grammars in GF.
|
|
||||||
|
|
||||||
In addition to GF itself, GF grammars can be accessed in
|
|
||||||
Haskell, Prolog, and Java programs.
|
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
|
|
||||||
==Domain, ontology, idiom==
|
==Domain, ontology, idiom==
|
||||||
|
|
||||||
An abstract syntax can represent
|
An abstract syntax represents
|
||||||
- a **semantic model**
|
- a **semantic model**
|
||||||
- an **ontology**
|
- an **ontology**
|
||||||
|
|
||||||
|
|
||||||
The concrete syntax defines how the **concepts** of the ontology
|
The concrete syntax defines how the concepts of the ontology
|
||||||
are represented in natural language (or in a formal language).
|
are represented in a language.
|
||||||
|
|
||||||
The following requirements are made:
|
The following requirements are made:
|
||||||
- linguistic correctness (inflection, agreement, word order,...)
|
- linguistic correctness (inflection, agreement, word order,...)
|
||||||
@@ -406,17 +415,17 @@ All these sentences are grammatically incorrect.
|
|||||||
|
|
||||||
==Solving the difficulties==
|
==Solving the difficulties==
|
||||||
|
|
||||||
GF has tools for expressing the linguistic rules that are needed to
|
GF can express the linguistic rules that are needed to
|
||||||
produce correct translations in different languages. (Expressive power
|
produce correct translations. (Expressive power
|
||||||
between TAG and HPSG.)
|
between TAG and HPSG, but the language is more high-level.)
|
||||||
|
|
||||||
Instead of just strings, we need parameters**, **tables**,
|
Instead of just strings, we need **parameters**, **tables**,
|
||||||
and **record types**. For instance, French:
|
and **record types**. For instance, French:
|
||||||
```
|
```
|
||||||
param Mod = Ind | Subj ;
|
param Mod = Ind | Subj ;
|
||||||
param Gen = Masc | Fem ;
|
param Gen = Masc | Fem ;
|
||||||
|
|
||||||
lincat Nat = {s : Str ; g : Gen} ;
|
lincat Nat = {s : Str ; g : Gen} ;
|
||||||
lincat Prop = {s : Mod => Str} ;
|
lincat Prop = {s : Mod => Str} ;
|
||||||
|
|
||||||
lin Even x = {s =
|
lin Even x = {s =
|
||||||
@@ -424,12 +433,29 @@ and **record types**. For instance, French:
|
|||||||
m => x.s ++
|
m => x.s ++
|
||||||
case m of {Ind => "est" ; Subj => "soit"} ++
|
case m of {Ind => "est" ; Subj => "soit"} ++
|
||||||
case x.g of {Masc => "pair" ; Fem => "paire"}
|
case x.g of {Masc => "pair" ; Fem => "paire"}
|
||||||
}
|
}
|
||||||
} ;
|
} ;
|
||||||
```
|
```
|
||||||
Linguistic knowledge dominates in the size of this grammar.
|
Linguistic knowledge dominates in the size of this grammar.
|
||||||
|
|
||||||
|
|
||||||
|
#NEW
|
||||||
|
|
||||||
|
==Application grammars vs. resource grammars==
|
||||||
|
|
||||||
|
Application grammar ("semantic grammar")
|
||||||
|
- abstract syntax: domain semantics
|
||||||
|
- concrete syntax: "controlled language"
|
||||||
|
- author: domain expert
|
||||||
|
|
||||||
|
|
||||||
|
Resource grammar ("syntactic grammar")
|
||||||
|
- abstract syntax: linguistic structures
|
||||||
|
- concrete syntax: (approximation of) entire language
|
||||||
|
- author: linguist
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
|
|
||||||
==Concrete syntax using library==
|
==Concrete syntax using library==
|
||||||
@@ -457,7 +483,7 @@ Notice: choice of adjective is domain expert knowledge.
|
|||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
==Questions in grammar library design==
|
==Design questions for grammar the library==
|
||||||
|
|
||||||
What should there be in the library?
|
What should there be in the library?
|
||||||
- morphology, lexicon, syntax, semantics,...
|
- morphology, lexicon, syntax, semantics,...
|
||||||
@@ -468,7 +494,7 @@ How do we organize and present the library?
|
|||||||
- "school grammar" vs. sophisticated linguistic concepts
|
- "school grammar" vs. sophisticated linguistic concepts
|
||||||
|
|
||||||
|
|
||||||
Where do we get the data from?
|
Where to get the data from?
|
||||||
- automatic extraction or hand-writing?
|
- automatic extraction or hand-writing?
|
||||||
- reuse of existing resources?
|
- reuse of existing resources?
|
||||||
|
|
||||||
@@ -478,14 +504,14 @@ hence cannot use existing proprietary resources.
|
|||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
==Answers to questions in grammar library design==
|
==Design decisions==
|
||||||
|
|
||||||
The current GF resource grammar library has, for each language,
|
The current GF resource grammar library has, for each language,
|
||||||
- complete morphology
|
- complete morphology
|
||||||
- lexicon of the most important structural words
|
- lexicon of the most important structural words
|
||||||
- test lexicon of ca. 300 content words
|
- test lexicon of ca. 300 content words
|
||||||
- representative fragment of syntax
|
- representative fragment of syntax (cf. CLE (Core Language Engine))
|
||||||
- very little semantics,
|
- rather flat semantics (cf. Quasi-Logical Form of CLE)
|
||||||
|
|
||||||
|
|
||||||
Organization and presentation:
|
Organization and presentation:
|
||||||
@@ -497,7 +523,7 @@ Organization and presentation:
|
|||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
==Answers to questions in grammar library design. cont'd==
|
==Design decisions, cont'd==
|
||||||
|
|
||||||
Where do we get the data from?
|
Where do we get the data from?
|
||||||
- morphology and syntax are hand-written
|
- morphology and syntax are hand-written
|
||||||
@@ -506,6 +532,7 @@ Where do we get the data from?
|
|||||||
- tool for automatic lexicon extraction
|
- tool for automatic lexicon extraction
|
||||||
- we have not reused existing resources
|
- we have not reused existing resources
|
||||||
|
|
||||||
|
|
||||||
The resource grammar library is entirely
|
The resource grammar library is entirely
|
||||||
open-source free software (under GNU GPL license).
|
open-source free software (under GNU GPL license).
|
||||||
|
|
||||||
@@ -513,27 +540,12 @@ open-source free software (under GNU GPL license).
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
|
||||||
==The scope of a resource grammar library for a language==
|
|
||||||
|
|
||||||
All morphological paradigms
|
|
||||||
|
|
||||||
Basic lexicon of structural, common, and irregular words
|
|
||||||
|
|
||||||
Basic syntactic structures (approx. those of CLE, Core Language Engine)
|
|
||||||
|
|
||||||
Currently,
|
|
||||||
- //no// semantics,
|
|
||||||
- //no// language-specific structures if not necessary for expressivity.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
==Success criteria==
|
==Success criteria==
|
||||||
|
|
||||||
Grammatical correctness
|
Grammatical correctness of everything generated.
|
||||||
|
|
||||||
Semantic coverage: you can express whatever you want.
|
Semantic coverage: you can express whatever you want.
|
||||||
|
|
||||||
@@ -548,24 +560,18 @@ families, using the module system of GF.
|
|||||||
==These are not our success criteria==
|
==These are not our success criteria==
|
||||||
|
|
||||||
Language coverage: to be able to parse all expressions.
|
Language coverage: to be able to parse all expressions.
|
||||||
|
- Example: French //passé simple//, although covered by the
|
||||||
|
morphology, is not available through the language-independent API.
|
||||||
|
|
||||||
Example:
|
|
||||||
the French //passé simple// tense, although covered by the
|
|
||||||
morphology, is not used in the language-independent API, but
|
|
||||||
only the //passé composé// is. However, an application
|
|
||||||
accessing the French-specific (or Romance-specific)
|
|
||||||
modules can use the passé simple.
|
|
||||||
|
|
||||||
Semantic correctness: only to produce meaningful expressions.
|
Semantic correctness: only to produce meaningful expressions.
|
||||||
|
- Example: the following sentences can be generated
|
||||||
Example: the following sentences can be generated
|
|
||||||
```
|
```
|
||||||
colourless green ideas sleep furiously
|
colourless green ideas sleep furiously
|
||||||
|
|
||||||
the time is seventy past forty-two
|
the time is seventy past forty-two
|
||||||
```
|
```
|
||||||
However, an applicatio grammar can use a domain-specific
|
|
||||||
semantics to guarantee semantic well-formedness.
|
|
||||||
|
|
||||||
(Warning for linguists:) theoretical innovation in
|
(Warning for linguists:) theoretical innovation in
|
||||||
syntax is not among the goals
|
syntax is not among the goals
|
||||||
@@ -576,6 +582,9 @@ syntax is not among the goals
|
|||||||
#NEW
|
#NEW
|
||||||
==So where is semantics?==
|
==So where is semantics?==
|
||||||
|
|
||||||
|
Application grammars typically use domain-specific
|
||||||
|
semantics to guarantee semantic well-formedness.
|
||||||
|
|
||||||
GF incorporates a **Logical Framework** and is therefore
|
GF incorporates a **Logical Framework** and is therefore
|
||||||
capable of expressing logical semantics //à la// Montague
|
capable of expressing logical semantics //à la// Montague
|
||||||
or any other flavour, including anaphora and discourse.
|
or any other flavour, including anaphora and discourse.
|
||||||
@@ -588,6 +597,29 @@ Instead, we expect semantics to be given in
|
|||||||
of different domains.
|
of different domains.
|
||||||
|
|
||||||
|
|
||||||
|
#NEW
|
||||||
|
==Levels of representation==
|
||||||
|
|
||||||
|
No fixed set of levels; here some examples:
|
||||||
|
```
|
||||||
|
2 is even
|
||||||
|
2 är jämnt
|
||||||
|
```
|
||||||
|
In ``Arithm``
|
||||||
|
```
|
||||||
|
Even 2
|
||||||
|
```
|
||||||
|
In ``Predication`` (high level resource API)
|
||||||
|
```
|
||||||
|
predA (IntNP 2) (regA "even")
|
||||||
|
predA (IntNP 2) (regA "jämn")
|
||||||
|
```
|
||||||
|
In ``Lang`` (ground level resource API)
|
||||||
|
```
|
||||||
|
UseCl TPres ASimul PPos (PredVP (UsePN (IntPN 2)) (UseComp (CompAP (PositA (regA "even")))))
|
||||||
|
UseCl TPres ASimul PPos (PredVP (UsePN (IntPN 2)) (UseComp (CompAP (PositA (regA "jämn")))))
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
@@ -729,17 +761,16 @@ Example heuristic, from [ParadigsSwe gfdoc/ParadigmsSwe.html]:
|
|||||||
==Some formats that can be generated from GF grammars==
|
==Some formats that can be generated from GF grammars==
|
||||||
|
|
||||||
```
|
```
|
||||||
-printer=lbnf BNF Converter, thereby C/Bison, Java/JavaCup
|
-printer=lbnf BNF Converter, thereby C/Bison, Java/JavaCup
|
||||||
-printer=fullform full-form lexicon, short format
|
-printer=fullform full-form lexicon, short format
|
||||||
-printer=xml XML: DTD for the pg command, object for st
|
-printer=xml XML: DTD for the pg command, object for st
|
||||||
-printer=gsl Nuance GSL speech recognition grammar
|
-printer=gsl Nuance GSL speech recognition grammar
|
||||||
-printer=jsgf Java Speech Grammar Format
|
-printer=jsgf Java Speech Grammar Format
|
||||||
-printer=srgs_xml SRGS XML format
|
-printer=srgs_xml SRGS XML format
|
||||||
-printer=srgs_xml_prob SRGS XML format, with weights
|
-printer=srgs_xml_prob SRGS XML format, with weights
|
||||||
-printer=slf a finite automaton in the HTK SLF format
|
-printer=slf a finite automaton in the HTK SLF format
|
||||||
-printer=regular a regular grammar in a simple BNF
|
-printer=regular a regular grammar in a simple BNF
|
||||||
-printer=gfc-prolog gfc in prolog format (also pg)
|
-printer=gfc-prolog gfc in prolog format (also pg)
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
@@ -749,15 +780,16 @@ Example heuristic, from [ParadigsSwe gfdoc/ParadigmsSwe.html]:
|
|||||||
The most general format is **multilingual treebank** generation:
|
The most general format is **multilingual treebank** generation:
|
||||||
```
|
```
|
||||||
> gr -tr | l -multi
|
> gr -tr | l -multi
|
||||||
Freeze (All Fruit)
|
UseCl TCond AAnter PPos (PredVP (DetCN (DetSg DefSg NoOrd)
|
||||||
|
(AdjCN (PositA young_A) (UseN man_N))) (ComplV2 love_V2 (UsePron she_Pron)))
|
||||||
|
|
||||||
all fruits freeze
|
den unga mannen skulle ha älskat henne
|
||||||
kaikki hedelmät jäätyvät
|
|
||||||
alla frukter fryser
|
der junge Mann würde sie geliebt haben
|
||||||
alle frukter fryser
|
|
||||||
todas las frutas congelan
|
le jeune homme l' aurait aimée
|
||||||
tutte le frutte gelano
|
|
||||||
tous les fruits gèlent
|
the young man would have loved her
|
||||||
```
|
```
|
||||||
A special case is corpus generation, either exhaustive or random with
|
A special case is corpus generation, either exhaustive or random with
|
||||||
or without probability weights attached to constructors.
|
or without probability weights attached to constructors.
|
||||||
@@ -765,6 +797,16 @@ or without probability weights attached to constructors.
|
|||||||
Cf. Rebecca Jonson this afternoon.
|
Cf. Rebecca Jonson this afternoon.
|
||||||
|
|
||||||
|
|
||||||
|
#NEW
|
||||||
|
==Use as program components==
|
||||||
|
|
||||||
|
Haskell, Java, Prolog
|
||||||
|
|
||||||
|
Parsing, generation, translation
|
||||||
|
|
||||||
|
Push-button creation of spoken language translators (using Nuance)
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
==Related work==
|
==Related work==
|
||||||
|
|
||||||
@@ -772,7 +814,8 @@ CLE = Core Language Engine
|
|||||||
- the closest point of comparison as for coverage and purpose
|
- the closest point of comparison as for coverage and purpose
|
||||||
- resource API similar to "Quasi-Logical Form"
|
- resource API similar to "Quasi-Logical Form"
|
||||||
- parametrized modules instead of grammar porting via macro packages
|
- parametrized modules instead of grammar porting via macro packages
|
||||||
- grammar specialization via partial evaluatio instead of explanation-based learning
|
- grammar specialization via partial evaluation instead of explanation-based learning
|
||||||
|
- therefore, transfer at compile time as often as possible
|
||||||
|
|
||||||
|
|
||||||
Lingo Matrix project (HPSG)
|
Lingo Matrix project (HPSG)
|
||||||
|
|||||||
Reference in New Issue
Block a user